This MariaDB tutorial explains how to use the MariaDB REGEXP_SUBSTR function with syntax and examples.
The MariaDB REGEXP_SUBSTR function is an extension of the SUBSTR function. This function, introduced in MariaDB 10.0.5, will allow you to extract a substring in a string using regular expression pattern matching.
The syntax for the REGEXP_SUBSTR function in MariaDB is:
The string to search.
The regular expression matching information. It can be a combination of the following:
Value | Description |
---|---|
^ | Matches the beginning of a string. If used with a match_parameter of 'm', it matches the start of a line anywhere within expression. |
$ | Matches the end of a string. If used with a match_parameter of 'm', it matches the end of a line anywhere within expression. |
* | Matches zero or more occurrences. |
+ | Matches one or more occurrences. |
? | Matches zero or one occurrence. |
. | Matches any character except NULL. |
| | Used like an "OR" to specify more than one alternative. |
[ ] | Used to specify a matching list where you are trying to match any one of the characters in the list. |
[^ ] | Used to specify a nonmatching list where you are trying to match any character except for the ones in the list. |
( ) | Used to group expressions as a subexpression. |
{m} | Matches m times. |
{m,} | Matches at least m times. |
{m,n} | Matches at least m times, but no more than n times. |
\ | n is a number between 1 and 9. Matches the nth subexpression found within ( ) before encountering \ . |
[..] | Matches one collation element that can be more than one character. |
[::] | Matches character classes. |
[==] | Matches equivalence classes. |
\\d | Matches a digit character. |
\\D | Matches a nondigit character. |
\\w | Matches a word character. |
\\W | Matches a nonword character. |
\\s | Matches a whitespace character. |
\\S | matches a non-whitespace character. |
*? | Matches the preceding pattern zero or more occurrences. |
+? | Matches the preceding pattern one or more occurrences. |
?? | Matches the preceding pattern zero or one occurrence. |
{n}? | Matches the preceding pattern n times. |
{n,}? | Matches the preceding pattern at least n times. |
{n,m}? | Matches the preceding pattern at least n times, but not more than m times. |
The REGEXP_SUBSTR function can be used in the following versions of MariaDB:
Let's start by looking how to use the | pattern with the REGEXP_SUBSTR function in MariaDB.
For example:
These REGEXP_SUBSTR examples will return values such as Pan, Pen, or Pin. The | pattern tells us to look for the letter "a", "e", or "i" between the letters "P" and "n". The REGEXP_SUBSTR function performs a case-insensitive search so it doesn't matter what case the letters are in the string.
If we wanted to perform a case-sensitive search, we need to modify our REGEXP_SUBSTR function to use a binary string. This can be done one of two ways.
OR ...
In the first example, we used the keyword BINARY to convert our string to a binary string. In the second example, we used COLLATE to convert our string to a binary string.
Since we are performing a case-sensitive search, the pattern does not match on 'pancakes' because it wants a capital 'P', so the REGEXP_SUBSTR function will return NULL.
Now let's show how to use the REGEXP_SUBSTR function with a table column:
So let's say we have a contact table with the following data:
contact_id | last_name |
---|---|
1000 | Mark |
2000 | Smith |
3000 | Johnson |
Now, let's run the following query:
These are the results that would be returned by the query:
contact_id | last_name | First vowel |
---|---|---|
1000 | Mark | A |
2000 | Smith | i |
3000 | Johnson | o |
Next, let's look at how to use the REGEXP_SUBSTR function to extract the first word from a string.
For example:
This example will return 'AODBA' because it will extract all non-whitespace characters as specified by (\\\\S*) and stop when it hits the first whitespace character (ie: space character). That will result in the word 'AODBA' being returned.
If you wanted to return the first two words from a string, we could modify the example as follows:
This example would extract the first two words from the string. The (\\\\S*) matches on the first word in the string, the (\\\\s) matches on the first non-whitespace character, and the (\\\\S*) matches on the second word in the string.
Now, let's look next at how we would use the REGEXP_SUBSTR function to match on a single digit character pattern.
For example:
This example will extract the first numeric digit from the string as specified by \\\\d. In this case, it will match on the number 7.
We could change our pattern to search for a two-digit number.
For example:
This example will extract a number that has two digits side-by-side as specified by (\\\\d)(\\\\d). In this case, it will skip over the 7 and 8 numeric values and return 15.
Now, let's look how we would use the REGEXP_SUBSTR function with a table column and search for a two digit number.
For example:
In this example, we are going to extract the first two-digit value from the address field in the contacts table.