Mariadb Regexp_substr Function

MariaDB: REGEXP_SUBSTR Function

This MariaDB tutorial explains how to use the MariaDB REGEXP_SUBSTR function with syntax and examples.

Description

The MariaDB REGEXP_SUBSTR function is an extension of the SUBSTR function. This function, introduced in MariaDB 10.0.5, will allow you to extract a substring in a string using regular expression pattern matching.

Syntax

The syntax for the REGEXP_SUBSTR function in MariaDB is:

REGEXP_SUBSTR( string, pattern )

Parameters or Arguments

string

The string to search.

pattern

The regular expression matching information. It can be a combination of the following:

Value Description
^ Matches the beginning of a string. If used with a match_parameter of 'm', it matches the start of a line anywhere within expression.
$ Matches the end of a string. If used with a match_parameter of 'm', it matches the end of a line anywhere within expression.
* Matches zero or more occurrences.
+ Matches one or more occurrences.
? Matches zero or one occurrence.
. Matches any character except NULL.
| Used like an "OR" to specify more than one alternative.
[ ] Used to specify a matching list where you are trying to match any one of the characters in the list.
[^ ] Used to specify a nonmatching list where you are trying to match any character except for the ones in the list.
( ) Used to group expressions as a subexpression.
{m} Matches m times.
{m,} Matches at least m times.
{m,n} Matches at least m times, but no more than n times.
\ n is a number between 1 and 9. Matches the nth subexpression found within ( ) before encountering \ .
[..] Matches one collation element that can be more than one character.
[::] Matches character classes.
[==] Matches equivalence classes.
\\d Matches a digit character.
\\D Matches a nondigit character.
\\w Matches a word character.
\\W Matches a nonword character.
\\s Matches a whitespace character.
\\S matches a non-whitespace character.
*? Matches the preceding pattern zero or more occurrences.
+? Matches the preceding pattern one or more occurrences.
?? Matches the preceding pattern zero or one occurrence.
{n}? Matches the preceding pattern n times.
{n,}? Matches the preceding pattern at least n times.
{n,m}? Matches the preceding pattern at least n times, but not more than m times.

Note

  • The REGEXP_SUBSTR function performs a case-insensitive match, except when used with binary strings.
  • If the REGEXP_SUBSTR function does not find any occurrence of pattern, it will return NULL.
  • This page applies to MariaDB 10.0.5 and higher, which uses the PCRE regex library.
  • See also the SUBSTR function.

Applies To

The REGEXP_SUBSTR function can be used in the following versions of MariaDB:

  • MariaDB 10.0.5+

Example - Match on more than one alternative

Let's start by looking how to use the | pattern with the REGEXP_SUBSTR function in MariaDB.

For example:

SELECT REGEXP_SUBSTR ('Ball Point Pen','P(a|e|i)n');
Output: 'Pen'

SELECT REGEXP_SUBSTR ('Eggs and pancakes','P(a|e|i)n');
Output: 'pan'

These REGEXP_SUBSTR examples will return values such as Pan, Pen, or Pin. The | pattern tells us to look for the letter "a", "e", or "i" between the letters "P" and "n". The REGEXP_SUBSTR function performs a case-insensitive search so it doesn't matter what case the letters are in the string.

Case Sensitive Search

If we wanted to perform a case-sensitive search, we need to modify our REGEXP_SUBSTR function to use a binary string. This can be done one of two ways.

SELECT REGEXP_SUBSTR (BINARY 'Eggs and pancakes', 'P(a|e|i)n');<strong>
Result:</strong> NULL

OR ...

SELECT REGEXP_SUBSTR ('Eggs and pancakes' COLLATE utf8_bin, 'P(a|e|i)n');<strong>
Result:</strong> NULL

In the first example, we used the keyword BINARY to convert our string to a binary string. In the second example, we used COLLATE to convert our string to a binary string.

Since we are performing a case-sensitive search, the pattern does not match on 'pancakes' because it wants a capital 'P', so the REGEXP_SUBSTR function will return NULL.

Table Column Search

Now let's show how to use the REGEXP_SUBSTR function with a table column:

So let's say we have a contact table with the following data:

contact_id last_name
1000 Mark
2000 Smith
3000 Johnson

Now, let's run the following query:

SELECT contact_id, last_name, REGEXP_SUBSTR (last_name, 'a|e|i|o|u') AS "First Vowel"
FROM contacts;

These are the results that would be returned by the query:

contact_id last_name First vowel
1000 Mark A
2000 Smith i
3000 Johnson o

Example - Match on Words

Next, let's look at how to use the REGEXP_SUBSTR function to extract the first word from a string.

For example:

SELECT REGEXP_SUBSTR ('AODBA is a great resource', '(\\\\S*)');
Output: 'AODBA'

This example will return 'AODBA' because it will extract all non-whitespace characters as specified by (\\\\S*) and stop when it hits the first whitespace character (ie: space character). That will result in the word 'AODBA' being returned.

If you wanted to return the first two words from a string, we could modify the example as follows:

SELECT REGEXP_SUBSTR ('AODBA is a great resource', '(\\\\S*)(\\\\s)(\\\\S*)');
Output: 'AODBA is'

This example would extract the first two words from the string. The (\\\\S*) matches on the first word in the string, the (\\\\s) matches on the first non-whitespace character, and the (\\\\S*) matches on the second word in the string.

Example - Match on Digit Characters

Now, let's look next at how we would use the REGEXP_SUBSTR function to match on a single digit character pattern.

For example:

SELECT REGEXP_SUBSTR ('7, 8, and 15 are numbers in this example', '\\\\d');
Output: 7

This example will extract the first numeric digit from the string as specified by \\\\d. In this case, it will match on the number 7.

We could change our pattern to search for a two-digit number.

For example:

SELECT REGEXP_SUBSTR ('7, 8, and 15 are numbers in this example', '(\\\\d)(\\\\d)');
Output: 15

This example will extract a number that has two digits side-by-side as specified by (\\\\d)(\\\\d). In this case, it will skip over the 7 and 8 numeric values and return 15.

Now, let's look how we would use the REGEXP_SUBSTR function with a table column and search for a two digit number.

For example:

SELECT REGEXP_SUBSTR (address, '(\\\\d)(\\\\d)')
FROM contacts;

In this example, we are going to extract the first two-digit value from the address field in the contacts table.