Mariadb Regexp_instr Function

MariaDB: REGEXP_INSTR Function

This MariaDB tutorial explains how to use the MariaDB REGEXP_INSTR function with syntax and examples.

Description

The MariaDB REGEXP_INSTR function is an extension of the INSTR function. It returns the location of a regular expression pattern in a string. This function, introduced in MariaDB 10.0.5, will allow you to find a substring in a string using regular expression pattern matching.

Syntax

The syntax for the REGEXP_INSTR function in MariaDB is:

REGEXP_INSTR( string, pattern )

Parameters or Arguments

string

The string to search.

pattern

The regular expression matching information. It can be a combination of the following:

Value Description
^ Matches the beginning of a string. If used with a match_parameter of 'm', it matches the start of a line anywhere within expression.
$ Matches the end of a string. If used with a match_parameter of 'm', it matches the end of a line anywhere within expression.
* Matches zero or more occurrences.
+ Matches one or more occurrences.
? Matches zero or one occurrence.
. Matches any character except NULL.
| Used like an "OR" to specify more than one alternative.
[ ] Used to specify a matching list where you are trying to match any one of the characters in the list.
[^ ] Used to specify a nonmatching list where you are trying to match any character except for the ones in the list.
( ) Used to group expressions as a subexpression.
{m} Matches m times.
{m,} Matches at least m times.
{m,n} Matches at least m times, but no more than n times.
\ n is a number between 1 and 9. Matches the nth subexpression found within ( ) before encountering \ .
[..] Matches one collation element that can be more than one character.
[::] Matches character classes.
[==] Matches equivalence classes.
\\d Matches a digit character.
\\D Matches a nondigit character.
\\w Matches a word character.
\\W Matches a nonword character.
\\s Matches a whitespace character.
\\S matches a non-whitespace character.
*? Matches the preceding pattern zero or more occurrences.
+? Matches the preceding pattern one or more occurrences.
?? Matches the preceding pattern zero or one occurrence.
{n}? Matches the preceding pattern n times.
{n,}? Matches the preceding pattern at least n times.
{n,m}? Matches the preceding pattern at least n times, but not more than m times.

Note

  • The REGEXP_INSTR function performs a case-insensitive match, except when used with binary strings.
  • If the REGEXP_INSTR function does not find any occurrence of pattern, it will return NULL.
  • This page applies to MariaDB 10.0.5 and higher, which uses the PCRE regex library.
  • See also the INSTR function.

Applies To

The REGEXP_INSTR function can be used in the following versions of MariaDB:

  • MariaDB 10.0.5+

Example - Match on Single Character

Let's start by looking at the simplest case. Let's find the position of the first 'O' character in a string using the REGEXP_INSTR function.

For example:

SELECT REGEXP_INSTR ('AODBA', 'O');
Output: 1

This example will return 1 because it is performing a case-insensitive search of 'O'. Therefore, it matches on the 'O' character in the first position.

Case Sensitive Search

If we wanted to perform a case-sensitive search for only a lowercase 'O', we need to modify our REGEXP_INSTR function to use a binary string. This can be done one of two ways.

SELECT REGEXP_INSTR (BINARY 'AODBA', 'O');Output: 12

OR ...

SELECT REGEXP_INSTR ('AODBA' COLLATE utf8_bin, 'O');Output: 12

In the first example, we used the keyword BINARY to convert our string to a binary string. In the second example, we used COLLATE to convert our string to a binary string.

Now, the REGEXP_INSTR function will return 12 because it will perform a case-sensitive search and look for only 'O' values (and exclude 'O' values).

We could also have performed a case-sensitive search using pattern matching as follows:

SELECT REGEXP_INSTR ('AODBA', '(?-i)O');Output: 12

This example would also match on the first 'O' character in the string, and ignore 'O' values.

Example - Match on Multiple Characters

Let's look next at how we would use the REGEXP_INSTR function to match on a multi-character pattern.

For example:

SELECT REGEXP_INSTR ('Example shows how to use REGEXP_INSTR', 'how');
Output: 10

This example will return the first occurrence of 'ow' in the string. It will match on the 'how' in the word 'shows'.

Now, let's look how we would use the REGEXP_INSTR function with a table column and search for multiple characters.

For example:

SELECT REGEXP_INSTR (other_comments, 'the')
FROM contacts;

In this example, we are going to search for the pattern 'the' in the other_comments field in the contacts table.

Example - Match on more than one alternative

The next example that we will look at involves using the | pattern. The | pattern is used like an "OR" to specify more than one alternative.

For example:

SELECT REGEXP_INSTR ('Mark', 'a|e|i|o|u');
Output: 1

This example will return 1 because it is searching for the first vowel (a, e, i, o, or u) in the string. The REGEXP_INSTR function will perform a case-insensitive search which means that the 'A' in 'Mark' will be matched as an 'a'.

Now, let's quickly show how you would use this function with a column.

So let's say we have a contact table with the following data:

contact_id last_name
1000 Mark
2000 Smith
3000 Johnson

Now, let's run the following query:

SELECT contact_id, last_name, REGEXP_INSTR (last_name, 'a|e|i|o|u') AS first_occurrence
FROM contacts;

These are the results that would be returned by the query:

contact_id last_name first_occurrence
1000 Mark 1
2000 Smith 3
3000 Johnson 2