This Oracle tutorial explains how to use the Oracle/PLSQL REGEXP_COUNT function with syntax and examples.
The Oracle/PLSQL REGEXP_COUNT function counts the number of times that a pattern occurs in a string. This function, introduced in Oracle 11g, will allow you to count the number of times a substring occurs in a string using regular expression pattern matching.
The syntax for the REGEXP_COUNT function in Oracle is:
The string to search. string can be CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.
The regular expression matching information. It can be a combination of the following:
Value | Description |
---|---|
^ | Matches the beginning of a string. If used with a match_parameter of 'm', it matches the start of a line anywhere within expression. |
$ | Matches the end of a string. If used with a match_parameter of 'm', it matches the end of a line anywhere within expression. |
* | Matches zero or more occurrences. |
+ | Matches one or more occurrences. |
? | Matches zero or one occurrence. |
. | Matches any character except NULL. |
| | Used like an "OR" to specify more than one alternative. |
[ ] | Used to specify a matching list where you are trying to match any one of the characters in the list. |
[^ ] | Used to specify a nonmatching list where you are trying to match any character except for the ones in the list. |
( ) | Used to group expressions as a subexpression. |
{m} | Matches m times. |
{m,} | Matches at least m times. |
{m,n} | Matches at least m times, but no more than n times. |
\n | n is a number between 1 and 9. Matches the nth subexpression found within ( ) before encountering \n. |
[..] | Matches one collation element that can be more than one character. |
[::] | Matches character classes. |
[==] | Matches equivalence classes. |
\d | Matches a digit character. |
\D | Matches a nondigit character. |
\w | Matches a word character. |
\W | Matches a nonword character. |
\s | Matches a whitespace character. |
\S | matches a non-whitespace character. |
\A | Matches the beginning of a string or matches at the end of a string before a newline character. |
\Z | Matches at the end of a string. |
*? | Matches the preceding pattern zero or more occurrences. |
+? | Matches the preceding pattern one or more occurrences. |
?? | Matches the preceding pattern zero or one occurrence. |
{n}? | Matches the preceding pattern n times. |
{n,}? | Matches the preceding pattern at least n times. |
{n,m}? | Matches the preceding pattern at least n times, but not more than m times. |
Optional. It is the position in string where the search will start. If omitted, it defaults to 1 which is the first position in the string.
Optional. It allows you to modify the matching behavior for the REGEXP_COUNT function. It can be a combination of the following:
Value | Description |
---|---|
'c' | Perform case-sensitive matching. |
'i' | Perform case-insensitive matching. |
'n' | Allows the period character (.) to match the newline character. By default, the period is a wildcard. |
'm' | expression is assumed to have multiple lines, where ^ is the start of a line and $ is the end of a line, regardless of the position of those characters in expression. By default, expression is assumed to be a single line. |
'x' | Whitespace characters are ignored. By default, whitespace characters are matched like any other character. |
The REGEXP_COUNT function returns a numeric value.
The REGEXP_COUNT function can be used in the following versions of Oracle/PLSQL:
Let's start by looking at the simplest case. Let's count the number of times the character 't' appears in a string.
For example:
This example will return 2 because it is counting the number of occurrences of 't' in the string. Since we did not specify a match_parameter value, the REGEXP_COUNT function will perform a case-sensitive search which means that the 'T' characters will not be included in the count.
If we wanted to include both 't' and 'T' in our results and perform a case-insensitive search, we could modify our query as follows:
Now because we have provide a start_position of 1 and a match_parameter of 'i', the query will return 4 as the result. This time, both 't' and 'T' values would be included in the count.
If we wanted to count the number of 't' in a column, we could try something like this:
This would count the number of 't' or 'T' values in the last_name field from the contacts table.
Let's look next at how we would use the REGEXP_COUNT function to match on a multi-character pattern.
For example:
This example will return the number of times that the word 'the' appears in the string. It will perform a case-insensitive search so it will return 2.
For example:
This example will return the number of times that the word 'the' appears in the string starting from position 4. In this case, it will return 1 because it will skip over the first 3 characters in the string before searching for the pattern.
Now, let's look how we would use the REGEXP_COUNT function with a table column and search for multiple characters.
For example:
In this example, we are going to count the number of occurrence of 'the' in the other_comments field in the contacts table.
The next example that we will look at involves using the | pattern. The | pattern is used like an "OR" to specify more than one alternative.
For example:
This example will return 2 because it is counting the number of vowels (a, e, i, o, or u) in the string 'James'. Since we did not specify a match_parameter value, the REGEXP_COUNT function will perform a case-sensitive search which means that the 'A' in 'James' will not be counted.
We could modify our query as follows to perform a case-insensitive search as follows:
Now because we have provide a start_position of 1 and a match_parameter of 'i', the query will return 3 as the result. This time, the 'A' in 'James' will be included in the count.
Now, let's quickly show how you would use this function with a column.
So let's say we have a contact table with the following data:
contact_id | last_name |
---|---|
1000 | James |
2000 | Smith |
3000 | Johnson |
Now, let's run the following query:
These are the results that would be returned by the query:
contact_id | last_name | total |
---|---|---|
1000 | James | 3 |
2000 | Smith | 1 |
3000 | Johnson | 2 |