Oracle 11g get all matched occurrences by a regular expression
This is a little late, but I needed basically the same thing and could not find a good snippet. I needed to search a free text column of a table for some terms and collect them. As this might be useful to another I have included a version based on this question. While REGEXP_SUBSTR only returns one value, Oracle also provides a REGEXP_COUNT to tell you how many matching items are present in a given string, therefore you can join this with a list of indexes to select each as follows (with examples from this query as free text from some 'source_table'):
DEFINE MATCH_EXP = "'(^|\s)[A-Za-z]{2}[0-9]{5,}'"
WITH source_table
-- Represents some DB table with a 'free_text' column to be checked.
AS ( ( SELECT 'Txa233141 Ta233141 Ta232 Ta233142 Ta233141 Ta233148'
AS free_text FROM dual )
UNION ( SELECT 'Other stuff PH33399 mixed in OS4456908843 this line'
AS free_text FROM dual )
)
, source
-- For some table, select rows of free text and add the number of matches
-- in the line.
AS ( SELECT cnt
, free_text
FROM ( SELECT RegExp_Count(free_text, &MATCH_EXP) AS cnt
, free_text
FROM source_table )
WHERE cnt > 0 )
, iota
-- Index generator
AS ( SELECT RowNum AS idx
FROM dual
CONNECT BY RowNum <= ( SELECT Max(cnt) FROM source ) )
-- Extract the unique 'cnt' matches from each line of 'free_text'.
SELECT UNIQUE
RegExp_SubStr(s.free_text, &MATCH_EXP, 1, i.idx) AS result
FROM source s
JOIN iota i
ON ( i.idx <= s.cnt )
ORDER BY result ASC
;
It has the advantages of working for any list of selected rows and uses the CONNECT BY minimally (as this can be very slow).
How about adding a function that will loop through and return all of the values?
create or replace function regexp_substr_mr (
p_data clob,
p_re varchar
)
return varchar as
v_cnt number;
v_results varchar(4000);
begin
v_cnt := regexp_count(p_data, p_re, 1,'m');
if v_cnt < 25 then
for i in 1..v_cnt loop
v_results := v_results || regexp_substr(p_data,p_re,1,i,'m') || chr(13) || chr(10);
end loop;
else
v_results := 'WARNING more than 25 matches found';
end if;
return v_results;
end;
Then just call the function as part of the select query.
REGEXP_SUBSTR only returns one value. You could turn your string into a pseudo-table and then query that for matches. There's an XML-based way of doing this that escapes me at the moment, but using connect-by works, as long as you only have one source string:
SELECT REGEXP_SUBSTR(str, '[^ ]+', 1, LEVEL) AS substr
FROM (
SELECT 'Txa233141b Ta233141 Ta233142 Ta233147 Ta233148' AS str FROM DUAL
)
CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(str, '[^ ]+')) + 1;
... gives you:
SUBSTR
--------------------
Txa233141b
Ta233141
Ta233142
Ta233147
Ta233148
... and you can filter that with a slightly simpler version of your original pattern:
SELECT substr
FROM (
SELECT REGEXP_SUBSTR(str, '[^ ]+', 1, LEVEL) AS substr
FROM (
SELECT 'Txa233141b Ta233141 Ta233142 Ta233147 Ta233148' AS str
FROM DUAL
)
CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(str, '[^ ]+')) + 1
)
WHERE REGEXP_LIKE(substr, '^[A-Za-z]{2}[0-9]{5,}$');
SUBSTR
--------------------
Ta233141
Ta233142
Ta233147
Ta233148
Which isn't very pretty, but neither is holding multiple values in one field.