connect by clause in regex_substr
The "abuse" (as Colin 't Hart put it) of connected by
has a good purpose here:
by using REGEXP_SUBSTR
you can extract only one of the 4 matches (23,34,45,56): the regex [^,]+
matches any character sequence in the string which does not contain a comma.
If you'll try running:
SELECT REGEXP_SUBSTR ('23,34,45,56','[^,]+') as "token"
FROM DUAL
you'll get 23
.
and if you'll try running:
SELECT REGEXP_SUBSTR ('23,34,45,56','[^,]+',1,1) as "token"
FROM DUAL
you'll also get 23
only that now we also set two additional parameters: start looking in position 1 (which is the default), and return the 1st occurrence.
Now lets run:
SELECT REGEXP_SUBSTR ('23,34,45,56','[^,]+',1,2) as "token"
FROM DUAL
this time we'll get 34
(2nd occurrence) and using 3
as the last parameter will return 45
and so on.
The use of recursive connected by
along with level
makes sure you'll receive all the relevant results (not necessarily in the original order though!):
SELECT DISTINCT REGEXP_SUBSTR ('23,34,45,56','[^,]+',1,LEVEL) as "token"
FROM DUAL
CONNECT BY REGEXP_SUBSTR ('23,34,45,56','[^,]+',1,LEVEL) IS NOT NULL
order by 1
will return:
TOKEN
23
34
45
56
which not only contains all 4 results, but also breaks it into separate rows in the resultset!
If you'll fiddle with it - it might give you a clearer view of the subject.
connect by
has nothing to do with regex_substr
:
The first is to perform a hierarchical query, see http://docs.oracle.com/cd/B19306_01/server.102/b14200/queries003.htm
The second is to get a substring using regular expressions.
This query "abuses" the connect by
functionality to generate rows in a query on dual
.
As long as the expression passed to connect by
is true, it will generate a new row and increase the value of the pseudo column LEVEL
.
Then LEVEL
is passed to regex_substr
to get the nth value when applying the regular expression.