Python regex: matching a parenthesis within parenthesis
First of all, using \(
isn't enough to match a parenthesis. Python normally reacts to some escape sequences in its strings, which is why it interprets \(
as simple (
. You would either have to write \\(
or use a raw string, e.g. r'\('
or r"\("
.
Second, when you use re.match
, you are anchoring the regex search to the start of the string. If you want to look for the pattern anywhere in the string, use re.search
.
Like Joseph said in his answer, it's not exactly clear what you want to find. For example:
string = "TEMPLATES = ( ('index.html', 'home'), ('base.html', 'base'))"
print re.findall(r'\([^()]*\)', string)
will print
["('index.html', 'home')", "('base.html', 'base')"]
EDIT:
I stand corrected, @phooji is right: escaping is irrelevant in this specific case. But re.match
vs. re.search
or re.findall
is still important.
Try this:
import re
w = "TEMPLATES = ( ('index.html', 'home'), ('base.html', 'base'))"
# find outer parens
outer = re.compile("\((.+)\)")
m = outer.search(w)
inner_str = m.group(1)
# find inner pairs
innerre = re.compile("\('([^']+)', '([^']+)'\)")
results = innerre.findall(inner_str)
for x,y in results:
print("%s <-> %s" % (x,y))
Output:
index.html <-> home
base.html <-> base
Explanation:
outer
matches the first-starting group of parentheses using \(
and \)
; by default search
finds the longest match, giving us the outermost ( )
pair. The match m
contains exactly what's between those outer parentheses; its content corresponds to the .+
bit of outer
.
innerre
matches exactly one of your ('a', 'b')
pairs, again using \(
and \)
to match the content parens in your input string, and using two groups inside the ' '
to match the strings inside of those single quotes.
Then, we use findall
(rather than search
or match
) to get all matches for innerre
(rather than just one). At this point results
is a list of pairs, as demonstrated by the print loop.
Update: To match the whole thing, you could try something like this:
rx = re.compile("^TEMPLATES = \(.+\)")
rx.match(w)