How to get group name of match regular expression in Python?
First of all your regular expression is syntactically wrong: you should write it as r'(?P<name>\w+)|(?P<number>\d+)'
. Moreover even this reg expr does not work, since the special sequence \w
matches all alphanumeric characters and hence also all characters matched by \d
.
You should change it to r'(?P<number>\d+)|(?P<name>\w+)'
to give \d
precedence over \w
.
However you can get the name of the matching group by using the attribute lastgroup
of the matched objects, i.e.:
[m.lastgroup for m in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'Ala ma 123 kota')]
producing:
['name', 'name', 'number', 'name']
You can get this information from the compiled expression:
>>> pattern = re.compile(r'(?P<name>\w+)|(?P<number>\d+)')
>>> pattern.groupindex
{'name': 1, 'number': 2}
This uses the RegexObject.groupindex
attribute:
A dictionary mapping any symbolic group names defined by
(?P<id>)
to group numbers. The dictionary is empty if no symbolic groups were used in the pattern.
If you only have access to the match object, you can get to the pattern with the MatchObject.re
attribute:
>>> a = list(re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'Ala ma kota'))
>>> a[0]
<_sre.SRE_Match object at 0x100264ad0>
>>> a[0].re.groupindex
{'name': 1, 'number': 2}
If all you wanted to know what group matched look at the value; None
means a group never was used in a match:
>>> a[0].groupdict()
{'name': 'Ala', 'number': None}
The number
group never used to match anything because its value is None
.
You can then find the names used in the regular expression with:
names_used = [name for name, value in matchobj.groupdict().iteritems() if value is not None]
or if there is only ever one group that can match, you can use MatchObject.lastgroup
:
name_used = matchobj.lastgroup
As a side note, your regular expression has a fatal flaw; everything that \d
matches, is also matched by \w
. You'll never see number
used where name
can match first. Reverse the pattern to avoid this:
>>> for match in re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'word 42'):
... print match.lastgroup
...
name
name
>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word 42'):
... print match.lastgroup
...
name
number
but take into account that words starting with digits will still confuse things for your simple case:
>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word42 42word'):
... print match.lastgroup, repr(match.group(0))
...
name 'word42'
number '42'
name 'word'