The groups() method in regular expressions in Python
This is the most specified regexp, by groups you can see the protocol, filename I forgot the file-ext.
["](?P<protocol>http(?P<secure>s)?://)(?P<fqdn>[a-zA-Z0-9]*(?P<subdomain>(.)[a-zA-Z0-9]*)*)[/](?P<filename>([a-zA-Z.])*)["]
I the response removed because I was.
From the docs:
If a group matches multiple times, only the last match is accessible:
>>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times.
>>> m.group(1) # Returns only the last match.
'c3'
Your group can only ever match one character, so c
is the last match.
You mention that you'd expect to at least see 'abc'
- if you want your group to match multiple characters, put the +
inside the group:
>>> m = re.match("([abc]+)", "abc")
For re
details consult docs. In your case:
group(0)
stands for all matched string, hence abc
, that is 3 groups a
, b
and c
group(i)
stands for i'th group, and citing documentation
If a group matches multiple times, only the last match is accessible
hence group(1)
stands for last match, c
Your +
is interpreted as group repetation, if you want repeat [abc]
inside group, move +
into parentheses:
>>> re.match("([abc])", "abc").groups()
('a',)
>>> re.match("([abc]+)", "abc").groups()
('abc',)