Match last occurrence with regex
For me the clearest way is:
>>> re.findall('<br>(.*?)<br>', text)[-1]
'Tizi Ouzou'
A non regex approach using the builtin str
functions:
text = """
Pellentesque habitant morbi tristique senectus et netus et
lesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae
ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam
egestas <br>semper<br>tizi ouzou<br>Tizi Ouzou<br> """
res = text.rsplit('<br>', 2)[-2]
#Tizi Ouzou
Have a look at the related questions: you shouldn't parse HTML with regex. Use a regex parser instead. For Python, I hear Beautiful Soup is the way to go.
Anyway, if you want to do it with regex, you need to make sure that .*
cannot go past another <br>
. To do that, before consuming each character we can use a lookahead to make sure that it doesn't start another <br>
:
<br>(?:(?!<br>).)*<br>\s*$