Regex to find words between two tags

You can use BeautifulSoup for this HTML parsing.

input = """"<person>John</person>went to<location>London</location>"""
soup = BeautifulSoup(input)
print soup.findAll("person")[0].renderContents()
print soup.findAll("location")[0].renderContents()

Also, it's not a good practice to use str as a variable name in python as str() means a different thing in python.

By the way, the regex can be:

import re
print re.findall("<person>(.*?)</person>", input)
print re.findall("<location>(.*?)</location>", input)

import re

# simple example
pattern = r"<person>(.*?)</person>"
string = "<person>My name is Jo</person>"
re.findall(pattern, string, flags=0)

# multiline string example
string = "<person>My name is:\n Jo</person>"
re.findall(pattern, string, flags=re.DOTALL)

This example works for simple parsing only. Have a look at python official documentation on re

To parse HTML, you should consider @sabuj-hassan answer but please remember to check this Stack Overflow gem as well.

Regex to find words between two tags

Tags:

Python

Regex

Related

Recent Posts