Beautiful Soup Using Regex to Find Tags?
yes see docs...
http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html
import re
soup.findAll(re.compile("^a$|(div)"))
find_all()
is the most favored method in the Beautiful Soup search API.
You can pass a variation of filters. Also, pass a list to find multiple tags:
>>> soup.find_all(['a', 'div'])
Example:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<html><body><div>asdfasdf</div><p><a>foo</a></p></body></html>')
>>> soup.find_all(['a', 'div'])
[<div>asdfasdf</div>, <a>foo</a>]
Or you can use a regular expression to find tags that contain a
or div
:
>>> import re
>>> soup.find_all(re.compile("(a|div)"))
Note that you can also use regular expressions to search in attributes of tags. For example:
import re
from bs4 import BeautifulSoup
soup.find_all('a', {'href': re.compile(r'crummy\.com/')})
This example finds all <a>
tags that link to a website containing the substring 'crummy.com'
.