Get all HTML tags with Beautiful Soup
You don't have to specify any arguments to find_all()
- in this case, BeautifulSoup
would find you every tag in the tree, recursively.
Sample:
from bs4 import BeautifulSoup
html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>
"""
soup = BeautifulSoup(html, "html.parser")
print([tag.name for tag in soup.find_all()])
# ['div', 'div', 'div', 'p']
print([str(tag) for tag in soup.find_all()])
# ['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>']
Please try the below--
for tag in soup.findAll(True):
print(tag.name)
If you want to find some specific HTML tags then try this:
html = driver.page_source
# driver.page_source: "<div>something</div>\n<div>something else</div>\n<div class='magical'>hi there</div>\n<p>ok</p>\n"
soup = BeautifulSoup(html)
for tag in soup.find_all(['a','div']): # Mention HTML tag names here.
print(tag.text)
# Result:
# something
# something else
# hi there
I thought I'd share my solution to a very similar question for those that find themselves here, later.
Example
I needed to find all tags quickly but only wanted unique values. I'll use the Python calendar
module to demonstrate.
We'll generate an html calendar then parse it, finding all and only those unique tags present.
The below structure is very similar to the above, using set comprehensions:
from bs4 import BeautifulSoup
import calendar
html_cal = calendar.HTMLCalendar().formatmonth(2020, 1)
set(tag.name for tag in BeautifulSoup(html_cal, 'html.parser').find_all())
# Result
# {'table', 'td', 'th', 'tr'}