Get all HTML tags with Beautiful Soup

You don't have to specify any arguments to find_all() - in this case, BeautifulSoup would find you every tag in the tree, recursively.

Sample:

from bs4 import BeautifulSoup

html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>
"""
soup = BeautifulSoup(html, "html.parser")

print([tag.name for tag in soup.find_all()])
# ['div', 'div', 'div', 'p']

print([str(tag) for tag in soup.find_all()])
# ['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>']

Please try the below--

for tag in soup.findAll(True):
    print(tag.name)

If you want to find some specific HTML tags then try this:

html = driver.page_source
# driver.page_source: "<div>something</div>\n<div>something else</div>\n<div class='magical'>hi there</div>\n<p>ok</p>\n"
soup = BeautifulSoup(html)
for tag in soup.find_all(['a','div']):  # Mention HTML tag names here.
    print(tag.text)

# Result:
# something
# something else
# hi there

I thought I'd share my solution to a very similar question for those that find themselves here, later.

Example

I needed to find all tags quickly but only wanted unique values. I'll use the Python calendar module to demonstrate.

We'll generate an html calendar then parse it, finding all and only those unique tags present.

The below structure is very similar to the above, using set comprehensions:

from bs4 import BeautifulSoup
import calendar

html_cal = calendar.HTMLCalendar().formatmonth(2020, 1)
set(tag.name for tag in BeautifulSoup(html_cal, 'html.parser').find_all())

# Result
# {'table', 'td', 'th', 'tr'}

Get all HTML tags with Beautiful Soup

Example

Tags:

Python

Html

Beautifulsoup

Related

Recent Posts