How to grab all headers from a website using BeautifulSoup?

If you do not wish to use regex then you might wanna do something like:

from bs4 import BeautifulSoup
import requests

url = "http://nypost.com/business"

page = BeautifulSoup(requests.get(url).text, "lxml")
for headlines in page.find_all("h3"):
    print(headlines.text.strip())

Results:

The epitome of chic fashion is the latest victim of retail's collapse
Rent-a-Center shares soar after rejecting takeover bid
NFL ad revenue may go limp with loss of erectile-dysfunction ads
'Pharma Bro' talked about sex with men to get my money, investor says

And So On------

Filter by regular expression:

soup.find_all(re.compile('^h[1-6]$'))

This regex finds all tags that start with h, have a digit after the h, and then end after the digit.

How to grab all headers from a website using BeautifulSoup?

Tags:

Python

Web Scraping

Beautifulsoup

Python Requests

Related

Recent Posts