Beautiful Soup and extracting a div and its contents by ID
Beautiful Soup 4 supports most CSS selectors with the .select()
method, therefore you can use an id
selector such as:
soup.select('#articlebody')
If you need to specify the element's type, you can add a type selector before the id
selector:
soup.select('div#articlebody')
The .select()
method will return a collection of elements, which means that it would return the same results as the following .find_all()
method example:
soup.find_all('div', id="articlebody")
# or
soup.find_all(id="articlebody")
If you only want to select a single element, then you could just use the .find()
method:
soup.find('div', id="articlebody")
# or
soup.find(id="articlebody")
To find an element by its id
:
div = soup.find(id="articlebody")
You should post your example document, because the code works fine:
>>> import BeautifulSoup
>>> soup = BeautifulSoup.BeautifulSoup('<html><body><div id="articlebody"> ... </div></body></html')
>>> soup.find("div", {"id": "articlebody"})
<div id="articlebody"> ... </div>
Finding <div>
s inside <div>
s works as well:
>>> soup = BeautifulSoup.BeautifulSoup('<html><body><div><div id="articlebody"> ... </div></div></body></html')
>>> soup.find("div", {"id": "articlebody"})
<div id="articlebody"> ... </div>