How do you get the text from an HTML 'datacell' using BeautifulSoup

The BeautifulSoup documentation should cover everything you need - in this case it looks like you want to use findNext:

headerRows[0][10].findNext('b').string

A more generic solution which doesn't rely on the <b> tag would be to use the text argument to findAll, which allows you to search only for NavigableString objects:

>>> s = BeautifulSoup(u'<p>Test 1 <span>More</span> Test 2</p>')
>>> u''.join([s.string for s in s.findAll(text=True)])
u'Test 1 More Test 2'

headerRows[0][10].contents[0].find('b').string

How do you get the text from an HTML 'datacell' using BeautifulSoup

Tags:

Python

Html

Parsing

Beautifulsoup

Related

Recent Posts