Beautiful soup getting the first child
div.children returns an iterator.
for div in nsoup.find_all(class_='cities'):
for childdiv in div.find_all('div'):
print (childdiv.string) #london, york
AttributeError was raised, because of non-tags like '\n'
are in .children
. just use proper child selector to find the specific div.
(more edit) can't reproduce your exceptions - here's what I've done:
In [137]: print foo.prettify()
<div class="cities">
<div id="3232">
London
</div>
<div id="131">
York
</div>
</div>
In [138]: for div in foo.find_all(class_ = 'cities'):
.....: for childdiv in div.find_all('div'):
.....: print childdiv.string
.....:
London
York
In [139]: for div in foo.find_all(class_ = 'cities'):
.....: for childdiv in div.find_all('div'):
.....: print childdiv.string, childdiv['id']
.....:
London 3232
York 131
With modern versions of bs4 (certainly bs4 4.7.1+) you have access to :first-child css pseudo selector. Nice and descriptive. Use soup.select_one
if you only want to return the first match i.e. soup.select_one('.cities div:first-child').text
. It is considered good practice to test is not None
before using .text
accessor.
from bs4 import BeautifulSoup as bs
html = '''
<div class="cities">
<div id="3232"> London </div>
<div id="131"> York </div>
</div>
'''
soup = bs(html, 'lxml') #or 'html.parser'
first_children = [i.text for i in soup.select('.cities div:first-child')]
print(first_children)