Beautiful soup getting the first child

div.children returns an iterator.

for div in nsoup.find_all(class_='cities'):
    for childdiv in div.find_all('div'):
        print (childdiv.string) #london, york

AttributeError was raised, because of non-tags like '\n' are in .children. just use proper child selector to find the specific div.

(more edit) can't reproduce your exceptions - here's what I've done:

In [137]: print foo.prettify()
<div class="cities">
 <div id="3232">
  London
 </div>
 <div id="131">
  York
 </div>
</div>

In [138]: for div in foo.find_all(class_ = 'cities'):
   .....:     for childdiv in div.find_all('div'):
   .....:         print childdiv.string
   .....: 
 London 
 York 

In [139]: for div in foo.find_all(class_ = 'cities'):
   .....:     for childdiv in div.find_all('div'):
   .....:         print childdiv.string, childdiv['id']
   .....: 
 London  3232
 York  131

With modern versions of bs4 (certainly bs4 4.7.1+) you have access to :first-child css pseudo selector. Nice and descriptive. Use soup.select_one if you only want to return the first match i.e. soup.select_one('.cities div:first-child').text. It is considered good practice to test is not None before using .text accessor.

from bs4 import BeautifulSoup as bs

html = '''
<div class="cities"> 
       <div id="3232"> London </div>
       <div id="131"> York </div>
  </div>
  '''
soup = bs(html, 'lxml') #or 'html.parser'
first_children = [i.text for i in soup.select('.cities div:first-child')]
print(first_children)

Beautiful soup getting the first child

Tags:

Python

Beautifulsoup

Related

Recent Posts