selecting second child in beautiful soup with soup.select?
Beautiful Soup 4.7.0 (released at the beginning of 2019) now supports most selectors, including :nth-child
:
As of version 4.7.0, Beautiful Soup supports most CSS4 selectors via the SoupSieve project. If you installed Beautiful Soup through
pip
, SoupSieve was installed at the same time, so you don’t have to do anything extra.
So, if you upgrade your version:
pip install bs4 -U
You'll be able to use nearly all selectors you'd ever need to, including nth-child
.
That said, note that in your input HTML, the #names
h2
tag does not actually have any children:
<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>
Here, there are just 3 elements, which are all siblings, so
#names > p:nth-child(1)
wouldn't work, even in CSS or Javascript.
If the #names
element had the <p>
s as children, your selector would work, to an extent:
html = '''
<div id='names'>
<p>John</p>
<p>Peter</p>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
soup.select("#names > p:nth-child(1)")
Output:
[<p>John</p>]
Of course, the John
<p>
is the first child of the #names
parent. If you want Peter
, use :nth-child(2)
.
If the elements are all adjacent siblings, you can use +
to select the next sibling:
html = '''
<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>
'''
soup = BeautifulSoup(html, 'html.parser')
soup.select("#names + p + p")
Output:
[<p>Peter</p>]
'nth-of-child' is simply not implemented in beautifulsoup4 (at time of writing), there is simply no code in the beautifulsoup codebase to do it. The authors explicitly added the 'NotImplementedError' to explain this, here is the code
Given the html you quote in your question you are not looking for a child of h2#names.
What you are really looking for is the second adjacent sibling, I'm not a css selector guru but I found that this worked.
soup.select("#names + p + p")
Adding your edit as an answer so that it can be more easily found by others:
Use nth-of-type
instead of nth-child
:
soup.select("#names > p:nth-of-type(1)")