beautifulsoup get data inner value of a paragraph tag based on id code example

Example 1: find element in beautifulsoup by partial attribute value

# Use regex

# suppose I have list of 'div' with 'class' as follows:
# <div class='abcd bcde cdef efgh'>some content</div>
# <div class='mnop bcde cdef efgh'>some content</div>
# <div class='abcd pqrs cdef efgh'>some content</div>
# <div class='hijk wxyz cdef efgh'>some content</div>

# as observable the class value string of above div(s) ends with 'cdef efgh'
# So to extract all these in a single list:

from bs4 import BeautifulSoup
import re # library for regex in python
soup = BeautifulSoup(<your_html_response>, <parser_you_want_to_use>)
elements = soup.find_all('div', {'class': re.compile(r'cdef efgh$')}) # $ means that 'cdef efgh' must is the ending of the string

# Note: This was just one case. You can make almost any case with regex.
# Learn more and experiment with regex at https://regex101.com/

Example 2: scrape text from specific p tag

from bs4 import BeautifulSoup
import urllib

url = urllib.urlopen('http://meinparlament.diepresse.com/')
content = url.read()
soup = BeautifulSoup(content, 'lxml')

table = soup.findAll('div',attrs={"class":"content-question"})
for x in table:
    print x.find('p').text

# Another way to retrieve tables:
# table = soup.select('div[class="content-question"]')

Tags:

Misc Example