Scraping and parsing Google search results using Python

There is a twill lib for emulating browser. I used it when had a necessity to login with google email account. While it's a great tool with a great idea, it's pretty old and seems to have a lack of support nowadays (the latest version is released in 2007). It might be useful if you want to retrieve results that require cookie-handling or authentication. Likely that twill is one of the best choices for that purposes. BTW, it's based on mechanize.

As for parsing, you are right, BeautifulSoup and Scrapy are great. One of the cool things behind BeautifulSoup is that it can handle invalid HTML (unlike Genshi, for example.)


You may find xgoogle useful... much of what you seem to be asking for is there...


Have a look at this awesome urllib wrapper for web scraping https://github.com/mattseh/python-web/blob/master/web.py