Scraping and parsing Google search results using Python
There is a twill lib for emulating browser. I used it when had a necessity to login with google email account. While it's a great tool with a great idea, it's pretty old and seems to have a lack of support nowadays (the latest version is released in 2007).
It might be useful if you want to retrieve results that require cookie-handling or authentication. Likely that twill
is one of the best choices for that purposes.
BTW, it's based on mechanize
.
As for parsing, you are right, BeautifulSoup
and Scrapy
are great. One of the cool things behind BeautifulSoup
is that it can handle invalid HTML (unlike Genshi, for example.)
You may find xgoogle useful... much of what you seem to be asking for is there...
Have a look at this awesome urllib wrapper for web scraping https://github.com/mattseh/python-web/blob/master/web.py