web scraping to fill out (and retrieve) search forms?
Beautiful Soup is great for parsing webpages- that's half of what you want to do. Python, Perl, and Ruby all have a version of Mechanize, and that's the other half:
http://wwwsearch.sourceforge.net/mechanize/
Mechanize let's you control a browser:
# Follow a link
browser.follow_link(link_node)
# Submit a form
browser.select_form(name="search")
browser["authors"] = ["author #1", "author #2"]
browser["volume"] = "any"
search_response = br.submit()
With Mechanize and Beautiful Soup you have a great start. One extra tool I'd consider is Firebug, as used in this quick ruby scraping guide:
http://www.igvita.com/2007/02/04/ruby-screen-scraper-in-60-seconds/
Firebug can speed your construction of xpaths for parsing documents, saving you some serious time.
Good luck!
Python Code: for search forms.
# import
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()
# go to the google home page
driver.get("http://www.google.com")
# the page is ajaxy so the title is originally this:
print driver.title
# find the element that's name attribute is q (the google search box)
inputElement = driver.find_element_by_name("q")
# type in the search
inputElement.send_keys("cheese!")
# submit the form (although google automatically searches now without submitting)
inputElement.submit()
try:
# we have to wait for the page to refresh, the last thing that seems to be updated is the title
WebDriverWait(driver, 10).until(EC.title_contains("cheese!"))
# You should see "cheese! - Google Search"
print driver.title
finally:
driver.quit()
Source: https://www.seleniumhq.org/docs/03_webdriver.jsp