scrapy - how to get text from 'div'
I can get you the title of the film, but I'm somewhat crappy with XPath so the description XPath will get you everything within the <div class="tabbertab" title="Synopsis">
element. It's not ideal, but it's a starting point. Getting the image URL is left as an exercise for the OP. :)
from scrapy.item import Field, Item
class DmozItem(Item):
title = Field()
description = Field()
class DmozSpider(BaseSpider):
name = "test"
allowed_domains = ["roxie.com"]
start_urls = [
"http://www.roxie.com/events/details.cfm?eventID=4921702B-9E3D-8678-50D614177545A594"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
item = DmozItem()
item["title"] = hxs.select('//div[@style="width: 100%;"]/text()').extract()
item["description"] = hxs.select('//div[@class="tabbertab"]').extract()
return item
Just replace
item['name'] = hxs.select("text()").extract()
with
item['name'] = site.select("text()").extract()
Hope that helps.