scrapy - how to get text from 'div'

I can get you the title of the film, but I'm somewhat crappy with XPath so the description XPath will get you everything within the <div class="tabbertab" title="Synopsis"> element. It's not ideal, but it's a starting point. Getting the image URL is left as an exercise for the OP. :)

from scrapy.item import Field, Item


class DmozItem(Item):
    title = Field()
    description = Field()


class DmozSpider(BaseSpider):
    name = "test"
    allowed_domains = ["roxie.com"]
    start_urls = [
        "http://www.roxie.com/events/details.cfm?eventID=4921702B-9E3D-8678-50D614177545A594"
    ]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        item = DmozItem()
        item["title"] = hxs.select('//div[@style="width: 100%;"]/text()').extract()
        item["description"] = hxs.select('//div[@class="tabbertab"]').extract()
        return item

Just replace

item['name'] = hxs.select("text()").extract()

with

item['name'] = site.select("text()").extract()

Hope that helps.

scrapy - how to get text from 'div'

Tags:

Html

Text

Web Crawler

Scrapy

Related

Recent Posts