How run a scrapy spider programmatically like a simple script?

You can run spider directly in python script without using project.

You have to use scrapy.crawler.CrawlerProcess or scrapy.crawler.CrawlerRunner
but I'm not sure if it has all functionality as in project.

See more in documentation: Common Practices

Or you can put your command in bash script on Linux or in .bat file on Windows.

BTW: on Linux you can add shebang in first line (#!/bin/bash) and set attribute "executable" -
ie. chmod +x your_script - and it will run as normal program.

Working example

#!/usr/bin/env python3

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'

    allowed_domains = ['http://quotes.toqoute.com']

    #start_urls = []

    #def start_requests(self):
    #    for tag in self.tags:
    #        for page in range(self.pages):
    #            url = self.url_template.format(tag, page)
    #            yield scrapy.Request(url)

    def parse(self, response):
        print('url:', response.url)

# --- it runs without project and saves in `output.csv` ---

from scrapy.crawler import CrawlerProcess

c = CrawlerProcess({
    'USER_AGENT': 'Mozilla/5.0',
    'FEED_FORMAT': 'csv',
    'FEED_URI': 'output.csv',
})
c.crawl(MySpider)
c.start()

How run a scrapy spider programmatically like a simple script?

Tags:

Python

Web Scraping

Scrapy

Related

Recent Posts