How run a scrapy spider programmatically like a simple script?
You can run spider directly in python script without using project.
You have to use scrapy.crawler.CrawlerProcess
or scrapy.crawler.CrawlerRunner
but I'm not sure if it has all functionality as in project.
See more in documentation: Common Practices
Or you can put your command in bash script on Linux or in .bat
file on Windows.
BTW: on Linux you can add shebang in first line (#!/bin/bash
) and set attribute "executable" -
ie. chmod +x your_script
- and it will run as normal program.
Working example
#!/usr/bin/env python3
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
allowed_domains = ['http://quotes.toqoute.com']
#start_urls = []
#def start_requests(self):
# for tag in self.tags:
# for page in range(self.pages):
# url = self.url_template.format(tag, page)
# yield scrapy.Request(url)
def parse(self, response):
print('url:', response.url)
# --- it runs without project and saves in `output.csv` ---
from scrapy.crawler import CrawlerProcess
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
'FEED_FORMAT': 'csv',
'FEED_URI': 'output.csv',
})
c.crawl(MySpider)
c.start()