Python Scrapy: What is the difference between "runspider" and "crawl" commands?
In the command:
scrapy crawl [options] <spider>
<spider>
is the project name (defined in settings.py, as BOT_NAME
).
And in the command:
scrapy runspider [options] <spider_file>
<spider_file>
is the path to the file that contains the spider.
Otherwise, the options are the same:
Options
=======
--help, -h show this help message and exit
-a NAME=VALUE set spider argument (may be repeated)
--output=FILE, -o FILE dump scraped items into FILE (use - for stdout)
--output-format=FORMAT, -t FORMAT
format to use for dumping items with -o
Global Options
--------------
--logfile=FILE log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
log level (default: DEBUG)
--nolog disable logging completely
--profile=FILE write python cProfile stats to FILE
--lsprof=FILE write lsprof profiling stats to FILE
--pidfile=FILE write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
set/override setting (may be repeated)
--pdb enable pdb on failure
Since runspider
doesn't depend on the BOT_NAME
parameter, depending on the way you are customising your scrapers, you might find runspider
more flexible.
The little explanation and syntax of both:
runspider
Syntax: scrapy runspider <spider_file.py>
Requires project: no
Run a spider self-contained in a Python file, without having to create a project.
Example usage:
$ scrapy runspider myspider.py
crawl
Syntax: scrapy crawl <spider>
Requires project: yes
Start crawling using a spider with the corresponding name.
Usage examples:
$ scrapy crawl myspider
The main difference is that runspider
does not need a project. That is, you can write a spider in a myspider.py
file and call scrapy runspider myspider.py
.
The crawl
command requires a project in order to find the project's settings, load available spiders from SPIDER_MODULES
settings, and lookup the spider by name
.
If you need quick spider for a short task, then runspider
has less boilerplate required.