Crawler configuration, start and stop

Crawler configuration, start and stop


Spider

Spider is a reptile start entrance. Before starting the reptiles, we need to use a PageProcessor create a Spider object and then use the run () start.

While the other components of Spider (Downloader, Scheduler, Pipeline) can be set by a set method.



Crawler configuration Site

Site.me () may be made to some configurations crawler configuration, including coding, capture interval, timeout, retries and the like. Here we briefly set about: the number of retries is 3, capture interval of one second.

The site itself some configuration information, such as encoding, HTTP headers, timeout, retry strategies, agents, etc., can be configured by setting the Site object.

​​​​​​​

Published 434 original articles · won praise 105 · views 70000 +

Guess you like

Origin blog.csdn.net/qq_39368007/article/details/105047471