Crawler configuration, start and stop
Spider
Spider is a reptile start entrance. Before starting the reptiles, we need to use a PageProcessor create a Spider object and then use the run () start.
While the other components of Spider (Downloader, Scheduler, Pipeline) can be set by a set method.
Crawler configuration Site
Site.me () may be made to some configurations crawler configuration, including coding, capture interval, timeout, retries and the like. Here we briefly set about: the number of retries is 3, capture interval of one second.
The site itself some configuration information, such as encoding, HTTP headers, timeout, retry strategies, agents, etc., can be configured by setting the Site object.