scrapy framework

Scrapy is an application framework written to crawl website data and extract structural data.

When installing scrapy, I used pip to install it, and it keeps reporting an error twisted, so downloading Twisted.whl and then pip install Twisted also failed. You must go to the location of the file, and then pip install Twisted-18.4.0-cp36-cp36m-win32.whl success

The Scrapy running process is roughly as follows:

  1. The engine takes a link (URL) from the scheduler for the next crawl
  2. The engine encapsulates the URL as a request and sends it to the downloader
  3. The downloader downloads the resource and encapsulates it into a response package (Response)
  4. The crawler parses the Response
  5. Parse out the entity (Item), then hand it over to the entity pipeline for further processing
  6. If the parsed is a link (URL), the URL is handed over to the scheduler to wait for crawling

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325715237&siteId=291194637