Crawler web crawler frame -Scrapy explain -Scrapy installation instructions -Scrapy

Scrapy frame mounting

1, first, the terminal upgrade Run PIP: the install Python -m --upgrade PIP PIP
2, installation, Wheel (recommended network installation) the install Wheel PIP
. 3, installation, lxml (recommended download and install)
4, installed, the Twisted (recommended downloads installation)
5, the installation, Scrapy (recommended for network installation) pip install Scrapy

Scrapy test whether the installation is successful

Crawler web crawler frame -Scrapy explain -Scrapy installation instructions -Scrapy

Framework Directive Scrapy

scrapy -h view help

Commands the Available:
  Bench the Run Quick Benchmark the Test (scrapy Bench hardware test instructions, you can test the current server can climb up to the number of pages per minute)
  FETCH Fetch A at The Scrapy the URL of a using Downloader (scrapy FETCHhttp://www.iqiyi.com/ obtain a html page source)
  genspider the using the Generate new new pre-defined Templates Spider ()
  runspider the Run A Self-Contained Spider (the without Creating Project A) ()
  Settings Settings values the Get ()
  the shell Interactive Console Scraping ()
  startproject the create new new project (cd into the directory you want to create the project, scrapy startproject project name, creation scrapy project)
  Version Print Version Scrapy ()
  View the URL of Open in Browser, AS Seen by Scrapy ()

Create a project and project description

scrapy startproject adc create project

project instruction

Directory structure is as follows:

├── firstCrawler

│ ├── init.py

│ ├── items.py

│ ├── middlewares.py

│ ├── pipelines.py

│ ├── settings.py

│ └── spiders

│ └── init.py

└── scrapy.cfg

  • scrapy.cfg: The project configuration files
  • tems.py: Item in the project file for defining attributes or fields corresponding to the analysis target.
  • pipelines.py: Responsible for processing is extracted from the spider item. A typical cleaning process has verified and persistence (such as access to a database)
  • settings.py: Set the file for the project.
  • spiders: implement custom crawlers directory
  • middlewares.py:Spider middleware specific hook (specific hook) between the engine and Spider, the input (response) spider and the output process (items and requests). Which provides a convenient mechanism to extend Scrapy function by inserting custom code.

Crawler web crawler frame -Scrapy explain -Scrapy installation instructions -Scrapy

Project instruction

Project instruction is required to enter the command cd directory project execution

scrapy -h command help project

Available commands:
  bench      Run quick benchmark test
  check      Check spider contracts
  crawl     Run a spider
  edit      Edit spider
  fetch      Fetch a URL using the Scrapy downloader
  genspider   Generate new spider using pre-defined templates
  list        List available spiders
  parse      Parse URL (using its spider) and print the results
  runspider    Run a self-contained spider (without creating a project)
  settings    Get settings values
  shell      Interactive scraping console
  startproject   Create new project
  version   Print Scrapy version (scrapy version 查看scrapy版本信息)
  view     Open URL in browser, as seen by Scrapy (scrapy view http://www.zhimaruanjian.com/ download and open a web page)

Create a file reptiles

Create a file is to create a reptile reptiles file based scrapy master

scrapy genspider -l see scrapy create a crawler master file available

Available templates: Master explain
  basic foundation to create a crawler file

  crawl crawler automatically create a file
  csvfeed create crawling reptiles csv file data

  xmlfeed create crawling reptiles xml data file

Creating a basic master reptiles, other empathy

The domain name scrapy genspider -t master reptiles file name to create a basis for crawling reptiles master, the other the same way
as: scrapy genspider -t basic pach baidu.com

Crawler web crawler frame -Scrapy explain -Scrapy installation instructions -Scrapy

scrapy check the file name to test whether a reptile reptilian document compliance
such as: scrapy check pach

Crawler web crawler frame -Scrapy explain -Scrapy installation instructions -Scrapy

If you are still confused in the programming world, you can join us to learn Python buckle qun: 784758214, look at how seniors are learning. Exchange of experience. From basic web development python script to, reptiles, django, data mining and other projects to combat zero-based data are finishing. Given to every little python partner! Share some learning methods and need to pay attention to small details, click on Join us python learner gathering

scrapy crawl reptile reptiles name of the execution file, display the log [focus]

scrapy crawl reptile reptiles file name --nolog execution, does not display the log [focus]

Guess you like

Origin blog.51cto.com/14510224/2434869