Take you to learn the very popular open source crawler framework Scrapy

 

Scrapy installation

Official website https://scrapy.org/

Installation method

Under any operating system, you can use pip to install Scrapy, for example:

$ pip install scrapy

To confirm that Scrapy has been installed successfully, first test whether the Scrapy module can be imported in Python:

>>> import scrapy  
>>> scrapy.version_info
(1, 8, 0)

Python crawler, data analysis, website development and other case tutorial videos are free to watch online

https://space.bilibili.com/523606542

 Python learning exchange group: 1039645993

Then, test whether the Scrapy command can be executed in the shell:

(base) λ scrapy 
Scrapy 1.8.0 - no active project 
Usage: 
  scrapy <command> [options] [args] 

Available commands: 
  bench Run quick benchmark test
  fetch Fetch a URL using the Scrapy downloader 
  genspider Generate new spider using pre-defined templates 
  runspider Run a self-contained spider (without creating a project) 
  settings Get settings values 
  shell Interactive scraping console 
  startproject Create new project version 
  Print Scrapy version 
  view Open URL in browser, as seen by Scrapy 

  [ more ] More commands available when run from project directory 

Use "scrapy <command> -h" to see more info about a command

Passed the above two tests, indicating that Scrapy was installed successfully. As shown above, we have installed the latest version 1.8.0

note:

  • In the process of installing Scrapy, you may encounter errors such as missing VC++, you can install offline packages with missing modules
  • After successful installation, running scrapy under CMD shows that the above figure is not really successful. Check whether the scrapy bench test is really successful. If there is no error, it means the installation is successful.

Specific Scrapy installation process reference: http://doc.scrapy.org/en/latest/intro/install.html##intro-install-platform-notes There are installation methods for each platform

Global command

$ scrapy 
Scrapy 1.7.3 - no active project 
Usage: 
  scrapy <command> [options] [args] 

Available commands: 
  bench Run quick benchmark test 
        ## 测试电脑性能。
  fetch Fetch a URL using the Scrapy downloader 
        ## 将源代码下载下来并显示出来
  genspider Generate new spider using pre-defined templates 
        ## 创建一个新的 spider 文件 
  runspider Run a self-contained spider (without creating a project) 
        ## 这个和通过crawl启动爬虫不同,scrapy runspider 爬虫文件名称 
  settings Get settings values 
        ## 获取当前的配置信息 
  shell Interactive scraping console 
        ## 进入 scrapy 的交互模式 
  startproject Create new project 
        ## 创建爬虫项目。 
  version Print Scrapy version 
  view Open URL in browser, as seen by Scrapy 
        ## 将网页document内容下载下来,并且在浏览器显示出来 

  [ more ] More commands available when run from project directory 

Use "scrapy <command> -h" to see more info about a command

Project command

  • scrapy startproject projectname
    creates a project
  • scrapy genspider spidername domain to
    create a crawler. After creating a crawler project, you also need to create a crawler.
  • scrapy crawl spidername
    runs crawlers. Note the directory where the command is run.

Guess you like

Origin blog.csdn.net/m0_48405781/article/details/114371231