06_Scrapy crawler framework

0. Foreword:

  • In the process of downloading Scrapy, an error was reported. My pip had changed the source before. Today, I have no way to change the source for conda. I also figured out that a pip command is used to download the pip source, and the conda command is used to download and use is the Anaconda source. Finally, today is scrapy downloaded through conda install scrapy.
  • In fact, the purpose of changing the source is to make the download more convenient. If you don’t mind the trouble, you can write down a few commonly used sources (Tsinghua, Baidu), and then no matter when using pip or conda, specify the source and you can download it. up.
  • Pip command with specified source download (Tsinghua source as the source): pip install package_name -i https://pypi.tuna.tsinghua.edu.cn/simple
  • Conda command (Tsinghua source as source): conda install package_name pip -i https://pypi.tuna.tsinghua.edu.cn/simple
  • Note that the conda command is run in the Anaconda prompt, and pip can be run in both cmd and Anaconda prompt.

2. Scrapy framework introduction:

  • Note: The purpose of the framework is to liberate productivity and simplify repetitive work.
  • Scrapy is a crawler framework developed based on Python, which is used to crawl structured data from websites. The framework provides a lot of crawler-related basic components, with a clear structure and strong scalability. Based on Scrapy, we can complete it flexibly and efficiently. Various reptile needs.

3. Use Scrapy project initialization:

  • In the terminal window of pycharm, use the command: scrapy startproject project name to create a Scrapy crawler project. The project name naming rules follow the variable naming method. After this command is executed, a Scrapy framework will be automatically created in pycharm, provided that it is already in the corresponding Scrapy is installed in the environment.
    insert image description here

  • Create a python crawler file
    insert image description here

  • Learn about the parts in creating a project
    insert image description here

  • Test execution crawler: Note that with the framework, you need to use the command in the terminal to run the crawler: scrapy crawl crawler file name, execute. And every time the crawler is executed, it will be automatically closed.
    insert image description here


4. The usage process of the scrapy crawler framework:

  • Frames:
    insert image description here

  • Framework workflow:
    Step 1: The engine gets the initial crawling request from the crawler.
    Step 2: The engine schedules requests in the scheduler and requests to crawl the next request.
    Step 3: The scheduler returns the next request to the engine.
    Step 4: The engine sends a request to the downloader through the downloader middleware.

Guess you like

Origin blog.csdn.net/sz1125218970/article/details/131156947