#scrapy实战# Crawling bidding website information (1)

First paste the background information of the project: The
Insert picture description hereabove table is the information that needs to be crawled. According to the extraction requirements, first analyze where the content to be extracted is distributed on the target website.
First open the target website. Here, I will take the Internet as an example. The keyword is: Router
Open this website: https://ss.ebnew.com/tradingSearch/index.htm What
you can see includes: information type, title, product category, bidding method, bidding deadline, bidding deadline
Insert picture description here
Open one of the project details and enter the secondary page:
you can see the project number (here is empty), the industry
Insert picture description herepage continues to slide down, you can see the project number, Insert picture description hereconfirm where the content you need is on the page, then click Next, confirm the target realization path. The scrapy framework is used here. The crawler field is relatively well-known. Confirm the realization of the framework. Then build a scrapy project step by step and
Insert picture description hereopen the terminal (you can operate in the terminal in pycharm):

Create a scrapy project:

scrapy startproject zhaobiao

D:\爬虫\pythonProject\实战>scrapy startproject ZHAOBIAO
New Scrapy project 'ZHAOBIAO', using template directory 'd:\python3.8.6\lib\site-packages\scrapy\templates\project', created in:
    D:\爬虫\pythonProject\实战\ZHAOBIAO

You can start your first spider with:
    cd ZHAOBIAO
    scrapy genspider example example.co

Enter the project

cd ZHAOBIAO

D:\爬虫\pythonProject\实战>cd ZHAOBIAO

D:\爬虫\pythonProject\实战\ZHAOBIAO>

Create a crawler file

scrapy genspider bilian"ebnew.com

D:\爬虫\pythonProject\实战\ZHAOBIAO>scrapy genspider bilian "ebnew.com"
Created spider 'bilian' using template 'basic' in module:
  ZHAOBIAO.spiders.bilian

D:\爬虫\pythonProject\实战\ZHAOBIAO>

The scrapy project has been successfully created, and the bilian crawler file is created.
Insert picture description hereNext, you need to set the content in scrapy,
mainly to set the request header and proxy IP.
Note: Whenever crawling is involved, the priority must be established in the mind. Set the request header and proxy IP,

So where to set up these contents, you need to clean up and master the scrapy framework, and then show you the schematic diagram of the scrapy framework:
Insert picture description herethe meaning of each component, you can first search for each concept and data flow diagram by yourself ,
one article All the content of the article will be too much, and the follow-up content will be in #scrapy实战# to crawl the bidding website information (2)

Guess you like

Origin blog.csdn.net/weixin_42961082/article/details/109922243