First paste the background information of the project: The
above table is the information that needs to be crawled. According to the extraction requirements, first analyze where the content to be extracted is distributed on the target website.
First open the target website. Here, I will take the Internet as an example. The keyword is: Router
Open this website: https://ss.ebnew.com/tradingSearch/index.htm What
you can see includes: information type, title, product category, bidding method, bidding deadline, bidding deadline
Open one of the project details and enter the secondary page:
you can see the project number (here is empty), the industry
page continues to slide down, you can see the project number, confirm where the content you need is on the page, then click Next, confirm the target realization path. The scrapy framework is used here. The crawler field is relatively well-known. Confirm the realization of the framework. Then build a scrapy project step by step and
open the terminal (you can operate in the terminal in pycharm):
Create a scrapy project:
scrapy startproject zhaobiao
D:\爬虫\pythonProject\实战>scrapy startproject ZHAOBIAO
New Scrapy project 'ZHAOBIAO', using template directory 'd:\python3.8.6\lib\site-packages\scrapy\templates\project', created in:
D:\爬虫\pythonProject\实战\ZHAOBIAO
You can start your first spider with:
cd ZHAOBIAO
scrapy genspider example example.co
Enter the project
cd ZHAOBIAO
D:\爬虫\pythonProject\实战>cd ZHAOBIAO
D:\爬虫\pythonProject\实战\ZHAOBIAO>
Create a crawler file
scrapy genspider bilian"ebnew.com
D:\爬虫\pythonProject\实战\ZHAOBIAO>scrapy genspider bilian "ebnew.com"
Created spider 'bilian' using template 'basic' in module:
ZHAOBIAO.spiders.bilian
D:\爬虫\pythonProject\实战\ZHAOBIAO>
The scrapy project has been successfully created, and the bilian crawler file is created.
Next, you need to set the content in scrapy,
mainly to set the request header and proxy IP.
Note: Whenever crawling is involved, the priority must be established in the mind. Set the request header and proxy IP,
So where to set up these contents, you need to clean up and master the scrapy framework, and then show you the schematic diagram of the scrapy framework:
the meaning of each component, you can first search for each concept and data flow diagram by yourself ,
one article All the content of the article will be too much, and the follow-up content will be in #scrapy实战# to crawl the bidding website information (2)