Stock information targeted crawling
Function description:
The goal is to obtain the names and trading information of all stocks on the Shanghai and Shenzhen stock exchanges
The output is saved in a file
Technical route requests bs4 re
Candidate crawl sites:
Sina Stock http://finance.sina.com.cn/stock/
Baidu stock https://gupiao.baidu.com/stock/
Website selection:
In principle, stock information exists statically in HTML pages, non-js code is generated, and there is no robots protocol restrictions
Method browser F12, source code viewing, etc.
Don't be entangled in a certain website, try to find more information sources
(In the video, the teacher said that there is no individual stock information in the html file of Sina stock, but now there is, I will use Baidu stock first)
Since the Baidu stock page does not contain all stocks, we first obtain all stock names from Oriental Fortune.com.
Program structure design:
1. Get the stock list from Oriental Fortune.com
2. Go to Baidu stock one by one to get individual stock information according to the stock list
3. Save the result to a file