Crawler 10 - Targeted Crawl of Stock Information

Stock information targeted crawling

Function description:

The goal is to obtain the names and trading information of all stocks on the Shanghai and Shenzhen stock exchanges

The output is saved in a file

Technical route requests bs4 re

Candidate crawl sites:

Sina Stock http://finance.sina.com.cn/stock/

Baidu stock https://gupiao.baidu.com/stock/

Website selection:

In principle, stock information exists statically in HTML pages, non-js code is generated, and there is no robots protocol restrictions

Method browser F12, source code viewing, etc.

Don't be entangled in a certain website, try to find more information sources

(In the video, the teacher said that there is no individual stock information in the html file of Sina stock, but now there is, I will use Baidu stock first)

Since the Baidu stock page does not contain all stocks, we first obtain all stock names from Oriental Fortune.com.

Program structure design:

1. Get the stock list from Oriental Fortune.com

2. Go to Baidu stock one by one to get individual stock information according to the stock list

3. Save the result to a file

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325111135&siteId=291194637