Python collects daily fund data to help you keep abreast of the latest fund trends

The implementation process of this case

1. Analysis of ideas

What data is needed? Where is the data needed?

Second, the code implementation

  1. send request
  2. retrieve data
  3. Analytical data
  4. Crawling multiple pages
  5. save data

Knowledge point

  • requests send request
  • Use of developer tools
  • json type data parsing
  • Use of regular expressions

development environment

  • Version: python 3.8
  • Editor: pycharm 2021.2

This goal

Students who have questions about this article, or want the source code, can also click here

Analyze website

Step 1: Open the developer tool, press F12, or right-click to check
Step 2: Refresh the website, click the search tool, enter the fund code in the search box, and click Search

Step 3: Find the real url where the data is located

start code

import module

import requests    
import re
import csv

send request

url = f'http://fund.eastmoney.com/data/rankhandler.aspx?op=ph&dt=kf&ft=all&rs=&gs=0&sc=6yzf&st=desc&sd=2020-12-16&ed=2021-12-16&qdii=&tabSubtype=,,,,,&pi=1&pn=50&dx=1'
headers = {
    
    
    'Cookie': 'HAList=a-sz-300059-%u4E1C%u65B9%u8D22%u5BCC; em_hq_fls=js; qgqp_b_id=7b7cfe791fce1724e930884be192c85e; _adsame_fullscreen_16928=1; st_si=59966688853664; st_asi=delete; st_pvi=79368259778985; st_sp=2021-12-07%2014%3A33%3A35; st_inirUrl=https%3A%2F%2Fwww.baidu.com%2Flink; st_sn=3; st_psi=20211216201351423-112200312936-0028256540; ASP.NET_SessionId=miyivgzxegpjaya5waosifrb',
    'Host': 'fund.eastmoney.com',
    'Referer': 'http://fund.eastmoney.com/data/fundranking.html',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
}
response = requests.get(url=url, headers=headers)

retrieve data

data = response.text

Parse DataFilter Data

 data_str = re.findall('\[(.*?)\]', data)[0]

Convert data type

tuple_data = eval(data_str)
for td in tuple_data:
    # 把td 变成列表
    td_list = td.split(',')

turn pages

Analyze the changing law of URLs with different page numbers

for page in range(1, 193):
    print(f'-------------------------正在爬取第{
      
      page}页内容-----------------------')
    url = f'http://fund.eastmoney.com/data/rankhandler.aspx?op=ph&dt=kf&ft=all&rs=&gs=0&sc=6yzf&st=desc&sd=2020-12-16&ed=2021-12-16&qdii=&tabSubtype=,,,,,&pi={
      
      page}&pn=50&dx=1'

save data

with open('基金.csv', mode='a', encoding='utf-8', newline='') as f:
    csv_write = csv.writer(f)
    csv_write.writerow(td_list)
print(td)

Run the code and get the data


Python crawls Tiantian Fund data and masters the wealth password (complete code, comments, explanation)

Guess you like

Origin blog.csdn.net/m0_48405781/article/details/122012280