Python crawler case: simply obtain stock, index, and three major report data

Python crawler case: simply obtain stock, index, and three major report data

introduce

  • This case is easy to operate and the code is short. You can use it even if you have not learned Python. You only need to simply change the original stock/index code and the path to save the data file.
  • The purpose and significance of crawling stock, index and three major statement data is to obtain real-time and historical data of financial markets and listed companies for further analysis, research and decision-making. By crawling stock data, we can track changes and fluctuations in stock prices, identify potential investment opportunities and risks, and conduct effective stock trading. Crawling index data can help us understand the trends and performance of the entire market, monitor overall market risks and trends, and provide important reference for asset allocation and investment decisions.
  • For the three major report data of listed companies (i.e. balance sheet, income statement and cash flow statement), crawling these data can provide an in-depth understanding of the company's financial status, operating performance and cash flow. By analyzing these data, investors and researchers can evaluate a company's profitability, solvency and growth potential, and assist investment decisions and risk management.

(For more information and code source files, please follow gzh'finance melatonin' and click the automatic reply to get it)

1. Preparation

  • Install Anaconda:

    1. First, go to the Anaconda official website (https://www.anaconda.com/products/individual) to download the Anaconda installation program suitable for your operating system.

    2. Run the downloaded Anaconda installer. Install according to the instructions of the installation wizard. During the installation process, you can choose whether to add Anaconda to the system's environment variables.

    3. After the installation is complete, open the terminal (Anaconda Prompt or command prompt for Windows users) and enter the following command to verify whether Anaconda is installed successfully:

      conda --version
      
  • Start Jupyter Notebook:

    1. In the terminal, enter the following command to start Jupyter Notebook:

      jupyter notebook
      
    2. The Jupyter Notebook server will open a new page in the default browser. You will do all your Jupyter Notebook interaction on this page.

    3. In the upper right corner of the page, you can click the "New" button and select "Python 3" to create a new Python 3 Notebook.

  • Install crawler-related libraries

    Enter the following command in the terminal (Anaconda Prompt or command prompt for Windows users) to install the relevant library name.

    pip install 库名
    

2. Use different stocks and indices as rows to crawl data on different dates

Call the tushare library to obtain data and save it locally

Select the stock codes as '002549', '600008', '300332', '300055', '600292' to make the call. The data in the generated DataFrame format is as follows. The horizontal axis is different stocks and the vertical axis is the time series (2020 -11-01 to 2021-11-01)

1689908810529.png
    # 导入需要的包
    import pandas as pd
    import tushare as ts
    import numpy as np
    import matplotlib.pyplot as plt #绘图
    
    symbols = [ '002549','600008','300332','300055','600292']#里面为股票、指数代码
    noa = len(symbols)
    indexes = pd.date_range('2020-11-01', '2021-11-01')
    data = pd.DataFrame(index=indexes)
    for sym in symbols:
        k_d = ts.get_k_data(sym, '2019-01-01', ktype='D')
        k_d['date'] = k_d['date'].astype('datetime64[ns]')
        k_d.set_index('date', inplace=True)
        data[sym] = k_d['close'] //这里选择每只股票的收盘价组成data
    data = data.dropna()
    # 简单查看一下股票数据:
    data.head()
    # 保存数据
    data.to_csv('价格数据.csv')

baostock library obtains detailed information of a single stock

Use the baostock library to get detailed information on a single stock. For example, open (opening price), high (highest price of the day), low (lowest price of the day), preclose (closing price of the previous day), pctChg (increase or decrease), etc.

The format of the generated data is as follows, with time series as the index and different data elements as the horizontal axis.

1689911107120.png
  import baostock as bs
     import pandas as pd
     import numpy as np
     import matplotlib.pyplot as plt
     from datetime import datetime, date

      # 登陆系统
     lg = bs.login()
     code = 'sh.000300' 
     start = '2022-01-01'
     end = '2023-07-01'
     
     # 获取指数基金指数历史数据
     # 沪深300指数
     hs300_price = bs.query_history_k_data_plus(code, "date,code,open,high,low,close,preclose,pctChg",
                   start_date=start, end_date=end, frequency="d")
     # 整合为DataFrame格式
     data_list = []
     while (hs300_price.error_code == '0') & hs300_price.next():
         data_list.append(hs300_price.get_row_data())
     hs300 = pd.DataFrame(data_list, columns=hs300_price.fields)
     # 保存数据
     hs300.to_csv('沪深300.csv')

3. Crawl the three major report data

Use the akshare library to get a list of listed companies

The generated format is as shown below:

image-20230721113206313.png

import pandas as pd
import numpy as np
import datetime
from matplotlib import pyplot as plt
import akshare as ak

#获取A股全部股票数据 存储到stock_basic.csv
stock_zh=ak.stock_zh_a_spot()
stock_zh.to_csv("stock_basic.csv") 

Crawl company financial statement data

  • Get company stock data
stock_zh[stock_zh["名称"]=="复星医药"]
stock_daily = ak.stock_zh_a_hist(symbol="600196", period="daily", start_date="20220629", end_date='20230629', adjust="qfq")
close_price=stock_daily[["日期","收盘","最高","最低"]]
close_price.set_index("日期",inplace=True)
# 保存数据
close_price.to_csv('600196.csv')

(For more information and code source files, please follow gzh'finance melatonin' and click the automatic reply to get it)

  • Obtain the company's three major report data
  1. cash flow statement

1689911701239.png

  1. income statement

1689912167991.png

  1. balance sheet

1689912167991.png

#获取现金流量表
stock_financial_report_sina_df = ak.stock_financial_report_sina(stock="600196", symbol="现金流量表")

geli_sheet1=stock_financial_report_sina_df[stock_financial_report_sina_df["报表日期"]=="20221231"]
# 保存数据
stock_financial_report_sina_df.to_excel('600196现金流量表.xlsx')


#获取利润表
stock_financial_report_sina_lrb = ak.stock_financial_report_sina(stock="600196", symbol="利润表")

geli_sheet2=stock_financial_report_sina_lrb[stock_financial_report_sina_lrb["报表日期"]=="20221231"]
# 保存数据
stock_financial_report_sina_lrb.to_excel('600196利润表.xlsx')


#获取资产负债表
stock_financial_report_sina_lrb = ak.stock_financial_report_sina(stock="600196", symbol="资产负债表")

geli_sheet3=stock_financial_report_sina_lrb[stock_financial_report_sina_lrb["报表日期"]=="20221231"]
# 保存数据
stock_financial_report_sina_lrb.to_excel('600196资产负债表.xlsx')

(For more information and code source files, please follow gzh'finance melatonin' and click the automatic reply to get it)

4. Summary

  • This article mainly introduces the method and process of using Python financial database to crawl stock, index and three major report data.
  • If you want to learn more about financial data visualization and analysis in the future, you are welcome to follow Princess "finance melatonin" to get more content related to financial analysis.

Guess you like

Origin blog.csdn.net/celiaweiwei/article/details/131863819