Tushare (financial data interface package)

A, Tushare Introduction

  Tushare is a free, open-source python Finance data interface package. The main achievement of stocks and other financial data from data acquisition, data storage to the cleaning process can provide data fast, clean, diverse facilitate the analysis of financial analysts, that the data acquisition aspect greatly reduce the workload, more focused on strategy Research and implementation of the model and.

  Tushare, Tencent Finance, the Shanghai Stock Exchange, the Shenzhen Stock Exchange to retrieve data from Reuters.

  Tushare official address

1, Tushare use and installation

(1) Use the premise

  Installation Python, pandas installation packages, the installation packages lxml, BS4 package, the package Requests. Environment as follows:

  

(2) to download and install

# Method One: 
PIP install tushare 

# Method Two: 
Access https://pypi.python.org/pypi/Tushare

(3) upgrade

# Updated version of 
PIP install tushare - Update 

# check the version information 
Print (. Tushare __version__ )

Second, the historical market (get_hist_data / get_h_data) Interface

  get_hist_data Interface : obtaining historical transaction data stocks (including average data) acquired at k can be set by the parameter, k circumferential lines, on line k, and 5 minutes, 15 minutes, 30 minutes and 60 minutes k line data. This interface can only get the last 3 years daily data, with the average for stock selection and data analysis.

            start taking API is empty earliest date provided data, end to empty rounded to the nearest trading day data.

import tushare as ts

print(ts.get_hist_data("601318"))

"""
             open   high  close    low  ...    ma20      v_ma5     v_ma10     v_ma20
date                                    ...                                         
2020-01-23  84.01  84.56  83.49  82.48  ...  85.610  807119.55  711352.85  634487.17
2020-01-22  85.00  85.48  85.22  83.83  ...  85.632  690831.24  663221.99  596297.65
2020-01-21  87.00  87.29  85.60  85.60  ...  85.594  648759.18  652369.61  574079.03
2020-01-20  88.30  88.70  87.60  87.35  ...  85.528  646579.65  624830.50  560559.15
2020-01-17  86.15  86.90  86.25  85.85  ...  85.425  621487.27  594870.25  531672.07
...           ...    ...    ...    ...  ...     ...        ...        ...        ...
2017-07-31  51.88  52.64  52.02  51.41  ...  52.094  587775.69  587775.69  587775.69
2017-07-28  52.20  52.46  51.89  51.80  ...  52.113  580718.35  580718.35  580718.35
2017-07-27  51.85  52.74  52.36  51.09  ...  52.187  610526.22  610526.22  610526.22
2017-07-26  52.10  52.50  51.89  51.28  ...  52.100  582222.86  582222.86  582222.86
2017-07-25  52.62  53.05  52.31  52.18  ...  52.310  506834.84  506834.84  506834.84
"""

  get_h_date Interface : Get stocks all the historical data, the other above. (expired)

           start taking the current date is empty, end last year to take an empty today.

  get_k_data Interface : k line data acquisition, integration of two functions and interfaces get_hist_data get_h_data, i.e. can be easily acquired Day Week Month low frequency data may be acquired relatively high frequency of the data packets 15, 30 and 60. At the same time, since the data before and after the right to re listed can easily get in one line of code.

           start to take the empty first day of listing, end to empty rounded to the nearest trading day

import tushare as ts

# print(ts.get_hist_data("601318"))
print(ts.get_k_data("601318"))

"""
           date    open   close    high     low     volume    code
0    2017-06-15  44.946  43.984  45.212  43.727  1041983.0  601318
1    2017-06-16  43.908  44.479  44.936  43.908   807231.0  601318
2    2017-06-19  44.727  46.251  46.317  44.470   808481.0  601318
3    2017-06-20  46.451  45.812  46.603  45.403   616355.0  601318
4    2017-06-21  46.003  47.079  47.203  45.298   849757.0  601318
..          ...     ...     ...     ...     ...        ...     ...
635  2020-01-17  86.150  86.250  86.900  85.850   555370.0  601318
636  2020-01-20  88.300  87.600  88.700  87.350   936050.0  601318
637  2020-01-21  87.000  85.600  87.290  85.600   727579.0  601318
638  2020-01-22  85.000  85.220  85.480  83.830   736576.0  601318
639  2020-01-23  84.010  83.490  84.560  82.480  1080020.0  601318
"""

1, Parameter Description

  code: ticker symbol, that is, six digit code, or the code index (sh: Shanghai Composite Index, sz: Shenzhen index, hs300: the Shanghai and Shenzhen 300 Index, sz50: SSE 50, zxb: small plates, cyb: GEM)

  start: Start date, format YYYY-MM-DD

  end: end date in the format YYYY-MM-DD

  ktype: data type, (D: k line, W: circumferential line k, M: month, 5: 5, 15: 15, 30: 30 min, 60: 60 min default D)

  retry_count: After the network anomaly retries, the default is 3

  pause: repeat request data during the pause seconds, the request interval is too short to prevent problems appear, the default is 0

  autype: recovery of the right type, the right to re qfq- ago, right after hfq- complex, None- no longer right, the default is qfq

  index: whether the index, the index is set so that code is the code, the default is False to True

2, return value

  data: Date

  volume: volume turnover: turnover (index without this)

  open: the opening price close: the closing price

  high: the highest price low: the lowest price

  price_change: price changes p_change: Quote change

  ma5: 5 Excluded ma10: 10 Avg.Rate ma20: 20 Avg.Rate

  v_ma5: 5 daily amount v_ma10: 10 daily amount v_ma20: 20 daily amount

3, with the designated index Index Code

  Identified as the index is set to True Code.

import tushare as ts

df = ts.get_k_data("000016", index=True)
print(df)

"""
           date     open    close     high      low      volume      code
0    2017-06-15  2474.26  2461.97  2481.80  2453.13  20702545.0  sh000016
1    2017-06-16  2454.84  2452.79  2466.69  2448.31  16518044.0  sh000016
2    2017-06-19  2455.03  2484.12  2486.31  2453.35  20594004.0  sh000016
3    2017-06-20  2489.20  2474.43  2492.22  2467.77  17771153.0  sh000016
4    2017-06-21  2487.85  2497.25  2498.19  2468.51  20354217.0  sh000016
..          ...      ...      ...      ...      ...         ...       ...
635  2020-01-17  3054.04  3053.17  3067.56  3042.59  23304514.0  sh000016
636  2020-01-20  3070.67  3065.99  3072.29  3054.45  27274849.0  sh000016
637  2020-01-21  3048.95  3012.11  3050.82  3011.18  32801291.0  sh000016
638  2020-01-22  2996.81  3017.88  3023.90  2965.66  33159760.0  sh000016
639  2020-01-23  2993.77  2932.49  2993.77  2910.39  42839352.0  sh000016
"""

Third, the right to re-data

1, the concept of the right to re

  Recovery of the right is the right to carry out price and volume information to repair, according to the actual ups and downs of the stock price chart is drawn, and adjust the volume of the share capital of the same caliber, and then compared with the same costs. Right to re-price movements can be eliminated due to the distortion caused by the ex dividend, maintaining the continuity of stock prices.

  No longer right after the ex-dividend is not artificially fill the huge gap on the price chart, let faults exist.

  Right before the complex is to the current share price as a benchmark, to maintain the existing price unchanged, the price reduction before, the K-line before the ex pan down, the graphics fit, keeping stock prices continuity. It simply is the price before the ex-dividend at the present prices converted over, recovery of the right now the same price, reducing the previous price.

    Right before the complex formula: recovery of the right price = (before re right price - cash dividend) / (1 + change in the proportion of outstanding shares)

  Recovery of the right refers to the K-line diagram before the ex-rights price as a benchmark to measure the stock market after the ex-dividend cost. Simply put, it is the price after the ex converted at the previous price over the previous recovery of the right price unchanged, the current price increase. By complex right after we can see that the cumulative increase since the stock market is, if the time to buy, participate in all distribution, dividends, has been held to the current price.

    Recovery of the right formula: = exercise price after the resumption before the right to re-price × (1+ change in the proportion of tradable shares) + Cash Dividend

2, view the stock listing date

import tushare as ts

df = ts.get_stock_basics()
date = df.ix['600848']['timeToMarket']
print(date)   # 19940324

3, the right to re-acquire stocks data

Import tushare AS TS 

# obtaining weights stock multiplexed data 
DF1 = ts.get_k_data ( ' 002 337 ' )     # complex right front 
Print (DF1)
 "" " 
           DATE Open Close High Low Volume code 
0 2017-06-15 002 337 5.884 5.953 5.973 5.834 38690.0 
. 1 2017 -06-16 5.953 5.963 5.973 5.913 002 337 21518.0 
22017-06-19 23512.0 002 337 5.933 5.983 5.993 5.933 
32017-06-20 27808.0 002 337 6.003 5.953 6.003 5.943 
42017-06-21 25967.0 5.953 5.953 6.023 5.884 002 337 
.. .. . ... ... ... ... ... ... 
6352020-01-17 3.730 3.660 3.730 3.660 002 337 29073.0
636  2020-01-20  3.650  3.660  3.690  3.620  23381.0  002337
637  2020-01-21  3.660  3.610  3.670  3.610  23591.0  002337
638  2020-01-22  3.600  3.610  3.630  3.520  32295.0  002337
639  2020-01-23  3.610  3.500  3.650  3.400  42815.0  002337
"""

df2 = ts.get_k_data('002337', autype='hfq')    # 后复权
print(df2)
"""
           date    open   close    high     low   volume    code
0    2017-06-15  30.589  30.952  31.056  30.330  38690.0  002337
1    2017-06-16  30.952  31.004  31.056  30.745  21518.0  002337
2    2017-06-19  30.849  31.108  31.160  30.849  23512.0  002337
3    2017-06-20  31.211  30.952  31.211  30.900  27808.0  002337
4    2017-06-21  30.952  30.952  31.315  30.589  25967.0  002337
..          ...     ...     ...     ...     ...      ...     ...
635  2020-01-17  19.393  19.029  19.393  19.029  29073.0  002337
636  2020-01-20  18.977  19.029  19.185  18.821  23381.0  002337
637  2020-01-21  19.029  18.769  19.081  18.769  23591.0  002337
638  2020-01-22  18.717  18.769  18.873  18.301  32295.0  002337
639  2020-01-23  18.769  18.197  18.977  17.677  42815.0  002337
"""

df3 = ts.get_k_data('002337', autype=None)     # 不复权
print(df3)
"""
           date  open  close  high   low   volume    code
0    2017-06-15  5.90   5.97  5.99  5.85  38690.0  002337
1    2017-06-16  5.97   5.98  5.99  5.93  21518.0  002337
2    2017-06-19  5.95   6.00  6.01  5.95  23512.0  002337
3    2017-06-20  6.02   5.97  6.02  5.96  27808.0  002337
4    2017-06-21  5.97   5.97  6.04  5.90  25967.0  002337
..          ...   ...    ...   ...   ...      ...     ...
635  2020-01-17  3.73   3.66  3.73  3.66  29073.0  002337
636  2020-01-20  3.65   3.66  3.69  3.62  23381.0  002337
637  2020-01-21  3.66   3.61  3.67  3.61  23591.0  002337
638  2020-01-22  3.60   3.61  3.63  3.52  32295.0  002337
2020-01-23 3.61 3.50 3.65 3.40 639 42815.0 002 337 
"" " 

DF4 = ts.get_k_data ( ' 002 337 ' , Start = ' 2015-01-01 ' , End = ' 2015-03-16 ' )    # between two dates right before the data multiplexed 
Print (DF4)
 "" " 
          DATE Open Close High Low Volume code 
0 2015-01-05 002 337 8.677 8.744 9.475 8.644 117,051.0 
. 1 8.385 9.153 9.309 8.182 142,634.0 2015-01-06 002 337 
2 9.060 9.718 10.067 2015-01-07 8.711 235,935.0 002,337 
32015-01-08 9.342 9.406 9.715 9.246 002 337 128,256.0
..         ...     ...     ...     ...     ...        ...     ...
41  2015-03-10  13.123  13.552  13.632  12.684  1057530.0  002337
42  2015-03-11  13.313  13.113  13.442  12.964   591102.0  002337
43  2015-03-12  13.253  13.243  13.911  12.924   715057.0  002337
44  2015-03-13  13.004  13.333  13.343  12.964   405488.0  002337
45  2015-03-16  13.233  13.353  13.412  12.964   812129.0  002337
"""

Fourth, the market index quotes list

  Get real-time quotes list market index, the market index to show real-time quotes in tabular form.

1, call the method

import tushare as ts

df = ts.get_index()
print(df)

 

2, return value

  code: Index Code

  name: Name Index

  change: Quote change

  open: the opening point

  preclose: yesterday closing level

  close: closing level

  high: the highest point

  low: the lowest point

  volume: volume (hand)

  amount: Adult Education (RMB 100 million)

3, calling the results show

      ... Low Volume Change name code AMOUNT 
0    000001 Shanghai Composite Index 272 763 234 3274.9036 2955.3460 ... -2.75 
1 000002 A-share Index 272 375 143 3272.4163 3096.7191 ... -2.75 
2 000003 B-share Index -3.47 2.4873 ... 246.4306 388 091 
3 000 008 Comprehensive 61940755 2834.5232 -2.51 716.1844 index ... 
4000009 ... SSE 380 -3.43 4770.1956 700.2298 57463666 
5000010 SSE 97054465 180 8513.7663 1428.8354 -2.90 ... 
.. ... ... ... ... ... ... ...
 16 399 005 ... szse 6944.9460 -3.31 830.3480 4628035966 
17399006 GEM mean 11576635669 1710.6619 1904.2750 -3.32 ...
Small 18399008 300 9981028403 1409.6974 1321.9150 -3.43 ... 
19399100 ... new index -3.49 4787.0620 7974.4380 39492412447 
20399101 small board consolidated 17783593837 2065.2582 9899.0470 -3.30 ... 
21399106 ... Shenzhen Composite -3.45 1740.1390 40289425936 4815.4754 
22399107 A refers Shenzhen 40248235084 4813.7496 1820.2840 -3.45 ... 
23399108 ... Shenzhen B refers to -2.26 1.7258 963.9610 41190852 
24399333 small plates ... 7818.6200 4,628,035,966 830.3480 -3.31 R & lt 
25 399 606 GEM R -3.32. .. 2008.9970 672.5783 3,195,889,692

Fifth, data storage 

  The data storage module is mainly to guide the user data stored in the local disk or a database server, and easy to post back to test the use of quantitative analysis.

  In the file format stored on computer disk the way, is called pandas comes with its own methods.

1, CSV files

  The DataFrame pandas and Series object provides a direct method to save csv file format, parameter setting, easy transfer of data content stored in the local disk.

(1) Save example csv file

Import tushare AS TS 

# saved as a csv file 
DF = ts.get_k_data ( " 000,875 " ) 

# stored directly 
df.to_csv ( ' 000875.csv ' )
 # Choose Save 
df.to_csv ( ' 000875.csv ' , Columns = [ ' Open ' , ' High ' , ' Low ' , ' Close ' ])

  Direct Save to the following effects:

  

  Choose Save, Save content directly overwritten, preservation effects as follows:

  

(2) adding sample data

  When the same data needs to be stored in a large file, you need to append data to the same file:

import tushare as ts
import os

filename = 'bigfile.csv'
for code in ['000875', '600848', '000981']:
    df = ts.get_hist_data(code)
    if os.path.exists(filename):
        df.to_csv(filename, mode='a', header=None)
    else:
        df.to_csv(filename)

  The program runs out of display:

  

  注意:如果不考虑header,直接df.to_csv(filename, mode='a')即可。否则,每次循环都会把columns名称append进去。

(3)参数说明

  path_or_buf:csv文件存放路径或StringIO对象

  sep:文件内容分隔符,默认为逗号

  na_rep:在遇到NaN值时保存为某字符,默认为'   '空字符

  float_format:float类型的格式

  columns:需要保持的列,默认为None

  header:是否保存columns名,默认为True

  index:是否保存index,默认为True

  mode:创建新文件还是追加到现有文件,默认为新建

  encoding:文件编码格式

  date_format:日期格式

2、Excel文件

  pandas将数据保存为MicroSoft Excel文件格式。

(1)调用示例

import tushare as ts

df = ts.get_hist_data('000875')
#直接保存
df.to_excel('000875.xlsx')

#设定数据位置(从第3行,第6列开始插入数据)
# df.to_excel('000875.xlsx', startrow=2,startcol=5)

(2)参数说明

  excel_writer:文件路径或ExcelWriter对象

  sheet_name:sheet名称,默认为Sheet1

  sep:文件内容分隔符,默认为 ',' 逗号

  na_rep:在遇到NaN值时保存为某字符,默认为'   '空字符

  float_format:float类型的格式

  columns:需要保持的列,默认为None

  header:是否保存columns名,默认为True

  index:是否保存index,默认为True

  encoding:文件编码格式

  startrow:在数据头部留出startrow行空行

  startcol:在数据左边留出startcol列空列

 3、其他存储类型

  pandas利用PyTables包将数据保存为HDF5格式的文件。需要确认的是,运行时PyTables包的版本需要 >=3.0.0。

  pandas生成Json格式的文件或字符串。

  pandas提供了将数据便捷存入关系型数据库的方法,在新版的pandas中,主要是已sqlalchemy方式与数据建立连接,支持MySQL、Postgresql、Oracle、MS SQLServer、SQLite等主流数据库。

  参数和使用方法见:http://tushare.org/storing.html

 

Guess you like

Origin www.cnblogs.com/xiugeng/p/12232235.html