A, Tushare Introduction
Tushare is a free, open-source python Finance data interface package. The main achievement of stocks and other financial data from data acquisition, data storage to the cleaning process can provide data fast, clean, diverse facilitate the analysis of financial analysts, that the data acquisition aspect greatly reduce the workload, more focused on strategy Research and implementation of the model and.
Tushare, Tencent Finance, the Shanghai Stock Exchange, the Shenzhen Stock Exchange to retrieve data from Reuters.
1, Tushare use and installation
(1) Use the premise
Installation Python, pandas installation packages, the installation packages lxml, BS4 package, the package Requests. Environment as follows:
(2) to download and install
# Method One: PIP install tushare # Method Two: Access https://pypi.python.org/pypi/Tushare
(3) upgrade
# Updated version of PIP install tushare - Update # check the version information Print (. Tushare __version__ )
Second, the historical market (get_hist_data / get_h_data) Interface
get_hist_data Interface : obtaining historical transaction data stocks (including average data) acquired at k can be set by the parameter, k circumferential lines, on line k, and 5 minutes, 15 minutes, 30 minutes and 60 minutes k line data. This interface can only get the last 3 years daily data, with the average for stock selection and data analysis.
start taking API is empty earliest date provided data, end to empty rounded to the nearest trading day data.
import tushare as ts print(ts.get_hist_data("601318")) """ open high close low ... ma20 v_ma5 v_ma10 v_ma20 date ... 2020-01-23 84.01 84.56 83.49 82.48 ... 85.610 807119.55 711352.85 634487.17 2020-01-22 85.00 85.48 85.22 83.83 ... 85.632 690831.24 663221.99 596297.65 2020-01-21 87.00 87.29 85.60 85.60 ... 85.594 648759.18 652369.61 574079.03 2020-01-20 88.30 88.70 87.60 87.35 ... 85.528 646579.65 624830.50 560559.15 2020-01-17 86.15 86.90 86.25 85.85 ... 85.425 621487.27 594870.25 531672.07 ... ... ... ... ... ... ... ... ... ... 2017-07-31 51.88 52.64 52.02 51.41 ... 52.094 587775.69 587775.69 587775.69 2017-07-28 52.20 52.46 51.89 51.80 ... 52.113 580718.35 580718.35 580718.35 2017-07-27 51.85 52.74 52.36 51.09 ... 52.187 610526.22 610526.22 610526.22 2017-07-26 52.10 52.50 51.89 51.28 ... 52.100 582222.86 582222.86 582222.86 2017-07-25 52.62 53.05 52.31 52.18 ... 52.310 506834.84 506834.84 506834.84 """
get_h_date Interface : Get stocks all the historical data, the other above. (expired)
start taking the current date is empty, end last year to take an empty today.
get_k_data Interface : k line data acquisition, integration of two functions and interfaces get_hist_data get_h_data, i.e. can be easily acquired Day Week Month low frequency data may be acquired relatively high frequency of the data packets 15, 30 and 60. At the same time, since the data before and after the right to re listed can easily get in one line of code.
start to take the empty first day of listing, end to empty rounded to the nearest trading day
import tushare as ts # print(ts.get_hist_data("601318")) print(ts.get_k_data("601318")) """ date open close high low volume code 0 2017-06-15 44.946 43.984 45.212 43.727 1041983.0 601318 1 2017-06-16 43.908 44.479 44.936 43.908 807231.0 601318 2 2017-06-19 44.727 46.251 46.317 44.470 808481.0 601318 3 2017-06-20 46.451 45.812 46.603 45.403 616355.0 601318 4 2017-06-21 46.003 47.079 47.203 45.298 849757.0 601318 .. ... ... ... ... ... ... ... 635 2020-01-17 86.150 86.250 86.900 85.850 555370.0 601318 636 2020-01-20 88.300 87.600 88.700 87.350 936050.0 601318 637 2020-01-21 87.000 85.600 87.290 85.600 727579.0 601318 638 2020-01-22 85.000 85.220 85.480 83.830 736576.0 601318 639 2020-01-23 84.010 83.490 84.560 82.480 1080020.0 601318 """
1, Parameter Description
code: ticker symbol, that is, six digit code, or the code index (sh: Shanghai Composite Index, sz: Shenzhen index, hs300: the Shanghai and Shenzhen 300 Index, sz50: SSE 50, zxb: small plates, cyb: GEM)
start: Start date, format YYYY-MM-DD
end: end date in the format YYYY-MM-DD
ktype: data type, (D: k line, W: circumferential line k, M: month, 5: 5, 15: 15, 30: 30 min, 60: 60 min default D)
retry_count: After the network anomaly retries, the default is 3
pause: repeat request data during the pause seconds, the request interval is too short to prevent problems appear, the default is 0
autype: recovery of the right type, the right to re qfq- ago, right after hfq- complex, None- no longer right, the default is qfq
index: whether the index, the index is set so that code is the code, the default is False to True
2, return value
data: Date
volume: volume turnover: turnover (index without this)
open: the opening price close: the closing price
high: the highest price low: the lowest price
price_change: price changes p_change: Quote change
ma5: 5 Excluded ma10: 10 Avg.Rate ma20: 20 Avg.Rate
v_ma5: 5 daily amount v_ma10: 10 daily amount v_ma20: 20 daily amount
3, with the designated index Index Code
Identified as the index is set to True Code.
import tushare as ts df = ts.get_k_data("000016", index=True) print(df) """ date open close high low volume code 0 2017-06-15 2474.26 2461.97 2481.80 2453.13 20702545.0 sh000016 1 2017-06-16 2454.84 2452.79 2466.69 2448.31 16518044.0 sh000016 2 2017-06-19 2455.03 2484.12 2486.31 2453.35 20594004.0 sh000016 3 2017-06-20 2489.20 2474.43 2492.22 2467.77 17771153.0 sh000016 4 2017-06-21 2487.85 2497.25 2498.19 2468.51 20354217.0 sh000016 .. ... ... ... ... ... ... ... 635 2020-01-17 3054.04 3053.17 3067.56 3042.59 23304514.0 sh000016 636 2020-01-20 3070.67 3065.99 3072.29 3054.45 27274849.0 sh000016 637 2020-01-21 3048.95 3012.11 3050.82 3011.18 32801291.0 sh000016 638 2020-01-22 2996.81 3017.88 3023.90 2965.66 33159760.0 sh000016 639 2020-01-23 2993.77 2932.49 2993.77 2910.39 42839352.0 sh000016 """
Third, the right to re-data
1, the concept of the right to re
Recovery of the right is the right to carry out price and volume information to repair, according to the actual ups and downs of the stock price chart is drawn, and adjust the volume of the share capital of the same caliber, and then compared with the same costs. Right to re-price movements can be eliminated due to the distortion caused by the ex dividend, maintaining the continuity of stock prices.
No longer right after the ex-dividend is not artificially fill the huge gap on the price chart, let faults exist.
Right before the complex is to the current share price as a benchmark, to maintain the existing price unchanged, the price reduction before, the K-line before the ex pan down, the graphics fit, keeping stock prices continuity. It simply is the price before the ex-dividend at the present prices converted over, recovery of the right now the same price, reducing the previous price.
Right before the complex formula: recovery of the right price = (before re right price - cash dividend) / (1 + change in the proportion of outstanding shares)
Recovery of the right refers to the K-line diagram before the ex-rights price as a benchmark to measure the stock market after the ex-dividend cost. Simply put, it is the price after the ex converted at the previous price over the previous recovery of the right price unchanged, the current price increase. By complex right after we can see that the cumulative increase since the stock market is, if the time to buy, participate in all distribution, dividends, has been held to the current price.
Recovery of the right formula: = exercise price after the resumption before the right to re-price × (1+ change in the proportion of tradable shares) + Cash Dividend
2, view the stock listing date
import tushare as ts df = ts.get_stock_basics() date = df.ix['600848']['timeToMarket'] print(date) # 19940324
3, the right to re-acquire stocks data
Import tushare AS TS # obtaining weights stock multiplexed data DF1 = ts.get_k_data ( ' 002 337 ' ) # complex right front Print (DF1) "" " DATE Open Close High Low Volume code 0 2017-06-15 002 337 5.884 5.953 5.973 5.834 38690.0 . 1 2017 -06-16 5.953 5.963 5.973 5.913 002 337 21518.0 22017-06-19 23512.0 002 337 5.933 5.983 5.993 5.933 32017-06-20 27808.0 002 337 6.003 5.953 6.003 5.943 42017-06-21 25967.0 5.953 5.953 6.023 5.884 002 337 .. .. . ... ... ... ... ... ... 6352020-01-17 3.730 3.660 3.730 3.660 002 337 29073.0 636 2020-01-20 3.650 3.660 3.690 3.620 23381.0 002337 637 2020-01-21 3.660 3.610 3.670 3.610 23591.0 002337 638 2020-01-22 3.600 3.610 3.630 3.520 32295.0 002337 639 2020-01-23 3.610 3.500 3.650 3.400 42815.0 002337 """ df2 = ts.get_k_data('002337', autype='hfq') # 后复权 print(df2) """ date open close high low volume code 0 2017-06-15 30.589 30.952 31.056 30.330 38690.0 002337 1 2017-06-16 30.952 31.004 31.056 30.745 21518.0 002337 2 2017-06-19 30.849 31.108 31.160 30.849 23512.0 002337 3 2017-06-20 31.211 30.952 31.211 30.900 27808.0 002337 4 2017-06-21 30.952 30.952 31.315 30.589 25967.0 002337 .. ... ... ... ... ... ... ... 635 2020-01-17 19.393 19.029 19.393 19.029 29073.0 002337 636 2020-01-20 18.977 19.029 19.185 18.821 23381.0 002337 637 2020-01-21 19.029 18.769 19.081 18.769 23591.0 002337 638 2020-01-22 18.717 18.769 18.873 18.301 32295.0 002337 639 2020-01-23 18.769 18.197 18.977 17.677 42815.0 002337 """ df3 = ts.get_k_data('002337', autype=None) # 不复权 print(df3) """ date open close high low volume code 0 2017-06-15 5.90 5.97 5.99 5.85 38690.0 002337 1 2017-06-16 5.97 5.98 5.99 5.93 21518.0 002337 2 2017-06-19 5.95 6.00 6.01 5.95 23512.0 002337 3 2017-06-20 6.02 5.97 6.02 5.96 27808.0 002337 4 2017-06-21 5.97 5.97 6.04 5.90 25967.0 002337 .. ... ... ... ... ... ... ... 635 2020-01-17 3.73 3.66 3.73 3.66 29073.0 002337 636 2020-01-20 3.65 3.66 3.69 3.62 23381.0 002337 637 2020-01-21 3.66 3.61 3.67 3.61 23591.0 002337 638 2020-01-22 3.60 3.61 3.63 3.52 32295.0 002337 2020-01-23 3.61 3.50 3.65 3.40 639 42815.0 002 337 "" " DF4 = ts.get_k_data ( ' 002 337 ' , Start = ' 2015-01-01 ' , End = ' 2015-03-16 ' ) # between two dates right before the data multiplexed Print (DF4) "" " DATE Open Close High Low Volume code 0 2015-01-05 002 337 8.677 8.744 9.475 8.644 117,051.0 . 1 8.385 9.153 9.309 8.182 142,634.0 2015-01-06 002 337 2 9.060 9.718 10.067 2015-01-07 8.711 235,935.0 002,337 32015-01-08 9.342 9.406 9.715 9.246 002 337 128,256.0 .. ... ... ... ... ... ... ... 41 2015-03-10 13.123 13.552 13.632 12.684 1057530.0 002337 42 2015-03-11 13.313 13.113 13.442 12.964 591102.0 002337 43 2015-03-12 13.253 13.243 13.911 12.924 715057.0 002337 44 2015-03-13 13.004 13.333 13.343 12.964 405488.0 002337 45 2015-03-16 13.233 13.353 13.412 12.964 812129.0 002337 """
Fourth, the market index quotes list
Get real-time quotes list market index, the market index to show real-time quotes in tabular form.
1, call the method
import tushare as ts df = ts.get_index() print(df)
2, return value
code: Index Code
name: Name Index
change: Quote change
open: the opening point
preclose: yesterday closing level
close: closing level
high: the highest point
low: the lowest point
volume: volume (hand)
amount: Adult Education (RMB 100 million)
3, calling the results show
... Low Volume Change name code AMOUNT 0 000001 Shanghai Composite Index 272 763 234 3274.9036 2955.3460 ... -2.75 1 000002 A-share Index 272 375 143 3272.4163 3096.7191 ... -2.75 2 000003 B-share Index -3.47 2.4873 ... 246.4306 388 091 3 000 008 Comprehensive 61940755 2834.5232 -2.51 716.1844 index ... 4000009 ... SSE 380 -3.43 4770.1956 700.2298 57463666 5000010 SSE 97054465 180 8513.7663 1428.8354 -2.90 ... .. ... ... ... ... ... ... ... 16 399 005 ... szse 6944.9460 -3.31 830.3480 4628035966 17399006 GEM mean 11576635669 1710.6619 1904.2750 -3.32 ... Small 18399008 300 9981028403 1409.6974 1321.9150 -3.43 ... 19399100 ... new index -3.49 4787.0620 7974.4380 39492412447 20399101 small board consolidated 17783593837 2065.2582 9899.0470 -3.30 ... 21399106 ... Shenzhen Composite -3.45 1740.1390 40289425936 4815.4754 22399107 A refers Shenzhen 40248235084 4813.7496 1820.2840 -3.45 ... 23399108 ... Shenzhen B refers to -2.26 1.7258 963.9610 41190852 24399333 small plates ... 7818.6200 4,628,035,966 830.3480 -3.31 R & lt 25 399 606 GEM R -3.32. .. 2008.9970 672.5783 3,195,889,692
Fifth, data storage
The data storage module is mainly to guide the user data stored in the local disk or a database server, and easy to post back to test the use of quantitative analysis.
In the file format stored on computer disk the way, is called pandas comes with its own methods.
1, CSV files
The DataFrame pandas and Series object provides a direct method to save csv file format, parameter setting, easy transfer of data content stored in the local disk.
(1) Save example csv file
Import tushare AS TS # saved as a csv file DF = ts.get_k_data ( " 000,875 " ) # stored directly df.to_csv ( ' 000875.csv ' ) # Choose Save df.to_csv ( ' 000875.csv ' , Columns = [ ' Open ' , ' High ' , ' Low ' , ' Close ' ])
Direct Save to the following effects:
Choose Save, Save content directly overwritten, preservation effects as follows:
(2) adding sample data
When the same data needs to be stored in a large file, you need to append data to the same file:
import tushare as ts import os filename = 'bigfile.csv' for code in ['000875', '600848', '000981']: df = ts.get_hist_data(code) if os.path.exists(filename): df.to_csv(filename, mode='a', header=None) else: df.to_csv(filename)
The program runs out of display:
注意:如果不考虑header,直接df.to_csv(filename, mode='a')即可。否则,每次循环都会把columns名称append进去。
(3)参数说明
path_or_buf:csv文件存放路径或StringIO对象
sep:文件内容分隔符,默认为逗号
na_rep:在遇到NaN值时保存为某字符,默认为' '空字符
float_format:float类型的格式
columns:需要保持的列,默认为None
header:是否保存columns名,默认为True
index:是否保存index,默认为True
mode:创建新文件还是追加到现有文件,默认为新建
encoding:文件编码格式
date_format:日期格式
2、Excel文件
pandas将数据保存为MicroSoft Excel文件格式。
(1)调用示例
import tushare as ts df = ts.get_hist_data('000875') #直接保存 df.to_excel('000875.xlsx') #设定数据位置(从第3行,第6列开始插入数据) # df.to_excel('000875.xlsx', startrow=2,startcol=5)
(2)参数说明
excel_writer:文件路径或ExcelWriter对象
sheet_name:sheet名称,默认为Sheet1
sep:文件内容分隔符,默认为 ',' 逗号
na_rep:在遇到NaN值时保存为某字符,默认为' '空字符
float_format:float类型的格式
columns:需要保持的列,默认为None
header:是否保存columns名,默认为True
index:是否保存index,默认为True
encoding:文件编码格式
startrow:在数据头部留出startrow行空行
startcol:在数据左边留出startcol列空列
3、其他存储类型
pandas利用PyTables包将数据保存为HDF5格式的文件。需要确认的是,运行时PyTables包的版本需要 >=3.0.0。
pandas生成Json格式的文件或字符串。
pandas提供了将数据便捷存入关系型数据库的方法,在新版的pandas中,主要是已sqlalchemy方式与数据建立连接,支持MySQL、Postgresql、Oracle、MS SQLServer、SQLite等主流数据库。
参数和使用方法见:http://tushare.org/storing.html