Python shares acquired historical data

Many stock data acquisition interface, the interface has free Sina, Netease interfaces, Yahoo's API interface, charging that securities companies and the corresponding companies.
Free trial data interface generally provides only the most recent year or three years, more limited, unless enough money.
So this paper is to discuss the acquisition and processing of data for free.

Domestic provide stock data interfaces such as sinajs, money.163.com, yahoo, different API interface they provide, each provides similar data, you can select one of the data processing.

At present, there is an open source financial data acquisition package, a package of the above-mentioned interface, without relational data sources where to go, it will give priority to fetch data from the fastest source. Very convenient to use. It is TuShare, see the specific installation link.

This paper developed on the basis of acquired data based TuShare, describes how to get all the A-share stock K-line historical data.
First, access to the A-share listed companies list
Import tushare TS AS
Import PANDAS AS pd
DEF download_stock_basic_info ():

    the try:
       df = ts.get_stock_basics ()
       # save directly to CSV
       Print 'the Choose CSV'
       df.to_csv ( 'stock_basic_list.csv') ;
       Print 'download CSV Finish'
stock list includes basic information about the current 2756 shares of a shares, including:
code, the code
name, name
industry, industry
area, regional
pe, price-earnings ratio
outstanding, outstanding capital stock
totals, the total share capital (Wan )
totalAssets, total assets (ten thousand)
liquidAssets, current assets
fixedAssets, fixed assets
reserved, fund
reservedPerShare, fund share,
eps, earnings per share
bvps, net assets per share
pb, book value
timeToMarket, launch date
two, get a single stock in the history of K-line

K-line data acquisition include:

DATE: date of transaction (index)
Open: opening price (before reinstatement by default)
High: the highest price (before reinstatement by default)
use Close : closing price (before reinstatement by default)
Low: lowest price (before reinstatement by default)
open_nfq: opening price (no longer right)
high_nfq: highest price (no longer right)
close_nfq: closing price (no longer right)
low_nfq: lowest price (no longer right )
open_hfq: opening price (recovery of the right)
high_hfq: highest price (recovery of the right)
close_hfq: closing price (recovery of the right)
low_hfq: lowest price (recovery of the right)
volume: volume
amount: turnover

download ticker code shares the history of K line, the default is the launch date to today's K-line data to support incremental download, such as local data has been downloaded 60,000 shares to 2015-6-19, will run again from 6.20 to start the download, appended to the local csv file.

# The default is today's launch date to K-line data
# can specify a start and an end date: the format "2015-06-28"
DEF download_stock_kline (code, DATE_START = '', DATE_END = datetime.date.today ()):

    code = util.getSixDigitalStockCode (code) # The stock 6-digit code is formatted as

    the try:
       fileName = 'h_kline_' STR (code) '.csv'

       WriteMode = 'W'
       IF os.path.exists (cm.DownloadDir fileName) :
           #Print ( ">> exist:" code)
           df = pd.DataFrame.from_csv (path = cm.DownloadDir fileName)

           SE = df.head (1) .index # existing files take the most recent date
           dateNew = se [0 ] the datetime.timedelta (. 1)
           DATE_START = dateNew.strftime ( "% Y-M-% D%")
           #Print DATE_START
           WriteMode = 'A'

       IF DATE_START == '':
           SE = get_stock_info (code)
           DATE_START SE = [ ' timeToMarket '] 
           date = datetime.datetime.strptime(str(date_start), "%Y%m%d")
           date_start = date.strftime('%Y-%m-%d')
       date_end = date_end.strftime('%Y-%m-%d')  

       # 已经是最新的数据
       if date_start >= date_end:
           df = pd.read_csv(cm.DownloadDir fileName)
           return df

       print 'download ' str(code) ' k-line >>>begin (', date_start u' 到 ' date_end ')'
       df_qfq = ts.get_h_data(str(code), start=date_start, end=date_end) # 前复权
       df_nfq = ts.get_h_data(str(code), start=date_start, end=date_end) # 不复权
       df_hfq = ts.get_h_data(str(code), start=date_start, end=date_end) # 后复权

       if df_qfq is None or df_nfq is None or df_hfq is None:
           return None

       df_qfq['open_no_fq'] = df_nfq['open']
       df_qfq['high_no_fq'] = df_nfq['high']
       df_qfq['close_no_fq'] = df_nfq['close']
       df_qfq['low_no_fq'] = df_nfq['low']
       df_qfq['open_hfq']=df_hfq['open']
       df_qfq['high_hfq']=df_hfq['high']
       df_qfq['close_hfq']=df_hfq['close']
       df_qfq['low_hfq']=df_hfq['low']

       if writeMode == 'w':
           df_qfq.to_csv(cm.DownloadDir fileName)
       else:

           df_old = pd.DataFrame.from_csv(cm.FileName DownloadDir)            df_qfq = df_qfq.reindex (df_qfq.index [:: -. 1])           df_old = df_old.reindex (df_old.index [:: -. 1])

           # by date from far and near



           = df_old.append df_new (df_qfq)
           #Print df_new

           # by date and by the near-far
           df_new = df_new.reindex (df_new.index [:: -. 1])
           df_new.to_csv (cm.DownloadDir fileName)
           #df_qfq = df_new

       Print '\ ndownload 'STR (code)' K-Line Finish '
       return pd.read_csv (cm.DownloadDir fileName)

    the except Exception AS E:
       Print STR (E)        


    return None
## ## Private Methods
########## #############

# obtain basic information about stocks: stock name, industry, geography, PE and other detailed below
# code, the code
# name, the name of
# industry, industry
# area, area
# pe, price-earnings ratio
# outstanding, outstanding capital stock
# totals, the total share capital (million)
# TotalAssets, the total assets (ten thousand)
# liquidAssets, current assets
# fixedAssets, fixed assets
# reserved, fund
# reservedPerShare, fund share
# eps, earnings per share
# bvps, net assets per share
# pb, PB
# timeToMarket, date listing
# return value type: Series
DEF get_stock_info (code):
    the try:
       SQL = "SELECT * WHERE code from% S = '% S'"% (STOCK_BASIC_TABLE, code)
       DF = pd.read_sql_query (SQL, Engine)
       SE = df.ix [0]
    the except Exception AS E:
       Print STR (E)
    return SE
Third, access stock history of all the K line

# historical overview of all stock lines K
DEF download_all_stock_history_k_line ():
    Print 'downloads Stock All K-line'
    the try :
       pd.DataFrame.from_csv = DF (cm.DownloadDir cm.TABLE_STOCKS_BASIC '.csv')
       the pool = the ThreadPool (Processes = 10)
       pool.map (download_stock_kline, df.index)
       pool.close ()
       pool.join ()  

    the except Exception AS E:
       Print STR (E)
    Print 'downloads Stock All K-Line'
the Map language function from Lisp, map out another function capable of sequentially mapping function.

= URLs [ 'http://www.yahoo.com', 'http://www.reddit.com']
Results = Map (urllib2.urlopen, URLs)

There are two parallel support to complete the library by map functions: one is multiprocessing, the other is a little-known but powerful sub-file: multiprocessing.dummy.
Dummy is a clone file multiprocessing module. The only difference is that multi-process module using the process, and then use the dummy thread (of course, it has all the usual restrictions Python).

To call a specified number of multi-threaded through the processes.

Annex: Other functions and variables used herein, is defined as follows:
TABLE_STOCKS_BASIC = 'stock_basic_list'
DownloadDir os.path.pardir = '/ StockData /' # os.path.pardir: parent directory

# completion ticker (6 ticker )
# INPUT: int or String
# Output: String
DEF getSixDigitalStockCode (code):
    strZero = ''
    for I in Range (len (STR (code)),. 6):
       strZero = '0'
    return strZero STR (code)

Guess you like

Origin www.cnblogs.com/tan2810/p/12050651.html