The minimum market value of n selected stocks, output csv

Small-cap factor is very important stock-picking strategy, the main point is that small-cap stocks rose generally will be relatively large. 
Specific methods may be performed: every weekend (or end of each month) minimum market capitalization of 10 selected stocks, buy early next week (or next month) Average capital, to the next week (or next month) and back 10, and so on.
This section of code can generate all the recent stock transaction data, ranked by market capitalization, also can set the minimum market value of n selected stocks (already excluding ST, stop plate, the outgoing stock).
Ideas: Sina website crawling from the recent trading data for all stocks, ranked by market capitalization, saved as csv, for reference buying and selling.
from urllib.request import urlopen # python reptile own library 
Import PANDAS AS pd 
from datetime datetime Import 
Import Time 
Import Re # regular expression library 
os # system library Import 
Import json json database that comes with Python # 
pd.set_option ( 'expand_frame_repr' , False) # when the column is not too wrap 
pd.set_option ( 'display.max_rows', 500) # display up to the number of data rows 

# ===== functions: fetch data from a web page, the page returned to crawl content 
DEF get_content_from_internet (url, max_try_num = 10, sleep_time = 5): 
    "" " 
    use python comes urlopen function, grab data from a web page 
    : param url: URL to be crawled data 
    : param max_try_num: trying to crawl up to number 
    : param sleep_time: after each failed to fetch pause time 
    : return: return to crawl web content 
    "" " 
    get_success = False # successful crawl to content 
    # crawls
    i in for (max_try_num) the Range: 
        the try: 
            Content = urlopen (url = url, timeout = 15) .read () # built using python library to obtain information from the network 
            get_success = True # successfully crawled content to 
            BREAK 
        the except AS E Exception: 
            Print ( 'crawl error data, times:', i + 1, 'error content:', E) 
            the time.sleep (sleep_time) 

    # determine success crawls 
    iF get_success: 
        return content 
    the else: 
        The raise a ValueError ( 'use urlopen constantly crawl the web data error, try to reach the upper limit, stop the program, please check the problem as soon as possible') 

# ===== function: Gets the specified stock from Sina (available only, may be exponential) recent trading data day, organized into DataFrame certain format, return to this DataFrame 
DEF get_today_data_from_sinajs (code_list): 
    "" " 
    return a string of recent stock trading related data
    Get stock data from this URL: http: //hq.sinajs.cn/list=sh600000,sz000002,sz300001 
    normal URL: HTTPS: //finance.sina.com.cn/realstock/company/sh600000/nc.shtml, 
    : param code_list: list of stock symbol string, a plurality may, for example [sh600000, sz000002, sz300001], 
    : return: return of a data stock stored DataFrame 
    "" " 
    # Construction URL 
    URL =" HTTP: //hq.sinajs. . cn / list = "+" , "join (code_list) # code_list first comma in the connecting element, preceded by another HTTP: ...... 

    # fetch data 
    Content = get_content_from_internet (URL) 
    Content = content.decode ( 'gbk') # gbk with decoding, decoding can be Chinese; before decoding is dytes, is decoded STR 

    # converting the data into DataFrame 
    Content = content.strip () # removed before and after the text space, carriage return 
    data_line = content .split ( '\ n') # each row is a stock data, generates a list of
    data_line = [i.replace ( 'var hq_str_ ', '') .split ( ',') for i in data_line] # removed from the front of each row of the redundant data, and again divided, forming two layers listing 
# df = pd. DataFrame (data_line, dtype = 'float ') # dtype = 'float' may generate numerical data 
    df = pd.DataFrame (data_line) # where only generate DF, turn on the following float 

    # DataFrame to organize 
    df [0] = df [0] .str.split ( '= "') # df [0] is to take the" 0 "column 
    DF [ 'stock_code'] = DF [0] .str [0] .str.strip () 
    DF [ ' stock_name '] = DF [0] .str [-1] .str.strip () 
    DF [' candle_end_time '] = DF [30] +' '+ DF [31 is] # K-line stock market, is generally to when K-line with an end time named 
    DF [ 'candle_end_time'] = pd.to_datetime (DF [ 'candle_end_time']) 
    rename_dict = {. 1: 'Open', 2: 'pre_close',. 3: 'Close',. 4: ' high ', 5:'low', 6: 'buy1', 7: 'sell1',
                   8: 'volume', 9: 'amount', 32: 'status'} # themselves to comparative data, there will be new discovery; 10 to the fifth gear 29 is trading handicap data 
    # femoral wherein the unit volume, amount unit meta 
    df.rename (columns = rename_dict, InPlace = True) 
    # selected columns require transformed into a float 
    Get recent trading data for all stocks from the Sina website URL, page by page 
    df [[ 'open', 'high ', 'low', 'close', 'pre_close', 'amount', 'volume', 'buy1', 'sell1']] \
    = df[['open', 'high', 'low', 'close', 'pre_close', 'amount', 'volume', 'buy1', 'sell1']].astype('float')
    df [ 'status'] = df [' status'] str.strip ( ' ";'.) # in the status column, remove redundant ''; 'character 

    df = df [[' stock_code ' ,' stock_name ', 'candle_end_time', 'Open', 'High', 'Low', 'Close', 'pre_close', 'AMOUNT', 
             'Volume', 'buy1', 'sell1', 'Status']] # redefine the column order 

    df return 

# ===== function: to get all the stock recently trading data from Sina, a return to DF 
DEF get_all_today_stock_data_from_sina_marketcenter (): 
    "" " 
    http://vip.stock.finance.sina.com.cn/mkt / # stock_hs_up 
    : return: a return DataFrame stock data storage 
    "" " 

    # === data URL 
    raw_url = 'http://vip.stock.finance.sina.com.cn/quotes_service/api/json_v2.php/Market_Center.getHQNodeData?page=%s' \
              '&num=80&sort=symbol&asc=1&node=hs_a&symbol=&_s_r_a=sort'
    = 1 page_num 

    # === storing data DataFrame 
    all_df = pd.DataFrame () 

    # === Get recent trading day of the Shanghai index date. This code is no video in the curriculum, then make up on 
    df = get_today_data_from_sinajs (code_list = [ 'sh000001']) 
    sh_date = df.iloc [0] [ 'candle_end_time']. DATE () on the Shanghai Composite Index recently trading day # 

    # = == start page by page traversal, get stock data 
    the while True: 
        # construct url 
        url = raw_url% (page_num) 
        Print ( 'to start ripping pages:', page_num) 

        # fetch data 
        Content = get_content_from_internet (url) 
        Content = Content. decode ( 'GBK') 

        # determines whether the empty pages of  
        if 'null' in content:
            print ( 'pages to crawl to the end, the loop is exited') 
            BREAK 

        # regular expressions, a key quotes 
        content = re.sub (r '(< = {|?,) ([a-zA-Z ] [A-zA-Z0-9] *) (? = :) ', R & lt' "\. 1" ', content) 
        # dict convert the data to format 
        content = json.loads (content) # At this time, content conversion for the List, wherein the element is dict 

        # DataFrame converting the data into the format 
        DF = pd.DataFrame (Content, DTYPE = 'a float') 
        # collation data 
        # rename; total market value units million units of value in circulation Wan yuan, daily volume unit shares, turnover units membered amount 
        rename_dict = { 'symbol': 'stock code', 'name': 'stock name', 'open': 'opening price', 'high': ' the highest price ',' low ':' lowest ',' trade ':' closing price ', 
                       ' Settlement ':' before closing price ','Volume': 'volume', 'amount': 'turnover', 'mktcap': 'total market value', 'nmc': 'flow value'}
        df.rename (columns = rename_dict, inplace = True) # no mapping of columns, or to retain the original column names, that does not reduce the number of columns 
# === parameter setting 
        # add transaction date 
        # df [ 'trade date'] = pd.to_datetime (datetime.now () .date ()) # video courses used in the Bank Code 
        df [ 'trade date'] = pd.to_datetime (sh_date) # used in the course is on the line of code in the video, and now into the line of code, more robust program
        # Df in the field, further comprising: mktcap (total market value), NMC (flow value), per (PE ratio, similar PE), Pb (PB), turnoverratio (turnover) 
        # columns needed to take 
        df = df [[ 'ticker', 'share name', 'trade date', 'opening price', 'highest price', 'lowest price', 'closing price', 
                 'previous closing price', 'volume' 'turnover', 'total market value', 'flow value']] 


        # combined data 
        all_df = all_df.append (DF, ignore_index = True) 

        # number of pages + 1'd 
        the page_num +. 1 = 
        the time.sleep (. 1) 

    # == = the day stop plate stock delete, this code is not in the course of the video, then make up the 
    all_df = all_df [all_df [ 'opening price'] - 0> 0.00001] 
    all_df.reset_index (drop = True,= True InPlace) 

    # === return results 
    return all_df 

# The following main program 
select_stock_num = 10 # stock number 

# get today all stock data 
df = get_all_today_stock_data_from_sina_marketcenter ()

# Function has been removed from the day the stock stop plate, here and then delete the outgoing ST stock 
df = df [df [ 'stock Name'] .str.contains ( 'ST') == False] # delete ST stock 
df = df [df [ 'stock name'] .str.contains ( 'withdrawal') == False] # delete the outgoing stock 

df [ 'ranking'] = df [ 'market capitalization'] .rank () # according to the total market ranking 
df.to_csv (r'C: \ Users \ lori \ Desktop \ stockinvest \ project1 \ data \ djl_data \ based on the current market value of the total stock \ all_stock_rank.csv ', 
          encoding =' GBK ', MODE =' W ', = False index) 
df.sort_values (by = 'market capitalization', inplace = True) # according to the total market capitalization of ordering 

df_select = df [[ 'trade date', 'ticker', 'share name', 'total market value', ' flow value ',' ranking ',' closing price ']] 
df_select df_select.iloc = [: select_stock_num] 
df_select.to_csv (r'C: \ the Users \ Lori \ Desktop \ stockinvest \ Project1 \ Data \ djl_data \ based on the current market capitalization stock \ select_stock.csv ',
          encoding='gbk',mode='w',index=False)

  

 

Guess you like

Origin www.cnblogs.com/djlbolgs/p/12541213.html