Small-cap factor is very important stock-picking strategy, the main point is that small-cap stocks rose generally will be relatively large.
Specific methods may be performed: every weekend (or end of each month) minimum market capitalization of 10 selected stocks, buy early next week (or next month) Average capital, to the next week (or next month) and back 10, and so on.
This section of code can generate all the recent stock transaction data, ranked by market capitalization, also can set the minimum market value of n selected stocks (already excluding ST, stop plate, the outgoing stock).
Ideas: Sina website crawling from the recent trading data for all stocks, ranked by market capitalization, saved as csv, for reference buying and selling.
from urllib.request import urlopen # python reptile own library Import PANDAS AS pd from datetime datetime Import Import Time Import Re # regular expression library os # system library Import Import json json database that comes with Python # pd.set_option ( 'expand_frame_repr' , False) # when the column is not too wrap pd.set_option ( 'display.max_rows', 500) # display up to the number of data rows # ===== functions: fetch data from a web page, the page returned to crawl content DEF get_content_from_internet (url, max_try_num = 10, sleep_time = 5): "" " use python comes urlopen function, grab data from a web page : param url: URL to be crawled data : param max_try_num: trying to crawl up to number : param sleep_time: after each failed to fetch pause time : return: return to crawl web content "" " get_success = False # successful crawl to content # crawls i in for (max_try_num) the Range: the try: Content = urlopen (url = url, timeout = 15) .read () # built using python library to obtain information from the network get_success = True # successfully crawled content to BREAK the except AS E Exception: Print ( 'crawl error data, times:', i + 1, 'error content:', E) the time.sleep (sleep_time) # determine success crawls iF get_success: return content the else: The raise a ValueError ( 'use urlopen constantly crawl the web data error, try to reach the upper limit, stop the program, please check the problem as soon as possible') # ===== function: Gets the specified stock from Sina (available only, may be exponential) recent trading data day, organized into DataFrame certain format, return to this DataFrame DEF get_today_data_from_sinajs (code_list): "" " return a string of recent stock trading related data Get stock data from this URL: http: //hq.sinajs.cn/list=sh600000,sz000002,sz300001 normal URL: HTTPS: //finance.sina.com.cn/realstock/company/sh600000/nc.shtml, : param code_list: list of stock symbol string, a plurality may, for example [sh600000, sz000002, sz300001], : return: return of a data stock stored DataFrame "" " # Construction URL URL =" HTTP: //hq.sinajs. . cn / list = "+" , "join (code_list) # code_list first comma in the connecting element, preceded by another HTTP: ...... # fetch data Content = get_content_from_internet (URL) Content = content.decode ( 'gbk') # gbk with decoding, decoding can be Chinese; before decoding is dytes, is decoded STR # converting the data into DataFrame Content = content.strip () # removed before and after the text space, carriage return data_line = content .split ( '\ n') # each row is a stock data, generates a list of data_line = [i.replace ( 'var hq_str_ ', '') .split ( ',') for i in data_line] # removed from the front of each row of the redundant data, and again divided, forming two layers listing # df = pd. DataFrame (data_line, dtype = 'float ') # dtype = 'float' may generate numerical data df = pd.DataFrame (data_line) # where only generate DF, turn on the following float # DataFrame to organize df [0] = df [0] .str.split ( '= "') # df [0] is to take the" 0 "column DF [ 'stock_code'] = DF [0] .str [0] .str.strip () DF [ ' stock_name '] = DF [0] .str [-1] .str.strip () DF [' candle_end_time '] = DF [30] +' '+ DF [31 is] # K-line stock market, is generally to when K-line with an end time named DF [ 'candle_end_time'] = pd.to_datetime (DF [ 'candle_end_time']) rename_dict = {. 1: 'Open', 2: 'pre_close',. 3: 'Close',. 4: ' high ', 5:'low', 6: 'buy1', 7: 'sell1', 8: 'volume', 9: 'amount', 32: 'status'} # themselves to comparative data, there will be new discovery; 10 to the fifth gear 29 is trading handicap data # femoral wherein the unit volume, amount unit meta df.rename (columns = rename_dict, InPlace = True) # selected columns require transformed into a float Get recent trading data for all stocks from the Sina website URL, page by page df [[ 'open', 'high ', 'low', 'close', 'pre_close', 'amount', 'volume', 'buy1', 'sell1']] \ = df[['open', 'high', 'low', 'close', 'pre_close', 'amount', 'volume', 'buy1', 'sell1']].astype('float') df [ 'status'] = df [' status'] str.strip ( ' ";'.) # in the status column, remove redundant ''; 'character df = df [[' stock_code ' ,' stock_name ', 'candle_end_time', 'Open', 'High', 'Low', 'Close', 'pre_close', 'AMOUNT', 'Volume', 'buy1', 'sell1', 'Status']] # redefine the column order df return # ===== function: to get all the stock recently trading data from Sina, a return to DF DEF get_all_today_stock_data_from_sina_marketcenter (): "" " http://vip.stock.finance.sina.com.cn/mkt / # stock_hs_up : return: a return DataFrame stock data storage "" " # === data URL raw_url = 'http://vip.stock.finance.sina.com.cn/quotes_service/api/json_v2.php/Market_Center.getHQNodeData?page=%s' \ '&num=80&sort=symbol&asc=1&node=hs_a&symbol=&_s_r_a=sort' = 1 page_num # === storing data DataFrame all_df = pd.DataFrame () # === Get recent trading day of the Shanghai index date. This code is no video in the curriculum, then make up on df = get_today_data_from_sinajs (code_list = [ 'sh000001']) sh_date = df.iloc [0] [ 'candle_end_time']. DATE () on the Shanghai Composite Index recently trading day # # = == start page by page traversal, get stock data the while True: # construct url url = raw_url% (page_num) Print ( 'to start ripping pages:', page_num) # fetch data Content = get_content_from_internet (url) Content = Content. decode ( 'GBK') # determines whether the empty pages of if 'null' in content: print ( 'pages to crawl to the end, the loop is exited') BREAK # regular expressions, a key quotes content = re.sub (r '(< = {|?,) ([a-zA-Z ] [A-zA-Z0-9] *) (? = :) ', R & lt' "\. 1" ', content) # dict convert the data to format content = json.loads (content) # At this time, content conversion for the List, wherein the element is dict # DataFrame converting the data into the format DF = pd.DataFrame (Content, DTYPE = 'a float') # collation data # rename; total market value units million units of value in circulation Wan yuan, daily volume unit shares, turnover units membered amount rename_dict = { 'symbol': 'stock code', 'name': 'stock name', 'open': 'opening price', 'high': ' the highest price ',' low ':' lowest ',' trade ':' closing price ', ' Settlement ':' before closing price ','Volume': 'volume', 'amount': 'turnover', 'mktcap': 'total market value', 'nmc': 'flow value'} df.rename (columns = rename_dict, inplace = True) # no mapping of columns, or to retain the original column names, that does not reduce the number of columns # === parameter setting # add transaction date # df [ 'trade date'] = pd.to_datetime (datetime.now () .date ()) # video courses used in the Bank Code df [ 'trade date'] = pd.to_datetime (sh_date) # used in the course is on the line of code in the video, and now into the line of code, more robust program # Df in the field, further comprising: mktcap (total market value), NMC (flow value), per (PE ratio, similar PE), Pb (PB), turnoverratio (turnover) # columns needed to take df = df [[ 'ticker', 'share name', 'trade date', 'opening price', 'highest price', 'lowest price', 'closing price', 'previous closing price', 'volume' 'turnover', 'total market value', 'flow value']] # combined data all_df = all_df.append (DF, ignore_index = True) # number of pages + 1'd the page_num +. 1 = the time.sleep (. 1) # == = the day stop plate stock delete, this code is not in the course of the video, then make up the all_df = all_df [all_df [ 'opening price'] - 0> 0.00001] all_df.reset_index (drop = True,= True InPlace) # === return results return all_df # The following main program select_stock_num = 10 # stock number # get today all stock data df = get_all_today_stock_data_from_sina_marketcenter () # Function has been removed from the day the stock stop plate, here and then delete the outgoing ST stock df = df [df [ 'stock Name'] .str.contains ( 'ST') == False] # delete ST stock df = df [df [ 'stock name'] .str.contains ( 'withdrawal') == False] # delete the outgoing stock df [ 'ranking'] = df [ 'market capitalization'] .rank () # according to the total market ranking df.to_csv (r'C: \ Users \ lori \ Desktop \ stockinvest \ project1 \ data \ djl_data \ based on the current market value of the total stock \ all_stock_rank.csv ', encoding =' GBK ', MODE =' W ', = False index) df.sort_values (by = 'market capitalization', inplace = True) # according to the total market capitalization of ordering df_select = df [[ 'trade date', 'ticker', 'share name', 'total market value', ' flow value ',' ranking ',' closing price ']] df_select df_select.iloc = [: select_stock_num] df_select.to_csv (r'C: \ the Users \ Lori \ Desktop \ stockinvest \ Project1 \ Data \ djl_data \ based on the current market capitalization stock \ select_stock.csv ', encoding='gbk',mode='w',index=False)