20190919: Multi-factor stock selection model-scoring method technical route combing

 
1. Data preprocessing
1. Factor classification, extract basic data by factor category (factor combing of multi-factor stock selection model (t is 1 by default))
(1) Valuation factor: market data-daily indicators
  • Price-earnings ratio:
    • Price-to-earnings ratio (TTM): pe_ttm
    • Price-earnings ratio (total market capitalization/net profit): pe
  • Price to book ratio:
    • Price to book ratio (total market value/net assets): pb
  • Market-to-sales rate:
    • Market-to-sales ratio (TTM): ps_ttm
    • Market-sales ratio: ps
(2) Growth factor: financial data
  • Performance Express Interface
    • Net assets growth rate (growth rate compared to the beginning of the year: net assets per share attributable to shareholders of the parent company, growth_bps)
    • Growth rate of shareholders' equity (yoy_equity growth rate from the beginning of the year: shareholders' equity attributable to the parent company, yoy_equity)
  • Financial indicator data interface
    • Year-on-year growth rate of operating income (%) (single quarter) (q_sales_yoy)
    • Chain growth rate of operating income (%) (single quarter) (q_sales_qoq)
    • Year-on-year growth rate of operating profit (%) (single quarter) (q_op_yoy)
    • Chain growth rate of operating profit (%) (single quarter) (q_op_qoq)
    • Year-on-year growth rate of net profit (%) (single quarter) (q_profit_yoy)
    • Net profit growth rate (%) (single quarter) (q_profit_qoq)
    • Year-on-year growth rate of net profit attributable to shareholders of the parent company (%) (single quarter) (q_netprofit_yoy)
    • Chain growth rate of net profit attributable to shareholders of the parent company (%) (single quarter) (q_netprofit_qoq)
    • Year-on-year growth rate of net assets (equity_yoy)
    • Year-on-year growth rate of basic earnings per share (%) (basic_eps_yoy)
    • Diluted earnings per share year-on-year growth rate (%) (dt_eps_yoy)
    • Year-on-year growth rate of net cash flow from operating activities per share (%) (cfps_yoy)
(3) Profitability factor: financial data-financial index data interface
  • Net sales margin: net sales margin (single quarter) (q_netprofit_margin), net sales margin (netprofit_margin)
  • Gross profit margin: sales gross profit margin (single quarter) (q_gsprofit_margin), sales gross profit margin (gsprofit_margin)
  • Return on net assets: return on net assets (single quarter) (q_roe), return on net assets (roe)
  • Return on assets: return on total assets (roa), annualized return on total assets (roa2_yearly)
  • Operating expense ratio: total operating cost/total operating income (gc_of_gr), total operating cost/total operating income (single quarter) (q_gc_of_gr)
  • Financial expense ratio: financial expense/total operating income (finaexp_of_gr), financial expense/total operating income (single quarter) (q_finaexp_to_gr)
  • Ratio of profit before interest and tax to total operating income: (ebit_of_gr)
(4) Momentum reversal factor
  • Change in the previous t days: (pct_chg_t)
(5) Trading factors: turnover rate (turnover_rate_t), volume ratio (volume_ratio_t), etc. in the previous t days;
(6) Scale factor: market data-daily indicator interface
  • Circulation market value: circ_mv
  • Total market value: total_mv
  • Circulating capital: float_share
  • Total share capital: total_share
(7) Stock price fluctuation factor:
  • Previous stock price amplitude: the percentage of the absolute value of the difference between the highest and lowest prices of the day after the opening of the stock and the closing price of the previous day, namely abs(high_0-low_0)/close_t
(8) Target label: close_0/close_t-1
 
2. Data missing
        Only part of the factors, growth and profitability factors, can be obtained through tushare. The three most basic and important factors are seriously missing. After analysis, 27 factors are obtained, which are as follows:
(1) Valuation category:
  • Dynamic P/E ratio'pe_ttm',
  • Static P/E ratio'pe',
  • P/B ratio'pb',
  • Dynamic market-sales ratio'ps_ttm',
  • Static market-sales ratio'ps';
(2) Growth category:
  • Year- on-year growth rate of operating income (%) (single quarter) 'q_sales_yoy',
  • Operating profit growth rate (%) (single quarter) 'q_op_qoq',
  • The year-on-year growth rate of net assets'equity_yoy',
  • Year- on-year growth rate of basic earnings per share (%) 'basic_eps_yoy',
  • Diluted earnings per share year-on-year growth rate (%) 'dt_eps_yoy',
  • Year- on-year growth rate of net cash flow from operating activities per share (%) 'cfps_yoy',
(3) Profitability category:
  • Net profit margin'netprofit_margin',
  • Return on net assets (single quarter) 'q_roe',
  • ROE 'roe',
  • Return on total assets'roa',
  • Annualized return on total assets'roa2_yearly',
  • Total operating cost / total operating income'gc_of_gr',
  • Financial expenses/total operating income'finaexp_of_gr',
  • The ratio of profit before interest and tax to total operating income'ebit_of_gr',
(4) Scale category:
  • Circulation market value'circ_mv',
  • Total market value'total_mv',
  • Tradable share capital'float_share',
  • Total share capital'total_share',
(5) Momentum reversal:
  • Quote change 'pct_chg_0',
(6) Trading factors:
  • Turnover 'turnover_rate_0',
  • Volume ratio'volume_ratio_0',
(7) Stock price fluctuation factor:
  • Earlier stock price amplitude '(high_0-low_0)/close_0'
 
3. Outlier processing
          Before standardizing the data, we need to deal with outliers first. Because data that is too large or too small may affect the results of the analysis, especially when doing regression, the outliers will seriously affect the estimation results of the correlation between the factor and the rate of return. The processing method for outliers is to adjust them to the upper and lower limits, where the upper and lower limits are given by the criteria for judging the outliers. There are three criteria for determining outliers, namely  MAD , and percentile method. The main idea is to first define the upper and lower limits, and then adjust the outliers that exceed the limit to the upper and lower limits. The more commonly used is the MAD method.
Center distance calculation method based on MAD: Center distance calculation method based on Median Absolute Deviation (MAD)
(1) Calculate the median (X) of all observation points;
(2) Calculate the absolute deviation of each observation point from the median abs(X-median(X));
(3) Calculate the median of the absolute deviation value in (2), that is, MAD = median(abs(X-median(X)));
(4) Divide the value obtained in (2) by the value in (3) to obtain a set of distance values ​​from the center of all observation points based on MAD abs(X-median(X))/MAD.
 
 
4. Data standardization
z-score standardization
 
 
 

Guess you like

Origin blog.csdn.net/weixin_38192254/article/details/111627486