[Technology Sharing] Application of Machine Learning in Quantitative Trading Direction-Multi-Factor Stock Picking Strategy Based on Neural Network

This article original author: Oasis, released after authorization.

1. Multi-factor stock selection background

Quantitative trading strategies are nothing more than three points: timing, stock selection, and warehouse control. The timing is short-term arbitrage trading strategy and stock selection is mid-to-long term trading strategy. The goal is to outperform the index in the mid-to-long term and obtain the market excess return alpha. The key to multi-factor stock selection is to find the correlation between the factor and the stock return, that is, the factor with strong ability to predict the return. Generally adopt the following steps:

Traditional multi-factor models often rely on investors' subjective judgment and logical reasoning when constructing large-scale factor characteristics. Quantitative models such as machine learning, based on some mechanism, construct a model with adaptive and automatic learning features, which is widely used in the construction of multi-factor models. Compared with machine learning classifiers such as SVM and Logistic Regression, deep learning can learn more useful factor features by constructing machine learning models with many hidden layers and massive training data. Therefore, "deep model" is only a means, and our ultimate goal is to use deep neural networks to perform multi-factor "feature learning", thereby ultimately improving the accuracy of classification or prediction.

Second, the model characteristics-stock multi-factor

The stock pool on which our stock selection is based is a constituent stock of CSI 500. According to the financial indicators and stock price performance of these companies, five factors of profit, technology, growth, valuation, and scale are constructed as model inputs:

盈利因子技术因子BPSCloseAdj/OneMonthMaxCloseAdjDividend Yield1 Month returnDividend Yield Last Year1 Month RSIEPS1 Month ADRSales/Enterprise Value1 Month Amount AverageFCFF/Enterprise Value3 Month returnReturn on P/B1 year daily SkewnessReturn on P/E1 Month Turn Over AverageROE3 MonthTurn Over AverageROICilliquidity

规模因子估值因子成长因子Market Free SharesAcca_Operating Finance InvestmentNet Profit GrowthMarket Total CapitalAsset Impairment Loss To Gross RevenueOperation Profit GrowthMarket log Total CapitalCash From Sales/Operating RevenueOperation Income GrowthCFO/NOIRevenue GrowthP/BROE GrowthP/EChange Of Net Profit GrowthNet Profit Growth

Based on the above factors, we use the neural network model to learn the characteristics of the model, so as to select the top 50 stocks with the best market performance and build an optimal investment portfolio.

Three, model construction-deep neural network

The deep neural network of the model has been old-fashioned, there are countless posts on the KM and the network, and here is just a brief introduction and no expansion. A deep neural network consists of a multi-layer network consisting of an input layer, a hidden layer (multi-layer), and an output layer. Only nodes in adjacent layers are connected, and nodes in the same layer and cross-layers are not connected to each other; formed by combining low-level features More abstract high-level representation of attribute categories or characteristics. By building machine learning models with many hidden layers and massive training data, to learn more useful features, and ultimately improve the accuracy of classification or prediction. The advantage of this model is that it does not need to do too much feature data processing and screening. If it is thrown into the model black box, it can train better effects than machine models such as Boosting or trees, especially in scenarios with sufficient data and many features. Can exert his power.

The shortcomings are also obvious: 1) The model black box has no intuitive business meaning explanation for the features. 2) It is easy to cause gradient instability such as gradient explosion or gradient disappearance, and it is difficult to stabilize training convergence, often reaching local optimum or falling into a saddle point. 3) The calculation cost is high, and large amounts of data are generally calculated on the GPU. 4) It is still more suitable for training image data. Financial data are mostly linear and cannot take advantage of the ability of "deep" neural networks to fit nonlinear data. This model is based on the Keras framework to complete training and testing.

Fourth, the model strategy process

First determine the rolling backtest period: the three-year stock feature factor score is used as the training data of the neural network, and the three-year period is the training window period. The position adjustment period is 20 days, and the market-wide stock factor data on the day of the position exchange is used as the test data set to predict the next stock classification. Construct the corresponding factor training sample set and test set. When constructing the technical factor label, it should be noted that for the 10-day interval trading day, the price should be calculated after the 10-day trading day rate of return (calculated by adjclose), and finally hedge the CSI 500 rate of return to calculate the alpha return.

Schematic diagram of training rollback

Target variable: Divide stocks into strong and weak groups according to the current month ’s return on the stock exchange date (two categories, 0-1):

Strategy model: build a deep neural network with 4 hidden layers, and set the softmax classifier in the last layer. Supervise and train the entire neural network, adjust the hidden layer, the number of nodes (256-128-32-16), activation function (relu), loss function (cross entropy) and other model parameters, extract the main features of the factors.

Strategy output: predict the return performance of all stock swap cycles in the stock pool, sort the stocks according to the model probability prediction scores output by the deep neural network, and select the top 10% (TOP50) stocks as the optimal stock combination.

V. Model performance

According to the probabilistic prediction scoring ranking of the model output, first select the TOP50 stocks to build an equal-weight stock combination, and compare with the CSI 500 index.

The green line represents the cumulative return (net value) effect of the top50 stock combination selected by the model and the gray line CSI 500 index combination (wide market performance). It can be seen that the optimal stock combination selected by the multi-factor stock selection model far surpassed the large market index. The tail of the historical data used at the time happened to catch up with the 15-year stock market crash. During the stock market crash, the optimal portfolio stocks selected by the multi-factor stock picking model encountered significant risks and performed better than the broader market index.

The following is the backtest performance of the equal-weight stock portfolio constructed by the 50 best stocks. The annualized alpha (relatively better than the market's return rate) reaches 25%, and the sharpe ratio is as high as 1.22. Of course, this model is a more ideal market environment, established under the premise of almost no frictional cost, and good performance is inevitable.

Performance statisticsBacktestannual_return0.37annual_volatility0.29sharpe_ratio1.22calmar_ratio0.75stability_of_timeseries0.83max_drawdown-0.49omega_ratio1.24sortino_ratio1.67skew-0.84kurtosis3.75tail_ratio0.90beta1.01information_ratio0.17alpha0.25

6. Brief introduction of stock portfolio construction strategy

Generally, when a fund company constructs a quantitative index or wealth management product, the selection of the optimal stock is not complete. It is also necessary to match the weight of the stock, match the bond currency and other investment tools to hedge the risk, and control the VAR of the portfolio to a reasonable level. As expected, no introduction will be made here:

There is also a popular method called Smart Beta, which is used by some mainstream fund quantitative indexes. In essence, it is no longer to closely track the index, but to optimize the stock selection and weight arrangement during the index compilation process. It is not a simple passive index investment, but a Beta investment with thought and personality. Strategy. According to the China Securities Index Company, it has obtained excess returns that outperform the traditional market value weighted index. It combines the advantages of active investment and passive investment, can break through the restrictions of market value weighted index, provide investors with more flexible and diversified portfolio strategies, and better manage portfolio risk, so more and more professional Investors turned their attention to Smart Beta investment strategies based on factors such as stock price fluctuations, dividend payout capabilities, or company performance.

Seven: Conclusion

The essence of the multi-factor stock model is nothing more than the process that industry researchers who use machine learning algorithms to replace brokers look at stock indicators, company performance indicators, industry analysis reports and other data to make stock selection strategies and other decision-making processes. What classifier model is used next is that the key is to find a feature factor expansion factor library that has a strong effect on stock price prediction. Just like researchers who tirelessly observe the market daily, read industry reports, and abstract feature factors in their brains. Support stock selection decisions. Of course, whoever has the stock “factor” that others cannot grasp can easily find the “alpha” of the market. If we have a data processor with enough computing power to capture all signals and indicators in the stock market into factors in real time and make a stock selection strategy, can the positions of brokerage industry researchers be replaced? When the market is transparent enough, the market information can be fully interpreted, is there still alpha to earn?

In fact, the key to the original intention of writing this article is to let everyone understand how some of the common quantitative indexes, wealth management products and the model strategy behind stock selection software are currently on the market. Nor is it to prove how powerful the neural network is in stock selection. If this is the case, the performance of all quantitative funds on the market will not be so unsatisfactory, and some may not even run through the market. The real stock trading scene is far more complicated, especially in the context of frequent "black swan" incidents in recent years, or the sentence: "The stock market is risky, and you need to be cautious when entering the market."

Guess you like

Origin blog.csdn.net/qq_42933419/article/details/105098993