Quantitative Research | Differentiation Analysis of Strategy in Index and Main Link

 Committed to sharing quantitative strategies, training videos, Python, programmatic trading and other related content

About the Author

Lu Yangyang 

In-service quantitative strategy researcher of a large asset management company, familiar with data cleaning, good at using macro factors, industry factors, etc. to model the impact of futures prices and correlation analysis, understand machine learning multiple regression methods, SVM, XGboost, financial time The underlying algorithm logic such as sequence, and some algorithms can be encapsulated by custom functions. Master the application of various machine learning packages and data calculation and analysis packages. Including but not limited to: Alphalens, pandas, crawler technology, sklearn, statsmodels, etc.

"text"

ˇ

introduction

As we all know, in the process of programming futures, most individuals or small and medium-sized institutions basically use third-party commercial software for transactions. The most widely used platform in China is the TB Trailblazer platform, aside from software stability In terms of personalization and other issues, from the perspective of strategy backtesting, the problem that everyone is most confused and concerned about is the error of the futures index contract and the main link recovery contract. Today, the author will share with you the differentiated analysis of index contracts and main-linked recovery rights with the original intention of the firm, scientific calculations, and detailed descriptions.

logical steps

Without further ado, let's go straight to the logic of computational thinking.

Step 1: Characterize the correlation and cointegration between the index and the main link weight.

Purpose: Through primary statistics (I don't know the advanced ones, haha~~) characterization, quantitative methods, qualitative observations, whether the convergence of two XY time series data or whether the changes are closely related, of course, there must be errors , but as long as the whole is convergent, it is fine, at least if the convergence is not convergent, then the index contract cannot be used. Because any computational evaluation method has errors, just like regression has "epsilon (the fifth letter of the Greek alphabet)". So it depends on whether the error is acceptable.

Step 2: We actually use the same strategy (the code is exactly the same, the parameters are the same), load it into the data K-line of the same cycle (same cycle, the same length), and go back to test to see the performance of the index and the main link. difference.

Purpose: Backtesting on different XY data through actual strategies is the final result that we want to see.

Step 3: Conduct cointegration and correlation analysis on the net value curve data obtained from the backtest.

Purpose: This step is mainly because I am personally limited by data problems, and I cannot obtain the data of the main connection and restoration rights to perform cointegration analysis locally. Therefore, cointegration and correlation analysis can only be performed on the backtested equity curves of the XY two time series data. That is to say, the cointegration in the first step is put into the third step. Of course, XY has undergone the same mathematical transformation, so their statistical relationship will not change. After all, the strategy is actually a mapping function, which is a process of transforming X, the market data, into the performance curve Y through the strategy f( ).

data preparation

Due to the differences in varieties and the active liquidity of each sector, we will analyze them one by one according to the following list.

Black: Thread, coke, hot coil, iron ore, manganese, thermal coal

Colored: Copper, Aluminum, Nickel

Chemical: PTA, PP, MA, RU, SC, EG, EB, LPG

Agricultural products: apples, eggs, rapeseed oil, palm oil, soybean meal, corn, sugar, cotton

Basically, the above varieties are all the varieties that I am actually trading. Some varieties may only have 1-2 strategies, and there may be 6-8 strategies, so I will explore these issues with the attitude of "pushing to myself".

Since this is the first article, there may be many more scientific and rigorous methods that I have not thought of, so I hope you will give your views and opinions before the official start.

Changes in the correlation coefficient between the index and the main linked complex rights contract

What we use is the Pearson correlation coefficient, and the mathematical formula is shown in the following figure:

I won't say too much about Pearson's mathematical derivation and properties, and say some basics. After all, this is not a math class. The Pearson correlation coefficient varies from -1 to 1. A value of 1 for the coefficient means that X and Y are well described by the equation of the line, that all data points fall well on a straight line, and that Y increases as X increases. vice versa.

Obviously, when both X and Y fall

and the mean side, the values ​​of (-)(-) are both positive. That is to say, their respective variable values ​​tend to be greater than or less than their respective means at the same time, and the correlation coefficient is positive, and vice versa, the opposite side is negative. I will not repeat the rest of the geometric explanation here, and everyone who is interested can Baidu.

Figure 1: 2016.1.1-2020.8.3, 15-minute K-line chart

The above figure shows the visualization of the correlation coefficient between the index (top) and the main link weight (bottom). There are two horizontal lines, 0.8 and 0.9, respectively. Looking at the time axis of 4 years and 7 months from 2016 to today, the large Some of the correlation coefficients are above 0.9. Let's take a look at the time period of these correlation coefficient jitters. (We look qualitatively from near to far)

 Figure 2: 2019.11-2020.8.3, 15-minute K-line chart

      

From the visualization of the correlation coefficient and the 2 K-line market charts in the past six months or more, we can see that abnormal fluctuations with a large range and large differences occurred before the replacement of the main contract and near the replacement date. The logic is actually very simple, because the changes in premiums and discounts and open interest in the approaching month have resulted in the same weight of open interest for two months, and the larger the premium and discount, the greater the difference, which is equivalent to The average premium and discount were taken. At that time, there will be a difference in volatility between the index and the main contract actually traded. This point can also be confirmed by the method described by the correlation coefficient. But what we should pay attention to is, which one is more detrimental to us, the magnitude of correlation reduction and the time of correlation reduction, of course, nonsense is detrimental to both of us, but we cannot expect perfection to become a skinny reality. It can be clearly seen from the graphic visualization that there have been a total of 15 undercuts below 0.8 in the past half a year.

Figure 3: Correlation Coefficient << span="">0.8 Marker Plot

15 times in 10 months, an average of 1.5 times a month. Let's take a look at how long it lasts. As shown below:

Figure 4: 2019.11-2019.12 correlation coefficient graph

The largest two of them are 2019.11.15 and 2019.12.2 respectively, with a duration of 8 K lines (from 0.9 below to 0.9 above), one K line for 15 minutes, and 8 for a total of 2 hours. Among them, these two times are the two times with the longest duration in the past half a year.

Here I also send you the actual code:

Params

       //add parameters here

       Numeric length(10);

Whose

       // add variables here

       Numeric corr;

Events

       OnBar(ArrayRefindexs)

       {

              corr = CoefficientR(Data0.close, Data1.Close, length);

              PlotNumeric("corr",corr);

              PlotNumeric("阈值2", 0.9, red);

              PlotNumeric("阈值",0.8, White);

       }

In fact, the "CoefficientR" function is what we mentioned at the beginning, the calculation method of the Pearson correlation coefficient, we can automatically call this function in TBQ, and the following is the decomposition of this function code.

Figure 5: Source code of Pearson correlation coefficient function

By reading the source code, we can see that the function encapsulation of the correlation coefficient is no problem. Don't use the "Correlation" function that comes with TBQ. This function is wrong, or it does not conform to the Pearson correlation coefficient algorithm we use.

We are returning to Figure 1. The biggest one in history was 2016.3.7. That time, the rebar reached the daily limit, and the next day the market continued to increase. However, the actual naked eye observation index and the main continuous recovery right are not particularly different. And in the calculation, we found that the scene of -2 appeared. This is not a calculation problem. It is a default value added to the source code, as shown in the following figure:

Figure 6: Function content initialization

The remaining relatively large correlation fluctuation periods are near the two time points of 2018.11 and 2019.3, which are basically before the conversion points of 01→05, 05→10 contracts. According to the previous review logic, the maximum period is 13 K lines, which is 3 hours.

Throughout the period of 4 years and 7 months, there have been 7 fluctuations in the correlation between large and small, and not every major contract change will reduce the correlation.

Backtesting variance analysis of the same strategy

Next, let's go to the second step: use the same strategy to backtest the index and the main link recovery contract. Let's take a look at what the difference between specific performance metrics and performance curves looks like.

  Figure 7: TBQ Index Backtest Performance Chart

The above picture is the backtest performance chart from 2016.1.1-2020.8.3 under the scenario of 10,000 1.5 handling fee slippage and 1 jump of strategy A (only including long positions). The overall performance can be seen in the figure. Let's take a look at the performance chart of the main company

Figure 8: TBQ main connection recovery weight backtest performance chart

We can see from the picture at a glance that the index and the main link recovery rights are still quite large, at least the profit in the second half of the curve is obviously different. Don't deny it directly. The purpose of our research and analysis is not as simple as looking at a line of sight, but also to see how big the gap between all the performance indicators is.     

Figure 9: TBQ Index Backtest Performance

Figure 10: Backtest performance of TBQ main connection recovery rights

The range of differences in key performance indicators is shown in the table below:

index

Master's reinstatement

percentage range

Net profit 102157

57833

-43%

Maximum drawdown value 17729

21137

19%

Sharp 1.08

0.7

-35%

Win rate 44.28

41.44

-6.8%

Average profit 508

260

-48%

Number of transactions 201

222

10%

As can be seen from some performance indicators, the benefit-type indicators have been reduced by almost half, and the risk-type indicators have increased by 20%. In short, the overall weakening.

What can we learn from it? Why bother doing this? In fact, the purpose is to verify that we use the index to do backtesting, and use the index to map whether the real market is reliable or deceived.

At a preliminary point of view, we can see from this data that the performance of our index backtesting is actually about half of the actual profit, and the risk is underestimated by about 20%. The smoothness is also weakened by about 40%, and these specific intuitive feelings can also be seen from the graph.

Next, in order to verify the objectivity, we follow all the above logical steps to verify the short data. We will go directly to the results this time, without interspersing any explanatory diagrams.

index

Master's reinstatement

percentage range

Net profit 59526

-29171

-149%

Maximum drawdown value 23765

41385

74%

Sharp 0.59

-0.33

-155%

Win rate 38

30

-21%

Average Profit 607

-277

-145%

Number of transactions 98

105

7%

From the results in the short table, we can see that when the index contract is loaded, it is profitable, but when it comes to the main link, it ends up losing money.

think

1. Is the result of index strategy all the same? A: No, because it involves a strategy issue. Not all short strategies have more than double the difference between the index and the main link, but it shows that this situation will occur. That is to say, in the case of an index, this strategy is generally not considered, because in the actual main link recovery rights, it is definitely a loss.

2. Does that mean that we don't need to index, and it's over to directly restore the rights of the main company. Why do we spend so much work? A: It is indeed possible to do this. Even if these are not analyzed, from the perspective of closeness to the real market and backtesting, the actual transaction data should be the main focus.

3. Is there still value in index contracts? A: There is a preliminary personal assessment (not the final conclusion). Because the index can indeed filter the signal noise in the main link rebalancing to a certain extent, we can see from the data in the two long and short tables above that the number of transactions of the index contract is less than that of the main link rebalancing. We can consider using the index to calculate the signal, and the performance data between entering and exiting the market is calculated by the main link weight. Then the problem comes again. If the index and the main link weight are completely reversed, then the index cannot be used. For example, we will discuss Apple later. And eggs these two strange varieties.

Today is the first article of the in-depth analysis of the difference between the index and the main link weight. Due to the space problem, I will discuss the above questions in the follow-up, as well as whether to use the main link weight, index, or index to calculate the signal, and the main link weight to calculate the performance. further elaboration. Of course, according to the logic of our analysis, the results of all varieties are output, there will be logical places, and there will also be surprises.

Finally, I leave a question for everyone. If a strategy is very good in the main force recovery and backtesting, is it also good in the index contract? You can leave a message to participate in the discussion. If you think this article is helpful to you, please forward it or click at the end of the article to read and like it. The code word is not suitable.

This strategy is only used for learning and communication, and investors are personally responsible for the profit and loss of real trading.

Guess you like

Origin blog.csdn.net/m0_56236921/article/details/123788188