Dry goods丨How to convert high-frequency signals into discrete buying and selling signals in a time series database

In high frequency trading, we usually first generate semaphores based on tick-level quotation information and transaction information, and then convert these semaphores into discrete buying and selling signals, such as 1 (buy), 0 (unchanged), -1 (Sell), then generate orders based on funds and existing positions and other optimization rules and send them to the trading system. This article will discuss the second step, that is, how to convert the semaphore into discrete buying and selling signals, that is, converting a floating-point number array signal into an integer array direction with the value 1, 0 or -1.

If the conversion rule is simple, such as exceeding a certain threshold t1 as +1 (buy signal), below a certain threshold t2 as -1 (selling signal), and 0 in other cases, then the realization is also very simple. For example, the following expression can be used in DolphinDB.

iif(signal > t1, 1, iif(signal <t2, -1, 0))

In practice, in order to make the system more robust, don't switch the direction of buying and selling frequently, it is usually not handled like this. A commonly used method is as follows: when the signal volume exceeds a certain threshold t1, it begins to be converted into a buy signal, and the subsequent signal volume is kept as a buy signal (+1) until it decays below t10; the same is true for the signal volume When it is lower than a certain threshold t2, it starts to transform into a sell signal, and the subsequent signal volume will keep the sell signal (-1) until it increases to greater than t20; otherwise, it is 0. Here t1, t10, t2, t20 satisfy the following rules:

t1 > t10 > t20 > t2

When the system runs according to the above rules, besides the current signal value, the state of the previous signal is also the state of the previous signal, which is a typical path dependence problem. Usually we think that the path dependence problem is not suitable for vectorization method to deal with, or requires very high skills. The languages we use for backtesting high-frequency data are usually scripting languages (such as DolphinDB and kdb+). The scripting language is very efficient in dealing with quantification problems. However, if you need to deal with the path dependence problem line by line, the analysis cost will be very high and efficient. low. Today we will introduce some techniques, how to resolve this contradiction?

Let's find the buy signal first. It is easy to find a point greater than t1 in a vector (the critical point of a buy signal), and it is also easy to find a point that cannot be a buy signal (less than t10). In this way, we divide the points on a vector into three states, the critical point of the buy signal (+1), which cannot be the point of the buy signal (0), and the points with unknown states (NULL). According to the previous rules, if the point where the state is unknown has a critical point for buying, then this point should also be set as a buying signal point; if a non-buying signal point (0) appears before, then this point should also be set It is a non-buy signal point. So we can use front fill to achieve. We can use the same method to find the sell signal (the sell signal is +1, the other signals are 0). Subtract the two to get the final signal. There may be some null signals. Replace these signals with 0. The entire code of DolphinDB is as follows:

buy = iif(signal >t1, 1h, iif(signal < t10, 0h, 00h)).ffill()
sell = iif(signal <t2, 1h, iif(signal > t20, 0h, 00h)).ffill()
direction = (buy - sell).nullFill(0h)

The above code can be combined into a single expression:

direction = (iif(signal >t1, 1h, iif(signal < t10, 0h, 00h)) - iif(signal <t2, 1h, iif(signal > t20, 0h, 00h))).ffill().nullFill(0h)

A simple test is as follows:

t1= 60
t10 = 50
t20 = 30
t2 = 20
signal =10 20 70 59 42 49 19 25 26  35
direction = (iif(signal >t1, 1h, iif(signal < t10, 0h, 00h)) - iif(signal <t2, 1h, iif(signal > t20, 0h, 00h))).ffill().nullFill(0h)

[-1,-1,1,1,0,0,-1,-1,-1,0]

If you use kdb+ script instead, the expression is as follows:

direction: 0h^fills(-).(0N 1h)[(signal>t1;signal<t2)]^'(0N 0h)[(signal<t10;signal>t20)]

If implemented using pandas, the code is as follows:

t1 = 60
t10 = 50
t20 = 30
t2 = 20
signal = pd.Series([10,20,70,59,42,49,19,25,26,35])
direction = (signal.apply(lambdas: 1 if s > t1 else (0 if s < t10 else np.nan)) -
             signal.apply(lambdas: 1 if s < t2 else (0 if s > t20 else np.nan))).ffill().fillna(0)

Below we generate a random signal array between 0 and 100 with a length of 10 million to test the performance of DolphinDB, kdb+ and pandas. The machine configuration used in the test is as follows:

CPU：Intel(R) Core(TM) i7-7700 CPU @3.60GHz 3.60 GHz

Memory: 16GB

OS ： Windows 10

DolphinDB takes 330ms, kdb+ takes 800ms, and pandas takes about 6.8s . The test script of DolphinDB is as follows:

t1= 60
t10 = 50
t20 = 30
t2 = 20
signal = rand(100.0, 10000000)
timer direction = (iif(signal >t1, 1h, iif(signal < t10, 0h, 00h)) - iif(signal <t2, 1h, iif(signal > t20, 0h, 00h))).ffill().nullFill(0h)

The test script of kdb+ is as follows:

t1:60
t10:50
t20:30
t2:20
signal: 10000000 ? 100.0
\t  0h^fills(-).(0N 1h)[(signal>t1;signal<t2)]^'(0N 0h)[(signal<t10;signal>t20)]

The pandas test script is as follows:

import time
t1= 60
t10= 50
t20= 30
t2= 20
signal= pd.Series(np.random.random(10000000) * 100)
start= time.time()
direction= (signal.apply(lambdas:1 if s > t1 else (0 if s < t10 else np.nan)) -
            signal.apply(lambdas:1 if s < t2 else (0 if s > t20 else np.nan))).ffill().fillna(0)
end= time.time()
print(end- start)

Through the above example, it is not difficult to find that the scripts of DolphinDB and kdb+ have many things in common in nature. The script of kdb+ can basically be translated into DolphinDB script sentence by sentence. The difference is that kdb+ parses scripts from left to right, while DolphinDB, like a conventional programming language, is from right to left; kdb+ likes to use symbols to represent a certain function, while DolphinDB prefers to use functions to express a certain function. Readability will be better but it will also be a bit more verbose.

Dry goods丨How to convert high-frequency signals into discrete buying and selling signals in a time series database

Guess you like