What is a technical analysis (Technical Analysis) indicator library

TA-Lib is a Python library that encapsulates many commonly used indicators of financial transaction technical analysis implemented in C language. In order to facilitate users to calculate these technical indicators in DolphinDB, we use the DolphinDB script to implement the indicator functions included in TA-Lib and encapsulate them in the DolphinDB ta module  (ta.dos). DolphinDB Database Server  1.10.3 or above is required to use ta module .

1. The naming and usage specifications of functions and parameters

  • Different from the specification of all function names in uppercase and all parameter names in lowercase in TA-Lib, in the ta module, the function names and parameter names adopt the camel case naming method.

For example, the syntax of the DEMA function in TA-Lib is DEMA(close, timeperiod=30). The corresponding function in the ta module is dema(close, timePeriod).

  • Some functions in TA-Lib have optional parameters. In the ta module, all parameters are mandatory.
  • In order to get meaningful results, the parameter timePeriod of the function in the ta module must be at least 2.

2. Example of use

2.1 Direct use of indicator functions in scripts

wmaCalculate a vector directly using the function in the ta module :

use ta
close = 7.2 6.97 7.08 6.74 6.49 5.9 6.26 5.9 5.35 5.63
x = wma(close, 5);

2.2 Use in groups in SQL statements

Users often need to calculate multiple sets of data in each group in the data table. In the following example, we construct a data table containing 2 stocks:

close = 7.2 6.97 7.08 6.74 6.49 5.9 6.26 5.9 5.35 5.63 3.81 3.935 4.04 3.74 3.7 3.33 3.64 3.31 2.69 2.72
date = (2020.03.02 + 0..4 join 7..11).take(20)
symbol = take(`F,10) join take(`GPRO,10)
t = table(symbol, date, close)

Use the wmafunction in the ta module to calculate each stock :

update t set wma = wma(close, 5) context by symbol

2.3 Return results of multiple columns

Some functions return results in multiple columns, such as functions bBands.

Examples of direct use:

close = 7.2 6.97 7.08 6.74 6.49 5.9 6.26 5.9 5.35 5.63
low, mid, high = bBands(close, 5, 2, 2, 2);

Examples used in SQL statements:

close = 7.2 6.97 7.08 6.74 6.49 5.9 6.26 5.9 5.35 5.63 3.81 3.935 4.04 3.74 3.7 3.33 3.64 3.31 2.69 2.72
date = (2020.03.02 + 0..4 join 7..11).take(20)
symbol = take(`F,10) join take(`GPRO,10)
t = table(symbol, date, close) 
select *, bBands(close, 5, 2, 2, 2) as `high`mid`low from t context by symbol

symbol date       close high     mid      low
------ ---------- ----- -------- -------- --------
F      2020.03.02 7.2
F      2020.03.03 6.97
F      2020.03.04 7.08
F      2020.03.05 6.74
F      2020.03.06 6.49  7.292691 6.786    6.279309
F      2020.03.09 5.9   7.294248 6.454    5.613752
F      2020.03.10 6.26  7.134406 6.328667 5.522927
F      2020.03.11 5.9   6.789441 6.130667 5.471892
F      2020.03.12 5.35  6.601667 5.828    5.054333
F      2020.03.13 5.63  6.319728 5.711333 5.102939
GPRO   2020.03.02 3.81
GPRO   2020.03.03 3.935
GPRO   2020.03.04 4.04
GPRO   2020.03.05 3.74
GPRO   2020.03.06 3.7   4.069365 3.817333 3.565302
GPRO   2020.03.09 3.33  4.133371 3.645667 3.157962
GPRO   2020.03.10 3.64  4.062941 3.609333 3.155726
GPRO   2020.03.11 3.31  3.854172 3.482667 3.111162
GPRO   2020.03.12 2.69  3.915172 3.198    2.480828
GPRO   2020.03.13 2.72  3.738386 2.993333 2.24828

3. Performance description

Compared with the corresponding functions in TA-Lib, the functions in the ta module have similar average speeds when used directly, but in group calculations, the performance of the functions in the ta module far exceeds the corresponding functions in TA-Lib. For the performance comparison in this section, we take the wmafunction as an example.

3.1 Direct use performance comparison

In DolphinDB:

use ta
close = 7.2 6.97 7.08 6.74 6.49 5.9 6.26 5.9 5.35 5.63
close = take(close, 1000000)
timer x = wma(close, 5);

Using the wmafunction in the ta module directly for a vector with a length of 1,000,000 takes 3 milliseconds.

The corresponding Python statement is as follows:

close = np.array([7.2,6.97,7.08,6.74,6.49,5.9,6.26,5.9,5.35,5.63,5.01,5.01,4.5,4.47,4.33])
close = np.tile(close,100000)

import time
start_time = time.time()
x = talib.WMA(close, 5)
print("--- %s seconds ---" % (time.time() - start_time))

The WMAfunction in TA-Lib takes 11 milliseconds, which is wma3.7 times that of the function in DolphinDB ta module .

3.2 Performance comparison of grouping

In DolphinDB, construct a data table containing 1000 stocks with a total length of 1,000,000:

n=1000000
close = rand(1.0, n)
date = take(2017.01.01 + 1..1000, n)
symbol = take(1..1000, n).sort!()
t = table(symbol, date, close)
timer update t set wma = wma(close, 5) context by symbol;

Use the wmafunction in the ta module to calculate each stock, which takes 17 milliseconds.

The corresponding Python statement is as follows:

close = np.random.uniform(size=1000000)
symbol = np.sort(np.tile(np.arange(1,1001),1000))
date = np.tile(pd.date_range('2017-01-02', '2019-09-28'),1000)
df = pd.DataFrame(data={'symbol': symbol, 'date': date, 'close': close})

import time
start_time = time.time()
df["wma"] = df.groupby("symbol").apply(lambda df: talib.WMA(df.close, 5)).to_numpy()
print("--- %s seconds ---" % (time.time() - start_time))

It WMAtakes 535 milliseconds to calculate each stock using the function in TA-Lib, which wmais 31.5 times that of the function in the ta module .

4. Vectorization implementation

Like TA-Lib, all functions in the ta module are vector functions: the input is a vector, and the output result is also a vector of equal length. The bottom layer of TA-Lib is implemented in C language, which is very efficient. Although the ta module is implemented with DolphinDB's scripting language, it makes full use of the built-in vectorization functions and higher-order functions, avoids loops, and is extremely efficient. Among the 57 functions that have been implemented, 28 functions run faster than TA-Lib, and the fastest function is about 3 times the performance of TA-Lib; 29 functions are slower than TA-LIB, and the slowest performance is no less than 1/3 of TA-Lib.

The function implementation in ta module is also extremely concise. ta.dos has a total of 765 lines, and each function averages about 14 lines. Excluding comments, blank lines, the start and end lines of the function definition, and the pipeline code to remove the null value at the beginning of the input parameters, the core code of each function is about 4 lines. Users can browse the function code of the ta module and learn how to use DolphinDB scripts for efficient vectorized programming.

4.1. Handling of Null Values

If the input vector of TA-Lib starts to contain a null value, the calculation starts from the first non-empty position. The ta module uses the same strategy. During the calculation of the rolling/cumulative window function, for each group of initial values ​​that have not reached the window length, the result at the corresponding position is empty. In this regard, the results of TA-Lib and ta modules are consistent. But afterwards, if there is a null value, the result of this position and all subsequent positions in the TA-Lib function may be a null value. Unless the number of non-null value data in the window is not enough to calculate the indicator (for example, there is only one non-null value when calculating the variance), the number of null values ​​does not affect the production of the ta module function result.

//use ta in dolphindb
close = [99.9, NULL, 84.69, 31.38, 60.9, 83.3, 97.26, 98.67]
ta::var(close, 5, 1)

[,,,,670.417819,467.420569,539.753584,644.748976]

//use talib in python
close = np.array([99.9, np.nan, 84.69, 31.38, 60.9, 83.3, 97.26, 98.67])
talib.VAR(close, 5, 1)

array ([in, in, in, in, in, in, in, in])

In the above overall variance calculation, because the second value of close is a null value, the output of the ta module and TA-Lib are different, and the output of TA-Lib is all null. If you replace the null value with 81.11, the ta module and TA-Lib get the same result. Add a null value before the first element 99.9, the result of the two is still the same. In short, when only the first k elements of the input parameters are empty, the output results of the ta module and TA-Lib are exactly the same.

4.2 Iterative processing

Many indicator calculations in technical analysis use iteration, that is, the current indicator value depends on the previous indicator value and the current input: r[n] = coeff * r[n-1] + input[n]. For this type of calculation, DolphinDB introduces functions iteratefor vectorization, avoiding the use of loops.

def ema(close, timePeriod) {
1 	n = close.size()
2	b = ifirstNot(close)
3	start = b + timePeriod
4	if(b < 0 || start > n) return array(DOUBLE, n, n, NULL)
5	init = close.subarray(:start).avg()
6	coeff = 1 - 2.0/(timePeriod+1)
7	ret = iterate(init, coeff, close.subarray(start:)*(1 - coeff))
8	return array(DOUBLE, start - 1, n, NULL).append!(init).append!(ret)
}

Taking the emafunction implementation as an example, the fifth line of code calculates the mean value of the first window as the initial value of the iteration sequence. The sixth line of code defines the iteration parameters. The 7th line of code uses the iteratefunction to calculate the ema sequence. The built-in function iteratehas a very high operating efficiency. It calculates the ema sequence of a vector with a length of 1,000,000. When the window length is 10, TA-Lib takes 7.4ms, and the ta module only takes 5.0ms, which is faster than TA-Lib.

4.3 Application of sliding window function

Most technical indicators will specify a sliding window and calculate the indicator value in each window. DolphinDB built-in function has been included in the index part of the base of the sliding window calculations, including mcountmavgmsummmaxmminmimaxmiminmmedmpercentilemrankmmadmbetamcorrmcovarmstdand mvar. These basic sliding window functions have been fully optimized, and the complexity of most functions has reached O(n), that is, it has nothing to do with the window length. More complex sliding indicators can be achieved by superimposing or transforming the above basic indicators. ta::var is the overall variance, and the built-in mvar of DolphinDB is the sample variance, so it needs to be adjusted.

def var(close, timePeriod, nddev){
1	n = close.size()
2	b = close.ifirstNot()
3	if(b < 0 || b + timePeriod > n) return array(DOUBLE, n, n, NULL)
4	mobs =  mcount(close, timePeriod)
5	return (mvar(close, timePeriod) * (mobs - 1) \ mobs).fill!(timePeriod - 1 + 0:b, NULL)
}

Below we give a more complex example, the realization of the linearreg_slope indicator. Linearreg_slope actually calculates the beta of close relative to the sequence 0 .. (timePeriod-1). This indicator does not seem to be vectorized. The data of each window must be taken out and beta calculations are performed cyclically. But in fact, the independent variable in this example is quite special. It is a fixed arithmetic sequence. When calculating the beta of the next window, it can be optimized by incremental calculation. Since beta(A,B) = (sumAB-sumA*sumB/obs)/varB, varB and sumB are fixed, when sliding the window, we only need to optimize the calculation of sumAB and sumA. Through the simplification of the formula, the change of sumAB between the two windows can be realized by vectorization. For details, refer to line 10 of the code. The 12th line of the code calculates the sumAB of the first window. The sumABDelta.cumsum() in the 13th line of the code vectorizes to calculate the sumAB value of all windows.

def linearreg_slope(close, timePeriod){
1	n = close.size()
2	b = close.ifirstNot()
3	start = b + timePeriod
4	if(b < 0 || start > n) return array(DOUBLE, n, n, NULL)
5	x = 0 .. (timePeriod - 1)
6	sumB = sum(x).double()
7 varB = sum2 (x) - sumB*sumB/timePeriod
8	obs = mcount(close, timePeriod)
9	msumA = msum(close, timePeriod)
10	sumABDelta = (timePeriod - 1) * close + close.move(timePeriod) - msumA.prev() 
11	sumABDelta[timePeriod - 1 + 0:b] = NULL
12	sumABDelta[start - 1] =  wsum(close.subarray(b:start), x)
13	return (sumABDelta.cumsum() - msumA * sumB/obs)/varB
}

Calculate the linearreg_slope sequence of a vector with a length of 1,000,000. When the window length is 10, the TA-Lib takes 13ms and the ta module takes 14ms. The two are almost the same. This is not easy for ta implemented with scripts. When the window is increased to 20, the time-consuming of TA-Lib increases to 22ms, while the time-consuming of ta is still 14ms. This shows that the implementation of TA-Lib uses a loop to calculate each window separately, while ta implements vectorized calculation, regardless of the length of the window.

4.4 Techniques to reduce data duplication

When performing operations such as slice, join, and append on a vector, a large amount of data is likely to be copied. Usually data copying is more time-consuming than many simple calculations. Here are some tips on how to reduce data duplication through some practical examples.

4.4.1 Use vector view subarray to reduce data copying

If you directly slice a sub-window of a vector for calculation, a new vector will be generated and the data will be copied, which not only takes up more memory but also takes time. DolphinDB has introduced a new data structure for this purpose subarray. It is actually a view of the original vector, it just records the pointer of the original vector, as well as the start and end positions, and does not allocate a large block of memory to store the new vector, so no data copy actually occurs. All vector read-only operations can be directly applied to subarray. The implementations of ema and linearreg_slope make extensive use of subarray. In the following example, we perform 100 slice operations on a million-length vector, which takes 62ms, and each operation takes 0.62ms. Considering that the ema operation for testing a million-length vector in 4.2 takes only 5ms, the saving of 0.62ms is very considerable.

close = rand(1.0, 1000000)
timer(100) close[10:]

Time elapsed: 62 ms

4.4.2 Specify capacity for the vector to avoid expansion

When we append data to the end of a vector, if the capacity is not enough, we need to allocate a larger memory space, copy the old data to the new memory space, and finally release the old memory space. When the vector is relatively large, this operation may be time-consuming. If the final length of a vector is clearly known, then specifying this length as the capacity of the vector in advance can avoid the occurrence of vector expansion. DolphinDB's built-in function array(dataType, [initialSize], [capacity], [defaultValue]) can specify capacity when creating. For example, in line 8 of the ema function, first create a vector with a capacity of n, and then append the calculation result.

5. DolphinDB ta index list

Overlap Studies

111.png

Momentum Indicators

222.png

Volume Indicators

333.png

Volatility Indicators

444.png

Price Transform

555.png

Statistic Functions

666.png

Other Functions

  • For the Math Transform and Math Operators functions in Ta-Lib, you can use the corresponding DolphinDB built-in functions instead. For example, Ta-Lib in SQRT, LN, SUM functions, respectively, may be used in DolphinDB  sqrtlogmsum function instead.
  • The following Ta-Lib functions have not been implemented in the ta module: all Pattern Recognition and Cycle Indicators class functions, as well as HT_TRENDLINE (Hilbert Transform-Instantaneous Trendline), ADOSC (Chaikin A/D Oscillator), MAMA (MESA Adaptive Moving Average), SAR( Parabolic SAR), SAREXT (Parabolic SAR-Extended) functions.

6. Roadmap

  • The indicator function that has not yet been implemented will be implemented in the next version and is expected to be completed in April 2020.
  • Currently, DolphinDB's custom functions do not support default parameters, nor do they support inputting parameters based on key values ​​when calling functions. These two points will be implemented in DolphinDB Server 1.20.0, when the ta module will implement the default parameters consistent with TA-Lib.
  • Before using the ta module, you must use use ta to load it, which is inconvenient in interactive queries. DolphinDB Server will allow pre-loading of modules during system initialization in version 1.20. The ta module function will have the same status as DolphinDB's built-in functions to save the step of loading modules.


Guess you like

Origin blog.51cto.com/15022783/2656668