How do the great gods perform feature extraction in time series? After reading it, you will understand!

Author|Translated by Sharmistha Chatterjee| Huohuojiang
~, editor in charge|
Produced by Jin Zhaoyu|

introduce

Institutions of all types now need to collect more and more data and often need to detect unusual or anomalous time series. For example, Yahoo has a large number of mail servers and monitors them in real time. To identify servers/devices that are behaving abnormally, a large amount of measurement data on server/IoT device performance is collected from each server every hour.

The Python library tsfeature can assist us in calculating the feature vector on each time series, measuring different characteristics of the series, including lag correlation, seasonal strength, spectral entropy, etc.

In this article, we will discuss different feature extraction techniques in time series and demonstrate using two different time series as examples.

Commonly used feature extraction metrics

One of the most commonly used feature extraction mechanisms in data science, Principal Component Analysis (PCA), is also used for time series feature extraction . After performing a principal component analysis on the features, a variety of bivariate outlier detection methods can be applied to the first two principal components. This allows the most unusual sequences to be identified based on their eigenvectors. The bivariate outlier detection method used was based on the regions of highest density.

Changes in variance or volatility over time can cause some problems when modeling time series using classical methods such as ARIMA .

The ARCH (Autoregressive Conditional Heteroskedasticity) method plays a crucial role in high volatility time series models such as stock forecasting, and it can be used to measure variance over time, such as increases or decreases in volatility.

Next, we will introduce some time series features, functions, and related information.

The code below shows how we can extract relevant features with one line of code.

source code

tsf_hp = tf.holt_parameters(df2['# Direct_1'].values)
print(tsf_hp)

tsf_centrpy = tf.count_entropy(df2['# Direct_1'].values)
print(tsf_centrpy)

tsf_crossing_points =tf.crossing_points(df2['# Direct_1'].values)
print(tsf_centrpy)

tsf_entropy =tf.entropy(df2['# Direct_1'].values)
print(tsf_entropy)

tsf_flat_spots =tf.flat_spots(df2['# Direct_1'].values)
print(tsf_flat_spots)

tsf_frequency =tf.frequency(df2['# Direct_1'].values)
print(tsf_frequency)

tsf_heterogeneity = tf.heterogeneity(df2['# Direct_1'].values)
print(tsf_heterogeneity)

tsf_guerrero =tf.guerrero(df2['# Direct_1'].values)
print(tsf_guerrero)

tsf_hurst = tf.hurst(df2['# Direct_1'].values)
print(tsf_hurst)

tsf_hw_parameters = tf.hw_parameters(df2['# Direct_1'].values)
print(tsf_hw_parameters)

tsf_intv = tf.intervals(df2['# Direct_1'].values)
print(tsf_intv)

tsf_lmp = tf.lumpiness(df2['# Direct_1'].values)
print(tsf_lmp)

tsf_acf = tf.acf_features(df2['# Direct_1'].values)
print(tsf_acf)

tsf_arch_stat = tf.arch_stat(df2['# Direct_1'].values)
print(tsf_arch_stat)

tsf_pacf = tf.pacf_features(df2['# Direct_1'].values)
print(tsf_pacf)

tsf_sparsity = tf.sparsity(df2['# Direct_1'].values)
print(tsf_sparsity)

tsf_stability = tf.stability(df2['# Direct_1'].values)
print(tsf_stability)

tsf_stl_features = tf.stl_features(df2['# Direct_1'].values)
print(tsf_stl_features)

tsf_unitroot_kpss = tf.unitroot_kpss(df2['# Direct_1'].values)
print(tsf_unitroot_kpss)

tsf_unitroot_pp = tf.unitroot_pp(df2['# Direct_1'].values)
print(tsf_unitroot_pp)

The results indicate eigenvalues ​​extracted from the Fetal ECG.

Resulting time series - 1 (data from Fetal ECG)

The figure below shows the time series data collected from Fetal ECG, from which features were extracted.

{'alpha': 0.9998016430979507, 'beta': 0.5262228301908355}
{'count_entropy': 1.783469256071135}
{'crossing_points': 436}
{'entropy': 0.6493414196542769}
{'flat_spots': 131}
{'frequency': 1}
{'arch_acf': 0.3347171050143251, 'garch_acf': 0.3347171050143251, 'arch_r2': 0.14089508110660665, 'garch_r2': 0.14089508110660665}
{'hurst': 0.4931972012451876}
{'hw_alpha': nan, 'hw_beta': nan, 'hw_gamma': nan}
{'intervals_mean': 2516.801557547009, 'intervals_sd': nan}
{'guerrero': nan}
{'lumpiness': 0.01205944072461473}
{'x_acf1': 0.8262122472240574, 'x_acf10': 3.079891123506255, 'diff1_acf1': -0.27648384824011435, 'diff1_acf10': 0.08236265771293629, 'diff2_acf1': -0.5980110240921641, 'diff2_acf10': 0.3724461872893135}
{'arch_lm': 0.7064704126082555}
{'x_pacf5': 0.7303549429779813, 'diff1x_pacf5': 0.09311680507880443, 'diff2x_pacf5': 0.7105000333917864}
{'sparsity': 0.0}
{'stability': 0.16986190432765097}
{'nperiods': 0, 'seasonal_period': 1, 'trend': nan, 'spike': nan, 'linearity': nan, 'curvature': nan, 'e_acf1': nan, 'e_acf10': nan}
{'unitroot_kpss': 0.06485903737928193}
{'unitroot_pp': -908.3309773009415}

The results show the extracted eigenvalues ​​related to the temperature change of the date.

Resulting time series - 2 (data from daily temperature)

{'alpha': 0.4387345064923509, 'beta': 0.0}
{'count_entropy': -101348.71338310161}
{'crossing_points': 706}
{'entropy': 0.5089893350876903}
{'flat_spots': 10}
{'frequency': 1}
{'arch_acf': 0.016273743642920828, 'garch_acf': 0.016273743642920828, 'arch_r2': 0.015091960217949008, 'garch_r2': 0.015091960217949008}
{'hurst': 0.5716257806690483}
{'hw_alpha': nan, 'hw_beta': nan, 'hw_gamma': nan}
{'intervals_mean': 1216.0, 'intervals_sd': 1299.2740280633643}
{'guerrero': nan}
{'lumpiness': 5.464398615083545e-05}
{'x_acf1': -0.0005483958183129098, 'x_acf10': 3.0147995912148108e-06, 'diff1_acf1': -0.5, 'diff1_acf10': 0.25, 'diff2_acf1': -0.6666666666666666, 'diff2_acf10': 0.4722222222222222}
{'arch_lm': 3.6528279285796827e-06}
{'nonlinearity': 0.0}
{'x_pacf5': 1.5086491342316237e-06, 'diff1x_pacf5': 0.49138888888888893, 'diff2x_pacf5': 1.04718820861678}
{'sparsity': 0.0}
{'stability': 5.464398615083545e-05}
{'nperiods': 0, 'seasonal_period': 1, 'trend': nan, 'spike': nan, 'linearity': nan, 'curvature': nan, 'e_acf1': nan, 'e_acf10': nan}
{'unitroot_kpss': 0.29884876591708787}
{'unitroot_pp': -3643.7791982866393}

Summarize

  • Above, we discussed simple steps to extract features from time series (both time series have seasonality = 1), which can help us spot anomalies.

  • It is not difficult to see from the calculated metrics that the first series is more stable (higher values ​​given by stability and entropy) because the time stamp data has a longer period and its fluctuation is relatively small compared to the entire period.

  • The second time series shows higher volatility, manifested by more intersections.

  • Therefore, we also observe that the second time series also has a lower block degree and interval mean, which indicates a smaller variance in its variance . unirooot_kpss and unirooot_kpss indicate that there is a unit root in the vector, which is less than 1 and negative in the two time series, respectively.

  • tsfeature also supports evaluating custom functions entered as NumPy arrays and returns a dictionary with feature names as keys and their values.

references

  • https://github.com/FedericoGarza/tsfeatureshttps://htmlpreview.github.io/?

  • https://github.com/robjhyndman/M4metalearning/blob/master/docs/M4_methodology.html#features

  • https://cran.r-project.org/web/packages/tsfeatures/tsfeatures.pdf

  • https://robjhyndman.com/papers/icdm2015.pdf

  • https://math.berkeley.edu/~btw/thesis4.pdf

  • https://machinelearningmastery.com/develop-arch-and-garch-models-for-time-series-forecasting-in-python/

  • https://ir.nctu.edu.tw/bitstream/11536/14555/1/A1997YD78100005.pdf

  • Principal Component Analysis for Time Series and Other Non-Independent Data – https://link.springer.com/chapter/10.1007%2F0-387-22440-8_12

Original link: https://hackernoon.com/key-tactics-the-pros-use-for-feature-extraction-from-time-series-e7q3wfr

This article is translated by CSDN cloud computing, please indicate the source for reprinting

More reading recommendation

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324133330&siteId=291194637