Strategies for measuring time series similarity: from Euclidean distance to DTW and its variants

Original link: https://zhuanlan.zhihu.com/p/389388258

According to the different characteristics of the time series itself, there are many methods to measure the similarity of time series. This article starts from the Euclidean distance and further extends to Dynamic Time Warping (DTW), some shortcomings and related solutions of DTW, and two variants of DTW, Derivative Dynamic Time Warping (DDTW) and Weighted Dynamic Time Warping (WDTW). ).


1 Introduction/Background

Time series is a ubiquitous data format in many broad scientific research fields (further reading: A review of deep learning time series ). For time series related research, one of the most common needs is to compare whether two time series are similar. Effectively comparing the similarity between time series is very necessary and critical in many scientific/engineering tasks, such as: classification/clustering/speech recognition/gait recognition, etc.

Take the time series data collected for certain characteristics (some) of finished products in a certain manufacturing process as an example. First, the collected time series data representing good and defective products are different in certain characteristics, and these differences have specific physical meanings related to domain knowledge; secondly, due to the inherent characteristics of the production and manufacturing process, the collected The length of the received time series data is not equal; furthermore, there are many reasons for the generation of defective products. In other words, the finished products can not only be classified into two categories: good products and defective products, but can be further divided into good products. , defective type 1, defective type 2, defective type 3 and many other types; finally, like many data in actual manufacturing, the amount of data on defective products is much larger than the amount of data on defective products, and various types of defective products continue to be subdivided. There is even less data on defective products, and there is a serious data imbalance problem overall.

In order to achieve the multi-classification task of good products and defective products of different types in the normal production and manufacturing process, comparing the similarity between the collected time series is an important step. It is not difficult to understand intuitively that comparing the similarity of time series is equivalent to calculating the "distance" between time series. The greater the "distance" between two time series, the smaller the similarity between the two time series, and vice versa.

Therefore, this article starts from the Euclidean distance and further extends to Dynamic Time Warping (DTW), some shortcomings and related solutions of DTW, and two variants of DTW, Derivative Dynamic Time Warping (DDTW) and Weighted Dynamic Time Warping (WDTW). . Since there are many high-quality blog posts about the details of DTW, this article only explains the basic concepts and pays more attention to the differences between different methods, the logic of transition, and the issues applicable to different methods.


2 Euclidean distance

When it comes to measuring the distance between time series, Euclidean distance is the most direct method. It has a simple concept and will not be described in detail here. When applying Euclidean distance to compare two time series, each point between the sequences establishes a one-to-one correspondence in order, and the Euclidean distance is calculated based on the correspondence between points as the two time series. A distance measure (similarity) between time series. As shown in Figure 1 below:

4cd906189f21649983717f6661cb96fe.jpeg

▲ Figure 1. Euclidean distance between two equal-time series

When applying Euclidean distance, the i-th point in the first time series forms a one-to-one correspondence with the i-th point in the second time series. However, Euclidean distance can cause problems in some cases, as shown in Figure 2 below:

50cbe9b07c675237f1963051b890fde0.jpeg

▲ Figure 2. Is the Euclidean distance between two unequal time series feasible?

When the lengths of the two time series are not equal, the longer time series will always have points left that cannot be matched. How to calculate the Euclidean distance in this case? There is no doubt that at this point Euclidean distance is no longer feasible. In addition, as shown in the red circle in Figure 1, the two time series have a certain translation on the time axis but the overall trend is similar. Naturally, when we want to artificially align the two time series in Figure 1, the red circle The two downward convex points in should correspond to each other. Obviously, the sequential point-to-point method of Euclidean distance cannot meet our needs.

To sum up, in terms of distance measurement between time series, Euclidean distance has the following limitations: (1) It is only suitable for processing time series of equal lengths; (2) it cannot consider changes on the X-axis when aligning time series, resulting in Sometimes the alignment appears unnatural.

In particular, as a common standard distance measure, Euclidean distance is a special case of another more general distance measure - Minkowski distance when p takes a value of 2. In the Min distance, when p=1 and p=infinity, they respectively correspond to the Manhattan distance and the maximum value of the distance difference between two time series points.


3  DTW (Dynamic Time Warping)

In view of the two main problems mentioned above that Euclidean distance cannot handle unequal length data and the alignment is unnatural when processing equal length data, in order to solve the distance measurement and matching problems of unequal length data, in the 1970s, Japanese scholars DTW was proposed by Itakura et al. In the past few decades, DTW has been widely used in scenarios such as isolated word speech recognition, gesture recognition, data mining, and information retrieval. DTW was once the mainstream method of speech recognition. The principle of DTW is briefly described as follows:

For two unequal length time series Q and C, the lengths are n and m respectively:

8e9a46a017b58164b983706c50ec7de9.jpeg

To use DTW to align two unequal length time series, you need to construct an n*m distance matrix. The element corresponding to the i-th row and j-th column in the matrix represents the distance between points in the sequence. Usually Euclidean distance will be used here, so. See Figure 3 below:

9a9336271f9567ccc818ef3443078115.jpeg

▲ Figure 3. Warping path diagram in DTW

The figure above shows an n*m matrix, and each square represents each element in the matrix. For two time series, DTW puts aside the restriction of Euclidean distance. Its original intention is to find a continuous matching relationship that contains the correspondence between all points in the two time series (this matching can be the point Corresponding to the th point,), the set of these matching relationships together constitute the black solid line warping path W in Figure 3:

592f792102241876722833fd3931521f.jpeg

To perform DTW matching, warping path W needs to meet the following conditions:

1-Boundary conditions: and, in short, two DTW-aligned time series should be connected head-to-head and tail-to-tail. Reflected in the distance matrix is ​​the warping path. It should start from one corner and end at the opposite corner. Stop at the other corner in the direction of the line.

2-Continuity: Each warping path movement to the next step must be continuous. Reflected in the distance matrix, the next step can only be selected from adjacent squares of the original square (the direction must meet the diagonal direction) . Mathematically it can be written as: for , it needs to satisfy, .

3-Monotonicity: The correspondence between two time series must be carried out in order, and the warping paths cannot cross. Mathematically it can be written as: , , needs to satisfy .

There are still many W that meet these conditions, and DTW only looks for W that can minimize the warping cost:

8c70fc72d29cb67b0049e763dd1f84ba.jpeg

In the above formula, K is the length of the warping path. Dividing it by K can eliminate the influence of warping paths of different lengths.

Finally, the corresponding relationship between two unequal length time series data can be obtained by solving the following recursive formula through dynamic programming:

c53cca6610f81bbca3dcc53ce0755ca5.jpeg

Among them, is the total distance of the accumulated warping path to the row and column of the distance matrix.


4 Problems faced by DTW and their solutions

Although DTW has been successfully applied in many fields, DTW still has shortcomings: sometimes DTW can produce unnatural distortion/warping during alignment. As shown in Figure 4 below:

b82bf49ad43dd1db08a6470ca9359e1e.jpeg

▲ Figure 4. Singularities generated by DTW in synthetic data when aligned

The solid and dotted lines in A show two synthetic signals (the mean and variance are the same), B shows the natural "feature to feature" correspondence, and C shows the result of DTW. It is not difficult to find that DTW fails to naturally correspond to the wave peaks in the graph. Instead, it produces a situation where one point in one sequence corresponds to multiple points in another sequence. This situation is called "Singularities". This occurs because the DTW algorithm attempts to account for changes in the Y-axis by distorting the X-axis.

In order to solve the "Singularities" problem, past research has proposed many solutions, which can be roughly classified into the following three categories:

1-Windowing: In the final analysis, singularities occur because points far apart on two time series are easily warped together simply because their values ​​are the same/similar. You can limit the range of options that DTW can choose during the warping process to solve singularities. This can be achieved by setting up a warping window, so it is called the Windowing method. Mathematically it can be written as: , as window width is a positive integer. The range between the two dotted lines in Figure 3 is the range restricted by the window. At this time, the warping path can only be within this area.

2-Slope weighting: When the recursive formula in traditional DTW is changed to the following formula, slope weighting can be achieved.

1d227a378e8d55e968caa2d29595ea09.jpeg

It is not difficult to find that the only difference is that X is added before the last two terms in the min function, and X is a positive real number. When the value of X is adjusted, the direction (slope) of the warping path can be adjusted to a certain extent. When X takes a larger value, the warping path selection will be more diagonal.

3-Step patterns: Changing the recursive formula in traditional DTW to the following formula can change the warping path step.

04d04cc9bf7a1ef0b82df9fb0b8f242b.jpeg

The recursive formula and the above formula in traditional DTW are visualized respectively as shown in A and B in Figure 5 below:

71f52408fa3ef8f33ffd8ae9ff002853.jpeg

▲ Figure 5. Visualization of recursive expressions of two different step patterns

A corresponds to the recursive formula of traditional DTW. The next step can only be selected from the three adjacent squares in the distance matrix, while B corresponds to the recursive formula after changing the step. Compared with A, in B, for every square whose first step does not go along the diagonal direction, it moves one step in the diagonal direction of the square where it is located, so that the step pattern can be changed.

In general, the above three types of solutions are helpful to solve singularities to a certain extent. However, they still have the following shortcomings:

(1) It is possible to miss the correct warping path. The above three types of methods artificially limit and adjust the warping path without any preconditions to reduce warping, which is likely to miss the truly correct warping path.

(2) There is no clear guidance on parameter selection. The selection of R value in the Windowing method and X in the Slope weighting method are all subjective adjustments based on the specific scene, and there is no clear standard.


5  Derivative Dynamic Time Warping (DDTW)

In fact, the reason why DTW causes "Singularities" is essentially determined by the characteristics considered by the DTW algorithm itself: the DTW algorithm only considers the value of the data point on the Y-axis.

For example: the sum of two data points has the same value, but is in the upward trend part of one time series and in the downward trend part of one time series. For DTW, it is easy to match these two points together because they have the same value. However, intuitively, it is difficult to match two parts with opposite trends. In order to avoid the "Singularities" problem caused by DTW only considering the value of the Y axis, DDTW appeared.

DDTW does not consider the Y-axis value of the data point, but considers higher-level characteristics - the "shape" of the time series data. This method obtains information related to "shape" by calculating the first derivative of time series data, so it is called Derivative DTW.

The concept of DDTW itself is also very simple. For traditional DTW, the elements in the distance matrix are the distance between two points; however, for DDTW, the elements in the "distance matrix" at this time are no longer two points. The distance between points is the square of the difference in the first derivative of the time series data at two points. Although there are many methods to estimate the first derivative, for simplicity and scalability, the first derivative estimation in DDTW adopts the following method:

68819a2897ffbc98b70976676d6a1459.jpeg

It is not difficult to find that the estimate of the first derivative at a point is the average of the slope of the straight line passing through the point and the points to the left of the point and the slope of the straight line passing through the points to the left of the point and the points to the right of the point. Keogh, EJ, & Pazzani, MJ mentioned that this estimation method is more stable in the face of outliers when only two data points are considered.

It should be noted that this first-order derivative estimation method cannot calculate the first-order derivative of the first data point and the last data point of the time series data. In actual operation, the second data point and the penultimate data can be used. point derivative instead. In addition, for high-noise data sets, exponential smoothing can be performed before estimating the first derivative.


6  Weighted Dynamic Time Warping (WDTW)

As mentioned above, the classic DTW algorithm only considers the value on the Y-axis when matching points on two time series, and does not consider the difference between the matched points on the X-axis, so it will cause "discrepancies" when matching time series data. Singularities” problem.

Ultimately, the "Singularities" problem arises to some extent from considering only the Y-axis values. One point on the first sequence can be very far away from another point on the second sequence (here "far" refers to The matching of points (distance/ordinal number on the

DDTW solves this problem by estimating the first derivative of the time series data by considering the "shape", while WDTW adopts a different idea. Simply put, WDTW chooses to add a weight when calculating the Euclidean distance between two points on the two sequences, and this weight is related to the distance on the X-axis between the two points. The details are as follows (p=2):

2e1169c5518cb6613d3f012cf6651d8e.jpeg

As shown in the above formula, when p=2, the Euclidean distance of the sum of two points on the two sequences is calculated. Here is the weight related to the distance (phase difference) between the two points on the X-axis. WDTW provides a new idea for solving the "Singularities" problem by adding a weight when calculating the Euclidean distance between two points: weighted DTW is essentially a penalty-based DTW. When the value of When the distance on the axis is large), by assigning a larger value, you can prevent the algorithm from matching two points with a large distance together.

For WDTW, Jeong, YS, Jeong, MK, & Omitaomu, OA and others also proposed a logistic weight function to assign weights. Interested readers can check the original text by themselves. It is worth mentioning that when it is a constant, WDTW at this time will not penalize points with different distances on the X-axis the same, so it is equivalent to traditional DTW; when the value of is extremely large, WDTW at this time will WDTW also penalizes points with different distances on the X-axis, even the matching of the i-th point and the i-1th point. At this time, WDTW corresponds to the traditional Euclidean distance.


7 Summary and additions

In summary, this article starts from the Euclidean distance, which can only handle equal-length data and easily causes unnatural alignment. We gradually discuss the reasons and importance of DTW. Furthermore, we found that the singularities problem caused by the traditional DTW algorithm can be improved to some extent using methods such as windowing, slope weighting, and step pattern. However, starting from the feature level considered by the algorithm, in order to solve the problem of singularities that may exist when the DTW algorithm matches time series data, DDTW proposes to consider higher-level features-shape, and implement it by estimating the first-order derivative. Finally, WDTW shows that it is a larger distance measurement framework that can include Euclidean distance and traditional DTW. At the same time, WDTW also provides another way to solve the singularities problem by considering the phase difference in the temporal data matching process.

Stemming from the construction of the distance matrix, the algorithmic complexity of DTW and its variants is the same. In addition, the content described in this article does not involve the algorithm acceleration problem of DTW in large-scale data set retrieval. In fact, in large-scale applications, past research has produced many methods to accelerate the DTW algorithm, such as FastDTW, LB_Keogh, etc.

outside_default.png

references

outside_default.png

Keogh, E., & Lovers, CA (2005). Exact indexing of dynamic time warping.Knowledge and Information Systems,7(3), 358-386. 
Keogh , EJ , & Pazzani , MJ (2001, April). Derivative dynamic time warping. InProceedings of the 2001 SIAM International Conference on Data Mining(pp. 1-11). Society for Industrial and Applied Mathematics. 
Jeong , YS , Jeong , MK , & Omitaomu , OA (2011). Weighted dynamic time warping for time series classification.Pattern Recognition,44(9), 2231-2240
.

Recommended reading:

My 2022 Internet School Recruitment Sharing

My 2021 summary

A brief discussion on the difference between algorithm positions and development positions

Internet school recruitment R&D salary summary

The current situation of Internet job hunting in 2022, gold 9 silver 10 will soon become bronze 9 iron 10! !

Public account: AI snail car

Stay humble, stay disciplined, and keep improving

6d507a02a54a07d13d5028503e31fde3.jpeg

Send [Snail] to get a copy of "Hand-in-Hand AI Project" (written by AI Snail)

Send [1222] to get a good leetcode test note

Send [Four Classic Books on AI] to get four classic AI e-books

Guess you like

Origin blog.csdn.net/qq_33431368/article/details/132820030