5 Outlier (Change Point) Detection Algorithms Every Data Scientist Should Know

Author: CSDN @ _乐多_

Time series analysis is one of the topics that data scientists must be exposed to. Time series analysis involves the process of using a range of mathematical tools to study time series data to understand what happened, when and why it happened, and what is likely to happen in the future.

Change points are sudden changes in time series data that may represent transitions between states. When dealing with time series forecasting use cases, it is crucial to detect change points in order to determine when a random process or the probability distribution of a time series has changed.

Figure 1 Possible change points in the sample time series plot (highlighted)


This article will discuss and implement 5 change point detection techniques and benchmark their performance.

1. Piece-wise Linear Regression

When a change point occurs, the pattern or trend of time series data changes. The basic idea of ​​piecewise linear regression models is to identify such changes in patterns or trends within different regions of data. In the presence of a change point, the value of the coefficient is higher or lower than that of the nearby area.

实现的伪代码:

1. 将时间序列数据分成x(比如说100)天的子段
2. 遍历每个子段的数据:
	训练数据:枚举数据
	目标数据:原始时间序列值
	在训练数据和目标数据上训练一个线性回归模型
	计算训练后的线性回归模型的系数
3. 绘制系数图表
Figure 2 Results of linear piecewise change point detection algorithm

The red lines in the image above represent the coefficient values ​​of each linear regression model trained on that subsegment or portion of the time series data. The coefficient is the value multiplied by the predicted value, so the higher the predicted value, the higher the coefficient and vice versa.

def piece_wise_lr_cpd(data, columnName, peice_diff=100):

  st_idx, end_idx = 0, peice_diff
  coeff = []
  while end_idx<=data.shape[0]:
      
      X = [[x] for x in np.arange(peice_diff)]
      y = list(data.iloc[list(range(st_idx, end_idx, 1))][columnName])
      
      min_v, max_v = min(y), max(y)
      y = [(x-min_v)/(max_v-min_v) for x in y]
      
      model = LinearRegression()
      model.fit(X, y)
      
      coeff.extend([abs(model.coef_[0])]*peice_diff)
      #print(data.iloc[st_idx].index, data.iloc[end_idx].index, abs(model.coef_[0]))
      
      st_idx = end_idx
      end_idx = end_idx+peice_diff
      
  return coeff
  
# compute results
ts_df['coeff_1'] = piece_wise_lr_cpd(ts_df, 'series_1', peice_diff=200)

# Plot the results
fig, ax1 = plt.subplots(figsize=(16,4))
ax2 = ax1.twinx()
ax1.plot(ts_df.index, ts_df.series_1)
ax2.plot(ts_df.index, ts_df.coeff_1, color='red', alpha=0.5)
ax2.plot(ts_df.index, [np.mean(ts_df.coeff_1)]*ts_df.shape[0], color='grey', alpha=0.5)
ax1.grid(axis='x')
plt.title('Piece-wise linear CPD model')
plt.show()

2. Change Finder

Change Point Finder is an open source Python package that provides real-time or online change point detection algorithms. It uses the SDAR (Sequentially Discounting AutoRegressive) learning algorithm, which expects that the autoregressive process before and after the change point will be different.
The SDAR method has two learning stages:

  • The first learning stage: generate an intermediate score, called anomaly score.
  • The second learning stage: generate a change-point score that can detect change points.
Figure 3 Results of the change point detection algorithm of the change point finder
import changefinder

# ChangeFinder
def findChangePoints_changeFinder(ts, r, order, smooth):
    '''
       r: Discounting rate
       order: AR model order
       smooth: smoothing window size T
    '''
    cf = changefinder.ChangeFinder(r=r, order=order, smooth=smooth)
    ts_score = [cf.update(p) for p in ts]
    
    return ts_score

def plotChangePoints_changeFinder(df, ts, ts_score, title):
        
    fig, ax1 = plt.subplots(figsize=(16,4))
    ax2 = ax1.twinx()
    ax1.plot(df.index, ts)
    ax2.plot(df.index, ts_score, color='red')
    
    ax1.set_ylabel('item sale')
    ax1.grid(axis='x', alpha=0.7)
    
    ax2.set_ylabel('CP Score')
    ax2.set_title(title)
    plt.show()
    
    
tsd = np.array(ts_df['series_1'])

cp_score = findChangePoints_changeFinder(tsd, r=0.01, order=3, smooth=7)
plotChangePoints_changeFinder(ts_df, tsd, cp_score, 'Change Finder CPD model')

3.Ruptures

Ruptures is an open source Python library that provides offline change point detection algorithms. This library detects change points by analyzing the entire sequence and segmenting non-stationary signals.
Ruptures provides 6 algorithms or techniques to detect change points in time series data:

  • Dynamic Programming
  • PELT(Pruned Exact Linear Time)
  • Kernel Change Detection
  • Binary Segmentation
  • Bottom-up Segmentation
  • Window Sliding Segmentation
Figure 4 Results of ruptures change point detection algorithm
import ruptures as rpt

def plot_change_points_ruptures(df, ts, ts_change_all, title):
    
    plt.figure(figsize=(16,4))
    plt.plot(df.index, ts)
    for x in [df.iloc[idx-1].name for idx in ts_change_all]:
        plt.axvline(x, lw=2, color='red')

    plt.title(title)
    plt.show()
    

tsd = np.array(ts_df['series_1'])

detector = rpt.Pelt(model="rbf").fit(tsd)
change_points = detector.predict(pen=3) #penalty
plot_change_points_ruptures(ts_df, tsd, change_points[:-1], 'Ruptures CPD model for series_1')

in conclusion

In this article, we discuss 3 popular practical techniques for identifying change points in time series data. Change point detection algorithms have a variety of applications, including medical condition monitoring, human activity analysis, website tracking, and more.

In addition to the change point detection algorithms discussed above, there are other supervised and unsupervised change point detection algorithms.

References

Change finder documentation: https://pypi.org/project/changefinder/
Ruptures documentation: https://centre-borelli.github.io/ruptures-docs/

Disclaimer:
As an author, I attach great importance to my works and intellectual property rights. I hereby declare that all my original articles are protected by copyright law, and no one may publish them publicly without my authorization.
My articles have been published on some well-known platforms for a fee. I hope that readers can respect intellectual property rights and refrain from infringement. Any act of publishing paid articles on the Internet for free or for a fee (including commercial use) without my authorization will be regarded as infringement of my copyright, and I reserve the right to pursue legal liability.
Thank you readers for your attention and support for my article!

Guess you like

Origin blog.csdn.net/yyyyyyyyyyy_9/article/details/131844655