Filtering for python data processing

In the actual water quality prediction data processing project, high fluctuation data are often encountered, and these data are not useful for future prediction. For example, due to the periodic discharge of sewage by sewage treatment plants in the upstream river, a certain factor increases periodically, but in the downstream, the river water is fully mixed and the water quality becomes relatively smooth.

Examples are as follows:

 If the upstream fluctuation data is directly used for prediction, then there will be many unnecessary fluctuations in the prediction, resulting in unsatisfactory prediction results, as shown in the figure:

 Then we can remove unnecessary fluctuations by filtering and retain the data trend. code show as below:
 

import pandas as pd
from scipy.signal import savgol_filter
# 指定滤波器窗口大小

window_size = 21

# 应用移动平均滤波器
smoothed = preb_cb.rolling(window=window_size).mean()

This method is implemented with a moving average filter. After filtering, the result is as follows:

 When using the moving average filter, attention should be paid to the selection of the window. You can choose the optimal window size through the loop, and the code for selecting the window size with the smallest residual error is as follows:

def seek_windows(df):
    """
    df: DataFrame时间序列数据,共两列,第一列为直接预测后形成的波动数据,第二列为真实数据;
    查找最佳的平移窗口;
    """
    df_empty = pd.DataFrame(columns=["mean", "window"])
    df0 = df.iloc[:, 0]
    df1 = df.iloc[:, 1]
    
    for window in range(1, 24, 1): 
        df2 = df0.rolling(window=window).mean()
        df0_shifted = df2.shift(-window)
        df0_shifted = df0_shifted.dropna()
        df1_trimmed = df1[:-window]
        mean_error = mean_absolute_error(df0_shifted, df1_trimmed)
        
        df_empty2 = pd.DataFrame({"window": [window], "mean": [mean_error]})
        df_empty = pd.concat([df_empty, df_empty2], axis=0)
    
    df_sorted = df_empty.sort_values(by=["mean"])
    best_window =df_sorted["window"][0:1][0]
    return best_window

 There are many filtering methods, commonly used filters include moving average filter, exponential smoothing filter, Butterworth filter, Butterworth filter, etc. Moving average filters and exponential smoothing filters are suitable for stationary or trending signals, while Butterworth and Butterworth filters are more suitable for non-linear or periodic signals.

We just choose the one that suits our data.

----

Hope it helps you.

Guess you like

Origin blog.csdn.net/weixin_42984235/article/details/130038305