A time series filtering method based on moving average (implemented in Java)

 1 Understanding of moving average

  The moving average method is a common method that uses a set of recent actual data values ​​to predict the demand for the company's products and the company's production capacity in a period or several periods in the future. The moving average method is suitable for spot forecasting. When the product demand is neither rapidly increasing nor rapidly decreasing, and there is no seasonal factor, the moving average method can effectively eliminate the random fluctuations in the forecast, which is very useful. The moving average method varies according to the weight of each element used in the prediction

  The moving average method is a simple smoothing forecasting technique. Its basic idea is: according to time series data, item by item, sequential time averages containing a certain number of items are sequentially calculated to reflect the long-term trend. Therefore, when the value of the time series fluctuates greatly due to periodic changes and random fluctuations, and it is difficult to show the development trend of the event, the moving average method can eliminate the influence of these factors and show the development direction and trend of the event ( That is, the trend line), and then analyze the long-term trend of the forecast sequence according to the trend line.

  Types of moving average method

  The moving average method can be divided into: simple moving average and weighted moving average.

 (1) Simple moving average method

  The weight of each element of the simple moving average is equal. The calculation formula of a simple moving average is as follows: Ft=(At-1+At-2+At-3+…+At-n)/n where,

  ·Ft-the predicted value of the next period;

  ·N-the number of periods of moving average;

  ·At-1—Preliminary actual value;

  ·At-2, At-3 and At-n represent the actual values ​​of the first two periods, the first three periods and the first n periods, respectively. 
  

 (2) Weighted moving average method

  Weighted moving average gives different weights to each variable value within a fixed span period. The principle is that the data information of historical product demand in each period has a different effect on predicting the demand in the future period. Except for the periodic change with n as the period, the influence of the variable value far away from the target period is relatively low, so it should be given a lower weight. The calculation formula of the weighted moving average method is as follows:

  Ft=w1At-1+w2At-2+w3At-3+…+wnAt-n where,

  ·W1-the weight of actual sales in period t-1;

  ·W2-the weight of actual sales in period t-2;

  ·Wn-the right to actual sales in the tn period

  ·N—the number of forecast periods; w1+w2+…+wn=1

  When using the weighted average method, the choice of weight is a problem that should be paid attention to. Empirical methods and trial algorithms are the easiest ways to choose weights. Generally speaking, the most recent data can best predict the future situation, so the weight should be larger. For example, based on the profit and production capacity of the previous month, it is better to estimate the profit and production capacity of the next month than based on the previous months. However, if the data is seasonal, the weights should also be seasonal.

  Advantages and disadvantages of moving average method

  Using the moving average method for forecasting can smooth out the impact of sudden fluctuations in demand on the forecast results. However, there are also the following problems when using the moving average method:

  1. Increasing the number of periods of the moving average method (that is, increasing the value of n) will make the smoothing effect better, but it will make the predicted value less sensitive to the actual changes in the data;

  2. The moving average does not always reflect the trend well. Because it is an average value, the predicted value always stays at the past level and cannot be expected to cause higher or lower fluctuations in the future;

  3. The moving average method requires a large amount of past data records.

  Case Analysis of 
   
  Moving Average Method    Application of Simple Moving Average Method in Real Estate 
The price of a certain type of real estate in 2001 is shown in the second column of the table below. Because the monthly price is affected by certain uncertain factors, it is high and sometimes low, and the price fluctuates greatly. If it is not analyzed, it is not easy to show its development trend. If you add up the prices of every few months to calculate the moving average and establish a moving average time series, you can clearly see the direction and extent of its development changes from the smooth development trend, and then you can predict the future price .

  When calculating the moving average, several months should be used each time to calculate, which needs to be determined according to the ordinal number of the time series and the change period. If there are many ordinal numbers and the change period is long, it can be calculated every 6 months or even every 12 months; conversely, it can be calculated every 2 months or every 5 months. For the real estate price in this example in 2001, the actual value of every 5 months is used to calculate the moving average. The calculation method is: add up the prices from January to May and divide by 5 to get 684 yuan/square meter, add up the prices from February to June and divide by 5 to get 694 yuan/square meter, and add the prices from March to July Divide it by 5 to get 704 yuan/square meter, and so on, see the third column in the table. Then calculate the monthly increase based on the moving average every 5 months, see the fourth column in the table. 
  

Write picture description here

If you need to predict the price of this type of real estate in January 2002, the calculation method is as follows: Since the last moving average 762 is 3 months away from January 2002, the forecast price of this type of real estate in January 2002 is: 762 + 12 × 3 =798 (yuan/square meter)

2 Java implementation of moving average

  • Number of remaining points = window length-1 = number of complements required
  • If the remaining points are not considered, the final return length of the result array = the length of the original array-the length of the window + 1
import java.util.Arrays;

import static org.apache.commons.math.stat.StatUtils.mean;

/**
 * 滑动平均计算
 * Created by dandan.
 * 属性解释:
 * movWindowSize:移动平均计算的滑动窗口
 * movLeaveTemp:临时数组记录最后得不到均值处理的点
 * movResBuff:输出最终结果数组
 * movResBuffLen:输出最终结果数组长度
 * inputBuff:输入数据数组
 * winArray:滑动窗口所形成的数组
 * winArrayLen:滑动窗口所形成的数组长度
 * tempCal:原始数组+插值后的扩容数组
 */

public class movingAverage {

    private static final int WINDOWS = 5;
    private int movWindowSize = WINDOWS; //窗口大小

    public movingAverage() {

    }

    public movingAverage(int size) {

        movWindowSize = size;
    }

    // 均值滤波方法,输入一个inputBuff数组,返回一个movResBuff数组,两者下标不一样,所以定义不同的下标,inputBuff的下标为i,movResBuff的下标为movResBuffLen.
    // 同理,临时的winArray数组下表为winArrayLen
    public double[] movingAverageCal(double[] inputBuff) {
        int movResBuffLen = 0;
        int winArrayLen = 0;
        //定义返回结果数组
        double[] movResBuff = new double[inputBuff.length];
        //定义窗口数组
        double[] winArray = new double[movWindowSize];
        //求整体输入序列的平均值作为一个插值点
        double replace = mean(inputBuff);
        //对原始数组扩容,将插值点放进去.剩余点个数是窗口大小-1.需要补充值的个数等于剩余点的个数
        double[] tempCal = new double[inputBuff.length + movWindowSize-1];
        //拷贝原始数组到临时计算数组
        System.arraycopy(inputBuff, 0, tempCal, 0, inputBuff.length);
        //将平均值插入进去
        for (int m = inputBuff.length; m <tempCal.length ; m++) {
            tempCal[m]=replace;
        }
        //开始计算
        for (int i = 0; i < tempCal.length; i++) {
            if ((i + movWindowSize) > tempCal.length) {
                break; 
            } else { 
                for (int j = i; j < (movWindowSize + i); j++) {
                    winArray[winArrayLen] = tempCal[j];
                    winArrayLen++; 
                }
                movResBuff[movResBuffLen] = mean(winArray);
                movResBuffLen++;
                winArrayLen = 0; 
            } 
        }
        return movResBuff; 
    }

    public static void main(String[] args) {

       double[] inputBuff={670,680,690,680,700,720,730,740,740,760,780,790};

        movingAverage movingAverage = new movingAverage();

        double[] filter = movingAverage.movingAverageCal(inputBuff);

        System.out.println(filter.length);
        System.out.println(Arrays.toString(filter));
        System.out.println(mean(filter));

    }
}

3 UDF function realization.

On the basis of the above return value, the eigenvalue is averaged again

import org.apache.hadoop.hive.ql.exec.UDF;
import java.util.ArrayList;
import java.util.List;
import static org.apache.commons.math.stat.StatUtils.mean;

public class movingAverageFeaCal extends UDF {

    public static void main(String[] args) {
        String num_all = "3.1002," +
                "3.0984," +
                "3.147," +
                "3.197," +
                "3.1002," +
                "3.1002," +
                "3.0854," +
                "3.0982," +
                "3.12," +
                "3.09," +
                "3.a091";

        movingAverageFeaCal movingAverageFeaCal = new movingAverageFeaCal();

        Double evaluate = movingAverageFeaCal.evaluate(num_all, 3);
        System.out.println(evaluate);
    }


    public Double evaluate(String num_all,int windowSize) {

        if (num_all == null || num_all.isEmpty()) {
            return null;//参数不全,不需要计算
        }
        String[] numArr = num_all.split(",");
        List<Double> list = new ArrayList();
        double num = 0;

        if (numArr.length > 0) {
            for (String aNumArr : numArr) {
//                boolean flag = Utils.isNumber(numArr[i]);
                if (aNumArr != null && !aNumArr.isEmpty() && !aNumArr.equals("null") && !aNumArr.equals("NULL") && Utils.isNumber(aNumArr)) {
                    num = Double.parseDouble(aNumArr);
                    list.add(num);
                }
            }
            // Double[] arr1 = list.toArray(new Double[list.size()]);
            double[] inputBuff= list.stream().mapToDouble(i -> i).toArray();
            movingAverage movingAverage = new movingAverage(windowSize);
            double[] feaArr = movingAverage.movingAverageCal(inputBuff);
            return mean(feaArr);

        } else {
            return null;
        }

    }


}

 

Guess you like

Origin blog.csdn.net/godlovedaniel/article/details/114635797