[Efficient Alchemy] Exponential Moving Average (EMA): an artifact in deep learning

What is the exponential moving average (EMA)?

The exponential moving average (EMA) is a commonly used smoothing method. The principle is very simple, which is to perform a weighted average on the sequence data. EMA gives more weight to recent data points and less weight to earlier data points. This effectively smoothes time series data, making it more continuous and stable.

What is the use of exponential moving average (EMA) in deep learning?

In deep learning, EMA is often used to smooth the updates of model parameters. Specifically, every time the parameters are updated, EMA processing is performed on the model parameters, thereby reducing the fluctuation of each update and making the model more stable. In addition, EMA can also be used to calculate the sliding average gradient for optimizer updates to further improve the performance and generalization ability of the model.

Interpretation of the exponential moving average (EMA) calculation formula

EMA[t] = α * x[t] + (1 - α) * EMA[t-1]

Among them, trepresents the time step, x[t]represents tthe original data of the th time point, αis the smoothing factor, usually takes a value between 0 and 1, represents the weight of the current sample, represents (1 - α)the weight of historical data, EMA[t-1]represents the EMA value of the previous time point .

The meaning of the calculation formula is to multiply the weight of the current data point by αthe current data point x[t], then multiply the weight of the historical data (1 - α)by the EMA value at the previous time point EMA[t-1], and then add the two to get the EMA value at the current time point EMA[t].

Through this calculation formula, we can see that the essence of EMA is a weighted average of historical data, in which the weight of each data point decreases as it is closer to the current time point. The advantage of this is that it can effectively smooth the time series data, making it more continuous and stable.

PyTorch code implementation

Here is a simple PyTorch implementation of an exponential moving average (EMA):

import torch

class EMA():
    def __init__(self, alpha):
        self.alpha = alpha    # 初始化平滑因子alpha
        self.average = None   # 初始化平均值为空
        self.count = 0        # 初始化计数器为0

    def update(self, x):
        if self.average is None:  # 如果平均值为空,则将其初始化为与x相同大小的全零张量
            self.average = torch.zeros_like(x)
        self.average = self.alpha * x + (1 - self.alpha) * self.average  # 更新平均值
        self.count += 1   # 更新计数器

    def get(self):
        return self.average / (1 - self.alpha ** self.count)   # 根据计数器和平滑因子计算EMA值,并返回平均值除以衰减系数的结果

In this class, we define three methods, namely __init__, updateand get. __init__The method is used to initialize the smoothing factor alpha, average averageand counter count, updatethe method is used to update the EMA value, and getthe method is used to obtain the final EMA value.

updateWhen using this class, we can first instantiate an EMA object, then call the method to update the EMA value at each time step , and finally call getthe method to get the final EMA value. For example:

ema = EMA(alpha=0.5)
for value in data:
    ema.update(torch.tensor(value))
smoothed_data = ema.get()

In this example, we alpha=0.5initialize the EMA object using , then iterate through each data point in datathe dataset and call updatemethods to update the EMA value. Finally we call getthe method to get the smoothed data.

Let’s use EMA together! Make training smoother and the model more stable!

Guess you like

Origin blog.csdn.net/luxu1220/article/details/130573150