One of the Llama Rhea (Alpaca) improvements: mean square layer normalization RMSNorm

Layer Normalization (LayerNorm) Root Mean Square Layer Normalization (RMSNorm)
principle The normalization operation
LayerNorm is a normalization method that calculates the mean and variance of a sample, and then uses these to normalize the sample. . This approach is independent of the batch size, making the model more stable.
RMSNorm is an improvement to LayerNorm, without re-center operation (the mean item is removed ), it can be regarded as a special case of LayerNorm when the mean value is 0. The paper proves through experiments that the re-center operation is not important.
RMSNorm is also a normalization method, but unlike LayerNorm, instead of using the mean and variance of the entire sample, it uses the mean of the square root to normalize, which reduces the impact of noise.
official
formula explanation The x here can be understood as all the elements of a specific dimension in the tensor. For example, for a tensor input with a shape of (2,2,4), if the specified normalization operation is the third dimension, the third dimension will be Four tensors (2, 2, 1) in the dimension, each performing the above calculation once Here ai is equivalent to x in Layer Norm. The author believes that this mode can reduce the calculation time of each model by about 7%∼64% while simplifying Layer Norm.

 

 

Guess you like

Origin blog.csdn.net/qq_39970492/article/details/131125752