Normalization and regularization

Normalization and regularization

Gakuji:

https://blog.csdn.net/weixin_36604953/article/details/102652160

https://blog.csdn.net/zenghaitao0128/article/details/78361038

https://blog.csdn.net/u012768474/article/details/99871942


Both normalization and regularization belong to four feature scaling methods. The four are:

  1. Rescaling (min-max normalization) sometimes referred to as normalization
    Insert picture description here

  2. Mean normalization
    Insert picture description here

  3. Standardization(Z-score normalization) (standardization)
    Insert picture description here

  4. Scaling to unit length
    Insert picture description here

Feature scaling is a way of transforming the data to convert the original column of data to a certain range.

The role of feature scaling

In machine learning, different features in the feature vector may often use different measurement units.

例如:人的信息的特征向量:[身高,体重]
一般情况下对应的标准度量单位是[m, kg],但是有些时候信息并不如意,可能得到的身高信息是cm。
这样身高这个特征值就会特别大。
由于两者(身高/体重)的度量单位不一样,所以数值相差可能会特别大。从而导致样本数据成为奇异值数据

Insert picture description here

Singular value data will cause the objective function to stretch, so that the objective function appears flat. In this way, it will be easy to deviate when the gradient drops.
Insert picture description here

The function of feature scaling is to scale the sample data (all features) so that the numerical deviation between the feature values ​​of the feature vector is not large. (Equivalent to eliminating the influence of excessive values ​​generated by the unit)

Zoom: zoom in with a small feature value. The feature value is greatly reduced. (The distribution is still the original distribution)

After feature scaling, the objective function will be relatively smooth, so that it is not easy to go wrong
Insert picture description here

Normalized

Insert picture description here

Normalization is to change a column of data to a certain fixed interval. This interval can be any interval.

Usually mapped to [0, 1]in. It may also be mapped to [-1, 1].

Because the denominator is the largest minus the smallest, this method is similar to dividing each feature of all samples into a hundred, and each feature is a percentage, so there is no dimension between the features!

scenes to be used:

  1. It is suitable when the values ​​are relatively concentrated, otherwise the smaller number will be very small (the denominator is too large due to the large difference between the maximum and minimum values ​​not concentrated)

  2. When distance measurement, covariance calculation, and data do not conform to the normal distribution, the first method or other normalization methods (excluding the Z-score method ) can be used . For example, in image processing, the RGB image is converted to a grayscale image and its value is limited to the range of [0 255]

Disadvantages:

  • If the maximum and minimum values ​​are unstable (the difference is too large), it is easy to make the normalized result unstable, and the subsequent use effect is also unstable.

To cut in: using the L2 norm, and non-linear functions can also be normalized.

L2 norm normalization: that is, each feature element of the feature vector is divided by the L2 norm of the feature vector.
Insert picture description here


z-score standardization

Insert picture description here

Transform the data to a distribution with a mean of 0 and a standard deviation of 1 (standard normal distribution under normal circumstances).Remember, it’s not necessarily a normal distribution

For example, this distribution is used in the LayerNorm layer of pytorch,Insert picture description here

Add a small value to the denominator to prevent the denominator from becoming zero.

It is used to eliminate the measurement deviation caused by the distribution.

Standardization better maintains the sample spacing. When there are abnormal points in the sample, normalization may "squeeze" the normal samples together. For example, in three samples, the value of a certain feature is 1, 2, 10000. Assuming that the value of 10000 is an abnormal value, the normal 1, 2 will be "squeezed" together after using the normalization method. If unfortunately the classification labels of 1 and 2 are still opposite, then when we use gradient descent for classification model training, the model will take longer to converge, because it takes more effort to separate the samples! And standardization does a good job in this respect, at least it will not "squeeze together" the samples.

So when there is sample spacing, use z-score standardization!

scenes to be used:

  1. It is used when the original data is similar to the Gaussian distribution (normal distribution) to scale the data.

  2. In classification and clustering algorithms, when distance is needed to measure similarity, or when PCA technology is used for dimensionality reduction, Z-score standardization performs better

Note that the normalized distribution using z-score is not necessarily the normal distribution. After standardization, only the standard deviation and the mean have changed, but the original distribution type has not changed. For details, please refer to the first blog quoted above. The blogger has detailed instructions!

Connections and differences

Connection: Both are linear transformations that do not change the order of the original data.

difference:

  1. Normalization will strictly limit the scope of the transformed data. So normalization is not suitable for data with distance information (using distance to measure)

    There is no strict scope limit for the data after the standardized transformation. It's just that the mean is 0 and the standard deviation is 1.

  2. The normalized zoom ratio is only related to extreme values. (Scaling ratio=Xmax-Xmin)

    The normalized zoom ratio is relevant to all numbers. (If you change a number, the mean and standard deviation of the high probability will change.)
    After the standardized transformation, the data does not have a strict range limit. It's just that the mean is 0 and the standard deviation is 1.

  3. The normalized zoom ratio is only related to extreme values. (Scaling ratio=Xmax-Xmin)

    The normalized zoom ratio is relevant to all numbers. (If you change a number, the mean and standard deviation of the probability will change)

Guess you like

Origin blog.csdn.net/qq_43477218/article/details/115307069