Dimensional and dimensionless, standardization, normalization, regularization

table of Contents

1 Dimensional and dimensionless

1.1 Dimension

1.2 Dimensionless

2 Standardization

3 Normalization

The benefits of normalization

4 Regularization

5 Summary


1 Dimensional and dimensionless

1.1 Dimension

The size of the physical quantity is related to the unit. For example, 1 yuan and 1 cent are two different dimensions, because the units of measurement are different.

1.2 Dimensionless

The size of the physical quantity has nothing to do with the unit. Such as angle, gain, ratio of two lengths, etc.

2 Standardization

Gradient descent is one of the algorithms that benefit from feature scaling. There is a feature scaling method called standardization . Standardization makes the data present a normal distribution, which can help the learning progress in gradient descent to converge faster .
Standardize the mean (expected) of the moving features so that the mean (expected) of the features is 0 and the standard deviation of each feature is 1.
standardization

Why standardization can help gradient descent learning is because the optimizer needs to go through some steps to find a good or optimal solution (global minimum cost) , as shown in the figure, which represents the cost of two weights in a binary classification problem Curve function.
And let’s take a look at the graph again. The center of the graph represents the global optimal cost (minimum cost), while the w2 direction of the left graph is narrower, and the w1 direction is wider. Then in the process of gradient descent, the gradient in the w2 direction It will be very small, so it will always look for the optimal point, which will bring about the consequences of an increase in the number of iterations and reduce efficiency.
When it is standardized, it becomes the situation of 0 mean and 1 standard deviation as shown in the right figure. Then, in the process of gradient descent, there will be no more iterations because the gradient of a certain direction is too small.

3 Normalization

Normalization (normalization) and standardization (standardization) are not very different, both are feature scale methods.

According to some data, normalization is the process of compressing data to [0,1] and converting the dimension to dimensionless, which is convenient for calculation and comparison .
We have two common methods to bring different features to the same range: normalization and standardization. These two terms are loosely used in different fields, and it is usually necessary to rely on context to determine their meaning. Generally speaking, normalization refers to scaling the feature to the interval [0,1], which is a special chestnut of "min-max scaling".
For different feature column vectors, the commonly used normalization method: The min-max scaling formula is as follows:


 

There is another point about normalization, that is, after returning to the interval of [0, 1 ], the original numerical ordering relationship will not be changed . For example, the above chestnut, the size relationship of [1, 5, 3] is (1, 3, 2 ), and the size relationship after normalization to [0, 1, 0.5] is still (1, 3, 2 ), It has not changed.

The benefits of normalization

  1. Improve the convergence speed of the model
  2. Improve the accuracy of the model

Then the essential difference between normalization and standardization is that: normalization is to scale the feature to the interval [0, 1 ] , and standardization is to scale the feature to a mean value of 0 and a standard deviation of 1 .

4 Regularization

Regularization is something completely different from standardization and normalization. Regularization is equivalent to a penalty term, used to punish those features/parameters that are too good for training, prevent overfitting of the model, and improve the generalization ability of the model .

Regularization is used to deal with collinearity. This collinearity refers to the high correlation with features, cleans up the noise of the data, and ultimately prevents overfitting. Regularization actually introduces additional information (bias) to punish extreme parameter (weight) values.
Overfitting

We see that the curve in Figure 3 is too perfect for nonlinear fitting, so overfitting is caused. The reason is that there are too many features and the training is too good . If there are too many features, I personally think it corresponds to The colllinearity (high correlation among features) in English above.

Let’s analyze the formulas in Fig. 2 and Fig. 3. The extra features are x^3 and x^4. Why do these two more cause the accident scene in Fig. 3 (because the fitting is too good, it leads to The performance is very poor. When there are new features, it will be okay)? Everyone may wish to recall the Taylor series, which is to use a polynomial to approximate any curve. The same is true here. So how do we solve overfitting? We have the following two methods:

1. Reduce the number of features (feature reduction):
    manually retain some features (do you think you can do it? Anyway, I don’t think I can do it)
    model selection algorithm (PCA, SVD, chi-square distribution)
2. Regularization: Keep all the features, and punish the coefficient θ to make it close to 0. If the coefficient is small, the contribution is small. So it corresponds to the extreme parameter value of the penalty in the book.

Regularization usually uses L2 regularization, the formula is as follows:

 

Among them, λ is called the regularization term. Here 1/2 is convenient for derivation.

Regarding why the second norm is usually used instead of the first norm, I personally think there are two reasons:

  1. The calculation of square ratio in the computer is simple to calculate the absolute value;
  2. The second norm is smooth and derivable, but the first norm is non-derivable at least at 0.
     

5 Summary

The difference between dimension and dimensionless is whether the physical quantity is related to the unit.
There is no significant difference between standardization and normalization, and who should be determined according to the context. Normalization is to scale the features to [0, 1 ], and standardization is to scale the features to a mean value of 0 and a standard deviation of 1.
Regularization is a completely different thing from standardization and normalization. It is used to punish too good parameters for training, prevent the model from overfitting, and improve the generalization ability of the model.

Original: https://blog.csdn.net/qq_35357274/article/details/109371492?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.baidujs&dist_request_id=&depth_1-utm_source=distribute.pc_relevant.none-task blog-BlogCommendFromMachineLearnPai2-2.baidujs

Guess you like

Origin blog.csdn.net/weixin_43135178/article/details/114685798