normalized input vector

1. Why is normalization needed? The explanation given by Wikipedia: 1) After normalization, the speed of gradient descent to find the optimal solution is accelerated; 2) Normalization may improve the accuracy.

1) Speed ​​up the gradient descent solution

  The Stanford Machine Learning video explains it well: https://class.coursera.org/ml-003/lecture/21

      As shown in the figure below, the blue circles represent the contour lines of the two features. The interval between the two features X1 and X2 in the left picture is very different. The interval of X1 is [0, 2000], and the interval of X2 is [1, 5]. The contour lines formed by them are very sharp. When using the gradient descent method to find the optimal solution, it is very likely to take a "zigzag" route (vertical contour line), resulting in the need for many iterations to converge;

      The figure on the right normalizes the two original features, and the corresponding contour lines appear very round, which can converge faster when solving gradient descent.

      Therefore, if the machine learning model uses the gradient descent method to find the optimal solution, normalization is often very necessary, otherwise it will be difficult to converge or even unable to converge.

2) Normalization improves accuracy

  For some applications that depend on sample distance, such as KNN, classification is performed by solving the distance from the sample. At this time, if the sample is too large, it will play a dominant role and may affect the classification results.

 

2. Types of Normalization

1) Linear normalization

x' = \frac{x - \text{min}(x)}{\text{max}(x)-\text{min}(x)}

      This normalization method is more suitable for the case of numerical comparison. This method has a flaw. If max and min are unstable, it is easy to make the normalization result unstable, and the subsequent use effect is also unstable. In actual use, max and min can be replaced by empirical constant values.

2) Standard deviation normalization

  The processed data conforms to the standard normal distribution, that is, the mean is 0 and the standard deviation is 1. The transformation function is:

  where μ is the mean of all sample data and σ is the standard deviation of all sample data.

3) Nonlinear normalization

     It is often used in scenarios where data differentiation is relatively large. Some values ​​are large and some are small. The original value is mapped by some mathematical function. The methods include log, exponential, tangent, etc. It is necessary to determine the curve of the nonlinear function, such as log(V, 2) or log(V, 10), according to the data distribution.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325338205&siteId=291194637
Recommended