Normalization

What is normalization

Normalization is a data processing method that limits the processed data to a fixed range.

There are two forms of normalization. One is to process numbers into decimals between [0, 1] under normal circumstances. The purpose is to make it more convenient in subsequent data processing. For example, in image processing, the image will be normalized from [0, 255] to [0, 1]. This will not change the information storage of the image itself, but also speed up subsequent network processing. In other cases, the data can also be processed to be between [-1, 1], or other fixed ranges. The other is to convert a dimensional expression into a dimensionless expression through normalization. So what are dimensions, and why do we need to convert dimensions into dimensionless? Give a specific example. When we predict housing prices, the data collected, such as the area of ​​the house, the number of rooms, the distance to the subway station, the air quality near the residence, etc., are all dimensions, and their corresponding dimensional units They are square meters, number of units, meters, AQI, etc. The differences in these dimensional units make the data incomparable. At the same time, the order of magnitude of data is also different for different dimensions. For example, the distance from a house to a subway station can be thousands of meters, but the number of rooms in a house is generally only a few. After normalization processing, not only can the influence of dimension be eliminated, but also each data can be normalized to the same magnitude, thereby solving the comparability problem between data.

Why normalize

Normalization can convert dimensionality into dimensionlessness, and at the same time normalize data to the same magnitude to solve the comparability problem between data. In a regression model, inconsistent dimensions of the independent variables can lead to uninterpretable or misinterpreted regression coefficients. In algorithms such as KNN and Kmeans that require distance calculations, different magnitudes of dimensions may cause features with larger magnitudes to dominate during distance calculations, thus affecting the learning results.

After data normalization, the process of seeking the optimal solution will become smoother and it can converge to the optimal solution more quickly.

Why normalization can improve the speed of finding optimal solutions

Earlier we mentioned an example of predicting house prices. Assume that the independent variable is only the distance from the house to the subway station x 1 x_1x1and the number of rooms in the house x 2 x_2x2, the dependent variable is house price, the prediction formula and loss function are: y = θ 1 x 1 + θ 2 x 2 J = ( θ 1 x 1 + θ 2 x 2 − ylabel ) 2 \begin{aligned} y &= \ theta_1x_1 + \theta_2x_2 \\ J &= (\theta_{1}x_{1} + \theta_{2}x_{2} - y_{label})^2 \end{aligned}yJ=i1x1+i2x2=( i1x1+i2x2ylabel)2When not normalized, the distance from the house to the subway station ranges from 0 to 5000, while the number of rooms only ranges from 0 to 10. Suppose x 1 = 1000, x 2 = 3 x_{1} = 1000, x_{2} = 3x1=1000x2=3 , then the formula of the loss function can be written as: J = ( 1000 θ 1 + 3 θ 2 − ylabel ) 2 J = (1000\theta_{1}+3\theta_{2} - y_{label})^2J=( 1000 i1+3 i2ylabel)2 The process of finding the optimal solution of this loss function can be visualized as the following figure:
Insert image description here

The red ellipse in the left picture represents the contour of the loss function before normalization, the blue line segment represents the gradient update, and the direction of the arrow represents the direction of the gradient update. The process of seeking the optimal solution is the process of gradient update, and its update direction is perpendicular to the climbing line. Since the magnitudes of x1 and x2 are too different, the contour of the loss function appears as a thin and narrow ellipse. Therefore, as shown in the figure (left), a thin and narrow ellipse will make the gradient descent process appear in a zigzag shape, resulting in a slow gradient descent.

When the data is normalized, x 1 ′ = 1000 − 0 5000 − 0 = 0.2 x_{1}^{'} =\displaystyle \frac{1000-0}{5000-0}=0.2x1=5000010000=0.2 x 2 ′ = 3 − 0 10 − 0 = 0.3 x_{2}^{'} =\displaystyle \frac{3-0}{10-0}=0.3 x2=10030=0.3 , then the formula of the loss function can be written as: J ( x ) = ( 0.2 θ 1 + 0.3 θ 2 − ylabel ) 2 J(x) = (0.2\theta_{1} + 0.3\theta_{2} - y_{ label})^2J(x)=( 0.2 i1+0.3 i2ylabel)2 We can see that the normalized data belong to the same magnitude, the contours of the loss function appear as a short and fat ellipse (as shown in the figure (right)), and the process of finding the optimal solution becomes more complex. Fast and gentle, allowing for faster convergence when solving via gradient descent.

What are the types of normalization?

1、Min-max normalization (Rescaling) x ′ = x − m i n ( x ) m a x ( x ) − m i n ( x ) x^{'} = \frac{x - min(x)}{max(x) - min(x)} x=max(x)min(x)xmin(x)The normalized data range is [0, 1], where min (x), max (x) min(x), max(x)min ( x ) and max ( x ) respectively find the minimum and maximum values ​​of the sample data.

2、Mean normalization x ′ = x − m e a n ( x ) m a x ( x ) − m i n ( x ) x^{'} = \frac{x - mean(x)}{max(x) - min(x)} x=max(x)min(x)xmean(x)The normalized data range is [-1, 1], where mean ( x ) mean(x)m e an ( x ) is the average of the sample data.

3、Z-score normalization (Standardization) x ′ = x − μ σ x^{'} = \frac{x - \mu}{\sigma} x=pxmThe normalized data range is a real number set, where μ, σ μ, σμ and σ are the mean and standard deviation of the sample data respectively.

4. Nonlinear normalization :

  • Logarithmic normalization: x ′ = lg ⁡ x lg ⁡ max ( x ) x^{'} = \frac{\lg x}{\lg max(x)}x=lgmax(x)lgx
  • Arctangent function normalization: x ′ = arctan ⁡ ( x ) ∗ 2 π x^{'} = \arctan(x) * \frac{2}{\pi}x=arctan ( x )Pi2The normalized data range is [-1, 1]
  • Decimal Point Normalization: x ′ = x 1 0 jx^{'} = \frac{x}{10^j}x=10jxThe normalized data range is [-1, 1], jjj is to makemax ( ∣ x ′ ∣ ) < 1 max(|x^{'}|) < 1max(x)<The smallest integer of 1 .

Conditions for use of different normalizations

1. Min-max normalization and mean normalization are suitable for use when the maximum and minimum values ​​are clearly unchanged. For example, in image processing, if the gray value is limited to the range [0, 255], min- can be used. max normalization processes it to be between [0, 1]. When the maximum and minimum values ​​are unclear, each time new data is added, the maximum or minimum values ​​may change, resulting in unstable normalization results and unstable subsequent use effects. At the same time, the data needs to be relatively stable. If there are outliers that are too large or too small, the effects of min-max normalization and mean normalization will not be very good. If there are strict requirements on the range of processed data, min-max normalization or mean normalization should also be used.

2. Z-score normalization can also be called standardization. The processed data is distributed with a mean of 0 and a standard deviation of 1. Standardization can be used when there are outliers in the data and the maximum and minimum values ​​are not fixed. Standardization changes the state distribution of the data, but does not change the type of distribution. In particular, z-score normalization is often used in neural networks. We will introduce this in detail in subsequent articles.

3. Nonlinear normalization is usually used in scenarios with a large degree of data differentiation. Sometimes it is necessary to map the original values ​​through some mathematical functions, such as logarithms, arctangents, etc.

The connection and difference between normalization and standardization

There may be some conceptual confusion when it comes to normalization and standardization. We all know that normalization refers to normalization and standardization refers to standardization. However, according to the definition of the feature scaling method on the wiki, standardization is actually z-score normalization, that is, That is to say, standardization is actually a type of normalization . Under normal circumstances, we will call z-score normalization as standardization, and min-max normalization as normalization. In the following, we also use normalization to refer to z-score normalization and normalization to refer to min-max normalization.

In fact, normalization and standardization are essentially linear transformations. We mentioned the normalization and standardization formulas. For the normalization formula, given the data, we can let a = max ( x ) − min ( x ) , b = min ( x ) a = max( x) - min(x), b = min(x)a=max(x)min(x)b=min ( x ) , then the normalized formula can be transformed into: x ′ = x − ba = xa − ba = xa − cx^{'} = \frac{x - b}{a} = \frac{x} {a} - \frac{b}{a} = \frac{x}{a} - cx=axb=axab=axThe formula for c normalization is similar to the deformed normalization, whereμ \muµσ \sigmaσ can be regarded as a constant when the data is given. Therefore, the standardized deformation is similar to the normalization and can be regarded as a transformation ofxxx in proportionaaa to zoom, thenccA translation of c units. It can be seen that the essence of normalization and standardization is a linear transformation, and they will not change the original numerical ordering of the data due to the processing of the data.

So what is the difference between normalization and standardization?

  • Normalization will not change the state distribution of the data, but standardization will change the state distribution of the data;
  • Normalization will limit the data to a specific range, such as [0, 1], but standardization will not. Standardization will only process the data to have a mean of 0 and a standard deviation of 1.

Guess you like

Origin blog.csdn.net/weixin_49346755/article/details/127366789