Why do we need to "normalize" the data?

When watching others doing data analysis/training models, you will always see a step of "data normalization" in the data preprocessing stage. The scikit-learn code is as follows:

from sklearn import preprocessing
x = ... # x为样本数据
min_max_scaler = preprocessing.MinMaxScaler()
x_new = min_max_scaler.fit_transform(x)

So why is this necessary? The main reason is for numerical stability.

The following uses a multi-layer perceptron (MLP) as an example.

Backpropagation algorithm (BP) is often a means by which neural networks can "learn" knowledge. In a single-layer perceptron, there is the following equation:

Among them, w is the weight, x is the input, b is the bias, and σ(·) is the sigmoid function.

In the MLP shaped like Figure 1, there is the following equation:

Among them, the superscript [L-1] refers to the L-1th layer, which is actually quite easy to understand: use the L-1th layer information to calculate the Lth layer.

Derivation based on backpropagation: (Using the backpropagation process of the first hidden layer -> input x as an example)

It can be seen that each update has an item wi·gi', which means that as the network deepens, wi·gi' needs to be multiplied due to the chain rule, and each wi·gi' is relatively large (larger than 1), resulting in gradient explosion.

Without normalization, the weight becomes larger: (the thickness of the red line reflects the weight)

Normalize and the weight becomes smaller:

In addition, the weight increment dw is related to the input x. The larger x is, the larger dw is. This means that during the gradient descent process, the update speed of larger x is faster than that of smaller x, which requires more iterations. to find the optimal solution. If all x are normalized to the same numerical range, the isodiagram of the optimization target will become more rounded (right in the figure below), so that the update speed of each x becomes more consistent and it is easier to find the optimal solution:

Summary: Normalization actually makes the dimensions of the data consistent, making the model easier to train.
——————————————
Copyright Statement: This article is an original article by CSDN blogger “Dreamcatcher Wind” and follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this copy when reprinting statement.
Original link: https://blog.csdn.net/Wind_2028/article/details/123341678

Why do we need to "normalize" the data?

Guess you like