Normalization Methods in Machine Learning

Normalization Method

1. 概要


Data preprocessing plays an important role in many deep learning algorithms. In practice, after normalizing and whitening the data, many algorithms can achieve the best results. However, the precise parameters of preprocessing are not obvious unless there is extensive experience with these algorithms.

2. 数据归一化及其应用


In data preprocessing, the standard first step is data normalization. While there are a range of possible approaches, this step is usually chosen explicitly based on the specifics of the data. Commonly used methods of feature normalization include the following:

  • Simple zoom
  • Sample-by-sample mean subtraction (also known as removing the DC component)
  • Feature standardization (makes all features in the dataset have zero mean and unit variance)

In general, when doing machine learning applications, most of the time is spent on feature processing. A key step is to normalize the feature data. Why normalize it? Many students do not understand, the explanation given by Wikipedia:

  • After normalization, the speed of gradient descent to find the optimal solution is accelerated;
  • Normalization has the potential to improve accuracy;

加快梯度下降求解速度

As shown in the figure below, the blue circles represent the contour lines of the two features. The interval between the two features X1 and X2 in the left image is very different. The interval of X1 is [0, 2000], and the interval of X2 is [1, 5]. The contour lines formed by them are very sharp. When using the gradient descent method to find the optimal solution, it is very likely to take a "zigzag" route (vertical contour line), resulting in many iterations to converge;

normal

The figure on the right normalizes the two original features, and the corresponding contour lines appear very round, which can converge faster when solving gradient descent.

Therefore, if the machine learning model uses the gradient descent method to find the optimal solution, normalization is often very necessary, otherwise it will be difficult to converge or even unable to converge.

归一化有可能提高精度

Some classifiers need to calculate the distance between samples (such as Euclidean distance), such as KNN. If the range of a feature range is very large, the distance calculation mainly depends on this feature, which is contrary to the actual situation (for example, the actual situation is that the feature with a small range range is more important).

简单缩放 | min-max标准化(Min-max normalization) | 离差标准化


In simple scaling, our goal is to rescale the values ​​of each dimension of the data (these dimensions may be independent of each other) so that the final data vector falls in [0,1] or [−1,1] range (depending on the data). This is important for subsequent processing because many default parameters (such as epsilon in PCA-whitening) assume that the data has been scaled to a reasonable interval. Example: When processing natural images, we obtain pixel values ​​in the interval [0,255], and the common processing is to divide these pixel values ​​by 255 to make them scale to [0,1].

This algorithm is a linear transformation of the original data, so that the result falls into the [0,1] interval. The transformation function is as follows:

  • x = (x - min)/(max - min)
    • max: the maximum value of the sample data
    • min: is the minimum value of the sample data

适用场景

This normalization method is more suitable for the case of numerical comparison. However, if max and min are unstable, it is easy to make the normalized result unstable, and the subsequent use effect is also unstable. In actual use, max and min can be replaced by empirical constant values. And when new data is added, it may lead to changes in max and min, which need to be redefined.

The first method or other normalization methods can be used when distance metrics, covariance calculations, and data do not conform to a normal distribution are not involved. For example, in image processing, after converting an RGB image to a grayscale image, its value is limited to the range of [0 255].

标准差标准化 | z-score 0均值标准化(zero-mean normalization)


The processed data conforms to the standard normal distribution, that is, the mean is 0 and the standard deviation is 1. The transformation function is:

  • x = (x - u)/σ
    • u: mean of all sample data
    • σ: is the standard deviation of all sample data.

In classification and clustering algorithms, when distance is needed to measure similarity, or when PCA technology is used for dimensionality reduction, the second method (Z-score standardization) performs better.

非线性归一化


It is often used in scenarios where data differentiation is relatively large. Some values ​​are large and some are small. The original value is mapped by some mathematical function. The methods include log, exponential, tangent, etc. It is necessary to determine the curve of the nonlinear function, such as log(V, 2) or log(V, 10), according to the data distribution.

This article is reprinted from https://www.cnblogs.com/sddai/p/6250094.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324650196&siteId=291194637