1. The significance of data normalization processing
Multi-feature data sets often encounter such problems. The value ranges of different features are often very different, or even orders of magnitude. This is likely to lead to a reduction in the accuracy of the deep learning algorithm, so the data It makes sense to perform normalization.
2. Common normalization methods
2.1 Min-Max Normalization
公式: x ′ = x − m i n ( x ) m a x ( x ) − m i x ( x ) x^{'} = \frac{x - min(x)}{max(x)-mix(x)} x′=max(x)−mix(x)x−min(x)
This is a linear mapping method that linearly maps the original data to the range of [0 1], where X is the original data;
It is more suitable for situations where the numerical values are relatively concentrated;
Disadvantage: If max and min are unstable, it is easy to make the normalization result unstable;
2.2 z-score normalization
公式:x ∗ = x − μ σ x^{*} = \frac{x - \mu}{\sigma }x∗=px−mAmong them, μ and σ are the mean and variance of the original data respectively.
Normalize the original data to data with mean 0 and variance 1;
This method requires that the distribution of the original data is approximately Gaussian, otherwise the normalization effect will become very bad.
3. Use sklearn to achieve normalization
Create test data
# 创建数据import pandas as pd
import numpy as np
x=np.random.randint(1,1000,(10000,5))
x=pd.DataFrame(x)print(x)