[Introduction to Artificial Intelligence] Use Python to normalize data

[Introduction to Artificial Intelligence] Use Python to normalize data


1. The significance of data normalization processing

  • Multi-feature data sets often encounter such problems. The value ranges of different features are often very different, or even orders of magnitude. This is likely to lead to a reduction in the accuracy of the deep learning algorithm, so the data It makes sense to perform normalization.

2. Common normalization methods

2.1 Min-Max Normalization

  • 公式: x ′ = x − m i n ( x ) m a x ( x ) − m i x ( x ) x^{'} = \frac{x - min(x)}{max(x)-mix(x)} x=max(x)mix(x)xmin(x)
  • This is a linear mapping method that linearly maps the original data to the range of [0 1], where X is the original data;
  • It is more suitable for situations where the numerical values ​​are relatively concentrated;
  • Disadvantage: If max and min are unstable, it is easy to make the normalization result unstable;

2.2 z-score normalization

  • 公式:x ∗ = x − μ σ x^{*} = \frac{x - \mu}{\sigma }x=pxmAmong them, μ and σ are the mean and variance of the original data respectively.
  • Normalize the original data to data with mean 0 and variance 1;
  • This method requires that the distribution of the original data is approximately Gaussian, otherwise the normalization effect will become very bad.

3. Use sklearn to achieve normalization

  • Create test data
# 创建数据
import pandas as pd

import numpy as np

x=np.random.randint(1,1000,(10000,5))

x=pd.DataFrame(x)

print(x)

Insert image description here

  • View the mean and variance of the original data
# 查看原始数据的均值、方差

print("原始数据均值")
display(x.mean())
print("原始数据方差")
display(x.var())

Insert image description here

  • Min-Max Normalization
# 最大最小标准化(Min-Max Normalization)

from sklearn.preprocessing import MinMaxScaler

x_min=MinMaxScaler().fit_transform(x)

x_min=pd.DataFrame(x_min)

display(x_min.mean())

display(x_min.var())

Insert image description here

  • z-score normalization
# z-score 标准化

from sklearn.preprocessing import StandardScaler

x_std=StandardScaler().fit_transform(x)

x_std=pd.DataFrame(x_std)

display(x_std.mean())

display(x_std.var())

Insert image description here

Guess you like

Origin blog.csdn.net/qq_44928822/article/details/130345140