Article Directory
✌ Data normalization, standardization, and regularization
1. ✌ Normalization
Is to scale the data to 0~1 interval, using the formula (x-min)/(max-min)
2. ✌ Standardization
Convert the data to a standard normal distribution, with a mean of 0 and a variance of 1
3.✌ Regularization
The main function of regularization is to prevent over-fitting. Adding regularization items to the model can limit the complexity of the model and balance the complexity and performance of the model.
Commonly used regularization methods include L1 regularization and L2 regularization. L1 regularization and L2 regularization can be regarded as penalty terms of the loss function. The so-called "penalty" is to impose some restrictions on some parameters in the loss function.
4. ✌ Code test
4.1 ✌ Guide library
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Normalizer
4.2 ✌ Create data
x=np.random.randint(1,1000,(10000,5))
x=pd.DataFrame(x)
4.3 ✌ View the mean and variance of the original data
display(x.mean())
display(x.var())
4.4 ✌ Normalization
from sklearn.preprocessing import MinMaxScaler
x_min=MinMaxScaler().fit_transform(x)
x_min=pd.DataFrame(x_min)
display(x_min.mean())
display(x_min.var())
4.5 ✌ Standardization
from sklearn.preprocessing import StandardScaler
x_std=StandardScaler().fit_transform(x)
x_std=pd.DataFrame(x_std)
display(x_std.mean())
display(x_std.var())
4.6 ✌ Regularization
from sklearn.preprocessing import Normalizer
x_nor=Normalizer().fit_transform(x)
x_nor=pd.DataFrame(x_nor)