第一天主要学习了使用sklearn进行数据预处理。
1、缺失值填补Imputer
import numpy as np
from sklearn import Imputer
imputer = Imputer(missing_values='NaN', strategy='mean', verbose=0)
X1 = imputer.fit([[1,2], [np.nan, 3]])
X1 = imputer.transform(X1)
2、独热编码OneHotEncoder
from sklearn import OneHotEncoder
X = OneHotEncoder.fit([[1, 2], [3, 4], [5, 6]])
XC = X.transform([[0], [1], [5]).toarray()
3、归一化 StandardScaler
from sklearn import StandardScaler
scaler = StandardScaler.fit(X)
X = scaler.tansform(X)
Code:https://github.com/chenguiyuan/Machine-learning/blob/master/100Days/Day1_Data_preprocessing.py