1.sklearn.preprocessing.Imputer
将数据中的缺失值按某种方法填充。
Imputer(missing_values='NaN', strategy='mean', axis=0, verbose=0, copy=True)
Parameters
----------
missing_values : integer or "NaN", optional (default="NaN")
The placeholder for the missing values. All occurrences of
`missing_values` will be imputed. For missing values encoded as np.nan,
use the string value "NaN".
strategy : string, optional (default="mean")
The imputation strategy.
- If "mean", then replace missing values using the mean along
the axis.
- If "median", then replace missing values using the median along
the axis.
- If "most_frequent", then replace missing using the most frequent
value along the axis.
axis : integer, optional (default=0)
The axis along which to impute.
- If `axis=0`, then impute along columns.
- If `axis=1`, then impute along rows.
verbose : integer, optional (default=0)
Controls the verbosity of the imputer.
copy : boolean, optional (default=True)
If True, a copy of X will be created. If False, imputation will
be done in-place whenever possible. Note that, in the following cases,
a new copy will always be made, even if `copy=False`:
- If X is not an array of floating values;
- If X is sparse and `missing_values=0`;
- If `axis=0` and X is encoded as a CSR matrix;
- If `axis=1` and X is encoded as a CSC matrix.
实例:
import numpy as np
X = np.array([[ 1., np.nan],
[ 3., 4.],
[ 1., 2.],
[ 3., 4.],
[ 1., 2.],
[np.nan, 4.],
[ 1., np.nan],
[ 3., 4.]])
imp = Imputer()
imp.fit(X)
imp.tranform(X)
Out[199]:
array([[1. , 3.33333333],
[3. , 4. ],
[1. , 2. ],
[3. , 4. ],
[1. , 2. ],
[1.85714286, 4. ],
[1. , 3.33333333],
[3. , 4. ]])
2.待续