缺失值的处理方法总结

1.sklearn.preprocessing.Imputer

将数据中的缺失值按某种方法填充。

Imputer(missing_values='NaN', strategy='mean', axis=0, verbose=0, copy=True)
Parameters
----------
missing_values : integer or "NaN", optional (default="NaN")
    The placeholder for the missing values. All occurrences of
    `missing_values` will be imputed. For missing values encoded as np.nan,
    use the string value "NaN".
strategy : string, optional (default="mean")
    The imputation strategy.
    - If "mean", then replace missing values using the mean along
      the axis.
    - If "median", then replace missing values using the median along
      the axis.
    - If "most_frequent", then replace missing using the most frequent
      value along the axis.
axis : integer, optional (default=0)
    The axis along which to impute.
    - If `axis=0`, then impute along columns.
    - If `axis=1`, then impute along rows.
verbose : integer, optional (default=0)
    Controls the verbosity of the imputer.
copy : boolean, optional (default=True)
    If True, a copy of X will be created. If False, imputation will
    be done in-place whenever possible. Note that, in the following cases,
    a new copy will always be made, even if `copy=False`:
    - If X is not an array of floating values;
    - If X is sparse and `missing_values=0`;
    - If `axis=0` and X is encoded as a CSR matrix;
    - If `axis=1` and X is encoded as a CSC matrix.
 

实例:

import numpy as np

X = np.array([[ 1., np.nan],
       [ 3.,  4.],
       [ 1.,  2.],
       [ 3.,  4.],
       [ 1.,  2.],
       [np.nan,  4.],
       [ 1., np.nan],
       [ 3.,  4.]])


imp = Imputer()

imp.fit(X)

imp.tranform(X)

Out[199]: 
array([[1.        , 3.33333333],
       [3.        , 4.        ],
       [1.        , 2.        ],
       [3.        , 4.        ],
       [1.        , 2.        ],
       [1.85714286, 4.        ],
       [1.        , 3.33333333],
       [3.        , 4.        ]])

2.待续

猜你喜欢

转载自blog.csdn.net/zs15321583801/article/details/81545295
今日推荐