sklearn fill the missing values (summary)

First check data form:

data.shape

View data type and the number of non-null values ​​of the ratio

data.info()

Use padded SimpleImputer

from sklearn.impute import SimpleImputer as si
imp_mean=si()

The default is filled with mean parameters are as follows:

  • missing_values: type null. Default np.nan

Note, numpy comes fillna can only fill np.nan, and here you can specify the type of null values. For example, ?orN/A

  • strategy: 可选:mean, median, most_frequent, constant
  • fill_value: what values ​​to fill available when constant.
  • copy: whether copies

When the data is continuous, generally with a mean filled. Data is categorical, filled with the number of the congregation.

? For example, when the value is null, filled with 0s:

imp_0=si(missing_values="?",strategy='constant', fill_value=0)
imp_0=imp_0.fit_transform(data_)

Note Here, data is if it is a zero-dimensional, it must first become a one-dimensional:

data_=data.列名.values.reshape(-1.1)

In addition to using the mean, 0, mode, median. It can also be used: algorithm, multiple poor make up and so on. However, as forests were filled with random explanatory relatively poor.

Guess you like

Origin www.cnblogs.com/heenhui2016/p/10987948.html