First check data form:
data.shape
View data type and the number of non-null values of the ratio
data.info()
Use padded SimpleImputer
from sklearn.impute import SimpleImputer as si
imp_mean=si()
The default is filled with mean parameters are as follows:
- missing_values: type null. Default np.nan
Note, numpy comes fillna can only fill np.nan, and here you can specify the type of null values. For example,
?
orN/A
- strategy: 可选:mean, median, most_frequent, constant
- fill_value: what values to fill available when constant.
- copy: whether copies
When the data is continuous, generally with a mean filled. Data is categorical, filled with the number of the congregation.
? For example, when the value is null, filled with 0s:
imp_0=si(missing_values="?",strategy='constant', fill_value=0)
imp_0=imp_0.fit_transform(data_)
Note Here, data is if it is a zero-dimensional, it must first become a one-dimensional:
data_=data.列名.values.reshape(-1.1)
In addition to using the mean, 0, mode, median. It can also be used: algorithm, multiple poor make up and so on. However, as forests were filled with random explanatory relatively poor.