Data Preprocessing: Missing Value Handling

1 Introduction

Missing values ​​in data is a very tricky problem, and there is a lot of literature devoted to it. The meaning of missing data is: suppose there is n samples, each with 20 features. However, in some samples, a feature is invalid for some reason, and it does not constitute a complete sample. For such a problem, in some cases it cannot be directly discarded, and saving it is missing value processing

2. Missing value handling in features

(1) Use the mean of the available features to fill in the missing values
​​(2) Use a special value to fill the missing values, such as 0
(3) Ignore samples with missing values ​​(in the case of deletion)
(4) Use the mean of similar samples to fill in Missing values
​​(5) Predict missing values ​​using another machine learning algorithm

3. Missing data labels

(1) Delete the data directly
(2) Use Logistic Regression for data label prediction

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325802168&siteId=291194637