"Who Said Rookies Can't Analyze Data" Study Notes 2 Missing Data Processing

It is acceptable to have less than 10% missing values
 
Missing values ​​are when the value of one or some attributes in the dataset is incomplete.
 
There are various reasons for missing values, which are mainly divided into mechanical reasons and human reasons .

Mechanical reasons are missing data due to failed data collection or preservation. Such as data storage failure, memory corruption, mechanical failure

                    Data could not be collected for a certain period of time, etc.

Human reason is the lack of data due to human subjective errors, historical limitations or intentional concealment, for example, the interviewee refuses to disclose in market research

                    The answer to the relevant question, or the answer to the question, is invalid, or the data entry staff made a mistake and missed the data entry.

 
In a data table, the most common manifestation of missing values ​​is a null value or an error indicator.
 
How to quickly find all missing values:
 
1: Positioning input : start--edit--positioning conditions or directly use the shortcut key Ctrl+G, the "positioning" dialog box pops up, positioning conditions--null value--OK
 
Four ways to handle missing values:

Method 1: Use the value of a sample statistic to replace the missing value. The most typical way is to use the sample mean of the variable to replace the missing value.

               This method is a more common practical method in practice.

Method 2: Replace missing values ​​with values ​​calculated by a statistical model. Commonly used models include regression models, discriminative models, etc.

               However, this requires the use of professional data analysis software.

Method 3: Delete records with missing values, but it may reduce the sample size

Method 4: Keep records with missing values, and only make necessary exclusions in the corresponding analysis. When the sample size of the survey is relatively large,

               When the number of missing values ​​is not very large, and there is no high correlation between variables, this method is used to deal with missing values.

               relatively feasible.

 
 
2:Ctrl+Enter
Ctrl+Enter, useful when entering the same data or formula at the same time in discontinuous areas
eg:

 

 Press and hold Ctrl, select multiple discontinuous cells, release Ctrl, the data content in the last cell: "Xiaobai", press

  Ctrl+Enter, multiple discontinuous cells just selected become the same content "white".

Ctrl+Enter is used in conjunction with locate and search. After locating to a blank cell with F5 or Ctrl+G, you can enter the data you want to enter, and then press Ctrl+Enter, all blank cells will become you the way you want.
 
3: Find and replace
When the missing value is in the form of an error indicator, the second method-replacement search can be used.
Ctrl+F Find Ctrl+H Replace Ctrl+G Quick Locate

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326990081&siteId=291194637