Data mining learning | task2 EDA- exploratory data analysis

1. Study reference links: https://tianchi.aliyun.com/notebook-ai/detail?spm=5176.12586969.1002.12.1cd8593aw4bbL5&postId=95457
data exploration in machine learning is generally called the EDA (the Data Exploratory the Analysis) :

Refers to explore the data of existing data (especially inspection or observation of the raw data come) to explore a priori assumed at as little as possible by mapping, tabulation, fitting equation, the feature amount calculating means of the structure a law and data analysis.

Meaning data exploration : help us find the correlation between the data and the characteristic data for the subsequent build feature is very helpful.

So how do the EDA do? It can be extended from the following aspects ideas.

  1. Descriptive Statistics: sum () \ describe () \ mean (), etc. to view the training data, test data size, feature type, feature missing, the mean-variance characteristics, etc.
    Here Insert Picture Description

  2. Analysis of deletions, certain features may be used missingno, very clearly see the distribution of missing values. In order to follow-up model validation and analysis necessary to determine the value taken for the missing filling (filling in what way? Filling the mode, mean filling, filling 0?) Or rounding operation.
    Here Insert Picture Description
    Here Insert Picture Description

  3. Outlier analysis, also need to be removed or filled is determined. You can first see how much the skewed data, for the greater degree of deflection characteristics take culling operations.
    Here Insert Picture Description

  4. See predicted values ​​and the distribution of the unique features of each distribution. Numeric and categorical distribution analysis can be divided to the feature here. Next, the feature correlation analysis corr (). Visualization can do a lot, such as a bar graph, boxplots, frequency distribution and the like.
    Here Insert Picture Description

  5. Further features can be combined with a label correlation analysis, and correlation analysis between the features and characteristics.
    Here Insert Picture Description

Released five original articles · won praise 1 · views 57

Guess you like

Origin blog.csdn.net/weixin_39294199/article/details/105058194
Recommended