A must for data analysts - data preprocessing mind map (data exploration)

Foreword:

as a data analyst. First of all, when we get the data, we must have a data processing framework in mind, or a data processing template. When we deeply remember the next data processing template in our minds, and understand each module in detail, we can overcome it one module at a time. In this way, it will make us more handy in learning data analysis. But note that, just like the English composition template, it should not be too dead when applied. The same is true of data analysis, or data analysis is carried out according to our actual needs.


Background of data preprocessing:

Usually, when we get the data, it is difficult for the data to reach what we expected, such as: missing data, accuracy problems, too many indicators, etc. It is always through a series of analysis that data manipulation can get the data we want. So, at this time, an important step comes - data preprocessing. Personally, data preprocessing feels very important and data quality is the life of data. Data preprocessing is the key to controlling data quality. The above data preprocessing flow chart is based on my reference and literature summary (there are many data preprocessing versions for reference). Data preprocessing is mainly divided into five steps: data exploration, data cleaning, data integration, data reduction, and data transformation. Don't worry if you don't understand some professional terms, know about these steps first. I will explain it one by one later.

The first step of data preprocessing - data exploration phase:

First go to the data exploration step diagram, first do a preliminary understanding


After we observe, investigate and collect the initial sample data set, the next question that must be considered: Does the quantity and quality of the sample data set meet the requirements of the model's architecture? Are there data states that were never envisaged? Are there any obvious patterns and trends? What is the relationship between the factors? The data exploration phase is to solve these problems. It should be well understood here, not much bb. What I would like to add is that, in fact, the data exploration stage is in our later data mining, and the concept of mining through interesting patterns of data is very similar. Simply put, in most application data scenarios, we get the data without knowing the meaning, rules, and value behind it. At this time, we need to perform interesting pattern mining on the data. (Haha~ Data mining is an advanced chapter for data analysts. Let’s talk about it first.)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325525307&siteId=291194637