[Data analysis study notes] Data preprocessing

Data analysis study notes-data preprocessing

On the one hand, data preprocessing is to improve the quality of data, on the other hand, it is necessary to make the data better adapted to specific mining techniques or tools.

The main contents of data preprocessing include: data cleaning, data integration, data transformation and data specification.

The knowledge points are summarized as follows:


The main process of data preprocessing 

Data cleaning: It is mainly to delete irrelevant data, duplicate data in the original data set, smooth noise data, filter out data irrelevant to the mining theme, and deal with missing values ​​and outliers.

Data integration: The process of combining multiple data sources and storing them in a consistent data store (such as a data warehouse).

Data transformation: normalize the data and transform the data into an "appropriate" form to suit the needs of mining tasks and algorithms.

Data reduction: Complex data analysis and mining on large data sets takes a long time. The data reduction generates new data sets that are smaller but maintain the integrity of the original data. It will be more efficient to analyze and mine on the data set after specification.

Published 646 original articles · praised 198 · 690,000 views

Guess you like

Origin blog.csdn.net/seagal890/article/details/105375036