Structured data preprocessing map (basic)

1. Data cleaning

1.1 Data quality concept

Data cleaning is an important step to ensure data quality. The most important aspects of data quality are accuracy, completeness, and consistency, followed by timeliness, credibility, and interpretability.

Data quality is of different importance to different objects of data.

 

There are many sources of data quality issues:

Humanity issues:

  • Human accidental factors: such as memory deviation, measurement deviation;
  • Artificial system factors: acquisition-side factors and application-side factors.
    • Systematic errors caused by collection terminals such as people, tools, and environment, resulting in poor accuracy
    • The application side will subjectively choose attributes based on interests, which will lead to a lack of integrity.

 

Systemic issues:

  • Problems with the design of data collection methods: If the user does not fill in the information, the system compulsorily fills in the information
  • Errors in data transmission
  • ……

 

1.2 Data cleaning process and common method framework

Data preprocessing

Guess you like

Origin www.cnblogs.com/mx0813/p/12676336.html