The Road to Big Data, Alibaba Big Data Practice Reading Notes --- Chapter 15, Data Quality

  • With the transformation of IT to the DT era, the importance of data is self-evident, and the application of data is becoming more and more prosperous. Data is playing an extremely important role. For data that is increasingly valued, how to ensure its quality is a topic of concern;

  • Data quality is the basis for the validity and accuracy of data analysis conclusions, and it is also the premise of everything. How to guarantee digital quality and ensure data availability is a link that cannot be ignored in the construction of Alibaba's data warehouse.

 

1. Data quality assurance principles

  • Evaluation from four aspects

    • Completeness

    • accuracy

    • consistency

    • Timeliness

 

  • 1. Integrity

    • Integrity refers to whether the data records and information are complete and whether there are any missing cases. But the data mainly includes the information of a certain field in the record, but both will cause the statistical results to be inaccurate, so integrity is the most basic guarantee of data quality. For example, in the Bi transaction, the number of payment orders per day is about 100W. If the payment order suddenly drops by 1W on a certain day, it is likely that the record is missing. For the lack of information in a field in the record, such as the product ID and seller ID of the order, they must exist. The number of null values ​​in these fields must be 0. Once greater than 0, the integrity constraint must be violated;

  • 2. Accuracy

    • Accuracy refers to whether the information and data recorded in the data are accurate and whether there are differences

Guess you like

Origin blog.csdn.net/u012965373/article/details/105548880