Four Assessment Criteria for Data Quality

    Data quality is the basis for ensuring data application, and its evaluation criteria mainly include four aspects: completeness, consistency, accuracy, and timeliness. To evaluate whether the data meets the expected quality requirements, it can be judged through these four aspects.
                 
    Four evaluation criteria for data quality Data quality is the basis for ensuring data application, and its evaluation criteria mainly include four aspects, integrity, consistency, accuracy, and timeliness. To evaluate whether the data meets the expected quality requirements, it can be judged through these four aspects.   

Integrity 

Integrity refers to whether the data information is missing. The missing data may be the entire data record is missing, or the record of a certain field information in the data may be missing. The value that incomplete data can learn from will be greatly reduced, and it is also the most basic evaluation standard for data quality. The integrity of data quality is relatively easy to assess. Generally, we can assess the recorded values ​​and unique values ​​in data statistics. For example, the daily visit volume of the website log is a record value. The usual daily visit volume is around 1000, but suddenly it drops to 100 one day. It is necessary to check whether the data is missing. For another example, each area name of the geographical distribution of the website statistics is a unique value. my country includes 32 provinces and municipalities. If the unique value obtained by the statistics is less than 32, it can be judged that the data may be missing. 

Consistency 
Consistency refers to whether the data follows a unified specification and whether the data collection maintains a unified format. The consistency of data quality is mainly reflected in the specification of data records and whether the data is logical. The specification means that a piece of data exists in its specific format. For example, a mobile phone number must be a 13-digit number, and an IP address must be composed of 4 digits between 0 and 255 plus ".". Logic means that there is a fixed logical relationship between multiple pieces of data. For example, PV must be greater than or equal to UV, and the bounce rate must be between 0 and 1. General data has standard coding rules. The consistency check of data records is relatively simple, as long as the standard coding rules are met. For example, the standard coding format of the regional category is "Beijing" instead of "Beijing". It is enough to map the corresponding unique value to the standard unique value.

accuracy
Accuracy refers to whether there are anomalies or errors in the information recorded in the data. Unlike consistency, data with accuracy problems is not just rule inconsistencies. The most common errors in data accuracy are garbled characters. Second, abnormally large or small data is also ineligible data. Accuracy in data quality may exist in individual records or in entire datasets, such as order-of-magnitude record errors. Such errors can be audited using the statistics of the maximum and minimum values. General data conform to the law of normal distribution. If there are problems with some data with a small proportion, you can make a judgment by comparing the proportion of other data with a small number. Of course, if the abnormal statistical data is not significant, but there are still errors, the inspection of such values ​​is the most difficult, and it is necessary to find clues through complex statistical analysis and comparison. Here you can use some data analysis tools, then the specific data correction method not introduced here.

Timeliness 
Timeliness refers to the time interval from when data is generated to when it can be viewed, also known as the delay time of data. Timeliness does not require high data analysis itself, but if the data analysis cycle and the data establishment time are too long, the conclusions drawn from the analysis may lose their reference value.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326118829&siteId=291194637