Data Governance-Data Quality Management-Data governance Day1

Table of contents

Foreword:

1. Data and quality

        1.1 What is data

         1.2 What is quality

2. Management

        2.1 What to manage

        2.2 Reduce the generation of unqualified data

        1. Automation instead of manual ledger

        2. Process optimization

        2.3 Clean unqualified data

        1. Business cleaning

        2. Data cleaning

 3. Summary

Foreword:

First of all, let's understand the seven common methods of data governance. Today, we mainly share the relevant experience of data quality management         based on personal experience. One is to record the experience of personal data governance for later viewing and review.

  1. Data classification and standardization: Classify and standardize data according to certain rules for subsequent management and use.

  2. Data metadata management: Metadata management of data, including data definition, data structure, data format, etc., to ensure data accuracy and consistency.

  3. Data quality management: Perform quality management on data, including data integrity, accuracy, consistency, reliability, etc., to ensure that data quality meets business needs.

  4. Data security management: manage and protect data security, including data access control, data encryption, data backup and recovery, etc.

  5. Data lifecycle management: manage the full lifecycle of data, including data collection, storage, processing, analysis, use, etc., to ensure data traceability and compliance.

  6. Data governance organizational structure: establish a data governance organizational structure, including data governance committees, data administrators, data quality managers, etc., to ensure the effective implementation of data governance.

  7. Data governance process management: Establish data governance process management, including data governance process design, data governance process execution, data governance process monitoring, etc., to ensure the standardization and standardization of data governance work.

1. Data and quality

        1.1 What is data

        Data refers to the symbols that record and identify objective events, and are the form and carrier of information. Data refers not only to numbers in a narrow sense, but also to symbols, text, voice, graphics, and video.
        In computer science, data refers to the general term for all symbols and media that can be input into a computer and processed by a computer program. Data is processed to become information. 

According to the report "Data Age 2025" released by IDC, the annual data generated globally will increase from 33ZB in 2018 to 175ZB, equivalent to 491EB of data generated every day.

640?wx_fmt=jpeg

         1.2 What is quality

        Data quality is the way data is measured across a set of dimensions. Just like judging the quality and cost-effectiveness of things, data also has some criteria for judging whether it is good or bad.

        Including data integrity, accuracy, consistency, reliability, etc. to ensure that data quality meets business needs .

2. Management

        2.1 What to manage

        The goal of data quality management as stated above is to ensure that data quality meets business requirements. Therefore, the joint participation of business and IT is inseparable from the two objects in data quality management. The business includes main application parts, upstream and downstream related departments. Furthermore, we found that as long as our data conforms to the specified business, we consider it good data , otherwise it is bad data. In this way, we have a clear standard for the quality of data.

name definition
Target Ensure data quality meets business needs
standard

Good Data: Compliant with Business Regulations

Bad Data: Doesn't Meet Business Regulations

Influence

Good data: assisting decision-making and making intelligent decision-making possible

Bad data: Loss of trust in data prevents management from making decisive decisions

         Therefore, management is to make the data conform to the business regulations and prevent the data from deviation, that is, to reduce the generation of unqualified data + clean the unqualified data .

        2.2 Reduce the generation of unqualified data

        Reducing unqualified data is generally divided into two steps

        1. Automation instead of manual ledger

        Paperless office, let machines replace manual recording of production data. For example, digital transformation, iot, cloud computing, big data and other systems to support the automatic and standardized collection of data.

        2. Process optimization

        To sort out the enterprise process, the preliminary research should be comprehensive and specific. First, ask the leader's needs ( business control points ) and then combine with the specific business needs ( business process integrity and standardization ).

        Often there will be a running-in period of 1-3 months when a new system or new function is launched. Generally, an efficient enterprise will expose the insufficient support of the system within one month after going online, and solidify the process up.

        The machine solves the problems of efficiency and manual input errors, standardization solves whether the data can meet business needs, and reduces the generation of garbage data.

        2.3 Clean unqualified data

        There are also two steps in this process:

        1. Business cleaning

        Through the previous step, we have a more in-depth understanding of the business of the enterprise. This is also the stage when data governance can really play its value, after the simple "data collection" in the early stage.

        We will find that there are many unreasonable places in the existing business flow due to many objective and subjective factors. For example: in order to reduce the workload and cost, multiple samples will be selected to print only one label, and mixed inspection once, which will inevitably reduce the workload.

        But just like nucleic acid testing, when a problem is found, it is necessary to re-sample the mixed batch of samples inside, which is equivalent to twice the time when the abnormality control is discovered. For the process manufacturing industry, the cost caused by an abnormality is the previous More than 10 times the cost of testing alone.

        Let’s look at another case. In the past, when there was no digital means, the entire closed-loop process of abnormal control took more than a month to be completely processed. Because the whole process of abnormality from occurrence, determination, correction, effect evaluation, repeatability inspection, and effect verification is quite complicated. Often due to process performance problems, the closed-loop time of the entire process is prolonged. Therefore, in order to promote business development, real-time process reminders are provided, business judgments are moved forward, and part of the decision-making power is delegated to speed up process flow.

        Through a series of business self-cleaning, the process will be better and the business flow will be smoother.

        2. Data cleaning

        In the past big data construction process, data cleaning was mostly considered to be an IT problem. I think these are things that IT can do. But through business practice, it is found that IT's data cleaning method is really just a tool.

        During the recent digital construction process, it was found that there are two types of problems in practical application.

        2.1 Incompleteness of digital promotion

        In order to ensure the "normal operation" of the business under abnormal conditions (network disconnection, power failure, system downtime), a manual method is usually reserved to handle exceptions.

        If there is a problem with the label printing system, manual EXCEL printing is allowed. After the manual data entry into the system, if the operator is familiar with the business and the system, the quality of the data can also be guaranteed.

        But in reality, the human error rate is often as high as 60-70%. Because for high-quality data applications, it is usually because the entered data does not conform to this specification, the system makes the data unusable, such as entering the batch number, missing the model, and the wrong digit of the year, month, and day. For example, the batch number of the system specification is: F1-20230407-WH503, and it will generally be F1-23047-WH503, F1-2023407-WH503, F1-2023047-WH503, etc. during manual entry.

        Therefore, if an enterprise allows manual data entry, it needs to add a data verification mechanism to the system, review the manual data entry through the process, and allow the data to enter the system after the review is completed.

        2.2 Demands for continuous excavation

        In the past, we often talked about data mining. In fact, the demand was actually discovered step by step. From the beginning, we only needed simple data statistics→data analysis→data decision-making→data AI reminder, etc.

        In this set of process, it is because we have collected the data, and after displaying the data through the display platform, we found that these data are still difficult to replace the previous offline data statistics, so the business will propose such as the need for online data display. Integrating multiple "same" batches of data into one batch, integrating multiple batches of detection data into one batch, displaying only one piece of data for multiple batches of data, automatically distinguishing differences between batch data and standards, etc., when the data After the abnormality, it is automatically pushed to the real-time applications such as enterprise WeChat.

        If you need to transfer the data of multiple batches of 1-H1-8-F1-4-08004-006CC, 1-H1-8-F1-4-08004-006FC, 1-H1-8-F1-4-08004-006HF Data cleaning according to the above requirements

                                                         Multi-batch cleaning

Multi-test item cleaning

 Abnormal real-time push

 3. Summary

        Therefore, data governance work is more of a management work. According to industry and business needs, business governance should be the leading factor to promote data governance, ensure the quality of data, and generate more good data to assist intelligent decision-making. In the next section, let’s talk about how to make business enthusiastically participate in data governance.

Guess you like

Origin blog.csdn.net/qq_29061315/article/details/129881649
Recommended