Data Governance Professional Certification CDMP Study Notes (Mind Map and Knowledge Points) - Chapter 13 Data Quality

Hello everyone, I am Dugufeng, a former port coal worker, currently working as the person in charge of big data in a state-owned enterprise, and the author of the official account big data flow. In the last two years, because of the needs of the company and the development trend of big data, I began to learn about data governance.

Data governance requires systematic learning to truly master, and professional examination certification is also required to prove one's learning ability and knowledge mastery in data governance. If you have any questions about data governance and data governance certification CDMP, you can refer to my previous article for a detailed introduction.

5000 words explain how to get started with data governance (with international data governance certification exam-CDMP study group)

What exactly is CDMP - a super-comprehensive introduction to the international certification of data governance

Total text: 10272 words 11 pictures

Estimated reading time: 26 minutes

This document is based on the collation of learning materials related to data governance, and is collated for the study notes (mind map and knowledge points) of the data governance professional certification CDMP .

The article is long, it is recommended to read after bookmarking.

For subsequent documents, please pay attention to  the big data flow of the official account , and will continue to update~

This document is part of data quality management and is divided into 5 parts.

Due to the display of the page, some levels cannot be fully expanded. The structure is shown in the figure below.


4ac91f95d64372ac8ec86bcb27615e59.png

1. Overview of Data Quality

Data should be of high quality.

Factors contributing to poor data quality : Lack of organizational understanding of the impact of low-quality data, lack of planning, siled system design, inconsistent development processes, incomplete documentation, lack of standards or lack of governance, etc.

No organization has perfect business processes, perfect technical processes, or perfect data management practices, and all organizations experience issues related to data quality . Organizations that implement formal data quality management experience fewer problems than those that do not.

Data quality management is not a project, it is not a one-shot job, but a continuous work . Long-term success depends on the change of organizational culture and the establishment of quality concept. High-quality data is not an end in itself, it's just a means to an organization's success.

The data quality context diagram is as follows:

d064a4d02974d0f30b8ca99595628cf4.png

Business Drivers :

1) Increase the value of organizational data and opportunities for data utilization.

2) Reduce risks and costs caused by low-quality data.

3) Improve organizational efficiency and productivity.

4) Protect and enhance the reputation of the organization.

Consequences of poor quality data:

1) Failure to invoice correctly.

2) Increase the number of customer service calls and reduce the ability to solve problems.

3) Loss of income due to missed business opportunities.

4) Affect the progress of integration after mergers and acquisitions.

5) Increased risk of fraud.

6) Losses from bad business decisions driven by bad data.

7) Loss of business due to lack of good standing.

Goal :

1) Based on the needs of the data consumers, develop a governed approach to adapting the data to the requirements.

2) Define standards and specifications for data quality control as part of the entire data lifecycle.

3) Define and implement a process for measuring, monitoring and reporting data quality levels.

Principle :

1) Importance. Prioritize improvements based on the criticality of the data and the level of risk if the data is incorrect.

2) Full life cycle management. Data quality management should cover the full lifecycle of data from creation or procurement to disposal.

3) Prevention. The focus should be on preventing data errors and reducing data availability, etc.

4) Root cause correction. Changes to processes and the systems that support them are needed, not just understood and addressed at the surface.

5) Governance. Data governance activities must support the development of high-quality data, and data quality planning activities must support and maintain a governed data environment.

6) Standard drive. Quantifiable data quality requirements should be defined in the form of measurable standards and expectations.

7) Objective measurement and transparency. Data quality levels need to be measured objectively and consistently.

8) Embed business processes. Business process owners are responsible for the quality of data generated through their processes, and they must enforce data quality standards in their processes.

9) System enforcement. System owners must make the system enforce data quality requirements.

10) Linked to service level. Data quality reporting and problem management should be included in the service level agreement (SLA).

Activities :

1. Define high-quality data.

2. Define a data quality strategy.

3. Identify key data and business rules. (1) Identify key data. (2) Identify existing rules and patterns.

4. Perform an initial data quality assessment. (1) Identify and prioritize issues. (2) Perform a root cause analysis of the problem.

5. Identify and prioritize areas for improvement. (1) Prioritize actions based on business impact. (2) Develop preventive and corrective measures. (3) Confirm the planned action.

6. Define data quality improvement goals.

7. Develop and deploy data quality operations. (1) Develop data quality operating rules. (2) Correct data quality defects. (3) Measure and monitor data quality. (4) Report data quality levels and findings.

d23c51b9a23f01be08fd732afae12106.png

2. Basic concepts

1. Data quality

Data quality refers to the relevant characteristics of high-quality data and also refers to the process used to measure or improve data quality.

One of the challenges of data quality management is that the expectations related to quality are not always known.

2. Key data

Expectations related to quality are not always known. Usually customers may not be clear about their own quality expectations, and data managers will not ask.

Data quality management focuses improvement on the data that is most important to the organization and its customers. Possibly 1) Regulatory reporting. 2) Financial reports. 3) Commercial policy. 4) Going concern. 5) Business strategy, especially differentiated competitive strategy.

3. Data quality dimensions*

A data quality dimension is a measurable characteristic of data.

Data Quality Dimensional Framework - Strong-Wang Framework (1996)

The Strong-Wang framework (1996) describes 15 indicators in 4 categories of data quality .

(1) Intrinsic data quality. 1) Accuracy. 2) Objectivity. 3) Credibility. 4) Credibility.

(2) Scene data quality. 1) Value-added. 2) Relevance. 3) Timeliness. 4) Integrity. 5) Appropriateness.

(3) Express data quality. 1) Interpretability. 2) Understandability. 3) Express consistency. 4) Simplicity.

(4) Access data quality. 1) Accessibility. 2) Access security.

Redman's "Data Quality in the Information Age" defines a data item as a "representable triplet": a collection of entity attribute fields and values. Dimensions can be associated with any component of data: models (entities and attributes) and their values. Defines a class of representation dimensions for recording data item rules. In three categories, 20 dimensions are described :

(1) Data model

◆ 1) Content. ① Data relevance. ②The ability to capture value. ③ Define clarity. ◆ 2) Level of detail. ① Feature description granularity.

(2) Accuracy of attribute domain

◆ 1) Composition. ① Naturalness. Each property should have a simple counterpart in the real world, and each property should carry a single fact about the entity. ② Recognizability. Each entity should be distinguishable from other entities. ③ identity. ④ minimum necessary redundancy.

◆ 2) Consistency. ①Semantic consistency of each component of the model. ② Structural consistency of attributes across entity types.

◆ 3) Adaptability. ①Robustness. ②Flexibility.

◆ 4) Data value. ① Accuracy. ② Completeness. ③ Timeliness (Currency). ④ Consistency.

◆ 5) Data expression. ① Appropriateness. ② Interpretability. ③ portability. ④ format accuracy. ⑤ format flexibility. ⑥ The ability to express null values. ⑦ Effective use of storage. ⑧The physical instance of the data is consistent with its format.

Larry Englist's "Improving Data Warehousing and Business Information Quality" proposes two categories: inherent features and practical features .

(1) Intrinsic quality characteristics 1) Consistency of definition. 2) The completeness of the value range. 3) Validity or business rule consistency. 4) Accuracy of data sources. 5) Accuracy that reflects reality. 6) Accuracy. 7) Non-redundancy. 8) Equivalence of redundant or distributed data. 9) Redundant or distributed data concurrency.

(2) Practical quality characteristics 1) Accessibility. 2) Timeliness. 3) Context clarity. 4) Availability. 5) Integratability of multi-source data. 6) Properness or factual integrity.

In 2013, DAMA UK described 6 core dimensions of data quality :

1) Completeness. Percentage of stored data volume to potential data volume.

2) Uniqueness. Entity instances (things) should not be recorded multiple times on the basis of satisfying object recognition.

3) Timeliness. The degree to which data represent reality as of the requested point in time.

4) Effectiveness. Data is valid if it conforms to its defined syntax (format, type, range).

5) Accuracy. The degree to which data correctly describe the "real world" object or event being described.

6) Consistency. Compare the differences between various expressions and definitions of things.

The DAMA UK white paper also describes other properties that have an impact on quality.

1) Usability

2) Timing Issues (beyond timeliness itself).

3) Flexibility.

4) Confidence.

5) Value.

Common data quality dimensions :

bf9496a19b35546a0a8260fcc6a6b04d.png

4. Data quality and metadata

Metadata is critical to managing data quality.

5. Data quality ISO standard.

ISO8000

include:

1. Data quality planning

2. Data quality control

3. Data quality assurance

4. Data quality improvement

6. Data Quality Improvement Lifecycle

A version of the Deming cycle: plan-do-check-act

1) Plan (Plan) stage. The data quality team assesses the scope, impact, and priority of known issues and evaluates alternatives to address them.

2) Execution (Do) phase. The data quality team is responsible for working to resolve the root causes of issues and making plans for ongoing monitoring of the data.

3) Check (Check) stage. This phase includes actively monitoring the quality of the data measured as required. threshold.

4) Processing (Act) stage. This phase refers to activities to address and resolve emerging data quality issues.

A new cycle of the Deming cycle begins when: ① The existing measured value is lower than the threshold. ② New datasets are under investigation. ③ Put forward new data quality requirements for existing data sets. ④ Changes in business rules, standards, or expectations.

7. Types of data quality business rules

Business rules describe how the business should run internally in order to successfully align with the outside world.

Data quality business rules describe how useful and usable data exists within an organization.

Common business rule types:

1) Define consistency.

2) Numerical existence and record completeness. Rules defining whether missing values ​​are acceptable.

3) Format compliance.

4) Value range matching.

5) Scope consistency.

6) Mapping consistency. Represents the value assigned to the data element, which must correspond to a selected value mapped to an otherwise equivalent corresponding value field.

7) Consistency rules. Refers to a conditional determination of a relationship between two (or more) attributes, based on the actual values ​​of those attributes.

8) Verification of accuracy.

9) Uniqueness verification.

10) Timeliness verification. Rules that indicate characteristics related to data accessibility and usability expectations.

8. Common causes of data quality problems

(1) Problems caused by lack of leadership.

Common sense and research suggest that many data quality problems result from a lack of organizational commitment to high-quality data, which itself is a lack of leadership in the form of governance and management.

Barriers and root causes of managing information as a business asset:

8c26716e3fd6a684831e8558c890d28b.png

Barriers to effectively managing data quality include: 1) Lack of awareness among leaders and employees. 2) Lack of governance. 3) Lack of leadership and management skills. 4) Difficult to justify improvements. 5) The tools for measuring value are inappropriate or ineffective.

(2) Problems caused by the data entry process.

1) Data input interface problem. 2) List entry placement. 3) Field overloading. 4) Training issues. 5) Changes in business processes. 6) Business process execution is chaotic.

(3) Problems caused by data processing functions.

1) Incorrect assumptions about the source of the data. 2) Outdated business rules. 3) Changed data structure.

(4) Problems caused by system design.

1) Failure to enforce referential integrity. Causes: ① Generate duplicate data that violates unique constraints. ② Orphan data in some reports can be included or excluded, causing the same calculation to generate multiple values. ③ Unable to upgrade due to reverted or changed referential integrity requirements. ④ Data accuracy due to missing data being assigned as default values.

2) Unique constraints are not enforced.

3) Coding inaccuracies and disagreements.

4) The data model is not accurate.

5) Field overloading.

6) Time data mismatch.

7) Weak master data management.

8) Data replication. Harmful data replication problems mainly include: ①Single source - multiple local instances. ②Multiple sources - single local instance.

(5) Solve the problem caused by the problem.

9. Data Analysis

A form of data analysis used to examine data and assess quality.

Data profiling uses statistical techniques to discover the true structure, content, and quality of data collections. Statistics identification. Analysis across columns. Inter-table analysis. Solving problems also requires other forms of analysis.

Statistics identify patterns of problems, such as: 1) Number of nulls. Identify the presence of null values ​​and check whether null values ​​are allowed. 2) Maximum/minimum value. Identify outliers, such as negative values. 3) Maximum/minimum length. Identify outliers or invalid values ​​for fields with specific length requirements. 4) Frequency distribution of individual column values. Be able to assess plausibility. 5) Data type and format.

10. Data quality and data processing

Data quality can be improved through some form of data processing:

(1) Data cleansing or data cleansing.

Data can be transformed to conform to data standards and domain rules. Cleaning involves detecting and correcting data errors to bring data quality to an acceptable level.

Data cleansing needs can be addressed by: 1) Implementing controls to prevent data entry errors. 2) Correct the data in the source system. 3) Improve the business process of data entry. It is less expensive to make corrections through the midstream system.

(2) Data augmentation.

The process of adding attributes to a dataset to improve its quality and usability.

Examples of data augmentation:

1) Timestamp. Useful for tracking historical data events. Time frame for targeting issues

2) Audit data. Auditing can document data lineage, which is important for history tracking and verification.

3) Refer to the Glossary. Business-specific terminology, ontologies, and vocabularies enhance data understanding and control in customized business contexts.

4) Context information. Context contextual information for review and analysis.

5) Geographic information. Enhancing geographic information through address normalization and geocoding

6) Demographic information. Customer data can be enhanced with demographic information such as age, marital status, gender, income or ethnic code.

7) Psychological information. Data used to segment target groups by specific behaviors, habits, or preferences, such as product and brand preferences, organization membership, leisure activities, commuting modes of transportation, shopping time preferences, etc.

8) Evaluation information. Use this enhancement for asset valuations, inventory and sales data, and more.

(3) Data parsing and formatting.

The analytical process of interpreting its content or value using predetermined rules.

Data quality tools parse any data value that fits these patterns and transform it into a single normalized form, simplifying the process of evaluation, similarity analysis, and remediation. Schema-based parsing can automatically identify and facilitate the normalization of meaningful value components. Such as formatting telephone numbers as area codes, exchange codes, and terminal codes.

(4) Data conversion and standardization.

Guide rule-based transformations by mapping data values ​​in raw formats and schemas to target representations.

The resolved components in the schema will be rearranged, corrected or any changes according to the rules in the knowledge base.

63ead48ee55f0a76fa34efee886caaea.png

3. Activities

[Activity 1] Define high-quality data.

Based on a set of questions , understand the current state and assess the organization's readiness for data quality improvement.

1) What does "high quality data" mean?

2) What is the impact of low quality data on business operations and strategy?

3) How can higher quality data empower business strategies?

4) What priorities need to be driven by data quality improvement?

5) What is the tolerance for low quality data?

6) What is the governance in place to support data quality improvements?

7) What is the supporting governance structure?

Get a comprehensive look at the current state of data quality, exploring the issue from different angles:

1) Understand business strategy and goals.

2) Interview with stakeholders to identify pain points, risks and business drivers.

3) Direct assessment of data through data collection and other forms of profiling.

4) Record the data dependencies in the business process.

5) Record the technical architecture and system support of the business process.

[Activity 2] Define the data quality strategy.

Must be aligned with business strategy. A frame should include the following methods:

1) Understand and prioritize business needs.

2) Determine the key data to meet business needs.

3) Define business rules and data quality standards based on business requirements.

4) Evaluate the data against expectations.

5) Share findings and get feedback from stakeholders.

6) Prioritize and manage issues.

7) Identify and prioritize improvement opportunities.

8) Measure, monitor and report on data quality.

9) Manage metadata generated through the data quality process.

10) Integrate data quality control into business and technical processes.

[Activity 3] Identify key data and business rules.

Data quality improvement initiatives typically start with master data.

Determine the key data, and then identify business rules that can describe or imply the requirements for data quality characteristics.

Data quality metrics revolve around whether data is being used appropriately.

Describe the rules (field x is mandatory and must have a value) and results through quality indicators (but in fact, 3% of the records in this field have no value, and the completeness rate is only 97%).

[Activity 4] Perform an initial data quality assessment.

The goal is to understand the data in order to define actionable improvement plans. It's usually best to start with a smaller work of focus (POC).

Steps include:

1) Define the objectives of the assessment. These goals will drive the work forward.

2) Identify the data to evaluate. The focus should be on a small data set, or even a single data element, or a specific data quality issue.

3) Identify the purpose of the data and the users of the data.

4) Use the data to be assessed to identify known risks, including the potential impact of data problems on organizational processes.

5) Check the data against known and suggested rules.

6) Document the level of inconsistency and the type of issue.

7) Conduct additional in-depth analysis based on initial findings in order to 1 quantify the results, 2 optimize the problem based on business impact, and 3 develop a hypothesis about the root cause of the data issue.

8) Meet with data stewards, domain experts and data consumers to identify issues and priorities.

9) Use findings as a basis for planning. 1. Solve the problem, preferably by finding the root cause of the problem; 2. Control and improve the processing process to prevent the problem from recurring; 3. Continuous control and reporting.

【Activity 5】Identify and prioritize improvement directions.

Potential improvement measures need to be identified and prioritized.

Identification can be done through comprehensive data analysis of larger data sets to understand the breadth of existing problems, or through other means such as communicating with stakeholders about the impact of data and tracking the impact of these problems. business impact.

Discuss to determine the order. Steps: Define goals, understand data usage and risks, measure against rules, document and validate results with domain experts, use this information to prioritize remediation and improvement efforts. Determining impacts requires the involvement of stakeholders along the data chain.

【Activity 6】Define data quality improvement goals.

Blockers: System limitations, age of data, ongoing projects using problematic data, overall complexity of the data environment, resistance to cultural change.

Improving data has to have a positive return on investment, nobody cares about the level of field integrity unless there is a business impact.

Determine improved ROI based on:

1) The criticality of the affected data (ranking of importance).

2) The amount of data affected.

3) The age of the data.

4) The number and type of business processes affected by the problem.

5) Number of consumers, customers, suppliers, or employees affected by the problem.

6) Risks associated with the problem.

7) The cost of correcting the root cause.

8) Potential job costs.

【Activity 7】Develop and deploy data quality operations

[Activity 7-1] Develop and deploy data quality operations - manage data quality rules.

Predefined rules will:

1) Set clear expectations for data quality characteristics.

2) Provide system editing and control requirements that prevent the introduction of data problems.

3) Provide data quality requirements to suppliers and other external parties.

4) Create a foundation for ongoing data quality measurement and reporting.

Manage rules as metadata:

1) Consistency of records.

2) According to the definition of data quality dimension. Quality dimensions help people understand what is being measured. Consistent application of dimensions will facilitate the process of measuring and managing problems.

3) Link to business impact. Metrics that are not related to the business process should not be taken

4) Data analysis support. Data quality analysts should not guess rules, but test them against real data.

5) Confirmed by domain experts. Knowledge arises when subject matter experts confirm or interpret the results of data analysis.

6) Accessible to all data consumers.

[Activity 7-2] Development and Deployment - Measuring and Monitoring Data Quality

Reasons: 1) To inform data consumers of the quality level. 2) Managing business or technical processes, changes introduce change risks.

Knowledge gained from past problems should be applied to risk management.

The measurement results can be described in two levels: detailed information related to the execution of individual rules and the overall results of rule aggregation. Measurement formula: valid data quality = (total number of tests - number of abnormalities) / total number of tests; invalid data quality = number of abnormalities / total number of tests.

Data quality rules provide the basis for operational management of data quality.

By incorporating the control and measurement process into the information processing process for continuous monitoring, the consistency of data quality rules can be automatically monitored through process or batch processing, and measured at three levels of granularity: data element value, data instance or records, datasets.

Examples of data quality metrics:

8d1df0fcabf44fc511908f1f2538554e.png

Data quality monitoring technology:

665d238e1518955bb2b29b6edbc36b18.png

[Activity 7-3] Development and Deployment—Develop an operational process for managing data issues.

step:

(1) Diagnose the problem. 1) Look at data issues under the appropriate information handling process and isolate where defective processes occur. 2) Assess if there are any environmental changes that might cause the error. 3) Assess whether there are other process issues that caused the data quality incident. 4) Determine whether there are problems affecting data quality in external data.

(2) Formulate a remedial plan. 1) Correct non-technical root causes such as lack of training, lack of leadership support, unclear accountability and ownership, etc. 2) Modify the system to eliminate the technical root cause. 3) Develop controls to prevent problems from occurring. 4) Introduce additional checks and monitoring. 5) Correct the defective data directly. 6) Based on a value analysis of the cost and impact of the change versus the corrected data, no action is taken.

(3) Solve the problem. 1) Evaluate the relative costs and benefits of alternatives. 2) An alternative in the referral program. 3) Provide a plan for developing and implementing the solution. 4) Implement the solution.

Effective tracking requires the following:

1) Standardize data quality issues and activities.

2) Provide an assignment process for data issues. Operational procedures guide analysts in assigning data quality incidents to individuals for diagnosis and providing solutions. Those with specific domain expertise are recommended to facilitate the assignment process within the Incident Tracking System.

3) Manage the issue escalation process. Handling data quality issues requires establishing a clear escalation mechanism based on the impact, duration, or urgency of the issue, and clearly specifying the escalation sequence in the data quality service level agreement (SLA).

4) Manage data quality solution workflow. Data quality service level agreements (SLAs) specify objectives for monitoring, control, and resolution, all of which define the set of operational workflows. An incident tracking system can support workflow management to track the progress of problem diagnosis and resolution.

【Activity 7-4】Development and Deployment—Develop a data quality service level agreement.

SLAs specify the organization's expectations for response and remediation of data quality issues in each system.

Data quality control operations include:

1) The data elements covered by the protocol.

2) The business impact associated with the data defect.

3) Data quality indicators associated with each data element.

4) Identify quality expectations in each application system along the data value chain, starting from the data elements for each identified indicator.

5) A method of measuring these expectations.

6) The acceptability threshold for each measurement.

7) If the acceptability thresholds are not met, the data stewardship officer should be notified.

8) Expected time and deadlines for resolving or remediating issues.

9) Upgrade strategy, and possible rewards and penalties.

【Activity 7-5】Development and Deployment——Write a data quality report.

The report should have:

1) Data Quality Scorecard. Scores related to various metrics can be provided from a high-level perspective and reported to different levels of the organization within established thresholds.

2) Data quality trends. Shows how data quality is being measured over time and whether the data quality trend is upward or downward.

3) Service Level Agreement (SLA) indicators.

4) Data quality issue management. Monitor the status of issues and solutions.

5) Alignment of data quality teams with governance policies.

6) Alignment of data quality policies between IT and business teams.

7) Improve the positive impact of the project.

7bc1efc5308a1983c3db36155857b5eb.png

4. Tools, Methods, Implementation Guidelines

1. Tools

1) Data analysis tools. High level statistics.

2) Data query tools. In-depth query data.

3) Modeling and ETL tools. have a direct impact on data quality.

4) Data quality rule template. Capture what customers expect from data.

5) Metadata repository. The definition of high-quality data is a way of presenting the value of metadata.

2. Method

1. Preventive measures.

The best approach is to prevent low-quality data from entering the organization.

Prevention methods include:

1) Establish data entry controls. 2) Training data producers. 3) Define and enforce rules. 4) Require data suppliers to provide high-quality data. 5) Implement data governance and management systems. 6) Establish formal change control.

2. Corrective action.

Data quality should be addressed systematically and at its root, minimizing the cost and risk of corrective action.

"Fix it in place" is a best practice in data quality management.

There are generally three ways to perform data correction:

1) Automatic correction. 2) Manual inspection and correction. 3) Manual correction.

3. QA and audit code modules. Create shareable, linkable, and reusable code modules that developers can fetch from repositories to repeat data quality checks and audits, simplifying maintenance and preventing data quality issues.

4. Effective data quality indicators. 1) Measurability. 2) Business relevance. 3) Acceptability. 4) Accountability/management system. 5) Controllability. 6) Trend analysis.

5. Statistical Process Control SPC. SPC is based on the assumption that when a process with consistent inputs is executed consistently, it will produce consistent outputs. It uses measures of central tendency and variability around a central value to determine the tolerance for bias in a process.

SPC measures the predictability of process outcomes by identifying changes in the process.

6. Root cause analysis. Common root cause analysis techniques include Pareto analysis (80/20 rule), fishbone diagram analysis, track and trace, process analysis, and five whys.

3. Implementation Guide

Most data quality project implementations require planning:

1) Indicators about the value of data and the cost of low-quality data.

2) The operational model for IT/business interaction. Business people understand what data means and why it matters, and IT data managers understand where and how data is stored.

3) Changes in the way projects are executed.

4) Changes to business processes.

5) Provide funding for remedial and improvement projects.

6) Fund data quality operations.

1. Readiness assessment, risk assessment

An organization's readiness to adopt data quality practices can be assessed by the following characteristics:

1) Management commitment to managing data as a strategic asset.

2) The organization's current understanding of data quality. Obstacles and pain points.

3) The actual situation of the data. Describing the data situation that is causing the pain point in an objective way is the first step to improving the data. Quantification measures and describes data.

4) Risks associated with data creation, processing or use.

5) Culture and technology readiness for scalable data quality monitoring. Data quality can be negatively affected by business and technical processes.

2. Organizational and cultural change

Data quality cannot be improved with tools and slogans, but by helping employees and stakeholders develop a mindset of continuous action, while always considering data quality and the needs of the business and customers. Getting an organization to take data quality seriously often requires major cultural changes. This kind of change requires vision and leadership from leaders.

The first is to raise awareness of the role and importance of data to organizations. All employees must handle and raise data quality issues responsibly, demand high-quality data from a consumer perspective, and provide quality information to others. Ultimately, if employees are to generate higher quality data and manage it in a way that ensures quality, they need to think and act differently, and this requires training and reinforcement.

Training should focus on:

1) Common causes of data problems.

2) Organizing relationships in data ecosystems and why improving data quality requires a holistic approach.

3) Consequences of bad data.

4) The need for continuous improvement (why improvement is not a one-off).

5) "Data language" is required to explain the impact of data on organizational strategy and success, regulatory reporting and customer satisfaction.

65db7e174bf8d15da4f037ada72183cf.png

5. Data Quality and Data Governance

Integrating data quality into overall governance efforts enables data quality program teams to work with a range of stakeholders and enablers to :

1) Risk and security personnel can help identify organizational weaknesses related to data.

2) Business process engineering and training staff that can help teams implement process improvements.

3) Business and operational data stewards and data owners who can identify critical data, define standards and quality expectations, and prioritize data issues.

Governance organizations can expedite the work of data quality initiatives by :

1) Set priorities.

2) Identify and coordinate persons authorized to participate in various data quality-related decisions and related activities.

3) Develop and maintain data quality standards.

4) Report on relevant measures of data quality across the enterprise.

5) Provide guidance that facilitates employee engagement.

6) Establish a communication mechanism for knowledge sharing.

7) Develop and apply data quality and compliance policies.

8) Monitor and report performance.

9) Share data quality inspection results to raise awareness, identify opportunities for improvement, and build consensus on improvements.

10) Resolve changes and conflicts and provide directional guidance.

1. Data quality system

The content of the data quality system should include :

1) The purpose, scope and applicability of the system.

2) Definition of terms.

3) Responsibilities of the data quality team.

4) Responsibilities of other stakeholders.

5) Report.

6) The implementation of the strategy, including the associated risks, preventive measures, compliance, data protection and data security, etc.

2. Metrics

Much of the work of the data quality team will focus on quality measurement and reporting.

High-level indicators of data quality include:

1) Return on investment. A statement about the cost of improving the effort versus the benefits of improving data quality.

2) Quality level. Wrong amount and ratio.

3) Data quality trends.

4) Data problem management indicators. 1 Problem classification and count; 2 Problem status (resolved, open, escalated); 3 Sort problems by priority and severity; 4 Time to resolve problems.

5) Consistency in service levels.

6) Schematic diagram of data quality plan. Current status and expansion roadmap.

e350c7070a236ab59455dd471ad879f1.png

To be continued~

    I also organized a CDMP self-study exchange group here, only for students who want to learn data governance and students who intend to take the CDMP certification exam .

    (Because more than 200 people cannot enter directly, if you need to enter, please add my WeChat invitation to enter and  note CDMP )

981b76c5e084c463718c0569883ec3bf.jpeg02a7cac1ac9e4f5d3265b9cd79f5c06d.jpeg

Recommendation of Popular Articles on Big Data Flow

    From a port coal worker to a state-owned enterprise big data leader: How did the once Internet-addicted teenager do it?

    Big Data Data Governance | WeChat Exchange Group~

    5000 words explain how to get started with data governance (with international data governance certification exam-CDMP study group)

    What exactly is CDMP - a super-comprehensive introduction to the international certification of data governance

    Open Source Data Quality Solutions - Apache Griffin Getting Started

    One-stop Metadata Governance Platform - Datahub Getting Started

    Pre-research on data quality management tools - Griffin VS Deequ VS Great expectations VS Qualitis

    Thousand-character long text - Datahub offline installation manual

    Metadata Management Platform Datahub2022 Annual Review

Big data flow: big data, real-time computing, data governance, and data visualization practice self-media. Regularly publish data governance and metadata management implementation technology practice articles, and share relevant technologies and materials for data governance implementation implementation.

Provide learning exchange groups such as big data introduction, data governance, Superset, Atlas, Datahub, etc.

Big data flows, and the learning of big data technology will never stop.

I'm Dugufeng, if you like my article, I hope you can forward it, like it, watch it and support me, see you in the next article!

Guess you like

Origin blog.csdn.net/xiangwang2206/article/details/130002724