What is data governance? How to get started?

Hello everyone, I am Dugufeng, a former port coal worker, currently working as the person in charge of big data in a state-owned enterprise, and the manager of the big data flow of the official account.

In the last two years, because of the needs of the company and the development trend of big data, I began to learn about data governance.

With the receding of the Internet boom, the Internet began to shift from the consumer Internet to the industrial Internet. This also allows big data to begin to play a role in traditional enterprises. At present, there are more and more positions related to data governance. And with a certain foundation of big data technology and data analysis foundation, it is easier to work in related positions of data governance, and the salary will also be improved a lot. And data-related practitioners are also the main source of data governance practitioners, because there are currently no direct graduates who are engaged in data governance work, and they are all transferred through learning.

This is the job requirement of a large factory for data governance. You can take a brief look at it. Moreover, the current requirements of many companies for data architects also include related capability requirements for data governance. Data architect has always been one of the high-paying positions in the future.

e4b26a2b4f7822febd9520fd1a5b0e52.png

What is data governance?

In today's digital age, data plays a vital role, so data governance is becoming more and more important. Data governance can be understood as a set of norms and processes for managing and maintaining data assets within an organization.

The purpose of data governance is to ensure the accuracy, integrity, consistency and reliability of data. It covers all aspects of data collection, storage, processing, sharing and use. Through data governance, organizations can standardize the definition, naming, and classification of data to ensure data standardization and consistency. In addition, data governance also pays attention to the quality of data, including data accuracy, integrity and reliability, and ensures the high quality of data through measures such as data cleaning and verification. At the same time, data governance also involves data security and privacy protection, ensuring data confidentiality and compliance, and preventing data leakage and abuse.

Data governance is important in several ways. First, data is an important asset for organizations and is critical to decision making and business operations. Good data governance can ensure the accuracy and consistency of data, improving the reliability and accuracy of decision-making. Data-driven decision-making can help organizations better respond to market changes, optimize operations, and innovate. Second, as data size and complexity increase, data compliance and security become key issues. Data governance can help organizations ensure data compliance, comply with relevant regulations and industry standards, and reduce data leakage and risks. At the same time, data governance can also improve the data security of the organization and ensure the confidentiality and confidentiality of sensitive data.

Additionally, data governance can facilitate data sharing and collaboration. Within an organization, different departments and teams may use different data sources and definitions, leading to data inconsistencies and conflicts. Through data governance, shared data dictionaries and specifications can be established to promote data unification and collaboration. This helps cross-departmental communication and collaboration, avoids data silos and information silos, and improves organizational efficiency and innovation.

Data governance is a key practice for managing and maintaining data assets within an organization. It not only focuses on data accuracy, consistency and reliability, but also data security and compliance. Through good data governance, organizations can ensure the quality and reliability of data, support decision making and business operations, and improve efficiency and innovation. In today's data-driven era, the importance of data governance cannot be ignored, and it has great implications for an organization's success and competitive advantage.

for example

Let us take a multinational retail enterprise as an example to illustrate the concept of data governance.

Assume that the multinational retail business operates in multiple countries and has both online and brick-and-mortar stores. The enterprise collects a large amount of data, including sales data, customer data, inventory data and so on. In this context, data governance is a key practice to ensure consistency and reliability in the management and use of data.

First, data governance involves specifying data definitions and standards. In this example, data governance would clearly define different types of data, such as sales data, customer data, and product data. For example, sales data might include fields for order number, date, sales amount, and so on. These definitions and standards ensure a consistent understanding of data across different teams and systems, avoiding confusion and errors.

Second, data governance focuses on data quality and data cleaning. This means validating, validating, and cleaning data to ensure its accuracy and integrity. In this example, data governance can identify and correct erroneous sales records, weed out duplicate or incomplete customer data to improve data quality and avoid wrong decisions based on inaccurate data.

In addition, data governance also involves data security and privacy protection. For multinational retail enterprises, data governance needs to ensure the confidentiality and compliance of customer data. This may involve taking security measures to prevent data leakage and unauthorized access, while complying with applicable privacy regulations and laws.

In addition, data governance also involves the control of data access and sharing. In this example, data governance ensures that only authorized employees have access to certain types of data and sets access permissions and roles. In addition, data governance can also establish rules and processes for data sharing, so that data can be safely shared between different teams or departments, facilitating collaboration and decision-making.

Data governance plays a key role in this multinational retail enterprise. It ensures data consistency, accuracy and completeness, improving data quality and reliability. Data governance also ensures data security and privacy protection, compliance with relevant regulations and compliance requirements. Through data governance, this enterprise can better manage and utilize data assets to support decision making, optimize operations, and succeed in a highly competitive market.

How to get started?

Getting started with data governance is not easy, we need to do a lot of work, such as:

  1. Understand the basic concepts of data governance: Before starting to learn data governance, it is important to understand the definition, goals and basic principles of data governance. You can read related books, articles, or online resources to gain a basic understanding of data governance.

  2. Learn best practices in data governance: Study data governance best practices and industry standards to understand successful data governance frameworks and methodologies. Learn about the key components of data governance, such as data quality management, metadata management, security and privacy protection, and more.

  3. Assess your organization's current situation: Understand your organization's data management situation, and assess the status of existing data management processes, data quality, and security. Identify data governance pain points and opportunities to prioritize improvements.

  4. Develop a data governance strategy: Based on the organization's needs and goals, formulate a data governance strategy and plan suitable for the organization. This includes clarifying the objectives, scope, processes, and assignment of responsibilities for data governance.

  5. Establish a data governance team: Form a cross-functional data governance team, including business representatives, data management experts, and technical staff. Ensure the team has the skills and knowledge required for data governance and is accountable for driving the execution of the data governance program.

  6. Determine the data governance process: formulate data governance processes and specifications, including data collection, storage, cleaning, sharing, and security. Ensure data processes are aligned with data governance policies and best practices.

  7. Implement data quality management: establish a data quality management mechanism, including data quality assessment, data cleaning and correction, data monitoring and reporting, etc. Ensure data accuracy, consistency and integrity.

  8. Adopt metadata management: establish a metadata management system to record and manage information such as the definition, structure, relationship and use of data. Metadata management facilitates better understanding and utilization of data and supports data governance processes.

  9. Strengthen data security and privacy protection: formulate data security policies and measures to ensure data confidentiality, integrity and availability. At the same time, comply with relevant regulations and compliance requirements to protect user privacy.

  10. Continuous monitoring and improvement: Data governance is an ongoing process. Establish a monitoring mechanism, regularly evaluate the performance of data governance, and make improvements and optimizations based on the evaluation results.

It can be seen that there are many things to learn in data governance. Therefore, learning data governance should be in parallel with theory and practice .

Theoretically, internationally, mainstream data governance frameworks mainly include ISO data governance standards, DGI data governance framework, DAMA data management framework, etc. An understanding of the international mainstream data governance framework helps us establish a data governance system that meets the business needs of the enterprise itself.

DAMA (Data Management Association International) is a non-profit association of global data management and business professional volunteers dedicated to the research and practice of data management. The book "DAMA Data Management Body of Knowledge Guide" (DAMA-DMBOK for short) published by him is regarded as the "Bible of Data Management" by the industry, and the second edition has been published, namely DAMA-DMBOK2.

86f9fe74bd1ffef910ca185f8eb7d34b.png

Domestic data governance started relatively late in the country in terms of data governance framework and standard system research. At present, there are mainly two standards: GB/T 34960 and DCMM.

GB/T 36073—2018 "Data Management Capability Maturity Assessment Model" (DCMM) is a national standard compiled by the National Information Technology Standardization Technical Committee under the guidance of the National Standardization Management Committee. Released and implemented in the year.

DCMM analyzes and summarizes data management capabilities according to organization, system, process, and technology, and extracts eight process areas of organizational data management, namely data strategy, data governance, data architecture, data application, data security, data quality, and data Standard, data life cycle.

DCMM divides an organization's data capability maturity into five development levels: initial level, managed level, robust level, quantitative management level, and optimized level to help organizations evaluate the maturity of data management capabilities.

80239c8630d067346d5401e8b7ea7abc.png

At present, the most authoritative and down-to-earth system is the DAMA data management system , which is why you often hear DAMA-related vocabulary when learning data governance.

As the most authoritative data governance framework, we only need to master the relevant knowledge of DAMA and combine it with practice. It is no problem to get the most basic introduction to data governance. After several years of accumulation in the enterprise, you can also become a data governance expert.

theoretical study

In terms of theoretical study, it is recommended to take the CDMP International Data Governance Certification Examination. Having a certificate is really helpful to prove your professionalism in data governance-related fields.

In fact, there are many certifications on data governance now, and I have shared some of them before. For example, a data governance certification, a certain data manager certification, and so on.

Since the Ministry of Industry and Information Technology of our country has not yet issued a professional qualification certification for big data or data governance, which is similar to that of a registered professional engineer, the more authoritative data governance certification is still an international data governance certification, which is recognized both at home and abroad. .

DAMA Data Management Professional Certification CDMP

Please also remember the spelling CDMP, this is the international professional data governance certification.

There are four levels in total. Of course, most companies have no requirements for the level, and getting A level is a very high level.

The four levels are distinguished as follows:

347881225ea113aa018f0ba99bf7d46f.png

At present, recruiting companies have gradually increased the certification of CDMP, and there is a direct requirement for CDMP certificates.

The certificate looks like this:

27065b30530ff81db754b8ef905dcd55.png

It is also very important to learn relevant theoretical knowledge by taking the exam to promote your own learning while obtaining the certificate.

practical learning

How to carry out data governance starts from the top level and starts from the business side. But for novices, more attention should be paid to the actual work of data governance.

Metadata management is the starting point for data governance .a6f9b3a234a7c6ad5cf2dc9b7bf221dd.png

Simply put, metadata management is for the effective organization of data assets.

It uses metadata to help manage their data. It also helps data professionals collect, organize, access and enrich metadata to support data governance.

Thirty years ago, a data asset might be a table in an Oracle database. In the modern enterprise, however, we have a dizzying array of different types of data assets. It could be a table in a relational database or NoSQL store, real-time streaming data, a function in an AI system, a metric in a metrics platform, a dashboard in a data visualization tool. Modern metadata management should embrace all of these types of data assets and enable data workers to use them more efficiently to get their work done.

Therefore, metadata management should have the following functions:

  • Search and Discovery: Data Tables, Fields, Labels, Usage Information

  • Access Control: Access Control Groups, Users, Policies

  • Data Lineage: Pipeline Execution, Query

  • Compliance: Taxonomy of data privacy/compliance annotation types

  • Data management: data source configuration, ingestion configuration, retention configuration, data cleanup strategy

  • AI Interpretability, Reproducibility: Feature Definition, Model Definition, Training Run Execution, Problem Statement

  • Data manipulation: pipeline execution, processed data partitions, data statistics

  • Data quality: data quality rule definition, rule execution results, data statistics

The current mainstream metadata management platforms include Atlas, Datahub and so on. The following is a feature comparison.

79919a179cde133170c5e5b30e1ff021.png

Learning in this area should be based on practice, and more hands-on can be mastered more proficiently.

Of course, various open source frameworks for data governance are emerging one after another, and I have been keeping an eye on them.

For more data governance-related knowledge learning, please pay attention to the flow of big data, or you can scan the code and add the author's WeChat to discuss in detail~

Guess you like

Origin blog.csdn.net/xiangwang2206/article/details/131651112