Too comprehensive, reference for the construction of 8 core modules of data governance

Data governance is a system engineering of decentralization and multi-participation. A comprehensive and clear data governance system can help organizations build ecological and collaborative governance paths, maximize overall data quality, realize data strategies, and activate new productivity.

Based on the eight modules of metadata, data standards, master data, data exchange, data assets, data quality, data security, and data life cycle as the core context, this article sorts out a comprehensive data governance system guide, hoping to provide data governance for government and enterprises. refer to.

01

metadata

1. Definition

Metadata is data that describes information about data.

Metadata management refers to the activities concerned with ensuring that metadata is properly created, stored, and controlled so that data is defined consistently across the enterprise.

2. Type

Metadata is divided into business metadata, technical metadata and operational metadata.

The three are closely related. Business metadata guides technical metadata, technical metadata is designed with reference to business metadata, and operational metadata provides support for the management of both.

3. Five steps of metadata management

(1) Define metadata strategy: Enterprises need to start a metadata strategy plan, involve key stakeholders and departments, evaluate existing metadata resources and information architecture, conduct key interviews with key employees, and formulate reasonable strategic goals.

(2) Understanding metadata requirements: Metadata management solutions need specific functional requirements such as update frequency, synchronization, historical information, access rights, storage structure, inheritance requirements, operation and maintenance requirements, management requirements, quality requirements, and security requirements point satisfied.

(3) Define metadata architecture: Metadata architecture is usually divided into three types, including centralized, distributed, and hybrid. Different technical frameworks meet the needs of different scenarios, and enterprises choose according to their own conditions.

(4) Create and maintain metadata: the data system should sort out and integrate metadata within the enterprise, integrate technical metadata with business, process and management metadata, and make metadata processing standardized and unified for easy understanding and analysis.

(5) Query, report and analyze metadata: The metadata repository should have front-end applications and support query and acquisition functions, so as to meet the needs of various data asset management.

4. Metadata management application

(1) Data asset map: It is a panoramic map of enterprise data assets automatically generated by the metadata dictionary. It displays various metadata and data processing processes in a visual way to meet different business analysis needs.

(2) Metadata consanguinity: refers to the connection between different data. When we find a downstream wrong data, we can trace the source of the blood relationship, quickly find the upstream data source, understand the data processing process, and find the cause of the data error.

(3) Metadata impact analysis: It can tell us where the data goes, what processing and processing, which applications, databases, or departments use this data. When there is a problem with the data, you can quickly understand the transmission chain of the wrong data, and quickly solve the wrong results caused by the wrong data.

02

main data

1. Definition

Master data refers to the basic information of the organization that meets the needs of cross-departmental business collaboration and reflects the state attributes of core business entities.

Master data management refers to a set of specifications, technologies and solutions for generating and maintaining enterprise master data to ensure the integrity, consistency and accuracy of master data.

2. Master data project management implementation framework

Through the four steps of status analysis and evaluation, planning and management system, construction implementation plan, and platform deployment, the implementation of master data project management is carried out.

3. Ten important links in the realization stage of the master data project

(1) Master data standardization system

Taking materials as an example, a complete material standardization system mainly includes two parts: the formulation of material data standards and the construction of basic capabilities related to material data standardization management.

(2) Classification Design Principles

Four classification design principles: no repetition and no omission; reasonable granularity; meeting business needs; conforming to industry habits

(3) Code design

Coding design must abide by the principles of globality, uniqueness, moderation, flexibility, and scalability. Each encoding method has advantages and disadvantages.

(4) Sorting out attribute standards: It can be sorted out from three levels: business standards, technical standards, and management standards.

(5) Management and control process design: process review and verification are carried out during the construction of the business system.

(6) Historical data integration and cleaning: It is divided into 6 steps: data access, preliminary marking, classification and cleaning, first division and then combination, sorting and cleaning, and inspection and feedback.

(7) Data switching strategy: The following are the advantages and disadvantages of the three data switching strategies.

(8) Data production and maintenance strategies: There are two types: centralized and distributed.

(9) Master data distribution strategy: There are three ways to distribute master data.

(10) Example of master data integration

03

data standard

1. Definition

Data standards refer to normative constraints that ensure the consistency and accuracy of internal and external use and exchange of data. Data standard management is a system composed of management systems, control processes, and technical tools. Through the promotion of this system, the standardization of data is achieved by applying unified data definitions, data classifications, record formats, conversions, and coding.

2. Classification

(1) Business standard specification: generally includes the definition of business, the name of the standard, the classification of the standard, etc.

(2) Technical standard specification: It is to look at data standards from a technical point of view, including data type, length, format, encoding rules, etc.

(3) Management standard specification: For example, who is the manager of the data standard, how to add, how to delete, access standard conditions, etc., all belong to the data specification requirements from the management perspective.

3. Implementation steps of data standard management

      

Data Standard Implementation Flowchart

(1) Formulate goals and define scope: Organizations first need to formulate data standard goals, clarify strategic direction, and then formulate data standards according to the company's own management and business development needs.

(2) Data standard research: research and summarize the data standard management of the entire organization. By investigating the status quo of enterprise data standards, find out which systems have serious data standard problems and which fields do not meet the standards, so as to provide support and guidance for the subsequent implementation of data standards.

(3) Clarify the organization and process: by determining the data standard management roles of the data governance control committee, data standard management post, data standard management specialist, IT project team, etc., and formulating standard change, implementation, and management processes, to ensure the advancement of data standard projects implement.

(3) Compilation and release of data standards: Through the collection of national standards and industry standard requirements, combined with the company's own management and business requirements, and after coordination and communication between business, technology and management departments, the initial version of the data standard management document was formulated. After the data standard review, the final version of the data standard will be released.

(4) Data standard publicity: Organize data standard publicity meetings internally to increase the importance of data standard management by internal personnel in the enterprise, improve the proficiency of users, and enable data standards to be implemented better and faster, thereby giving full play to their value.

(5) Data standard platform landing operation: enter the formulated data standard into the corresponding data standard platform system, check the effect through the dimensions of management, technology, and business, make appropriate modifications to meet most requirements, and put it into use in the actual scene. In addition, the data standards need to be regularly evaluated and continuously improved to achieve the purpose of being more suitable for enterprise management and operation.

04

data quality

1. Definition

Data quality means that in a business environment, data meets the purpose of data consumers, and data quality needs to meet the specific needs of business scenarios. Data quality includes two aspects: the quality of the data itself and the process quality of the data.

Data quality management is to carry out a series of management activities such as identification, measurement, monitoring, and early warning of various data quality problems that may arise in each stage of the data life cycle from planning, acquisition, sharing, maintenance, application, and extinction, and through Improving and improving the management level of the organization further improves the data quality.

2. Four common data quality problems

(1) Missing data: refers to some important data not being filled.

For missing data, enterprises can find unfilled data and related attributes through simple statistical analysis, and fill in possible values.

(2) Abnormal data: it refers to the fact that the data is very different from the usual business and management data, which affects the conclusions drawn from data analysis.

For abnormal data, it is necessary to use the previous data as a basis to determine the maximum and minimum values, and judge whether the data variables exceed the reasonable range. If the data is abnormal, the system will automatically alarm and remind.

(3) Data inconsistency: refers to the inconsistency of the same data distributed by multiple systems during data integration and aggregation.

For inconsistent data, the enterprise system can pay attention to the rules of data extraction, and identify, modify, and merge most of the same but inconsistent data.

(4) Data duplication or error: It refers to duplication of statistics in some data and incorrect data filling.

For duplicate data, enterprises can set filter restrictions in the system to remove duplicate data.

3. Six dimensions of data quality evaluation

The National Information Technology Standardization Technical Committee proposed the data quality evaluation index (GB/T36344-2018 ICS 35.24.01), which includes the following aspects, namely completeness, consistency, accuracy, timeliness, uniqueness and accessibility sex.

4. Seven steps of data quality management

(1) Define high-quality data

Align goals and priorities for data quality improvement through a comprehensive understanding of relevant pain points, risks, and business drivers, as well as the business process system landscape, technical structure, and data dependencies.

(2) Define the data quality strategy

Data quality priorities must align with business strategy, and defining a data quality framework can help guide strategy and data quality management activities.

(3) Identify key business and quality rules

Data importance can be prioritized based on factors such as regulatory requirements, financial value, and immediate impact to customers. After identifying key data, identify business rules that sort out the requirements for data quality characteristics.

(4) Perform an initial data quality assessment

After identifying key business needs and data, understand the data by performing an initial data quality assessment, define an actionable improvement plan, identify issues and priorities through the assessment results, and serve as the basis for data quality planning.

(5) Identify improvement directions and determine priorities

After an initial data quality assessment, potential improvement measures are identified and prioritized, either through comprehensive data analysis of large data sets to understand the breadth of the problem, or to communicate with stakeholders to analyze the business impact of the problem, Final discussion to determine priorities.

(6) Define data quality improvement goals

Set specific, achievable goals based on quantifying the business value of data quality improvements.

(7) Develop and deploy data quality operations

In order to ensure data quality, develop an implementation plan around the data quality plan, manage data quality rules and standards, monitor the implementation consistency of data and rules, identify and manage data quality problems, and report quality levels.

05

data assets

1. Definition

Data assets are data resources that can generate value for the organization. The formation of data assets requires active management and effective control of data resources. Data asset management refers to the set of activities and functions that plan, control and provide data assets, including the development, implementation and monitoring of plans, policies, programs, projects, processes, methods and procedures related to data, so as to control, protect, deliver and Increase the value of your data assets.

2. Data asset inventory

(1) Sort from top to bottom

From a business perspective, through a comprehensive analysis of the relevant system documents, intelligent systems, business processes, business documents, etc. of the enterprise, it is decomposed layer by layer, and the three-level catalog, business attributes, and related management attributes of data assets are sorted out.

(2) Bottom-up inventory

From a technical perspective, starting from the IT system-database table-data structure, conduct bottom-up induction, and gradually clarify the technical attributes related to data assets.

Through the top-down and bottom-up inventory methods, the mapping relationship between the business perspective and the technical perspective is established, so that a complete data resource catalog is formed.

3. Data asset catalog

Through the data asset catalog, a series of problems such as where the data is, who is responsible for the data, and how the data is used can be solved. A practical and friendly data asset catalog can open up the link of query/retrieval, connect basic data and index data, and better support data exploration and association recommendation through advanced technologies such as artificial intelligence and machine learning.

4. Four steps of data asset management

(1) Overall planning: The first stage of data asset management implementation is overall planning, including three steps: assessing management capabilities, releasing data strategies, and establishing a corporate responsibility system, laying the foundation for the subsequent anchoring direction and operation of data asset management and operations.

(2) Management implementation: The goal of the second stage is to comprehensively carry out various activities of data asset management by establishing a rule system for data asset management, relying on data asset management platform tools, and taking the data life cycle as the main line, so as to promote the first stage. The results landed. The implementation of the second stage of management mainly includes four steps: establishing a standardized system, building a management platform, managing the entire process, and innovating data applications.

(3) Audit and inspection: The audit and inspection stage is an important part of ensuring the effective implementation of various management functions involved in the implementation stage of data asset management. This stage includes specific tasks such as checking the implementation of data standards, auditing data quality, and supervising the data life cycle.

(4) Asset operation: Through the first three stages, enterprises have been able to establish basic data asset management capabilities. The ability to deliver value from data. The asset operation stage is the final stage in which data asset management realizes value. This stage includes data asset value evaluation, data asset operation and circulation, etc.

06

data exchange

1. Definition

Data exchange and sharing is to enable users who use different computers and different software in different places to read other people's data and perform various operations and analysis.

2. Data exchange and sharing methods

(1) Electronic or digital file transfer

Data can be exchanged by electronic or digital file transfer, by transferring files (data) between two systems through a file transfer (communication) protocol. Organizations need to consider the security risks associated with using different file transfer protocols; file transfer protocols include FTPS, HTTPS, and SCP.

(2) Portable storage device

In some cases, it may be necessary to exchange data using a portable storage device, such as a removable disk (Digital Video Disc (DVD) or Universal Serial Bus (USB), etc.). Organizations need to consider the impact level of the data being transferred and the system to which the data will be transferred to determine whether adequate measures are being taken for the data being exchanged.

(3) Email

Organizations often share data as attachments via email. Organizations need to consider the impact level and security controls already in place for participating organizations' email infrastructure to determine whether adequate controls are in place to protect the data being exchanged, e.g. insufficiently protected email infrastructure at medium impact level to protect high-impact data.

(4) Database

Database sharing or database transaction information exchange, including access to data by users from another organization. What organizations need to consider is the feasibility of providing data access rather than transferring data to reduce the risk of duplicate data sets and loss of data confidentiality and integrity.

(5) File sharing service

File sharing services include, but are not limited to, sharing and accessing data through web-based file sharing or storage (such as Drop Box, Google Drive, MS Teams or MS One Drive). With a web-based file sharing or storage system, the system does not provide data owners with visibility into where servers are located, or physical and logical access to facilities, servers, and data.

3. Five principles of data exchange and sharing

(1) Consistency principle: Before providing data sharing services, the source unit of each data must be determined, and the source unit is responsible for the accuracy and consistency of the data. Reduce data "moving", thereby reducing data inconsistency caused by secondary transmission downstream.

(2) Black-box principle: Data users do not need to pay attention to technical details to meet the needs of different types of data sharing services.

(3) The principle of agile response: once the data sharing service is built, it is not necessary to repeatedly build integration channels by data users, but to quickly obtain data by "subscribing" to the data sharing service.

(4) The principle of self-service use: providers of data sharing services do not need to care about how data users "consume" data, avoiding the problem that the continuous development of suppliers cannot meet the flexible and changeable data usage demands of data users.

(5) The principle of traceability: the use of all data sharing services can be managed, and data suppliers can accurately and timely understand "who" uses their own data to ensure the reasonable use of data.

07

Data Security

1. Definition

Data security refers to taking necessary measures to ensure that data is in a state of effective protection and legal use, as well as the ability to ensure a continuous security state.

Data security governance is based on multiple factors such as data security compliance requirements, user business development needs, and risk tolerance, and relies on data security management and technical capabilities to achieve a security construction mechanism that integrates business and security development.

2. Data security management capabilities

(1) Organizational governance

The data security governance organization can adopt a five-layer organizational structure, namely, decision-making layer, management layer, execution layer, supervision layer and participation layer.

(2) Institutional Governance

The data security system is mainly constructed from four levels.

3. Data security technology capabilities

Data security technical capability governance is mainly about the construction of technical measures, taking corresponding security protection measures around each stage of the data life cycle, including intelligent identification, classification and classification, database audit, encrypted transmission, data leakage prevention, data desensitization, data Watermarking, user behavior analysis, knowledge graph, etc.

4. Data security operation capability

By building a data security hidden danger discovery and disposal mechanism, a data security risk assessment mechanism, a data security emergency response mechanism, and a data security monitoring and auditing mechanism, a long-term security operation system with standardized, process-oriented, and intelligent operations is formed.

08

data life cycle

1. Definition

The life cycle of data refers to the process of a collection of data from generation or acquisition to destruction. The data life cycle is divided into several stages: collection, storage, integration, presentation and use, analysis and application, archiving and destruction.

Data lifecycle management is a policy-based approach to managing the flow of data in an information system throughout its lifecycle: from data creation and initial storage until it is deleted or destroyed when it becomes obsolete.

2. Common data lifecycle management models

The data life cycle management model defines a macro framework, which is a panoramic view of data life from the production stage to the extinction stage. In the field of data management, many researchers in academia and business circles have proposed different data lifecycle management models, as shown in the figure below.

3. Four phases of data life cycle management

(1) "Entry" period

This stage does not only refer to the creation and reception of data, effective data asset management should start before the generation of data. First of all, planning and planning should be done, including data asset inventory, data governance plan, data demand plan, etc.; then define data standards, formulate data management specifications, ensure that data is generated according to standards, and start from the source.

(2) "Deposit" period

It is necessary to store and process structured, semi-structured and unstructured diverse data structures, batch data and streaming data. In the face of factors such as different data structures, data forms, timeliness, performance requirements, and storage and computing costs, appropriate storage forms and computing engines should be used.

(3) "use" period

Data appreciates due to use, and this stage is the cycle in which data really generates value. During the period of "use", special emphasis should be placed on "data reuse", which is very important for saving costs and improving efficiency. In the future, an important indicator for enterprises or organizations to evaluate whether a data product is worth developing should be whether it can be reused.

(4) "out" period

The "out" phase is to save the data whose life cycle is coming to an end to low-performance and cheap storage media or directly destroy it, which is an essential step in data life cycle management. For data destruction, enterprises should have a strict management system, establish an approval process for data destruction, and create a strict data destruction checklist. Only data that has passed the checklist and been approved by the process can be destroyed.

09

One platform covers the whole system

Data governance is a systematic, large-scale, and long-term project. It is a comprehensive solution to data problems. Choosing an appropriate data governance platform can make data governance more effective.

Ruizhi Intelligent Data Governance Platform is a groundbreaking, one-stop comprehensive data governance overall solution independently developed by Yixin Huachen. It not only covers metadata management, data standard management, and data quality in the above-mentioned data governance system Management, master data management, data exchange, data asset management, data security management, and data lifecycle management are core modules, which also include excellent data integration management and real-time computing and storage functions, helping to realize data centralized management, distributed storage, real-time Decision support, open up all aspects of data governance.

The ten functional modules can be used independently or freely combined to quickly meet various data governance demand scenarios of government and enterprises, help data standards to be implemented, improve data quality, and realize data capitalization.

Guess you like

Origin blog.csdn.net/esensoft123/article/details/130105593