Reading Notes - The Art of Data Governance

       Following the continuation of the previous article , learning and thinking about the art of data governance, this part of the content is the core details of the overall content of the book, the content is more complicated, and it took a long time to summarize and sort it out, so the update is slower . The art of data governance is the technology or method at the operational level. Data governance-related technologies mainly have seven capabilities, including data combing and modeling, metadata management, data standard management, master data management, data quality management, data security governance and Data integration and sharing have formed seven core technical capabilities or methods of data governance, providing a technical foundation for the digital transformation of enterprises and the development of data governance.

       These seven chapters of this book explain from the specific measures and technical perspectives of data governance work that any technology or method serves the goal of data governance. The goals and needs of enterprise data governance are different, and the technologies and methods adopted are also different. There will be some emphasis, but the commonly used techniques and methods are mainly the following seven items. In the digital age, data is the core element to enhance the competitiveness of enterprises. The development of enterprises requires comprehensive and effective management and utilization of data to form corresponding rules and models. The core data is the basis for enterprises to develop business applications, data analysis, and system integration, the basis for data interconnection, data analysis and mining, and the cornerstone for the success of an enterprise's digital transformation; reliable data quality is the premise of big data analysis and mining , the level of data quality directly affects the effectiveness of informatization construction, business collaboration, management innovation, and decision support; The work has gradually changed from disorder to order, from the rule of man to the rule of law; finally, in order to solve the urgent business needs and application integration needs of enterprises in the process of digital transformation, it is necessary to solve problems between systems, departments, and enterprises through data integration and sharing. Effective collaboration and management of the data movement process between It should be said that through the cultivation and consolidation of these seven technical capabilities, enterprises have corresponding technical guarantees and technical capabilities that can be implemented on the road to digital transformation, and provide technical support for the achievement of the overall goal of subsequent data governance for enterprises. Ultimately realize the data-driven business collaboration, management innovation and decision support of digital enterprises.

1. Data combing and modeling

Data asset sorting is the beginning of enterprise data governance, and it is the basic work. It is equivalent to finding out what data the company has, where it is, how the management status is, etc. There are two sorting methods: top-down and bottom-up. The former is relatively Comprehensive system, but the cycle is relatively long and the cost is high, including data domains, themes, entities and design models. The latter has strong purpose and quick results, but is relatively partial, including demand analysis, presentation, analysis logic and data modeling. At the same time, it is clear that there is no data management without a data model.

What is a data model? It is a set of data specifications and related diagrams reflecting data requirements and design, including three elements: data structure, data operation and data constraints. At the same time, according to different application levels, the models are divided into conceptual, logical and physical models. In fact, this is the model that we gradually formed at the beginning of information system construction, after research and analysis of requirements, or in the process of discussing business requirements with users. The purpose and audience of the three types of models are different. The conceptual model is mainly the entity, attribute and relationship at the business concept level, which is the business audience; the logical model is a complete model of business requirements, generally follows the three paradigms of the database, and is a bridge for business technology communication; The physical model includes indexes, primary and foreign key relationships, etc. on the basis of the logical model, and at the same time, issues such as storage and operating performance must be considered. The data model is the basis for the development of subsequent technologies. A well-designed model has better quality, lower cost, clearer scope, faster performance, fewer data errors, and a good start for data quality.

Data modeling methods mainly include dimensional modeling and ER modeling, and ER modeling technologies mainly include UML.

The data model is a communication tool for enterprises to reach a consensus. Through model-driven and shared data models, it provides a full range of data perspectives. Improve team collaboration efficiency, eliminate information islands, improve business processes, prevent project construction risks, accelerate data governance work, and support blood relationship analysis and impact analysis of data.

2. Metadata management

Metadata is data that describes data. Please refer to another article for details. Metadata management is to know what data an enterprise owns, where the data is, who is responsible, what the specific value represents, what is the life cycle, and what security and privacy needs to be protected , How is the data quality, who uses the data, what business purpose is it used for, etc., the common 5W1H model (who, what, when, where, why, how).

There are generally three types of metadata: business, technical, and operational metadata.

The meaning of metadata is to describe, locate, retrieve, manage, evaluate and interact with data objects.

Metadata management can describe what works from a technical, business and management perspective. Three goals need to be achieved: establishing an indicator interpretation system, improving data traceability and a data quality audit system.

There are four stages of metadata management: distributed bridging, central repository, metadata warehouse and intelligent management stage, most of which are in the stage of central repository and metadata warehouse.

There are four main purposes of metadata management: establish an enterprise data asset catalog, eliminate redundancy, strengthen data reuse, reduce the risk of knowledge loss caused by personnel turnover, improve data lineage detection capabilities, and improve data analysis quality.

The design and construction of the metadata management system includes organizational guarantee, system guarantee, process guarantee, technical tools, operation and maintenance, monitoring management, statistical analysis and publicity and promotion, etc.

Metadata management technologies include collection, management, application and interface.

Metadata applications mainly include data asset maps, data lineage analysis, impact analysis, heat and coldness analysis, and correlation analysis, etc. See the figure below for details.

3. Data standard management

Data standards are equivalent to rules, the process of establishing rules and regulations for the enterprise digital environment. Data standards are an abstraction of a series of normative constraints, and normative constraints that ensure the consistency and accuracy of internal and external use and exchange of data. Data standardization realizes the definition and standardization of enterprises' unified understanding of data. The construction of data standard system must not only satisfy the current situation, but also focus on the future integration with international and domestic standards, and consider the forward-looking nature of the standard. It generally includes the definition of business terms, combination of data elements, policy procedures of business rules and agreements, description framework of data forms, common language, and single data set shared by data integration.

The role of data standards

1. Strengthen the consensus reached by various departments and improve communication efficiency

2. The basis for system data integration and sharing

3. Promote the formation of an enterprise-level single data view and support the development of data management capabilities

4. Unify and standardize, eliminate data barriers, and support the standardization of business processes

5. Improve data quality and support the issuance of quality inspection reports

6. It is conducive to standardizing the management of enterprise data assets

The significance of data standard management

1. In terms of business, improve business standardization and business efficiency, and reduce communication costs caused by data inconsistency

2. In terms of technology, promote data sharing and integration, improve system implementation efficiency, and improve data quality

3. In terms of management, data-driven management, accurate data analysis and self-service analysis of business personnel

Data standards generally include four aspects: data model standards, basic standards, master data and reference data standards, and index data standards. Among them, index data includes business attributes, technical attributes, and management attributes, including business indicators of each business domain or unit. As the basis for self-service analytics.

The data standard management system includes organization, process and management methods. See the picture below for details

Four best practices of data standard management: business-oriented (value chain-oriented), step-by-step (step-by-step implementation), dynamic management and application-oriented (serving the business and improving business efficiency).

4. Master data management

Master data is the golden data in enterprise data, the basic data with sharing, and the most core data of the enterprise. Please refer to another article for details . Master data is the foundation of business applications, data analysis, and system integration, the foundation of data interconnection, the foundation of data analysis and mining, and the cornerstone of the success of an enterprise's digital transformation.

Master data has three vertical characteristics (high value, high sharing, and relative stability), and four horizontal characteristics (cross-department, cross-business, cross-system, and cross-technology)

Master data management is a solution that integrates methods, standards, processes, systems, technologies and tools

The meaning of master data management

1. Break down silos and improve data quality

2. Unify cognition and improve business efficiency

3. Centralized management and control to improve management efficiency

4. Data-driven, improve decision-making level

Master data management method, see the figure below

Master data management techniques are classification, coding and integration.

Master data classification methods include wired classification, surface classification and mixed classification; coding includes meaningful coding and non-meaningful coding; integration includes web service-based data synchronization, ETL-based data synchronization, integration with consumer systems and integrated joint debugging process, etc. See the picture below for details

7 Best Practices for Master Data Management

1. Big goals, small steps. The master plan is implemented step by step, and data governance is a marathon.

2. Business-driven, technology-driven twin engines. Driven by business needs, source business, and service business. Master data + big data + cloud computing + artificial intelligence + microservices improve the quality of big data analysis, open up cloud data fusion channels, enhance data management, and loosely couple to facilitate front-end business innovation.

3. Emphasis on master data coding design. Appropriate coding is a matter of refinement and granularity of management

4. Data cleaning is a chore. Ideological and cultural construction, management policy inclination, combination of manual + automation application

5. How to implement master data standards smoothly. Simple and rough, breakpoint switching, smooth transition mode.

6. Enterprise small data is integrated with social big data. Invoke social data services, improve data intelligence services, and data analysis services based on knowledge graphs

7. The operation of master data is ordinary but not simple. Obscurity. As the most important data asset of an enterprise, master data is highly valued; the quality of master data directly affects business operation efficiency and management decision-making level.

5. Data Quality Management

Reliable data quality is the premise of big data analysis and mining. The level of data quality directly affects the effectiveness of informatization construction, business collaboration, management innovation, and decision support. Data quality refers to the characteristics related to data and also refers to the process used to measure or improve the quality of data. Data quality management is an ongoing process, and the best time to do it is now.

The DIKW pyramid model is shown in the figure below

Consequences of poor data quality include financial and reputational damage, increased tangible costs (communication, operations and economics) and intangible costs (costs of wrong decisions), misleading or potential operational risks.

Data quality dimensions are used to measure or evaluate the aspects of data quality, generally including consistency, completeness, uniqueness, accuracy, authenticity, timeliness and relevance. At the same time, data quality measurements must be purposeful, repeatable, and interpretable.

Data quality management refers to the management of data quality problems that may arise from each stage of the data life cycle from planning, acquisition, storage, sharing, maintenance, application and extinction, for management activities such as identification, measurement, monitoring and early warning, and ultimately The goal is to enhance the value of data use and win economic benefits through reliable data.

The root cause analysis of data problems is equivalent to the most basic cause of data quality problems. Generally, the root causes include environmental conditions, human factors, system behavior or process factors, etc. The phases that cause problems include planning and design, creation, use, aging, and extinction.

The reasons for the problems are management, business application and technical operation

There are four steps in the problem analysis method, as shown in the figure below

Problem analysis tools include fishbone diagram, 5why diagram, fault tree diagram and retort diagram, etc., see the following three diagrams for details.

Data Quality Management System Framework

1. ISO9001 (PDCA), the core is customer-centric, emphasizing leadership, process approach, continuous improvement, circular decision-making and relationship management. See the figure below for details.

2. Data quality management (DMAIC) based on 6 Sigma, see the figure below for details.

3. Data Quality Assessment Framework (DQAF), see the figure below for details.

The data quality management strategy and technology adheres to pre-control as the core and meets business needs as the goal, and acts on the three stages of management before, during and after the event. See below.

Outlier processing methods for data quality are based on statistics, distance, density, and clustering.

6. Data Security Governance

In the digital age, data is a factor of production, an important asset of an enterprise, and the lifeblood of enterprise development, but it also brings challenges to data security, which is a double-edged sword. Data security is a quality attribute of data that focuses on the confidentiality, integrity, and availability of data. Data security mainly includes management (security governance, management system construction, system operation and maintenance management) and technical aspects (OS, app and db). Data security risks mainly come from targeted external personnel, third parties, malicious internal personnel and mistakes internal personnel, etc.

Data security governance refers to the various strategies, technologies and activities adopted to ensure the availability, integrity and confidentiality of data, including corporate strategy, culture, organizational construction, business processes, rules and regulations, technical tools, etc. to improve the response to data security risks capabilities, mainly to control and reduce risks. 

The relationship between data governance and data security governance is shown in the table below

Data security governance is a branch of data governance, which is equivalent to special governance, formulating corresponding strategies, forming order, and making data security change from disorder to order, from the rule of man to the rule of law. include the following

1. Data security governance system, see the figure below.

2. Governance goals are data-centric, not system-centric, so that data can be used safely and have the ability to be visible, controllable, and manageable. See below.

3. Governance organization and accountability strategy, see the figure below.

4. Governance system, including data, system and personnel levels, see the figure below.

5. Governance training, awareness training and skills training

6. Operation and maintenance system, regular audit strategy (compliance check, user behavior audit), data backup strategy, dynamic protection strategy (protection, detection and response)

The basic idea of ​​data security governance technology is to isolate and draw a clear security boundary, as follows:

1. Data combing and sensitive data identification is equivalent to sorting out what is in the enterprise's data asset catalog and which is sensitive data.

2. Data classification and grading strategy, classification is the process of better management and use of data, and grading is the level of confidentiality (sensitive, general and confidential)

3. Identity authentication, including authentication architecture, single sign-on, authentication mode and password management strategy

4. Authorization, system authorization, and user authorization model are shown in the figure below, using the access control matrix and the principle of least authorization.

5. Access control, common access control strategies include user, role, attribute, ACL, IP

6. Security audit to quickly discover potential risk behaviors

7. Asset protection, forming an asset protection strategy for the entire data life cycle, see the figure below.

8. Data desensitization is equivalent to data bleaching and mosaicing

9. Data encryption technology, symmetric, asymmetric, data certificate, signature, watermark (for tracking and tracing)

Data Security Policies and Regulations

1. GDPR: The European Union's data security laws and regulations, the General Data Protection Regulation. The strictest personal privacy protection law in history.

2. CCPA: California Consumer Privacy Act

3. Data Security Law: Data security rises to the national level, coordinating the security and development of data elements.

7. Data integration and sharing
Data integration and sharing is to solve the effective coordination and management of the data movement process between systems, departments, and enterprises. The driving force behind it is the urgent business needs and integrated application needs of enterprises. The data integration that everyone generally talks about is mainly application integration, as follows
1. Portal integration: there are identity authentication, single sign-on, interface integration, to-do integration, key indicator integration and content management, the last two mainly involve interface integration and data integrated.  
2. Service integration: There is SOA and microservice architecture integration, which is a technology for realizing process integration and data integration.
3. Process integration: There are cross-system automatic business processes, automatic and manual collaborative processes, and purely manual processes. Process integration is based on service integration and data integration. The process integration module arranges the services in each application for process integration.
4. Data integration: There are data replication, data federation (VDB) and interface integration (interface application packages and adapters). At present, 80% of application integration is data integration.
Architecture evolution of data integration
1. Point-to-point architecture, the number of paths = number of connected objects * (number of connected objects - 1) / 2, the disadvantages are confusion, inability to centralize control and tight coupling. See below.
2. Electronic Data Interchange (EDI) architecture, see the figure below.
3. SOA architecture, ESB is the foundation, with service management center, intermediary platform, conversion and decoupling platform, service orchestration and reorganization platform, and there is a performance bottleneck in the bus. See below.
4. The micro-service architecture maximizes the componentization and service of business requirements, as shown in the figure below, including interfaces, data, interfaces, and external integration. See below.

Application integration mainly involves data integration. At present, there are 4 typical applications of data integration
1. Based on middleware exchange and sharing mode, see the figure below.
2. Master data application integration mode forms a unified view centered on master data, see the figure below for details.
3. Data warehouse application mode, theme-oriented, integrated, relatively stable data integration that reflects historical changes, see the figure below for details.
4. The application mode of the data lake, emphasizing the ease of data entry into the lake and the original format of the data, the supported data formats are multi-source (structured, unstructured, semi-structured), supports batch processing, stream processing and real-time processing, after entering the lake Do processing and governance. See the figure below for details.

The steps and methods of data integration are shown in the figure below.
1. Demand analysis for data integration: demand research, demand summary, demand analysis, and demand
confirmation , interface, file exchange, etc.), write and confirm the integration plan
3. Interface development and joint debugging: data integration development, testing and joint debugging, including the integrity of records and attributes
4. Deployment operation and evaluation: interface deployment and joint debugging Operation and data integration evaluation (data integration degree, business collaboration situation and decision-making analysis ability), through business collaboration to open up the process, realize the end-to-end management of the business, improve the efficiency of business processing, and achieve the complete precipitation of data.

Guess you like

Origin blog.csdn.net/hhue2007/article/details/131363283