Transformation and Innovation of Data Governance in the Cloud Native Era

With the deepening of the digitalization process, enterprises rely more and more on data, and the importance of data resources has become increasingly prominent. How to manage and use data well, do a good job in data governance, and give full play to the value of data resources has become an important issue in the process of improving quality and efficiency.

In this live broadcast, we introduced the data governance system, technical framework, and the advantages of the cloud-native data platform in the field of data governance. The following content is organized into drafts based on the live text.

Traditional data governance system and framework

In digital transformation, strategic drive is the foundation, data governance is the foundation, and data intelligence is the direction.

Data governance is the process of continuously changing data usage behavior from the perspectives of organization, management, and technology, throughout the entire data life cycle. The fundamental goal of data governance is to ensure data security and enhance data value.

At the same time, data governance is a system that focuses on the implementation level of the information system. It aims to integrate the knowledge and opinions of the information technology department and the business department. Iterate the value of data assets, empower business development, achieve business strategies, ensure data security, and reduce the risk of privacy leakage.

The data governance system involves multiple levels of organization, management, and technology, and is closely related to business departments. Usually, the data governance work of an enterprise is led by the business department and assisted by the IT department. 

Figure 1 Architecture Diagram of Data Governance System 

As shown in the figure above, the data governance system is divided into three levels, which are data governance objects, data governance tasks, and data governance support.

  • The objects of data governance are divided around themes. Different enterprises have different data themes, which are closely related to the enterprise's own organization and department settings, and will not be described here.
  • Data governance support includes the organizational structure of the enterprise, the role in the data governance process, the internal systems and processes based on data governance, and the IT technology architecture and platform.
  • Data governance tasks mainly include master data management, data standard management, data quality management, data asset management, data security management, data life cycle, etc. Among them, master data management, metadata management and data quality management are the key points.

master data management

Master data refers to data describing core business entities, such as customers, products, employees, accounts, etc., which have high business value and can be reused across various business departments, and exist in multiple heterogeneous application systems .

Master data management will integrate the most core and most shared data (ie, master data) from multiple business systems of the enterprise to centralize cleaning and enrichment, and distribute the master data to the operational and analytical types within the enterprise in the form of services. application.

The purpose of master data management is to ensure that master data remains consistent when data is used across systems and platforms.

Data Quality Management

The data quality management system includes the judgment of data quality and all activities and processes that guarantee and improve data quality, striving to achieve all-round data quality management.

Among them, data quality optimization and improvement is a data quality management activity that is oriented to stock data, analyzes and cleans data in batches according to business systems or topics, and improves the quality of existing data.

The management and control of data quality management first requires the establishment of relevant rules and regulations within the enterprise, and the designation of corresponding departments for data according to different subject areas to ensure the continuous improvement of data quality.

Figure 2 Attribution and traceability of data quality problems 

As can be seen from the fishbone diagram above, there are many reasons for data quality problems, among which personnel, processes, and business system front-end entry are primary data quality problems; business system back-end database design, data extraction, data loading, etc. are secondary Data quality issues.

Aiming at the problem of raw data quality, it is possible to reduce the probability of errors during front-end input by improving the automation and ease of use of input.

For secondary data quality problems, some buried point checks are added in the data transfer process to perform data comparison to avoid data errors.

metadata management

Metadata is used to describe the data of data, which is equivalent to the catalog of books. It is the explanatory information of data, so that data users can understand the characteristics, content, function, and acquisition methods of data, and whether the data can meet the requirements of use. Appropriate evaluations are required.

In the data governance system, metadata can be divided into four types: business metadata, technical metadata, operational metadata, and management metadata.

Metadata management is divided into four levels: collection, management, classification, and service, involving many links such as standard formulation, supplementary maintenance, management classification, blood relationship analysis, and query statistics.

For large enterprises, the scale of metadata is relatively large, and a lot of manpower and time are required to realize metadata management, and the project cycle is often in units of years.

In addition, enterprise data governance also involves data standards, data security, data life cycle, etc., which will not be repeated here.

 Data Governance Challenges

With the continuous emergence of new technologies and the rapid growth of data volume, the traditional data governance system faces the following challenges during implementation:

  • High metadata management costs: The implementation of business metadata requires the intervention and identification of business personnel. Each data platform must be entered, and the operational metadata of data transfer also needs to be entered, resulting in high recording costs.
  • Data quality is difficult to guarantee: In the process of data transfer between platforms, secondary data quality problems will occur, and a lot of buried point verification work is required.
  • The data standards are complicated: each data platform must check the data standards, and it is difficult to guarantee the consistency of data standards across platforms.
  • Complex data synchronization strategy: The master data platform must synchronize master data to multiple data platforms, which requires a complex synchronization strategy, otherwise it may face the problem of inconsistent master data versions.
  • Sensitive data is difficult to manage centrally: Sensitive data on each data platform needs to be regularly identified, and cross-platform data transfer requires encryption and decryption, making maintenance difficult.
  • Long data service response cycle: Data services have to be processed and etl through multiple data platforms, which lengthens the response time of data services.

Data Governance under Cloud Native Data Platform

At present, cloud computing has had a profound impact and change on the enterprise IT architecture. The cloud-native platform can greatly reduce the above-mentioned data management and governance burden and shorten the response cycle. Data governance based on the cloud-native platform came into being.

 Figure 3 Cloud Native Platform System and Data Governance 

As shown in the figure above, compared with the traditional data system, the cloud-native data platform system has the following characteristics:

  • One-stop agile data service: Through the one-stop data portal, quickly retrieve data assets, agilely develop new data services and products, quickly release online, interact with users and iteratively update, and accumulate the value of enterprise data assets in a rolling manner.
  • Cloud-native data platform: adopts a storage-computing separation architecture, natively supports OneData, ensures the consistency of enterprise core data, and reduces the complexity of data governance; according to the business form, realizes resource elastic scaling, dynamic scheduling, and high concurrency to meet various business scenarios flexible requirements. At the same time, it has self-healing ability and improves system availability.
  • Cloud-native big data support platform: Provides stable support for cloud-native data platforms, realizes storage-computing separation, flexible scheduling, better resource isolation and other capabilities, and also supports hybrid cloud and other heterogeneous environment deployments to ensure business continuity Flexibility helps companies achieve business goals quickly.

As new technologies continue to mature, application scenarios continue to increase, and business models continue to become more complex, the concept of global data governance is increasingly valued by enterprises. Global data covers internal and external data related to the enterprise, and is closely related to the business and commercial nature of the enterprise.

The "separation of storage and computing" of the cloud-native data platform realizes global data fusion within the enterprise, integrates scattered data platforms, completely eliminates the phenomenon of data islands, and enables centralized data security management to reduce security loopholes. Reduced secondary data quality issues; one-stop management of master, metadata, data standards, data architecture and models for global data, greatly reducing the complexity of data governance; at the same time, cloud-native data asset services also make business Be more agile, adapt to the rapidly changing market, and continuously iterate data assets to achieve digital transformation.

At present, HashData, as a leading cloud-native data platform in China, has achieved large-scale commercial use in many fields such as finance, telecommunications, government affairs, energy, transportation, etc., helping enterprises to carry out global data governance efficiently and conveniently.

Figure 4 HashData is implemented in a large state-owned bank

Taking a large state-owned company as an example, the HashData cloud-native data platform was used to integrate all P9 analysis platforms, and the global data was divided according to subject domains, which realized centralized data management and control, and integrated and unified data architecture.

At the same time, the data platforms of all branches are collected to provide global data and computing resources in a unified manner. Based on shared storage, it manages a unified and analysis-oriented enterprise-level data view of the whole bank, and establishes multiple computing clusters according to different application scenarios. After authorization, any computing cluster can access any data in the shared storage to complete business processing and calculation. Or query and analyze online.

Due to the unified data platform, the best practice of unified data model can be used in the whole bank, avoiding model differences caused by different technology stacks, reducing secondary data quality problems and metadata operations, and greatly reducing data management costs.

In the future, we look forward to bringing the latest cloud-native technology practices to all walks of life, empowering enterprises to realize OneData, and unleash the value of data with ease!

Guess you like

Origin blog.csdn.net/m0_54979897/article/details/131400125