The road to building a data center

The data middle platform is the accumulation of business and data of each business unit in the process of digital transformation of government and enterprises, and builds a data construction, management, and usage system including data technology, data governance, data operations, etc., to achieve data empowerment. The data middle platform is the core of the new information application framework system.

1. What is a data center?

With the acceleration of digital transformation of enterprises, the concept of data middle platform has gradually become familiar to everyone, and the demand for the construction of data middle platform by governments and enterprises is gradually increasing.
The data middle platform is the accumulation of existing/new information system business and data, and is an intermediate and supporting platform to enable data to empower new businesses and new applications . The data middle platform is the accumulation of business and data of each business unit in the process of digital transformation of government and enterprises, and builds a data construction, management, and usage system including data technology, data governance , data operations , etc., to achieve data empowerment. The data middle platform is the core of the new information application framework system. However, for many people, data center is still a vague concept. Gartner: The construction direction of the data middle platform should be at the core of the enterprise digital platform, which is the data and analysis platform defined by Gartner to help business users of the enterprise's digital platform (customer experience platform, ecosystem platform, Internet of Things platform and internal information system) Make better decisions and form reusable data analysis capabilities under the cooperation and incubation of various digital platforms. Data analysis capabilities should be ubiquitous and highly self-service on the business side, ultimately helping the digital platform achieve packaged business capabilities (Packaged Business Capability) defined by Gartner. Alibaba: Through OneModel, OneID, and OneService, data unification is achieved, namely OneData. Among them, OneModel unifies data construction and management, completely eliminating data ambiguity and realizing minute-level code self-generation; OneID capitalizes core business elements and realizes global connection, tag extraction, and three-dimensional imaging of data; OneService unifies data services to Topic service output simplifies data query. Xinghuan Technology: The data middle platform allows enterprises to be highly data-driven, adapt to the rapid changes in the enterprise's main business, innovation, and incubation business, and support the digital transformation of enterprises. By unifying the underlying architecture, unifying data governance, unifying data services, and personalized data applications, we can ultimately realize the assetization, asset value, and value personalization of the entire enterprise. Therefore, it is difficult for different vendors, or even different teams/people from the same vendor, to unify the definition/main solution of the data center. The essence of middle platform: big data? Number of warehouses? Data lake? Data governance? data service? cloud platform? ...





2. Digital management needs in the era of digital transformation

The "14th Five-Year Plan" Digital Economy Development Plan calls for taking data as the key element, taking the deep integration of digital technology and the real economy as the main line, strengthening the construction of digital infrastructure, improving the digital economic governance system, and collaboratively promoting digital industrialization and industrial digitization. Empower traditional industries to transform and upgrade, cultivate new industries, new business formats and new models, continue to strengthen, improve and expand my country's digital economy, and provide strong support for building a digital China.
The digital transformation of enterprises requires the improvement of a "digital brain", including storage and computing capabilities, governance capabilities, service capabilities, and personalized applications.

this means:

  • Need a unified data base

When the open source big data hybrid architecture handles different types of business, it needs to store data in different databases, resulting in a large amount of data redundancy. The chimney development method results in the need to obtain the corresponding data through different query methods before processing, which makes the development complex and the processing efficiency low.

  • Need for unified data governance

There are four issues that need to be solved in data governance: what data is there, where is the data, where does the data come from, and who uses the data. The lack of unified data governance results in low data quality that lacks availability, consistency, integrity, compliance and security.

  • Need for unified data services

Whenever there is a new business requirement or data usage requirement, developers need to start processing from the underlying basic data and develop layer by layer until the data service is finally completed. The entire development cycle is long, and the chimney-style development method leads to data services. Cannot be reused directly.

  • Requires agile and flexible personalized application construction capabilities

The construction of new business systems requires an independent environment and the acquisition of required data for testing and launch. The construction process of the entire environment preparation, data preparation, and application development is complicated. The technology department responsible for unified construction usually becomes a resource bottleneck and lacks unified applications. Management results in applications that are not reusable.

3. The pain of data center transformation

However, if the data center is not systematically planned and is business-oriented, and different technical components and tools are selected for different businesses, the construction of the data center will be chaotic. Most businesses only use the source layer, which is built in a chimney style, resulting in data chimneys, tool chimneys, and service chimneys. As a result, the development team is tired of dealing with the operation and maintenance of various technical components and the data models of specific businesses. request, there is no energy to do more valuable work, such as technology improvement, common data model abstraction, data service sorting, application development, etc.

4. Evolution of data center

Therefore, the construction of a data center will generally go through three versions of evolution, thereby achieving the goal of "releasing data productivity and improving data production relationships."

Data center 1.0: hybrid underlying architecture + unified tool layer

This is a subconscious approach that most businesses, especially small and medium-sized businesses, like to adopt. However, the construction results brought by this method are average. As Xu Zhisheng said, "Young people always have to take some detours to reach the other side!".

Data middle platform 2.0: unified underlying architecture, global data integration, and unified data base

In the value chain activities throughout the entire enterprise, the unified underlying architecture improves storage and computing efficiency, unified data governance builds data assets, and unified data services activate data value, ultimately allowing enterprises to be highly data-driven and support their digital transformation.

Data middle platform 3.0: Agile application development model, running through the value chain, efficient data-driven

Based on the existing data base of 2.0, we build unified data governance to build data assets, and unified data services activate data value, ultimately allowing enterprises to be highly data-driven and adapt to the rapid changes in data applications such as enterprise personalized main business, innovation, and incubation business. Support enterprise digital transformation.

5. Ideas for building a data center
5.1 Construction goals

So, what ideas should be used to promote the construction of data middle platforms?

Build a big data middle platform with "unified access, unified storage, unified governance, unified development, and unified services" to achieve unified collection of multi-source data, unified management of business data, unified support of internal and ecological applications, while reducing The usage threshold has the capabilities of self-service, data autonomy, and platform self-care, and the realization of the three ones (an intelligent analysis and operation ecology, a normalized lean governance system, and an intensive data platform base) is the construction goal.

The data middle platform should integrate enterprise data governance and management, data asset development and operation, and connect and drive the concepts and best practices of data sharing and services, data development and operation and maintenance. The overall functional framework should be consistent with the enterprise data governance system framework. After completion, it can provide good technical support for the entire digital management work.

5.2 Construction content

The content of data center construction generally includes the following parts:

The big data basic platform provides engines and tools for analysis, calculation and storage for the entire big data middle platform, and is the underlying functional support for the big data middle platform. Provides distributed data warehouse, distributed NoSQL database, real-time computing, data retrieval, data mining and other components.

Data development and governance tools (platforms) provide data access, data development, data governance, data services, data management and other components to support the development of big data middle platforms. Thus, from data collection, data exchange, data storage, data governance to data sharing and services, the entire data development and sharing system is established for the big data middle office.

5.2.1 Big data basic platform

Based on a multi-model unified technical architecture, it provides a unified interface layer, a unified computing engine layer, a unified distributed storage management layer, and a unified resource scheduling layer, while ensuring high performance, high reliability, and high availability of different data models. It achieves the goal of more flexible resource allocation and simpler and easier operation and maintenance.

The big data basic platform can provide engines and tools for analysis, calculation and storage for the entire data center, and is the underlying functional support for the big data center. It needs to provide high-performance, high-stability, and high-availability database software for the construction of data warehouses, and by providing components such as offline processing, stream processing, full-text retrieval, and data mining.

Based on the platform's underlying storage design and platform business characteristics analysis, it is recommended to use the Hadoop ecosystem to design the big data basic platform. The big data basic platform built must be able to meet large-scale processing and analysis scenarios of massive data in the form of product components, engines or tools, including but not limited to the following: offline batch processing, real-time stream processing analysis, concurrent data query, full-text retrieval, Data mining, BI analysis, interactive analysis, etc.; the big data basic platform built must be able to process not only structured data, but also unstructured and semi-structured data to meet the needs of configuration, logs, web pages, audio and video, IoT , web crawlers and other multi-source heterogeneous data loading and storage; the platform needs to provide complete multi-tenant functions, unified control and management of computing resources, storage resources and data access resources, and efficient scheduling management and usage control of computing resources. Quota management for storage resources and strict permission management for data access; the platform needs to provide a unified visual operation and maintenance monitoring interface for operation and maintenance management of installation, configuration, monitoring, and alarms.

5.2.2 Data development governance tools

The technical architecture of the data development management platform is based on Docker+Kubernetes, and adopts a microservice development framework to implement visual operation tools such as data integration, data development, task scheduling, data governance, data services, and data malls.

5.2.3 Construction steps

Taking Xinghuan's data center product as an example, the following construction steps can be adopted:

1. Middle office planning

Build a unified data base (unified resource management, unified storage management, unified computing engine, unified query language), build data marts, data warehouses, and data lakes; build unified data governance, build data assets; build unified data services, Activate the value of data and ultimately enable enterprises to be highly data-driven, adapt to the rapid changes in data applications such as enterprise personalized main business, innovation, and incubation business, and support the digital transformation of enterprises.

2. Platform deployment

Through the cloud-native operating system and data management platform, heterogeneous processors (X86, ARM), GPUs and heterogeneous operating systems (UOS, Winning Kirin, Galaxy Kirin) are unifiedly managed to support the requirements of innovation and localization.

By shielding the underlying technical architecture, a unified resource layer is provided for cloud products, and only the CPU architecture of resources is exposed, providing a good deployment environment for data center systems.

3. Data access and operation

Through enterprise-level data development and management capabilities and multi-modal big data processing capabilities, enterprises can improve the efficiency of building data lakes, data middle platforms, data warehouses and other systems, and realize data assetization and data businessization more efficiently.

4. Application support

Empower business users based on their needs and build personalized applications independently and agilely.

5. Operation and maintenance management

Through the design process, the asset application process, data entry process, and data entry process are standardized, monitoring and alarming, and data security protection functions are provided to achieve all-round operation and maintenance of the platform.

6. Ending

When the wind is blowing and the tide is flat, it is time to set sail and break the waves; when the road ahead is long and arduous, we need to spur our horses hard.

With the deepening of digital transformation, the data center, as an important infrastructure for enterprise data management, has broad prospects for future development.

The future development of the data center will focus more on real-time, intelligence, cloud-native, ubiquitous and security compliance. Enterprises need to keep up with technological development trends and continuously upgrade and improve the construction and application of data middle platforms to better support the digital transformation and development of enterprises. 

Guess you like

Origin blog.csdn.net/leyang0910/article/details/135296387