Data center architecture system understanding

       At present, most enterprises prefer to collect and store data in a centralized manner, and apply layered construction. On the one hand, this method is conducive to the rapid deployment of application systems, on the other hand, it also ensures the centralized management and operation of data, reflecting the asset and resource attributes of data.

The emergence of the data center makes up for the shortcomings such as insufficient responsiveness between data development and application development due to the mismatch of development speed.

The data center is a concept proposed by domestic scholars, starting from Ali's concept of "big, medium and small front desks". Ali's middle office starts from the management point of view, and uses the middle office business department to centralize the functions of multiple departments such as data search, technology and products, and data sharing. Other organizations or enterprises don’t necessarily need to set up a China-Taiwan Business Unit to build a data center, but the idea of ​​centralized data management and improving the efficiency of data value conversion is consistent.

General architecture of data center

       Starting from the two dimensions of data processing and data governance, a decoupled data center architecture can be designed. The data middle platform architecture has certain flexibility, and can be combined according to enterprise application requirements, or a single module can be expanded, which can meet the needs of most enterprise data middle platform construction.

The general architecture of the data center is shown in the figure above. Based on the principle of reducing functional redundancy and improving functional reuse, the middle platform architecture decouples the data middle platform into six functional subsystems that can be independently constructed and evolved.

The data structure and data processing subsystem are the core of the data center system architecture, and data governance is an important means to enhance the value of data. The versatility of the data center architecture is manifested in the following points.

  • The architecture of the data middle station comprehensively considers various elements of the data middle station. Construction with reference to this architecture can effectively enhance the value of data assets and provide data and service sharing.

  • Referring to this data center system architecture, enterprises can plan once and implement step by step. First build the processing subsystem and data storage subsystem, and then gradually supplement the data collection, data security and data governance subsystems according to business development needs.

  • The data center consists of 6 decoupled subsystems. Enterprises can flexibly combine when establishing a project. Each subsystem can be independently tendered for construction, or multiple subsystems can be combined for tender construction. The general architecture of the data center includes six parts: data storage framework, data acquisition framework, data processing framework, data governance framework, data security framework and data operation framework.

1. Data storage framework

The core of the data center is data, which is acquired through the acquisition system, then processed through the processing framework, and accepts the management of the data governance framework, and also accepts the management of the data security management framework, and finally the open value data will pass through the data operation framework Provide external data services. 

The data architecture of the data center should be planned independently, and a reasonable technical architecture should be used to store different types of data.

In the data storage framework, regardless of whether the data adopts object storage, block storage or database storage technology, various middle-end data can be classified and managed as shown in the above figure.

The source data is mainly managed by the acquisition framework. The data governance framework simply divides data into two categories: structured and unstructured data according to data characteristics, while standardized domain-divided data is the normalized domain-based arrangement of the full amount of data by the data governance framework. Wide table data is the result of data association. Using wide table data can make a complete data portrait of people, things, places, objects, groups and other objects. At the same time, wide table data can also be used as the middle layer data of the upper model data.

Metadata and label data are both descriptions of data. Metadata is used to express the objective attributes of data. Label data is more inclined to the manager's subjective expression and classification of data, such as quality level labels, security labels, and attributes. label etc. Master data needs to be frequently updated and exchanged between systems, and requires independent storage space for maintenance and management.

2. Data collection framework

The collection framework of the data center should carry out unified collection and management of various source data included in the data center. A variety of data collection methods should be provided in the data collection framework, such as file transfer protocol collection, database collection, interface application program access collection, stream collection and web crawler collection.

At the same time, the acquisition framework should preprocess the source data according to the data acquisition specification, so as to remove obviously unnecessary data and redundant data, and manage the acquisition process. Although there is no unified template for the system architecture of the data center, the data collection frameworks of all enterprises are basically consistent.

3. Data processing framework

Data processing is one of the basic links of every data application. The classic data extraction, transformation, and loading (ETL) processing flow is used in multiple places such as data acquisition preprocessing, data integration, and data modeling. Building a data processing framework alone is conducive to the centralized development and management of data processing tool components, and is also conducive to the coordination and scheduling of data processing tasks in the data center.

The data processing framework is specifically responsible for data processing-related tasks, including batch processing, stream processing, artificial intelligence analysis, data cleaning, data exchange, and query. In addition, relevant tool components for data processing can be configured in the processing framework. The task scheduling module is in the middle command role in the data processing framework, and performs operations such as monitoring and exception handling on the running data processing tasks.

4. Data Governance Framework

Data governance in a broad sense not only includes content that enhances data value, such as data management, data catalog, data quality, etc., but also includes data security management and data sharing services.

Data security management and data value enhancement are a contradiction. If a manufacturer or development team develops software related to data security management and data value enhancement, the developer’s operation will inevitably be biased, and the contradiction is not easy to disclose. There are fewer good solutions to conflicts.

In addition, data sharing and other aspects of data governance have the same problem. Therefore, this paper suggests that the data governance framework of the data center does not include data security and sharing related content.

The data governance framework includes four modules: data catalog, data management, model management and data quality:

  • The main functions of data maps, data asset catalogs, knowledge graphs, and data lineages are to display the attributes and relationships of data, so they are all included in the data catalog module.

  • The data model can improve the response capability of the data center to external application requirements, and the solidified intermediate model data needs special management. Model management includes model catalog, model lineage and model map, etc.

  • Data management can be subdivided into metadata management, master data management, label data management and source data management.

  • The data quality management module manages the quality of the data in the data center according to the established data standards and data audit rules.

5. Data Security Framework

Data has become a data asset, and the data security framework is an indispensable part of the data center. Data security is superimposed on other functional frameworks of the data center, and every link such as data collection, processing, exchange, and sharing must implement security control strategies. The security framework can be divided into several functional modules such as log management, user authentication, authority management and encryption and decryption.

In addition, the security portal can also provide external security capability packaging to display the security situation and security view of the data platform.

6. Data Operation Framework

The core function of the data center is to integrate the data processing and data governance functions of many data applications, centralized construction, centralized management, reduction of redundancy, and increase of reuse. The ultimate goal of the data center is to provide data services for other applications or developers, while the external data service function will directly face uncertain external objects.

Therefore, the separate construction of data operation, on the one hand, is conducive to providing targeted functions for external users; on the other hand, the data operation module, as an intermediate layer between users and core data services in the data center, can effectively isolate external users from direct control and access to the core Data and applications can protect the security of the data center and the stability of internal functions.

Based on the above factors, data operation should be equipped with functions such as operation portal, capability opening, data opening, and operation monitoring:

  • Operation portal: Provide management portal for data center managers, and provide developer portal for developers. An internal application portal is provided for internal applications, and an external application portal is provided for external applications. The operation portal provides different channels for different users and opens up different data center capabilities.

  • Capabilities openness: Provide services to users after proper packaging of the data processing capabilities and data analysis capabilities of the data center, which can be micro-services, API interfaces, or directly provide secondary development capabilities.

  • Open data: Provide data services for other data application systems through data catalogs and data/model presentations (visualization, data views, etc.).

  • Operation monitoring: monitor and manage the overall operation of the data center, including the hardware environment and software environment, and determine the monitoring indicators, provide daily operation reports as required, and process alarm information.

Typical Architecture of Data Center 

       The goal of the data center is to make the data continue to be used. Through the tools, methods and operating mechanisms provided by the data center, the data can be turned into a service capability, so that the data can be more conveniently used by the business. The figure below shows the overall architecture of the data center, which is a complete system between the underlying storage computing platform and the upper data application.

 The overall architecture of the data center

The data middle platform shields the complexity of the computing technology of the underlying storage platform, reduces the demand for technical talents, and makes the cost of data usage lower. Establish enterprise data assets through data aggregation and data development modules in the data center. Through asset management, governance, and data services, data assets are transformed into data service capabilities to serve enterprise businesses. The data security system and data operation system ensure the long-term healthy and continuous operation of the data center.

1. Data Aggregation

Data aggregation is the entry point for data access in the data center. The data center itself hardly generates data. All data comes from business systems, logs, files, networks, etc. These data are scattered in different network environments and storage platforms, making it difficult to use and generate business value.

Data aggregation is the core tool that must be provided by the data center. Data from various heterogeneous networks and heterogeneous data sources can be conveniently collected to the data center for centralized storage to prepare for subsequent processing and modeling. Data aggregation methods generally include database synchronization, point burying, web crawlers, message queues, etc.; in terms of the timeliness of aggregation, there are offline batch aggregation and real-time collection.

2. Data development

The data aggregated to the middle station through the data aggregation module has not undergone any processing, and is basically piled together according to the original state of the data, so the business is still difficult to use. Data development is a set of tools for data processing and process control. Experienced data development and algorithm modeling personnel can use the functions provided by the data processing module to quickly process data into valuable forms for business use.

The data development module is mainly for developers and analysts, providing offline, real-time, algorithm development tools and a series of integrated tools such as task management, code release, operation and maintenance, monitoring, and alarming, which are easy to use and improve efficiency.

3. Data asset system

With the data aggregation and data development modules, the middle platform already has the basic capabilities of the traditional data warehouse platform, can do data aggregation and various data development, and can establish an enterprise's data asset system. It was said before that the data asset system is the flesh and blood of Zhongtai, and what is developed, managed, and used is all data. In the era of big data, the amount of data is large and the growth is fast, and the business's dependence on data will become higher and higher. The consistency and reusability of data must be considered. The vertical chimney-like data and data service construction methods are destined to not exist for a long time .

Different enterprises have different data due to different businesses, and the content of data construction is also different, but the construction methods can be similar, and the data should be constructed in a unified manner. The author recommends that the data be constructed in a unified manner according to the standards of posting source data, unified data warehouse, label data, and application data .

4. Data asset management

The data assets established through the data asset system are still a technical data system, which is difficult for business personnel to understand. Asset management is to present the data assets of the enterprise to all employees of the enterprise in a way that is better understood by all employees of the enterprise (of course, authority and security control must be considered). Data asset management includes data asset catalogs, metadata, data quality, and data lineage , data life cycle, etc. to manage and display, display the data assets of the enterprise in a more intuitive way, and enhance the data awareness of the enterprise.

5. Data service system

In the past, data aggregation and data development were used to build enterprise data assets, and data management was used to display enterprise data assets, but the value of data was not fully utilized. The data service system is to turn data into a service capability. Through data services, data can be used to participate in the business and activate the entire data center. The data service system is the value of the data center.

The data services of enterprises are ever-changing. Middle-end products can carry some standard services, but it is difficult to meet the service demands of enterprises. Most of the services still need to be quickly customized through the middle-end capabilities. The service module of the data center does not come with many services, but provides rapid service generation capabilities and functions such as service control, authentication, and measurement.

6. Operating system and security system

Through the previous data aggregation, data development, data assets, asset management, and data services, the establishment and construction of the entire data middle platform has been completed, and it has also played a certain value in the business.

The operating system and security system are the basis for the healthy and continuous operation of the data center. Without them, the data center is likely to be like a general project. After building the platform in the first phase, building some data, and trying one or two application scenarios If it stops, it cannot continue to operate normally, and cannot continue to exert the value of data applications. This also completely fails to achieve the goal of building a data center.

 

Architecture diagram of enterprise data center   

1. Technical middle platform architecture diagram 

Before the emergence of the concept of middle platform, in the informatization model, the front-end is the application end that supports the business, and the back-end is various application systems that provide services for front-end users, such as: customers, suppliers, partners, and society. However, as the market, Due to the variability of user needs and business, the underlying rigid application cannot provide timely support.

Enterprises need a strong middle layer to provide support for high-frequency and changeable businesses, and provide multi-terminal access channels for different audience users. Vendors, and even traditional application software vendors have a greater conceptual impact.

At this time, microservice technology and architecture, containerized ecology, Devops concepts and tools are in the stage of great development, and finally the informatization construction model based on "large, medium and small front desks" has become popular.

2. Bank data architecture system

At the data architecture level, the data is reasonably arranged from a non-functional perspective by means of data classification and layered deployment. Through overall architecture control and design, it supports business operations and management analysis applications (systems) to meet the data needs of business development and IT transformation. The scalability and adaptability of the architecture can improve the timeliness, flexibility and efficiency of data analysis applications. accuracy.

Under actual circumstances, the data architecture system of each bank will be different, and there will be different evolution paths and development directions according to the business development, customer data volume, transaction data volume, and functional requirements of each bank.

Generally, national banks such as state-owned banks and joint-stock banks have more complex businesses and a larger amount of data, so the data structure has evolved rapidly. Common data architecture partitions are shown in the following figure:

3. Middle-end structure of the retail industry

This is a schematic diagram of a middle-end logical architecture that mixes technology and business. In the front-end application part, we have listed several application systems that need to be connected to consumers in the retail and consumer goods industries. However, under the middle-end architecture, they have already been compared with traditional ones. The "application system" has made a big difference and has become very "lightweight".

 

 4. Business middle platform structure

The front desk follows the interface, and it is not stable by nature. There are always various data requests, which is inevitable.

The background should be mainly responsible for data storage, and organize data of different forms and scales in an appropriate way. Big data is too dynamic and requires a certain degree of stability.

If all requests from the foreground require the background to do it directly, then the background has too many things to manage.

5. Background Architecture

The backstage is shared by many frontends. If flexible data services are provided directly to the frontend, the degree of coupling between each frontend may become higher, and the maintenance cost will immediately increase sharply.

Similarly, it is not appropriate to put these data processing in the foreground. On the one hand, it is not safe. On the other hand, the front-end team is busy making the interface look better and smoother to use, and does not have much time to think about the data. Such a background structure can relatively balance this contradiction.

6. Real-time data platform

The following is a logical architecture to realize the real-time data platform, which is easy to understand. In fact, the most important thing is the layer of the real-time model  

 7. The development process of enterprise-level middle and Taiwan

I use the picture below to summarize the three stages of the development of Zhongtai. In the end, we found that for those companies that already have an ERP system, the essence of Zhongtai’s construction is to use the microservice architecture to build an open business platform to replace the closed-source single The process of the ERP system of the overall structure.

8. Ali Zhongtai Structure

Zhongtai is an architectural concept and method. The essence of any architectural method is nothing more than the use of technical means such as separation, combination, break-up, and reorganization to restructure the system in an orderly manner, so as to achieve the process of reducing the "entropy" of the system and enable the system to continuously evolve. .

 

Nine, Ali core architecture diagram

Deploy the technology middle platform through the Alibaba Cloud platform, provide support for the shared business units within the group, and finally provide service-oriented output for each business line at the front desk.

10. Omni-channel retail middle platform 

If you just pack everything in a "big background", you can't really solve the pain points of IT, because it is an IT system after all. In addition to business functions, the things that IT systems need to consider are more important and valuable:

11. Omni-channel integration architecture

2007~2012 was the era when the concept of "integrated mode" was thrown out with the highest rate. It had a name called "SOA", and SOA was the "omni-channel middle platform" of that era

 

 

12. Netease strictly selects the data middle platform system

The core responsibility of the data center is to efficiently empower the data front desk to provide value for the business. If you want to understand the data platform, you must first understand the data front desk. The search, recommendation, BI report, data large screen, etc. mentioned above belong to the data front desk.

 

Industry data center solution

▲Real estate industry solutions 

 ▲Securities industry solutions

▲Retail industry solutions 

▲Manufacturing industry solutions

▲Media industry solutions 

Inspection industry solutions 

Summarize 

The construction of a data center can realize efficient management of data assets of enterprises or institutions and maximize data value, bringing a data platform operation mechanism to institutions, which is expected to solve the problem of mismatch between application development and data development speed. Using the data middle platform can bring together the core technologies or teams of the organization, build a strong data development and operation team within the organization, and enhance the hard and soft power of the organization's team.

Although a good architecture plays an important role in the later expansion and operation and maintenance of an information system, the overall architecture design is only the first step in the construction of a data center, and each functional module still has a lot of room for refinement, such as different types of data Storage technology selection, data security compliance audit technology, data model design, etc. In specific projects, the balance between data sharing and security protection, the introduction of new technologies, etc., all need further detailed research.

Guess you like

Origin blog.csdn.net/SHYLOGO/article/details/129407729