Experts in the delivery of data in the platform tell you, how can the layering of data architecture be more reasonable?

Author: Cogan

On the whole, the data center system architecture can be divided into three levels: data collection layer, data calculation layer, and data service layer. Provide data support for upper-level data applications through these three levels.

Data collection layer

For enterprises, massive amounts of data are being generated all the time, and data collection is particularly important as the first link of the data system.

Therefore, it is necessary to establish a standard data collection system at the data collection layer, and strive to complete the collection of massive data in a comprehensive, high-performance, and standardized manner, and transmit it to the big data platform.

The Internet log collection system includes two major systems: the web-side log collection technical solution; the APP-side log collection technical solution.

On top of the collection technology, enterprises can use burying point specifications for various scenarios to meet various business scenarios such as log data opening. At the same time, a high-performance and high-reliability data transmission system can be established to complete the transmission of data from the production business end to the big data system; in terms of transmission, the collection technology can include not only the incremental data transmission of the database, but also the log Data transmission; it needs to be able to support real-time streaming computing, and real-time batch computing of various time windows. On the other hand, the data synchronization tool is directly connected to heterogeneous databases (standby database) to extract data in various time windows.

The following figure shows the position of the data collection layer in the data layer:
1.png

Data calculation layer

After collecting a large amount of raw data from the collection system, only when the data is integrated and calculated can it be used to gain insight into business laws, dig out potential information, realize the value of big data, and achieve the purpose of empowering and creating business. The large amount of raw data collected from the acquisition system will enter the data calculation layer to be further integrated and calculated.

Faced with massive data and complex calculations, the data computing layer includes two systems: data storage and computing cloud platform and data integration and management system.


-Data storage and computing cloud platform. For example, MaxCompute is an offline big data platform independently developed by Alibaba. Its rich functions and powerful storage and computing capabilities give enterprises a powerful storage and computing engine for big data; StreamCompute is Alibaba The self-developed streaming big data platform better supports the enterprise's streaming computing needs internally.

-The data integration and management system
"OneModel" is a method system and tool for data integration and management. Under this system, big data engineers build a unified, standardized, and shareable global data system to avoid data redundancy and duplication. Avoid data chimneys and inconsistencies, and give full play to the unique advantages of the massive and diverse aspects of big data. With the help of this unified data integration and management method system, the corporate data public layer can be constructed, and it can help similar big data projects to be implemented quickly.

The data processing link in the data center also follows the industry's layered concept: including the operational data layer (ODS, Operational Data Store), the detailed data layer (DWD, Data Warehouse Detail), the summary data layer (DWS, Data Warehouse Summary) and applications Data layer (ADS, Application Data Store). Realize the transformation from data assets to information assets through the processing process between different levels of the data center, and carry out effective metadata management and data quality processing for the entire process.

The following figure shows the position of data common layer (ODS+DWD+DWS) and data application layer (ADS) in data layering:
2.png
Figure: Relationship between data common layer and data application layer

(1) Unified data base layer
The rich data collected through various methods will enter the unified ODS data base layer after being cleaned and structured.

Its main functions include:
-Synchronization: Incremental or full synchronization of structured data to the data center
-Structured: Unstructured (log) structured processing and storage in the data center
Cumulative history, cleaning: According to data business requirements and audit And audit requirements to save historical data and data cleaning

In terms of rights and responsibilities, all data should be unified at the source, and all data foundation layers should be unified, and one team should be responsible and controlled. Other teams have no right to copy data in the data foundation layer.

(2) Data middle layer
We conduct data modeling research and development, and deal with data middle layer that is not easily transferred due to changes in business, especially organizational structure. Including DWD detailed data middle layer and DWS summary data middle layer.

Its main functions include:
-Combine related and similar data: Use detailed wide tables, reuse related calculations, and reduce data scanning.
-Unified processing of public indicators: Based on the OneData system, build statistical indicators with naming conventions, consistent calibers, and unified algorithms to provide public indicators for upper-level data products, applications and services; establish a logical summary table;
-establish consistency dimensions: establish consistency Data analysis dimension table reduces the risk of inconsistent data calculation caliber and algorithm.

In terms of rights and responsibilities, before providing services to the business, a unified team is responsible for abstracting from the business the data domain that originates from the business but is different from the business, and then leads the unified construction of the data middle layer, including the processing of detailed data pre-JOIN. Detailed middle layer, focusing on application-oriented reusable dimensions and index summary data middle layer. In particular, the sole team is responsible for adding core business data to the data middle layer. Part of the business data is allowed to have an independent data team to build a data system in accordance with the unified OneModel system methodology. The ODS data base layer and the DWD+DWS data middle layer are called the data common layer due to their unity and reusability.

(3) Data application layer
When providing services for applications, the business team or the data team deep in the business line have great freedom. As long as they rely on the data common layer, they can freely build the ADS data application layer.

Its main functions include:
-Individual index processing: non-publicity; complexity (exponential, ratio, ranking index)
-Application-based data assembly: wide table market, horizontal table to vertical table, trend indicator string

Data service layer

After the data has been integrated and calculated, it needs to be provided to products and applications for data consumption. For better performance and experience, it is necessary to build a data service layer and provide data services to the outside world through interface service. In response to different needs, the data source architecture of the data service layer is based on a variety of databases, such as Mysql and Hbase.

Data services can make applications transparent to underlying data storage, and open massive amounts of data to applications within the group conveniently and efficiently. How to better serve users in terms of performance, stability, scalability, etc.; how to meet the complex data service requirements of applications; how to ensure the high availability of data service interfaces. With the development of business, the requirements are becoming more and more complex, so data services are constantly advancing.

Whether it is the data common layer or the application layer, ultimately it needs to provide services to the business. In order to make it more convenient for business departments to find, view and use data, we upgraded OpenAPI to the OneService system including methodology + products that can alleviate the impact of business changes on the data model, so that it can provide unified public services while providing unified public services. Compatible with services for personalized applications.

The following figure shows the position of the data service layer in the data layer:
3.png
figure: the relationship between the data application layer and the data service layer

In summary, the enterprise data center relies on the data collection layer, data calculation layer, and data service layer to provide data support for upper-level data products and business systems. Dataphin, a data center product on the cloud, provides enterprises with a one-stop data center platform implementation from "collection, construction, management, and use". With the Alibaba Cloud series products, it can realize the stable and efficient construction of the entire link of the enterprise data center platform. .

 

 

Original link
This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/yunqiinsight/article/details/109195808