How does the layered data warehouse with many benefits look like? How to create data warehouse layers,

1. Create data warehouse layers

The data warehouse layering is the overall architecture design and hierarchical division of the data model in combination with the comprehensive analysis of business scenarios, actual data, and usage systems. It is used to classify and divide data for different purposes into different layers, so that you can better organize, manage and maintain data. This article introduces you how to create and manage data warehouse layers.

1.1 Background information

The data warehouse is a collection of all data, including log information, database data, text data, external data, etc. are integrated in the data warehouse. Elements such as data warehouse layering, data domain, business process, data mart, and subject domain jointly determine the logical data warehouse architecture for your modeling this time. Among them, the data domain and business process are located in the public layer, and are used for the construction of the data model of the public layer. Data marts and subject domains are located at the application layer and are used for model building for specific business applications.
The layering of the data warehouse ensures that the data is cleaned and filtered before entering the data warehouse, so that the original data is no longer messy, optimizes the query process, and effectively improves the efficiency of data acquisition, statistics and analysis. At the same time, the layering of the data warehouse realizes the association of various data in different dimensions, making multi-dimensional analysis more convenient, and providing convenience for data analysis and decision-making from multiple angles and levels.

1.2 How to build data warehouse modeling?

We built it in four steps according to the dimensional modeling in the data warehouse toolbox:
Alt

  1. Select the business process: import all the business tables involved in javeEE, these tables include entity tables, dimension tables, transactional snapshot fact tables, periodic snapshot fact tables, and cumulative fact tables. After coming over, make these tables a column of the matrix.
  2. Declaration granularity: Granularity generally includes: one line of information represents once, by day, by week, by month, etc., referring to other architectures, considering the indicators you want to analyze in the later stage, and choosing the smallest granularity appropriately, one line of information represents one consumption.
  3. Confirm the dimensions: adopt the thinking of standard data warehouse modeling, and strive to have level 1 dimensions around the fact table. What is related is the dimension of themes such as time, place, person, specific activities, coupons, etc. At the same time, the dimensions of the tables related to users and products are degraded, and they are reduced to the first level of dimension as much as possible.
  4. Confirm the fact: here we are not determining the fact table, but the measurement value of the fact table, such as the number of orders, the amount of the order, the number of orders placed, 可以累加的字段etc.

1.3 Plan data warehouse layering

Data warehouse layering needs to be designed in combination with business scenarios, data scenarios, and system scenarios. You can plan the layering of your data model according to actual business needs.
Data warehouse layering needs to be designed in combination with business scenarios, data scenarios, and system scenarios. You can plan the layering of your data model according to actual business needs.
According to the layering of Alibaba Cloud Data Warehouse, it can be divided as follows:

1.3.1 Data introduction layer ODS (Operational Data Store)

The ODS layer is used to receive and process the original data that needs to be stored in the data warehouse system. The structure of the data table is consistent with the table structure in the data system where the original data is located. It is the data preparation area of ​​the data warehouse. The operation of the ODS layer on the original data is as follows:
the original structured data is incrementally or fully synchronized to the data warehouse.
Structure raw unstructured data (for example, log information) and store it in MaxCompute.
According to actual business needs, record the historical changes of the original data or simply clean the original data.
For the data table of the ODS layer, the name must start with ods, and the life cycle is 366 days.

1.3.2 Detailed data layer DWD (Data Warehouse Detail)

The DWD layer builds a data model through the business activity events of the enterprise. Based on the characteristics of specific business events, build the most fine-grained detailed data table. You can make some important dimension attribute fields of the detailed data table redundant in combination with the data usage characteristics of the enterprise, that is, wide table processing. At the same time, it can also reduce the association between detailed data tables and dimension tables, and improve the usability of detailed tables.

1.3.3 Summary data layer DWS (Data Warehouse Summary)

The DWS layer builds a data model through the analyzed subject objects. Based on the upper-level application and product indicator requirements, build a summary indicator fact table with public granularity.
For example, make a preliminary classification and summary of user behavior from the ODS layer, abstract some common dimensions, assuming that the dimensions are time, IP, ID, and calculate relevant data based on these dimensions, such as users in each time period The number of products purchased by different login IPs. Then a layer of light summarization can be further added to the DWS layer, which can make the calculation more efficient. For example, calculating the behavior of only 7 days, 30 days, and 90 days on this basis will save a lot of time. 2

1.3.4 Application Data Layer ADS (Application Data Service)

The ADS layer is used to store personalized statistical index data of data products and output various reports. For example, an e-commerce company, from June 9th to June 19th, the quantity and ranking of major ball games sold in Hangzhou.

1.3.5 Common dimension layer DIM (Dimension)

The DIM layer uses dimensions to build a data model. Based on the actual business, it can store the dimension table of the logical model; or store the dimension definition of the conceptual model. By defining the dimension, determining the dimension primary key, adding dimension attributes, and associating different dimensions, etc., the consistent data analysis dimension table of the entire enterprise can be constructed to help You reduce the risk of inconsistent data calculation caliber and algorithm.

Guess you like

Origin blog.csdn.net/m0_58353740/article/details/131489056