Introduction to Data Warehouse-Data Warehouse Modeling

      When using hive, it is easy to come into contact with the modeling of the data warehouse. The data warehouse modeling is a necessary ability required by the data warehouse engineer. Excellent layered design can make the entire data system easier to understand and use. If you want to do counting, you need to fill in the number of layers in order to know the meaning of layering.
      Layering is very important, talk about my understanding.

1. Layered foundation

1. Sort out business data

      With the continuous expansion of data volume and business data tables, we need to sort out the data scope, so that we can clearly find the data source.

2. Avoid double counting

      In order to avoid multiple calculations, multiple tables are associated multiple times. Layering can save intermediate results and reduce development costs. Avoid calculating from the original table for each query.

3. Increase the convenience of data use

      The design of the warehouse layer allows the data to be analyzed, and the analysis is good, which can support most of the data needs. Each requirement can find the data it needs from the layer.

4. Avoid data divergence

      Unified data caliber, ensure data quality, and avoid various concepts of unified indicators.

5. General layering

The more general and simple dimensional modeling layering is divided into three layers:
Layered schematic
(The picture comes from the network, invades and deletes)

2. ODS layer, operation data layer

      Store the data of the operating system in the data warehouse almost without processing, and mainly have the following tasks:
      1. Synchronize the incremental or full amount of business structured data;
      2. Structure and process unstructured data such as logs and land it Go to the data warehouse;
      3. Accumulate the historical data, save the historical data and clean the data according to the data business needs, audit and other requirements, and the retained data snapshot is also convenient for backtracking.

2. CDM layer, public dimension model layer

      Store detailed fact data, dimensional data and public index summary data, unify the caliber, maintain data consistency, reduce data double calculation, CDM layer is divided into DWD layer and DWS layer.

1.DWD layer, detail data layer

      The dwd layer cleans and normalizes business data, such as removing cheating data, naming data fields to avoid ambiguity, etc. In addition, dimension degradation methods can be used to degrade dimensions into fact tables and reduce the association between fact tables and dimension tables. To improve the ease of use of the schedule.

2.DWS layer, summary data layer

      The dws layer strengthens the dimensional degradation of indicators, uses more broad representations to build a public indicator data layer, improves the reusability of public indicators, and reduces repetitive processing.

3. ADS layer, application data layer

      The ads layer stores data for personalized index calculation. The non-commonity and complexity (index type, ratio type, ranking type), etc., will be assembled based on application data, such as large and wide table fairs, horizontal to vertical tables, and trend indicator strings. In addition, as some indicators of ADS have personalized characteristics, try not to provide external services.

      
      
      
      
      
      There is something wrong, I will write it later.

      
      
      
      
      

Published 48 original articles · Like 36 · Visits 130,000+

Guess you like

Origin blog.csdn.net/weixin_42845682/article/details/105134997