Layering of BI Data Warehouse

Why layer your data warehouse:
a) Exchange space for time, and improve the user experience (efficiency) of the application system through a lot of preprocessing, so there will be a lot of redundant data in the data warehouse;
 
b) If it is not layered, if the business rules of the source business system change, it will affect the entire data cleaning process, and the workload will be huge
 
c) The data cleaning process can be simplified through data hierarchical management, because dividing the original one-step work into multiple steps to complete is equivalent to splitting a complex job into multiple simple jobs, creating a large black box It has become a white box, and the processing logic of each layer is relatively simple and easy to understand, so that it is easier for us to ensure the correctness of each step. When data errors occur, we often only need to adjust a certain step locally. .
 
 
Data warehouse standards can be divided into four layers: ODS (temporary storage layer), PDW (data warehouse layer), MID (data mart layer), APP (application layer)
 
 
ODS layer:
The temporary storage layer is a temporary storage area for interface data to prepare for the next step of data processing. Generally speaking, the data of the ODS layer and the data of the source system are isomorphic, and the main purpose is to simplify the work of subsequent data processing. In terms of data granularity, the data granularity of the ODS layer is the smallest. The table of the ODS layer usually includes two categories, one is used to store the data that needs to be loaded currently, and the other is used to store the processed historical data. Historical data is generally stored for 3-6 months and needs to be cleared to save space. However, different projects should be treated differently. If the amount of data in the source system is not large, it can be retained for a longer time, or even in full;
 
 
PDW层:
For the data warehouse layer, the data in the PDW layer should be consistent, accurate, and clean, that is, the data after cleaning (removing impurities) from the source system data. The data in this layer generally follows the third normal form of the database, and its data granularity is usually the same as that of ODS. In the PDW layer, all historical data in the BI system will be saved, for example, 10 years of data.
 
MID层:
For the data mart layer, this layer of data is subject-oriented to organize data, usually star or snowflake data. In terms of data granularity, the data in this layer is lightly aggregated data, and detailed data no longer exists. From the time span of data, it is usually part of the PDW layer, and its main purpose is to meet the needs of user analysis. From the perspective of analysis, users usually only need to analyze data in recent years (such as data in the past three years). can be. From the breadth of data, it still covers all business data.
 
 
APP layer:
为应用层,这层数据是完全为了满足具体的分析需求而构建的数据,也是星形或雪花结构的数据。从数据粒度来说是高度汇总的数据。从数据的广度来说,则并不一定会覆盖所有业务数据,而是MID层数据的一个真子集,从某种意义上来说是MID层数据的一个重复。从极端情况来说,可以为每一张报表在APP层构建一个模型来支持,达到以空间换时间的目的数据仓库的标准分层只是一个建议性质的标准,实际实施时需要根据实际情况确定数据仓库的分层,不同类型的数据也可能采取不同的分层方法。
 
---【补充】
数据缓存层:
用于存放接口方提供的原始数据的数据库层,此层的表结构与源数据保持基本一致,数据存放时间根据数据量大小和项目情况而定,如果数据量较大,可以只存近期数据,将历史数据进行备份。此层的目的在于数据的中转和备份。
 
 
 
核心数据层:
此层的数据在数据缓存层的基础上做了一定程度的整合,称之为数据集市,存储上仍是关系模型。此层的目的在于进行必要的数据整合为下一步多维模型做准备。
 
 
分析应用层:
此层的数据为根据业务分析需要构造的多维模型数据。数据可以直接用于分析展现。
说明:数据层次的划分可以根据实际项目需要进行裁剪,如果业务相对简单和独立,可以将核心数据层与分析应用层进行合并。另外,分析应用的数据可以来自多维模型的数据,也可以来自关系模型数据甚至原始数据。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325294803&siteId=291194637