Data Warehouse_Data Warehouse_Several Ideas for Slowly Gradual Dimension Implementation

Dimension table design of slowly gradual change of data warehouse

 

Slowly changing dimensions:

  Dimensional data will change with time, and the rate of change is relatively slow. This kind of dimensional data is usually called a slowly gradual dimension. Because the data warehouse needs to track historical changes, especially some important data, the historical state also needs to take certain measures save.

 

 

It is roughly divided into the following implementation ideas

 

1) Full snapshot:

Every day to save the full amount of snapshot data of the current data, this solution is suitable for the dimension of small data volume, using a simple way to save the historical state.

 

2) Extra data column to save historical status

Additional (one column / multiple columns) to retain one or more status values.

Id

name

dept

Last_dept

1

jiangtai

Dep1

Dep3

 

 

3) Zipper watch technology

       When the dimension data changes, the old data is invalidated, and the changed data is inserted into the dimension table as a new record and becomes effective. This can record the history of data changes at a certain granularity.

 

Combined with the surrogate key mentioned earlier, Uid_org is the original business primary key, and Uid_agency is the surrogate key

Uid_agency

Uid_org

name

dept

Start_date

End_date

1

1

jiangtai

Dep1

20180501

20180509

2

1

jiangtai

Dep2

20180510

20991231

 

Question 1: How to get the data status of a certain day from the zipper table

SELECT *

FROM lalian_table

WHERE start_date <= ‘${bizdate}’ AND end_date >= ‘${bizdate}’

 

Ext: Slowly gradual changes

       The surrogate key is a highly recommended method in dimensional modeling. Its application can effectively isolate the unstable problem of the data warehouse structure caused by the source change, and can also improve data retrieval performance.

       However, as you can see, the maintenance cost of surrogate keys is very high, especially in the process of data loading, which has a greater impact on the fact table. The impact is even more serious in the construction of HIVE-based data warehouses. For example, the generation of surrogate keys, the status of the associated keys in the fact table, and the non-equivalence associations are not supported, which makes the ETL process more complicated.

       Therefore, under the big data system, use the surrogate key cautiously. At the same time, for slow-gradual dimensional scenes, you can consider using space for time, and retain a full snapshot of the dimension table every day. But this will bring storage costs, measured according to the actual situation.

发布了519 篇原创文章 · 获赞 1146 · 访问量 283万+

Guess you like

Origin blog.csdn.net/u010003835/article/details/104420843