Data warehouse data model: extreme storage -- historical zipper table

In the process of data model design of data warehouse, such requirements are often encountered:


1. The amount of data is relatively large;
2. Some fields in the table will be updated, such as the user's address, product description information, order status, etc.;
3. You need to view historical snapshot information at a certain time point or time period, For example, check the status of an order at a certain point in history, for
   example, check how many times a user has updated in a certain period of time in the past, etc.;
4. The proportion and frequency of changes are not very large, for example, a total of Of the 10 million members, about 100,000 are added and changed every day;
5. If you keep a full amount of this table every day, then a lot of unchanged information will be saved in each full amount, which is great for storage. Waste;


the zipper history table can not only satisfy the historical status of the response data, but also save the storage to the greatest extent;

for example, there is an order table, and there are 3 records on June 20:

As of June 21, there are 5 records in the table:

As of June 22, there are 6 records in the table:

The retention method for this table in the data warehouse:

 

1. Only one full copy is kept, the data is the same as the record on June 22. If you need to check the status of order 001 on June 21, it cannot be satisfied;

2. If a full amount is kept every day, there are 14 records in the table in the data warehouse, but many records are stored repeatedly without task changes, such as order 002,004, the large amount of data will cause a lot of storage waste;

 

 

If the data warehouse is designed as a historical zipper table to save the table, there will be the following table:

illustrate:

 

1. dw_begin_date indicates the start time of the life cycle of the record, and dw_end_date indicates the end time of the life cycle of the record;

2. dw_end_date = '9999-12-31' means the record is currently valid;

3. If querying all currently valid records, select * from order_his where dw_end_date = '9999-12-31'

 

4. 如果查询2012-06-21的历史快照,则select * from order_his where dw_begin_date <= '2012-06-21' and end_date >= '2012-06-21',这条语句会查询到以下记录:

和源表在6月21日的记录完全一致:

可以看出,这样的历史拉链表,既能满足对历史数据的需求,又能很大程度的节省存储资源;

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326415561&siteId=291194637