Description of the zipper table

Original address: https://blog.csdn.net/xiepeifeng/article/details/42431027

In the process of designing the data model of a data warehouse, such requirements are often encountered:


1. The amount of data is relatively large;
2. Some fields in the table will be updated, such as the user's address, product description information, order status, etc.;
3. You need to view historical snapshot information at a certain point in time or time period, For example, check the status of a certain order at a certain point in history, for
   example, check how many times a certain user has updated in a certain period of time in the past, etc.;
4. The proportion and frequency of changes are not very large, for example, there are With 10 million members, there are about 100,000 new and changed every day;
5. If you keep a full copy of this table every day, then a lot of unchanged information will be stored in each full amount, which is great for storage Waste; the


zipper history table can not only meet the historical state of the response data, but also save storage to the greatest extent; for

a simple example, for example, there is an order table with 3 records on June 20:

As of June 21, there are 5 records in the table:

As of June 22, there are 6 records in the table:

The method of retaining the table in the data warehouse:

 

1. Only keep a copy of the full amount, the data is the same as the record on June 22, if you need to check the status of the order 001 on June 21, it cannot be satisfied;

2. If you keep a full copy every day, there are 14 records in the table in the data warehouse, but many records are stored repeatedly, and there is no task change. For example, order 002,004, the amount of data is large, which will cause a lot of storage waste;

 

 

If the table is designed as a historical zipper table in the data warehouse, there will be a table like this:

Description:

 

1. dw_begin_date indicates the start time of the life cycle of the record, and dw_end_date indicates the end time of the life cycle of the record;

2. dw_end_date = '9999-12-31' means that the record is currently in a valid state;

3. If you query all current valid records, select * from order_his where dw_end_date = '9999-12-31'

 

4. If you query the historical snapshot of 2012-06-21, select * from order_his where dw_begin_date <= '2012-06-21' and end_date >= '2012-06-21', this statement will query the following records:

It is exactly the same as the record of the source table on June 21:

It can be seen that such a historical zipper table can not only meet the demand for historical data, but also save storage resources to a large extent;

Guess you like

Origin blog.csdn.net/qq_32323239/article/details/100568680