What is a zipper watch?

1. What is a zipper watch

The zipper table is defined for the way the table stores data in the data warehouse design. As the name suggests, the so-called zipper is to record history. Record information about all changes of a thing from the beginning to the current state.

Second, where is it used

During the data model design process of the data warehouse, such requirements are often encountered:

The amount of data is relatively large;
Some fields in the table will be updated, such as the user's address, product description information, order status, etc.;
It is necessary to view historical snapshot information at a certain point in time or time period, for example, to view the status of an order at a certain point in history, for example, to view how many times a certain
user has updated in a certain period of time in the past, etc.;
The proportion and frequency of changes are not very large. For example, there are a total of 10 million members, and about 100,000 new members and changes are made every day;
If a full amount is kept for this table every day, a lot of unchanged information will be saved in each full amount, which is a great waste of storage;

The zipper history table can not only meet the historical state of the reaction data, but also save storage to the greatest extent;

Three, for example

To give a simple example, for example, there is an order table
with 3 records on June 20th,

By June 21st, there are 5 records in the table,

As of June 22, there are 6 records in the table:

The method of retention of this table in the data warehouse,

If only one full quantity is kept, the data is the same as the record on June 22. If you need to check the status of order 001 on June 21, it cannot be satisfied;
If a full copy is kept every day, there are 14 records in the table in the data warehouse, but many records are saved repeatedly without task changes, such as order 002,004, the large amount of data will cause a lot of storage waste;

If the table is designed as a historical zipper table in the data warehouse, there will be a table like the following:

illustrate:

dw_begin_date indicates the start time of the life cycle of the record, and dw_end_date indicates the end time of the life cycle of the record;
dw_end_date = '9999-12-31' indicates that the record is currently valid;
If querying all currently valid records, select * from order_his where dw_end_date = '9999-12-31'
If you query the historical snapshot of 2012-06-21, then select * from order_his where dw_begin_date <= '2012-06-21' and end_date >= '2012-06-21', this statement will query the following records:

It is exactly the same as the record in the source table on June 21.

It can be seen that such a historical zipper table can not only meet the demand for historical data, but also save storage resources to a great extent;

Replenish

Let me add how to obtain the daily user update table. According to the author's experience, there are the following ways to obtain or indirectly obtain the daily user increment. Because it is more important, I will explain it in detail:

We can monitor the changes of Mysql data, such as using Canal, and finally merge the daily changes to obtain the last status.
Assuming that we get a slice of data every day, we can use the difference between the slice data of two days as a daily update table.
Daily change flow table.
Use the etl tool to incrementally extract the operational database to ods or data warehouse according to the time field (extract the data of the previous day every day) to form daily incremental data (the most used situation in practice).