Three layers of SCD

5.5.2 SCD1 (Slow Gradient Type 1)
directly overwrites the existing value by updating the dimension record. The history of records is not maintained. Generally used to modify the wrong data, that is, historical data is the wrong data, there is no other use.

In the data warehouse, we can keep the business data and the data in the data warehouse consistent. You can use the Business Key-CustomerID from the business database in the Customer dimension to track changes in business data. Once changes occur, the old business data will be overwritten and rewritten.

The records in the DW obtain the latest City information according to the CustomerID in the business database, and are directly updated in the DW.
Insert picture description here

5.5.3 SCD2 (Slow Gradual
Change Type 2) creates a new **"version" record** for the dimension record when the source data changes, so as to maintain the dimension history. SCD2 does not delete or modify existing data. SCD2 is also called zipper watch.
There are many demand scenarios in the data warehouse that summarize and analyze historical data. Therefore, historical data from the business system will be maintained as much as possible, so that the system can truly capture this historical data change.
Taking the above example, the result of the analysis may be that BIWORK's purchase quota was generally stable in 2012, but the purchase quota has decreased since 2013. The reason may be related to the city where it is located. There may be more stores in Beijing than in Sanya.
In this case, it is not easy to update the current city of BIWORK in the data warehouse directly, otherwise all the purchase amount of this user will be attributed to Sanya.
It is identified by the start time, and the Valid To (chain closure time) is NULL to identify the current data. You can also use 2999, 3000, 9999 and other relatively large years. The internal data warehouse needs to be unified. Each version will generate a new row of data .
Insert picture description here

5.5.4 SCD3 (Slow Gradient Type 3)
Actually SCD1 and 2 can meet most needs, but there are still other solutions, such as SCD3. SCD3 hopes to maintain fewer historical records.
For example, add a column to the history field to be maintained , and then only update the Current Column and Previous Column each time. In this way, only the last two historical records are saved, and the historical data are all in the same row of data. But if there are more fields to maintain, it is more troublesome, because more Current and Previous fields are required. So SCD3 is not as common as SCD1 and SCD2. It is only applicable when the data storage space is insufficient and the user accepts limited historical data.
Insert picture description here

Guess you like

Origin blog.csdn.net/xianyu120/article/details/112168703