SCD slowly changing dimensional zipper table

                           SCD Slowly Changing Dimensional Zipper Table SQL Implementation

1 Overview of Slowly Changing Dimensions

SCD English Slow Changing Dimensions (SCD Slow Changing Dimensions), it is a very important concept in the data warehouse modeling process. As we all know, the data warehouse is based on historical data, and the change of historical data depends on the definition of dimension. Slowly changing dimension is a method used to track and express the change of dimension table.

Note: 1 The current code demonstration environment is SQL Server, based on Merge syntax, similar to other databases.

      2 SCD Slowly Transform Dimension Kettle Realization

 

Commonly, there are 3 which are respectively denoted as Type1, Type2, and Type3. Here, if we have a user dimension table customer customer (cust_id user number, name name, age age), if a certain user’s age is updated to 40 at a certain time, we look at the situation of the three dimension processing methods.

  • Type1 (will not record the historical changes of the key field values ​​in the dimension):

cust_id name age

1 sheet of three 30

The new customer table is:

cust_id name age

1 sheet of three 40

  • Type2:

For the case of 1, it is assumed that the update time is 2020-10-21

cust_id name age start_date end_date is_current

1 three 30 2020-10-10 2020-10-21 0

1 Zhang San 40 2020-10-21 9999-10-21 1

  • Type3

Directly record the current latest value and the value before the last change

cust_id name age pre_age

1 sheet three 30 40

In summary, it is not difficult to find that type1 and type2 cannot record the change of dimension well, type1 does not, and type3 can only record the latest change.

 

2 Code and comments

2.1 Table structure and data

-- step1 准备表和数据,当前运行在SQL Server里。

-- 业务系统(OLTP)的客户表
CREATE TABLE Customer(
       ID int IDENTITY(1,1) NOT NULL,
       FullName nvarchar(50) NULL,
       City nvarchar(50) NULL,
       Occupation nvarchar(50) NULL)

      
-- 数据仓库的(OLAP)的客户维度表
CREATE TABLE DimCustomer(
              CustomerID int IDENTITY(1,1) NOT NULL,
              CustomerAlternateKey int NULL,
              FullName nvarchar(50) NULL,
              City nvarchar(50) NULL,
              Occupation nvarchar(50) NULL,
              StartDate datetime NULL,
              EndDate datetime NULL,
              IsCurrent bit NULL,
       PRIMARY KEY CLUSTERED
       (
              CustomerID ASC
       )
)

GO

ALTER TABLE DimCustomer ADD  DEFAULT ((1)) FOR IsCurrent
INSERT INTO Customer(FullName,City,Occupation)
SELECT 'BIWORK','Beijing','CEO' UNION ALL
SELECT 'ZhangSan','Shanghai','Education' UNION ALL
SELECT 'Lisi','Guangzhou','IT' UNION ALL
SELECT 'Wangwu','Beijing','Finance'

2.2 Slowly transform the dimensional code

-- step2 SCD 模块
-- 1 修改状态
MERGE INTO dbo.DimCustomer AS Dim
USING dbo.Customer AS Src
    ON Dim.CustomerAlternateKey = Src.ID
WHEN NOT MATCHED BY TARGET
    THEN INSERT VALUES(Src.ID,Src.FullName,Src.City,Src.Occupation,GETDATE(),NULL,1)
WHEN MATCHED AND (Dim.City <> Src.City OR Dim.Occupation <>  Src.Occupation) AND Dim.IsCurrent=1

    THEN UPDATE SET Dim.EndDate =CASE WHEN Dim.EndDate IS NULL THEN GETDATE() ELSE Dim.EndDate END,Dim.IsCurrent = 0;

-- 2 修改数据
MERGE INTO dbo.DimCustomer AS Dim
USING dbo.Customer AS Src
    ON Dim.CustomerAlternateKey = Src.ID
       AND Dim.City = Src.City AND Dim.Occupation = Src.Occupation
       WHEN NOT MATCHED BY TARGET
       THEN INSERT VALUES(Src.ID,Src.FullName,Src.City,Src.Occupation,getDATE(),NULL,1);

 

2.3  Modify data verification

-- Step3 验证
-- 新插入一条
INSERT INTO Customer(FullName,City,Occupation) VALUES
('qinliu','Beijing','Finance')


-- Case1: 执行如下更新后执行SCD模块,这里的ID依赖于自增序列生成的序号

UPDATE Customer
SET Occupation = 'IT'
WHERE ID = 6


-- Case2: 执行如下更新后执行SCD模块,这里的ID依赖于自增序列生成的序号
UPDATE Customer
SET Occupation = 'Publisher',
    City = 'Hangzhou'
WHERE ID = 6

-- 每次修改后对照查看DimCustomer表的变化,查看是否追踪到数据的历史变更信息。

 

Guess you like

Origin blog.csdn.net/shenliang1985/article/details/114079616