Dimensional modeling and data warehouse design: theory and practical cases

definition

Dimensional modeling is a technique used in data warehouse design that aims to make database structures more intuitive and easy to understand and use, especially for non-technical users who perform data querying and reporting. It mainly includes two core concepts: fact table and dimension table.

  • Fact table: This is the core of the data warehouse and is used to store measures or quantitative data of a business process. For example, a fact table for a retail business might contain information such as the total amount, quantity, and time of each sale.
  • Dimension tables: These tables contain descriptive information used to "dimensionalize" the quantitative data in fact tables. They help users understand the data in fact tables. Continuing the above example, dimension tables may include customer information, product information, store information, etc. Each dimension table is related to the fact table through foreign keys.

Case: Retail Sales Data Warehouse

Let's say we want to build a data warehouse for a retailer. In this example we might have:

Fact Table: Sales Facts

  1. Sales ID
  2. Product ID (foreign key, connected to product dimension table)
  3. Customer ID (foreign key, connected to the customer dimension table)
  4. Store ID (foreign key, connected to store dimension table)
  5. Sales Date (foreign key, connected to time dimension table)
  6. Sales Amount
  7. Sales volume

Dimension table: product dimensions

  1. Product ID
  2. product name
  3. Product Category
  4. product price

Dimension table: customer dimension

  1. Customer ID
  2. Customer name
  3. Customer address
  4. Customer Category

Dimension table: store dimensions

  1. Store ID
  2. store name
  3. store location
  4. store type

Dimension table: time dimension

  1. date
  2. week
  3. moon
  4. quarter
  5. Year

In this example, the fact table provides the measurable business process (for example, the amount and quantity of each sale), while the dimension table provides the context needed to understand these measures (for example, in which store the sale occurred, by which customer, which product is involved). In this way, dimensional modeling helps users understand and analyze complex business data in an intuitive way.

practice

Create dimension table

Product dimension table

CREATE TABLE test.DimProduct (
    ProductID INT PRIMARY KEY,
    ProductName VARCHAR(255),
    Category VARCHAR(255),
    Price DECIMAL(10, 2)
);

customer dimension table

CREATE TABLE test.DimCustomer (
    CustomerID INT PRIMARY KEY,
    CustomerName VARCHAR(255),
    Address VARCHAR(255),
    CustomerType VARCHAR(255)
);

store dimension table

CREATE TABLE test.DimStore (
    StoreID INT PRIMARY KEY,
    StoreName VARCHAR(255),
    Location VARCHAR(255),
    StoreType VARCHAR(255)
);

time dimension table

CREATE TABLE test.DimTime (
    DateKey DATE PRIMARY KEY,
    Day INT,
    Month INT,
    Quarter INT,
    Year INT
);

Create fact table

sales fact table

CREATE TABLE test.FactSales (
    SaleID INT PRIMARY KEY,
    ProductID INT,
    CustomerID INT,
    StoreID INT,
    DateKey DATE,
    Amount DECIMAL(10, 2),
    Quantity INT,
    FOREIGN KEY (ProductID) REFERENCES DimProduct(ProductID),
    FOREIGN KEY (CustomerID) REFERENCES DimCustomer(CustomerID),
    FOREIGN KEY (StoreID) REFERENCES DimStore(StoreID),
    FOREIGN KEY (DateKey) REFERENCES DimTime(DateKey)
);

Insert dimension table data

Insert data into product dimension table

INSERT INTO test.DimProduct (ProductID, ProductName, Category, Price) VALUES
(1, '苹果手机', '电子产品', 5000.00),
(2, '三星电视', '电子产品', 3000.00),
(3, '联想笔记本', '电子产品', 4500.00);

select * from test.DimProduct;

As follows:
Insert image description here
Insert data into the customer dimension table

INSERT INTO test.DimCustomer (CustomerID, CustomerName, Address, CustomerType) VALUES
(1, '张三', '北京市', '个人'),
(2, '李四', '上海市', '企业'),
(3, '王五', '广州市', '个人');


select * from test.DimCustomer;

As follows:
Insert image description here
Insert data into the store dimension table

INSERT INTO test.DimStore (StoreID, StoreName, Location, StoreType) VALUES
(1, '京东商城', '在线', '电商'),
(2, '苏宁易购', '在线', '电商'),
(3, '国美电器', '线下', '实体店');


select * from test.DimStore;

As follows:
Insert image description here
Insert data into the time dimension table

INSERT INTO test.DimTime (DateKey, Day, Month, Quarter, Year) VALUES
('2023-01-01', 1, 1, 1, 2023),
('2023-02-01', 1, 2, 1, 2023),
('2023-03-01', 1, 3, 1, 2023);



select * from test.DimTime;

as follows:
Insert image description here

Insert fact table data

Insert data into sales fact table

INSERT INTO test.FactSales (SaleID, ProductID, CustomerID, StoreID, DateKey, Amount, Quantity) VALUES
(1, 1, 1, 1, '2023-01-01', 5000.00, 1),
(2, 2, 2, 2, '2023-02-01', 6000.00, 2),
(3, 3, 3, 3, '2023-03-01', 4500.00, 1);


select * from test.FactSales ;

as follows:
Insert image description here

Add and modify check

Add new products to product dimension table

INSERT INTO test.DimProduct (ProductID, ProductName, Category, Price) VALUES
(4, '惠普打印机', '电子产品', 800.00);

Delete a customer from the customer dimension table
Assuming that the customer with customer ID 3 is no longer our customer, we can delete this record from the customer dimension table.

Update the store information in the store dimension table
If the store with store ID 3 changes its name, we can update this information.

UPDATE test.DimStore SET StoreName = '新国美电器' WHERE StoreID = 3;

Query for total sales in the first quarter of 2023
This query will involve joining the fact table and the time dimension table to calculate the total sales within a specific time period.

SELECT
    SUM(test.FactSales.Amount) AS TotalSales
FROM
    test.FactSales
JOIN
    test.DimTime ON test.FactSales.DateKey = test.DimTime.DateKey
WHERE
    test.DimTime.Year = 2023 AND test.DimTime.Quarter = 1;

as follows:
Insert image description here

  • FROM FactSales: This is the main table for the query, we start from this fact table.
  • JOIN DimTime ON FactSales.DateKey = DimTime.DateKey: Here we connect the FactSales table and DimTime table through JOIN operation. The condition of the join is FactSales.DateKey
  • DimTime.DateKey, means we only care about rows that have matching dates in both tables.
  • WHERE DimTime.Year = 2023 AND DimTime.Quarter = 1: This condition further filters the results to only include data for the first quarter of 2023.
  • SELECT SUM(FactSales.Amount) AS TotalSales: Finally, we summarize the joined data and calculate the total sales.

Query the purchase history of a specific customer
This query displays the products purchased by the customer with customer ID 1 at different times.

SELECT
    DimCustomer.CustomerName,
    DimProduct.ProductName,
    FactSales.DateKey,
    FactSales.Amount
FROM
    test.FactSales
JOIN
    test.DimCustomer ON FactSales.CustomerID = DimCustomer.CustomerID
JOIN
    test.DimProduct ON FactSales.ProductID = DimProduct.ProductID
WHERE
    DimCustomer.CustomerID = 1;

As follows:
Insert image description here
In this query, we first join FactSales and DimCustomer, and then join the results with DimProduct. The final result set contains the customer name, product name, purchase date, and purchase amount, which information comes from three different tables.

Guess you like

Origin blog.csdn.net/weixin_46211269/article/details/134779815
Recommended