Five steps to build a data warehouse

First, determine the theme

That data analysis or determine the subject of front-end display. (Automobile industry KPI management analysis system as an example)

For example: We want to analyze the situation in a certain area store sales a certain period, this is a theme.

The relationship between the theme to reflect the particular aspect of each analysis point of view (dimensions) and statistical numerical data (measurements), and comprehensive consideration when determining the topic. Data statistic (measure) present in the intermediate fact table; angle analysis of each dimension; dimension will be by a combination, to examine measurement.

Well, such a theme "stores a certain period of a certain area sales", we are required to pass the time, a combination of regional and stores in three dimensions to examine sales this measure.

Thus, different themes from a different subset of the data warehouse, data mart, we can call it. Data mart information reflects some aspect of the data warehouse, data marts constitute more than a data warehouse.

 

Second, determine the measure

In determining the subject of the future, we will consider technical indicators to be analyzed, such as: sales and the like. They are generally numerical data.

Either we summarize this data, or the data access time, independently of the number of minimum or maximum value and the like, such data is called a measure. To measure statistical indicators necessary to select the appropriate, can be complex key performance indicators (KPI) such as the design and calculation based on different metrics.

 

Third, the fact that the data size is determined

After determining the measure we have to take into account the case of a measure under polymerization summary of the different dimensions and metrics. Taking into account the different measure of the degree of polymerization, we will use the "minimum granularity principle", is about to measure the particle size is set to the minimum.

For example: Suppose the current minimum data records to months, that is recorded in the database of the monthly turnover.

So, if we can confirm that, in the future analytical needs, the volume of transactions only accurate to month can, then we can in the ETL process, on a monthly basis in units of aggregate sales data, at this time, the data warehouse metrics the size is the "month"; conversely, if we can not confirm whether future analysis needs in terms of sales require accurate to month, then we need to follow the "principle of minimum size," Keep daily in the fact table in a data warehouse sales data for future "Heaven" for analysis.

For example: automotive industry KPI analysis system, the demand for change requests in days for the analysis of sales stores.

 

Fourth, determine the dimensions

1, how to understand the dimensions? And the level dimensions (Hierarchy) and level (Level)

a, each dimension is an analytical perspective. For example, we hope, or analyzed in accordance with the store name by time, or by region, this time, region, store is the appropriate dimension. Based on different dimensions, we can see a summary of the situation of each measure, can also be cross-analysis based on all dimensions.

b, level dimension (Hierarchy) and level (Level)

Levels: the time dimension table structure, for example, in - half - quarter - months - half - ten days - weeks - days

Level: store-dimensional table structure, for example, stores group classification, classification, etc. Location Type

 

2. What is slowly changing dimensions?

Dimension table problems over time, we call for the slowly changing dimension.

Customer dimension table structure, for example, belongs to the monthly changes, Ltd., the customer code, customer identification, customer identification source, the customer full name, branch number, branch name, logo history (T / F) and other changes.

 

Fifth, create a fact table

a, understand

After determining good factual data and dimensions, we will consider to load the fact table.

The fact that the contents of the table with a corporate metrics, metrics and fact table is the ultimate users really want to see, is the entrance to the fact table dimension table data, the fact that only becomes meaningful only by the dimensions explain

 

b, how to create?

Practice: The original table (OLTP) were associated with the dimension table, fact table generation

 

Precautions: promising when empty when the associated data (dirty data source), requires external connections, each connection we will put out dimension surrogate keys in the fact table, in addition to the fact table surrogate key of each dimension, and each metric data, from which the original table, and the presence of each metric dimension surrogate keys in the fact table, without applying the present description information, i.e. matching "lanky principle", i.e. the fact table requires as many number of pieces of data (the minimum granularity ), and descriptive information as little as possible.

 

Fact table is the core of the data warehouse requires careful maintenance, after get JOIN fact table, generally larger than the number of records that we need for composite primary keys and indexes set to achieve data integrity and data warehouse based on query performance optimization. Fact table and dimension tables put together in the data warehouse, if you need to connect front-end data warehousing queries, we also need to establish a number of related intermediate summary tables or materialized views, in order to facilitate the inquiry.

Guess you like

Origin www.cnblogs.com/zkteam/p/12175511.html