kimball Dimensional Modeling Technology Overview - "Data Warehouse Toolkit"

1. Basic Concepts

1.1 Data gathering business requirements and achieve

Before starting dimensional modeling work, the project team needs immediate business needs, as well as the actual situation as the basis of source data to identify the needs of the business through the exchange of representatives for their understanding based on key performance indicators, competitive business issues, decision-making process, self-sustaining objective analysis needs. At the same time, the actual situation can exchange data with the source system by experts, to build a high-level data analysis to reveal the feasibility of accessing data.

1.2 Collaboration dimensional modeling research

Dimensional modeling should have subject matter experts and representatives of enterprise data management cooperation designed. The responsibility of the data modeler, but the model should conduct the business on behalf of a series of high-level interactive discussion to obtain. The discussion group also provides a wealth of opportunities for business needs. Dimensional modeling should not be designed by those who do not understand business needs, collaboration is the key to success

1.3 4-step design process dimension

The main design of four major decisions during the design dimensions of the model:
1. Select the Business Process
2. Statement size
3. Confirm dimension
4. Confirm the facts

1.4 Business Process

Business processes are organized and completed the operational activities, for example, order activity, insurance claims processing, student course registration bill each month or snapshots. Business process events resume or obtain performance metrics, and converted into facts in the fact table. Most concerned at the fact that some of the results of the business process. The selection process is very important, because the process defines the specific design goals and the definition of size, dimensions, facts. Each business process corresponds to the enterprise data warehouse bus matrix row

1.5 size

Declaration is an important step size dimension of design. What size is used to determine a fact table rows represent. Size declarative design must fulfill the contract. Size must be declared before choosing a dimension or fact, because each candidate dimension or fact must be consistent with the defined size. In all dimensions the design of enforced consistency is the key BI application performance and ease of use of the guarantee. When acquiring data from a given business process, atomic particle size lowest level of granularity. We strongly recommend that concern the atomic level granularity data from the beginning of design, because atomic granularity of data that can withstand the unpredictable user queries. Summary on the volume size is very important for performance tuning, but this size often have to guess the business public issues. For different fact table size, to create a different physical form, do not mix different variety of the same size in the fact table.

1.6 Description of the environment dimension

Dimension provides around the "who, what, where, when, why, how" and the background of a business process events involved dimension table that contains filter real-time classification of descriptive attributes for BI applications need. Firmly grasp the size of the fact table, it is possible to separate all possible dimensions area. When the fact table associated with a given row, in any case that the dimension should maintain a single value.

Advantages dimension tables are called data warehouse "soul", because the dimension table containing ensure DW / BI system can be used as entrance identification and description of business analysis, the main work on the development dimension tables and data management aspects, because they are the drivers of user BI experience.

1.7 for the fact that the measure

Business process design metrics from the fact that events are basically expressed in the number of values. One relationship exists between a line and the fact table in accordance with the fact that the particle size measurement described event, an event corresponding to a fact table thus physically observable, in the fact table, all consistent with the fact that only stated particle size. For example, in a retail transaction, the number of total sales of its products is a good fact, but wages are not allowed in the store manager in retail transactions

1.8 to easily expand the dimensional model

Dimensional model changes with flexible adaptability to data relationships, when the change occurs listed below, do not need to change the existing BI queries or applications, you can easily adapt, and the results will not change.

  • When the fact is consistent with the presence of the fact table size, you can create a new column
  • By establishing a new foreign key column, you can associate a dimension to the fact table already exists, provided that the dimensions of the column and the fact table size consistent
  • You can add attributes by creating a new column in the dimension table
  • The fact that the particle size of the table can be more atomization, is to add attributes in the dimension table, and then reset to a finer granularity fact table and the fact table to save the new small dimension table column name

2. The fact that the technology base table

2.1 fact sheet structure

Happen in the real world of operational events, it produces measurable value exists in the fact table, from the lowest level of granularity of view, the fact table rows correspond to a measure of time. And vice versa, therefore, the fact table design totally dependent on physical activity, the impact is not likely to produce the final report of. In addition to a digital measurement, always a fact table contains the foreign key is used to associate the corresponding dimension, optionally also the dimensions and construction of degradation date / time stamp. The main objective of the query request is based on the fact table to carry out calculations and aggregate operations.

2.2 can be added, half can be added, plus the fact that not

Digital measure fact table can be divided into three categories: the most flexible and useful facts are fully additive, additive metrics can be aggregated in any dimension associated with the fact table. Half-measure can be added to certain dimensions of the summary, but not a summary of all dimensions. The difference is a common semi plus the fact that, in addition to the time dimension, they can add operation across all dimensions. In addition, some measures are not completely added, e.g., ratio, of the fact that the non-additive, is a good way, as a non-additive storage fully additive component metrics, and may be applied to calculate the final non- before the fact, these components are summarized to the final result set. BI usually occurs in the final calculation of the OLAP cube database layer or layers.

Null fact table 2.3

The fact table can measure the presence of null values, but there is not a null value in the foreign key in the fact table, otherwise it will lead to referential integrity violations occur. Associated dimension table must default row (surrogate key) but not null foreign key value represents a condition of unknown or can not be applied

2.4 Consistency facts

If some measure appear in different fact table, need to pay attention, or the fact that if you require a different fact table calculations, should ensure that the technical definition for the facts are the same, is consistent with the fact that if a different table definition, these consistency facts should have the same name, if they are not compatible, you should have a different name for warn business users and BI application

2.5 Transaction fact table

Transaction fact table row corresponding measure event of a point in space or time. Atomic transaction is the dimension of the particle size and the fact table expressible fact table, such robust dimensions ensure maximum fragmentation and transaction data block. Transaction fact table can be dense, it can be sparse, only when there is a measure of doubt will build the line, these fact table contains foreign key table is always associated with a dimension may also contain precise timestamps and degenerate dimensions subtract, digital measure must be consistent with the fact that the transaction size.

2.6 periodic snapshot fact tables

Periodic snapshot fact table summarizes each row occurs in a standard period, such as one day, one week, more than a month of events measure particle size is cyclical, rather than individual transactions. Periodic snapshot fact table usually contains many facts, because any agreement with the granularity of the fact table is a measure of the event are allowed to exist. The density of these outer keys fact table is uniform, because even if no activity occurs within a period will insert row contains a null value or 0 for each fact in the fact table.

2.7 cumulative snapshot fact tables

Line cumulative snapshot fact table summarizes occur between the beginning and end of the measurement time within a predictable step. Pipeline or workflow processes (eg, order fulfillment or claims process) has a defined starting point, the standard middle of the process, the end point defined, they then such a fact table can be modeled. Usually in the fact table for the key steps in the process include the date of the foreign key. Cumulative snapshot fact table row corresponding to a particular order, the order is generated when a row is inserted. When the pipeline process occurs, the accumulated row fact table is accessed and modified, this fact has a characteristic in the three types of tables modified cumulative snapshot consistency fact table rows, with each outer key step of associating a key process in addition to the date, cumulative snapshot fact table contains foreign key dimensions and other optional degenerate dimensions. Typically contains digitized consistent with the size, meet Milestones hysteresis metric count.

2.8 unsubstantiated facts table

For example, in the event the students occurred in one day given to participate in the course, digital facts may not be recorded, but the event is to implement a foreign key with a containing well-defined calendar days, students, teachers, location, curriculum, etc. Similarly, customer communication is also an event, but without the associated metrics. The use of unsubstantiated facts table can also analyze what happened, these queries are always consists of two parts: contains the known facts of events may not be covered table, that contains the active table event actually occurred. When subtracted from the activities covered table, the result is an event that has not occurred.

3. dimension table technology base

3.1 dimension table structure

Each dimension table contains a single primary key column. Primary key dimension table as a foreign key may be associated with any fact table, of course, described in the context dimension table rows should correspond exactly to the fact table rows.

3.2 dimension surrogate key

Dimension table contains a column represents a unique primary key. The primary key is not a natural key operational systems, due to the need to track changes, and therefore the use of a natural bond, represented by lines would require multiple dimensions. Further, the natural dimension key may be established by a plurality of source systems, these bonds to the natural compatibility problems, difficult to manage.

3.3 key natural, long-lasting bond and supernatural bond

Established by the operating systems affected by natural key business rules, can not be controlled DW / BI system. For example, if the employee resigns, then work again, the number of employees (natural key) may vary. We want to create a single data repository for key employees, which requires the establishment of a new long-lasting bond to ensure that in such cases, the employee numbers remain persistent will not change. The key is sometimes called persistent supernatural bond. The best lasting bond which format should be independent of the original business process

3.4 Drill

3.5 degenerate dimension

Sometimes, in addition to the primary key dimension nothing else, for example, but a certain invoice contains multiple data items, the data items do implement all inherited invoice describing dimension foreign keys, foreign keys in addition to the invoice no other items. But the number of invoices are still in legal dimensions of this data item level key. Degenerate dimension is common in transactions and cumulative snapshot fact tables

3.6 denormalization flat dimension

More than 3.7-level dimension

For example, the calendar date dimension can be divided from days to weeks in accordance with the financial cycle level, then there may be years from day to month level.

3.8 the dimension table null properties

When a given row dimension is not completely filled, or when the presence attribute is not applied to all rows dimensions, will produce a null dimension attributes. In both cases, it recommended descriptive strings for nulls . For example, use of alternative Unknown Not Applicable or null values. Should be avoided in a null dimension attributes, since different database systems and packet processing constraints, inconsistent processing method for null values

3.9 calendar date dimension

3.10 play the role of dimensionality

Physical dimensions of a single fact table can be multiple references, each reference character dimension differences on the connection logic. For example, the fact table may have a plurality of dates, each date represented by the date dimension different foreign key, in principle, each foreign key represents a different view of the date dimension, this reference has a different meaning. These dimensions view (the only attribute column name) is not called by the role.

3.11 Miscellaneous dimension

Transactional business processes typically produce a series of mixed, low granularity and identifying indicators, their different identifier or attribute defined for each dimension, as the establishment of a separate different dimensions to merge together Miscellaneous dimension. These dimensions, generally indicated in a transactional mode outline dimension, not necessary that all possible values ​​of attributes Cartesian product, but it should only contain the actual values ​​in the combined data source

3.12 snowflake dimensions

3.13 bracket dimension

Dimensions may contain references to other dimensions. For example, a bank account can be referenced dimension represents a dimension of the date of account opening. These secondary dimension referenced is called a stent dimensions. Stent dimensions may be used, but should be used sparingly, in most cases, the association between the dimensions should be realized by a fact table. In fact table Different keys associated with two dimensions by

4. Consistency integrated dimension

4.1 Consistency dimension

When the properties of different dimension tables have the same column names and field content, dimension tables called consistent.

4.2 reduced dimensions

Dimension is a dimension reduction consistency, the subset consists of the column (or) dimension substantially rows. When building gather facts on the table need to reduce the volume dimensions. When business processes naturally get a higher level of granularity of data, but also need to reduce the dimensions, for example, a monthly forecasting and brand (no data associated with the sale of more atomic-level data and products). Further one case, i.e. when two dimensions with the same level of granularity of detail data, but represent only a partial subset of rows, also requires consistency dimension subsets

4.3 Cross-drilled table

Cross-drilled table means that when the first line of each query contains the same consistency of attributes that make different queries can be queried for two or more facts table

4.4 Value Chain

Value chain is used to distinguish the natural flow of the main business organization. For example, vendors may include the purchase value chain, inventory, sales and so on.

4.5 enterprise data warehouse bus architecture

Enterprise data warehouse bus architecture provides incremental method of establishing the date of DW / BI system. This architecture by focusing on business process DW / BI planning process into manageable modules. Published achieve integration through standardized consistency across different dimensions reuse process.

4.6 enterprise data warehouse bus matrix

Enterprise data warehouse bus matrix applicable to the design and basic tools for enterprise data warehouse bus architecture to interact with. Rows of the matrix represent business processes column indicates the dimension of the matrix dot represents a dimension of a given business process and whether there is any relationship.

4.7 bus matrix implementation details

Bus matrix allow for more granular detail is a bus matrix, in which each business process to expand the row to show a particular fact or OLAP multidimensional database. On the details of this size can be documented accurately describe the particle size and the fact that the list

4.8 Opportunity / stakeholder matrix

After determining the enterprise data warehouse bus matrix rows, columns can plan different matrices by replacing the dimension contains business functions (eg, marketing, sales, finance, etc.). By determining the matrix point to indicate which service-related functions which business lines and processes.

The process slowly changing dimension (SCD) properties

5.1 Type 0: retained as

5.2 Type 1: Rewrite

Dimension line of the original property is overwritten with the new value, retaining only the latest situation. This method is easy to implement and does not require extra dimension to establish the line, but affected aggregate fact tables and OLAP multidimensional database will double counting

5.3 Type 2: add a new row

The need to increase the minimum three additional columns in the dimension line:

  1. Line a valid date / time stamp column
  2. Line cut-off date / time stamp column
  3. Identifies the current row

5.4 Type 3: add a new property

Less common

5.5 Type 4: Add Miniature Dimensions

Type 4, when the dimensions of a set of properties and divided rapid changes when using miniature dimensions. Dimension in this case is often called the devil quickly took classes dimension. Property is often used in the dimension table contains millions of rows of miniature dimensions candidate is designed, in time they do not change often. Changes Type 4 miniature dimension requires its own unique primary key, Kivi and micro dimensions of the primary key obtained from the relevant facts table.

5.6 Type 5: Type 1 and increase stent dimensions Miniature

5.7 Type 6: increase the Type 1 to Type 2 attribute dimensions

7 5.8 Type: type 1 and type 2 dual dimensions

6. Treatment dimension hierarchy

6.1 fixed level of depth position

Fixed depth level is a many relationships, for example, from products to brand, to classify, to part.

6.2 spotty light / variable depth level

A slight stagger in no hurry levels of no fixed depth levels, but the limited level of depth. Depth geographical levels typically comprises 3-6 layers.

6.3 uneven bridging table having a hierarchical / Variable Depth.

In a relational database, variable depth level depth uncertainty is very difficult to model. This problem can be through the use of relational database modeling uneven way to build a bridge table level to resolve. Such a bridge table reservations row for each possible path, ensure that traverse all levels of forms.

6.4 having a variable depth levels character attribute path

Path attributes may be employed in the character dimension to avoid using the bridge table showing a variable depth levels. Dimension of each line, character attributes include a specific path embedded text characters, comprising the complete path from the node as described, the highest level node to the row describes a specific dimension.

7. Advanced technical fact sheet

7.1 Table surrogate key facts

Surrogate key as a master key for all dimension tables. In addition, you can use a separate proxy key facts, although less need. Fact table not associated with any agent associated with key dimensions, is sequentially assigned in the ETL loading process, it can be used to
1. As a unique primary key column fact table
2. In the ETL, an identifier used as a direct fact table rows, without querying multiple dimensions
3. allow the fact table update operations broken down into smaller risk of insert and delete operations

7.2 centipede fact table

Some designers to build many-levels of each different standardized dimensions, for example, the date dimension, the dimension month, quarterly and annual dimension and other dimensions, and all foreign keys contained in a fact table.

7.3 Properties of the facts or numeric values

Designers sometimes encounter some numeric value, it is difficult to determine the classification of these numerical values ​​into the case of a dimension table or fact table. A typical example is the standard price of the product. If the digital value is mainly used for calculation purposes, it may belong to a fact table. If the value is primarily used to determine the number or packet filtering, it should be defined as the dimension attributes, supplemented with discrete digital values ​​of the range of attribute values. In some cases, both the digital value for the dimension and modeled property is modeled as intentional, for example, qualitative and quantitative delivery time metric text descriptors.

7.4 log / duration facts

7.5 / row fact table

7.6 allocation of fact

7.7 using the established allocation of profits and losses facts table

The fact that more than 7.8 kinds of currency

The fact that more than 7.9 kinds of measurement units

Some also said that the fact that a business process needs in a variety of units of measure. For example, according to the service user's perspective, the supply chain may need to be platform, shipping, retail and a single scanning unit to build reports the same facts. If the fact table contains a large number of facts, every fact must be expressed in all units of measure, this time is the fact that a better way to a recognized standard unit of measurement recording, simultaneous recording standard metric conversion factors and other metrics . Such a fact table may be deployed in a different user's perspective, the use of an appropriate conversion factor chosen. Conversion factor must be stored in a fact table rows to ensure correct calculation is simple, and minimize the complexity of the query.

7.10 In - Day Facts

7.11 SQL in order to avoid changing the connection between the fact table

7.12 for a fact table to track time

There are three basic fact table size: transaction level snapshot period and cumulative snapshot. In individual cases, an increase in the effective period of the fact table rows, row deadlines and represent the current line is very useful, and using type 2 slowly changing dimensions, in the matter of implementation of similar acquisition time when effective way.

7.13 late Facts

8. High-level technical dimensions

8.1 dimension table join

More than 8.2 value dimension and the bridge table

Classic mode dimensions, each dimension is associated with a fact table has a uniform particle size and the fact that a single value table. However, in some cases, there is a reasonable multi-valued dimensions. For example, a patient receives a physical examination, diagnosis may have multiple simultaneously. In this case, the multi-dimensional value must be associated with each diagnosis and a fact table row by a bridging group table by a key set of dimensions.

8.3 multi-value changes with time bridging tables

Time series behavior 8.4 tags

Almost all of the data warehouse dimension text is descriptive text in the table. Data mining generally produces customer clustering behavior of text labels, can generally be used to distinguish periods, in this case, across the time customer behavior metrics called one sequence acts composed of these labels, the sequence of events should be the position attributes are stored in the customer dimension with optional text string, a complete sequence tags. Behavior label design based on location, because the behavior label is the target complex concurrent queries is not digital computing.

8.5 Behavior grouping

8.6 gather facts as a dimension attribute

Business users are usually interested in the customer dimension based on aggregated performance metrics. For example, last year or the entire filtration stage all to spend more than a certain amount of customers. Select gather facts in the destination dimensions as constraints and as a line identification report. Metric is usually expressed as a range of scale of the strip. The expression aggregated performance metric dimension properties will increase the burden on the ETL process, but can easily BI application layer analysis

8.7 Dynamic Range

8.8 text annotation dimensions

8.9 Multiple Time Zones

8.10 metric type dimension

8.11 Step Dimensions

8.12 heat exchanger dimensions

8.13 abstract universal dimension

8.14 Audit dimensions

Dimensions 8.15 generated last

Guess you like

Origin www.cnblogs.com/bystander/p/kimball-wei-du-jian-mo-ji-shu-gai-shu-shu-ju-cang-.html
Recommended