[Bin number] data warehouse design

Data warehouse design

I. Overview

 Feature data warehouse is subject-oriented, integrated, time-varying stability and to support management decisions. Significance of the existence of a data warehouse is that all enterprise data were aggregated to provide uniform for the various departments of enterprises, standardize the data export. Data warehouses often require slicing during the build process. Different business, layered technology processing means are also different. The number of hierarchical positions of the main reasons:

  • Clear data structure
     for each hierarchical data has its scope, which can be more easily understood and is positioned in the use of the table.
  • Data lineage to track
     since the final presentation to the business is a business table can be used directly, but there are many sources of data tables, if there is a problem in the source table, and we hope to be able to quickly and accurately locate the problem, and clearly his harm range.
  • Reduce duplication of development
     specifications hierarchical data, the development of some common data intermediate layer, double counting can be reduced greatly.
  • Simplify complex issues
     a complex task into a plurality of steps to complete, each layer only a single step process, simple and easy to understand. And easy to maintain data accuracy, when the data problem, you can not fix all the data, just steps from the problematic start repair.

    II. Data Warehouse hierarchical design

     Common data warehouse into ODS operational data storage layer, DW data warehouse layer and data marts DM three-layer, which layer is divided into DW and DWS DWD layer layer. Data Warehouse hierarchical structure shown below:

2.1 ODS layer

 ODS data from all layers of the business databases, tables, table ODS layer also in the one to one business database, business database is the underlying tables in the re-establishment of a data warehouse, data fully consistent with the structure.
 Since the service database (OLTP) essentially as ER modeling entities, modeling ER ODS layer is a solid model.

2.2 DW layer

 DWD layer to do is to clean up the data, integration, standardization, dirty, garbage data, specifications are inconsistent, inconsistent state definition, naming non-standard data will be processed. DWD layer should cover all systems, complete, clean, consistent data layer. In DWD layer based dimensional model, design fact and dimension tables, that DWD layer is a very standardized, high-quality, reliable data of detail.
 DWS layer is a common layer summary, the summary will be mild, thicker than the particle size of the detail based on the basic data DWD layer, integrated services data aggregated to analyze a particular subject area, is generally wide table. DWS layer should cover 80% of the scenarios.

III. Dimensional model

 Dimensional modeling theory was proposed by Ralph Kimball, he proposed table is divided into the data warehouse fact and dimension tables for the two types. From dimensional modeling data marts, mainly for scene analysis. Dimensional modeling is born scene-oriented analysis, model building several positions for scene analysis; focus on fast, flexible solution analysis needs, while providing rapid response to large-scale performance data. Targeted, mainly used in data warehouse and OLAP engine to build the underlying data model.
 "Truth table", used to store the fact that the measure (its measure) and the outer dimension of each key point. "Dimension table" to save the metadata that dimension, that dimension of descriptive information, including peacekeeping levels and categories of membership and so on.
 Simply put, the dimension table is the angle you look at things (dimension), the fact that the table of contents that you want attention. For example, the user drops a taxi, then a taxi this matter can be transformed into a fact table that taxi order fact table, then the user corresponds to a user dimension table, the driver corresponding to a driver dimension table.

Fact Table 3.1

 In the real world, every operational event, basically occurs between entities, along with the occurrence of an event of such an operation, will produce measurable value, and this process gave rise to a fact table stores each a measurable event.
 The metric value may occur in the real world operational event generated, stored in the fact table. From the lowest level of granularity of view, the fact table rows correspond to a measure of the event, and vice versa. Therefore, the fact table design totally dependent on physical activity, the impact is not likely to produce the final report of. In addition to digital measurement, always a fact table contains the foreign key, for association dimension associated therewith, optionally also degradation of key dimensions and date / time stamp. The main objective of the query request is based on the fact table Computation and aggregation operations.
 The fact table usually contains three important elements:

  • Dimension table foreign keys
  • Metrics
  • Event Description Information

 In the event such as a purchase electricity supplier scene involving body including customer, product, businesses can generate metrics include the number of items, amount, number and other pieces.

3.2 dimension table

 Each dimension table contains a single primary key column. Primary key dimension table as a foreign key may be associated with any fact table, of course, described in the context dimension table rows should correspond exactly to the fact table rows. Dimension table usually wide, flat type non-specification table, text attribute contains a large amount of low particle size.
 Such goods, a single primary key item ID, a property including origin, color, material, size, unit price, but not the attribute must be text, such as price, size, daily major dimensions are numerical descriptive abstract comprises: time dimension tables, dimension tables and other geographic areas.

In summary, if the single-dimensional modeling for the behavior of the user (single commodity), the model can be obtained as follows:

IV. Data Warehouse Specification

4.1 Specification Table Name

 To make the data table information for all stakeholders contained a common understanding. For example, part of which layer (ODS, DW details, DW summary, DM)? Which business / department? Which dimension (user, vehicle computer equipment)? What time span (days, months, years, real-time)? Incremental or full amount?
Naming format: _ business level / department _ modification / Description _ range / cycle

Data warehouse specification table named in the following table:

Number warehouse level Cycles / data range
Public Dimension dim Day Snapshot d
DM layer dm Incremental i
ODS layer ods week w
DWD layer dwd Zipper table l
DWS layer ie Non-partitioned Full Scale a

Guess you like

Origin www.cnblogs.com/skyell/p/11005666.html