A brief overview of data warehouse

Data Warehousing Overview Getting

Data Warehouse (Data Warehouse), abbreviated as DW or DWH. In order to have a better business reporting and analytical decision support and build.

Data warehouse features

  • Data warehouse is an object-oriented
  • Data warehouse is integrated
  • Data warehouse is not updatable
  • Data warehouse is constantly changing over time

OLTP & OLAP contrast

OLTP (online transaction processing), also known as transaction-oriented processing systems, mainly for specific business in the daily operation of the database online, usually for a small number of records, and modify. More concerned about problems of user operation in response to events, data security, integrity and concurrency support the number of users. Traditional database systems as the primary means of data management.

OLAP (online analytical processing), generally analyze historical data for certain topics, to support management decisions.

  • Comparison follows:
OLTP (operational process) OLAP (analytical processing)
Details Comprehensive or refined
Entity - Relationship (ER) model Star or snowflake model
Instantaneous data access Storage of historical data, does not include the most recent data
Updatable Read-only append
A first operation unit A collection of one operation
High performance requirements, a short response time Performance requirements relaxed
Transaction-oriented Oriented Analysis
A small amount of data operation The amount of data in one operation
Support daily operations Support decision-making needs
A small amount of data Big amount of data
Customer orders, inventory levels and bank account checking Customers benefit analysis, market segmentation, etc.

Number of positions layered architecture

Here Insert Picture Description

  • OSD layer: In order to simplify the work of the subsequent data processing, temporary data storage area of ​​the interface. Typically include two categories:

    1) for storing data to be loaded currently

    2) a history of the data storage after processing

  • DW layer: the number of data warehouse layer should be consistent and accurate, clean data, i.e., data of the source data system after the washing.

  • DM layers: layer is a subject-oriented data mart to organize the data, typically data structure of a star and snowflakes.

  • APP layer: an application layer analysis in order to meet specific needs built data, is data of a star and snowflakes.

Why hierarchical data warehouse?

  • Space for time, through a large number of pre-application to enhance the user experience (efficiency), and therefore there will be a lot of redundant data warehouse data;
  • The assumption is not hierarchical, if the source system business rules to send data changes will affect the entire cleaning process, a huge amount of work;
  • The cleaning process can be simplified data management by the hierarchical data, to split the original complex job into a plurality of small working

Metadata (the MetaData)

1, the metadata Overview

Metadata is data about data warehouse data warehouse data. Its role is similar to the data dictionary database management system, saving the logical data structures, files, address and index information. Broadly speaking, in the data warehouse, the metadata describes the data structure and method for establishing a data warehouse data.

  • One of the major steps to build a data warehouse is ETL. At this metadata will play an important role, which defines the source data to the data warehouse mapping system, a conversion rule data, logical structure data warehouse, data update rules, the content import data load period history, and the like. Data extraction and conversion specialists, and data warehouse manager, is built data warehouses efficiently through metadata.

  • Users in the use of data warehousing, data access through metadata, meaning clear and custom reporting of data items.

  • Data warehouse size and complexity can not do without proper metadata management, including adding or removing an external data source, change data cleansing methods to control errors and other queries and schedule backups.

    Metadata and metadata can be divided into technical service metadata. Technical metadata for the development and management of data warehouse IT personnel, which describes the data warehouse development, management and maintenance-related data, including data source information, data conversion description, data warehouse model, data cleaning and update rules, data mapping and access permissions. The business metadata management and business analyst for the service, describing data from a business perspective, including business terms, what the data warehouse data, availability data and location data, to help businesses better understand what a data warehouse data are available and how to use.

    As seen above, the metadata defines not only the pattern data in the data warehouse, the source extraction and conversion rules, and is the basis of data warehouse system is running, the metadata data warehouse system of each loose components linked, form the an organic whole.

Here Insert Picture Description

2, the metadata storage
  • Based on the data set, each data set has a corresponding metadata file.

    Advantages: 1) corresponding to the metadata is invoked as an independent file transfer

    2) relatively strong independence of the database

    3) When the metadata database retrieval function can be achieved using

    Disadvantages: 1) If each data set corresponds to a metadata document, if the data is large, then there will be a lot of metadata files, inconvenience management.

  • Based on database, that database yuan.

    Advantages: 1) metabase only one metadata file for easy management

    2) to add or delete data, add or delete records as long as the response to the metadata file.

    Metadata database for storing metadata, so the database is best to use metadata database management system mainstream.

3, the role of metadata
  • Describes what data in the data warehouse, data warehouse to facilitate positioning of content
  • Define data into the data warehouse, as data collection, mapping and cleaning guide
  • Detecting and recording data consistency requirements and implementation
  • Evaluation of data quality

Star structure & snowflakes

Star structure is a non-normalized structure, each have a cube dimensions are directly connected with the fact appearances, slowly changing dimensions do not exist, there is some redundancy data.

Snowflake is an extension of the star model, so that the original dimensions of each table may be extended to small fact table to improve query performance by minimizing the amount of data storage and the United smaller dimension Biao, snowflakes removed data redundancy.

Star structure because redundant data, making a lot of statistical queries do not need an external connection, under normal circumstances be higher than the efficiency of snowflakes. Therefore, under the premise of redundancy it can be accepted, the practical application of the use of star structure more efficient.

  • Comparison follows:
Star structure Snowflakes
Data Optimized Anti-normalized data Specification data
Business Model Only have a primary key Primary key - foreign key relationships to represent
performance Less connection, high efficiency Connect multiple, low efficiency
ETL Design simple, highly parallelized Design complexity, can not be parallelized

Snowflake suitable for dimensional analysis.

Star structure suitable index analysis.

Guess you like

Origin blog.csdn.net/aubekpan/article/details/88881412