The concept of data warehouse

1. Data Warehouse

The term data warehouse is still no single definition, renowned data warehousing expert WH Inmon, in his book "Buildingthe Data Warehouse", a book given to the following description: Data Warehouse (Data Warehouse) is a subject-oriented (Subject Oriented), integrated the (Integrate), relatively stable (Non -Volatile), reflects the historical changes (TimeVariant) collection of data used to support management decisions.

A data warehouse is a central repository of information . Typically, the data on a regular basis from the transaction systems, relational databases and other sources flows into the data warehouse. Business analysts, data scientists and policy makers by Business Intelligence (BI) tools, SQL client and other analytical tools to access applications or data.

For us the concept of the data warehouse can be understood from two levels. First, the data warehouse to support decision-making, for analytical data processing, which is different from the company's existing operational databases; secondly, the data warehouse is a more effective integration of heterogeneous data sources, integration was reorganized in accordance with the theme, data and contains historical data and stored in the data warehouse is generally no longer be modified.

 

2, data marts

To maximize flexibility, data integration, data warehouse should be stored in standard RDBMS, and after a standardized database design, as well as to improve performance and add some summary information and non-standard designs. This type of design is called atomic data warehouse data warehouse. Atomic data warehouse subset, also called data marts .

 

3, & the fact table dimension table

Dimension table contains the analysis of matter pertains type of description , such as business, organization or enterprise. Column dimension table usually contains a text description of the type of information, it may be a numerical description information (e.g., product weight, customer income level, etc.); fact table contains a measure of the subject of analysis in , contains an outer code and dimension tables associated. Dimension tables and fact table is a table included in the dimensional modeling, except that a conventional dimensional modeling concept relationship (primary key, foreign key, integrity constraints, etc.), the dimensions of the model contains two kinds of dimension tables and fact tables type of table

 

5, Metadata

Metadata (Meta Data) is data about data , when people describe real-world phenomena, will produce abstract information, these abstract information can be seen as metadata, data is mainly used to describe the context information data. Popular speaking, if the contents of each book in the library of data, then the index to find each book is metadata, metadata reason why there are other methods can not match advantage is that it can help people more good understanding of data, data discovery and description of the ins and outs, especially those coming from the OLTP system up to the construction of the DW / BI system enterprise, metadata can help them form a clear and intuitive data flow diagram, metadata is data management and control of basic means. Described according to the different objects can be divided into three types of metadata: metadata technology, service metadata and manage metadata. Detailed description of these three metadata as follows:

1) ** Technical metadata metadata describing technical data related concepts, relationships and rules in the technical field system data, including feature data structures, data processing is described, covering the data source interface, data warehouses and data marts storage, ETL, OLAP, data encapsulation and presentation front end, and all the data processing stage;

2) ** business metadata service metadata describing the data system concepts, relations and business rules, business terms including information, classification information, and indicators define business rules;

3) ** Management Metadata Management Metadata is data describing data system concepts, relationships and rules of management areas, including information on personnel roles, job responsibilities and management processes.

 

6, comparative data warehouse and database

A data warehouse is designed specifically for data analysis, it involves reading large amounts of data relationships and trends between the data to understand. Database for capturing and storing data such as transaction details recorded.

 

7, comparative data warehouse and data Lake

Different data warehouse, data lake all the data (including structured and unstructured data) in a central repository. Data warehouse is optimized using the pre-defined Schema for analysis. Data is not defined in the lake Schema, support other types of analysis, such as big data analytics, full-text search, real-time analysis and machine learning.

 

8, data warehouses and data marts comparison

Data Mart is a data warehouse to meet specific team or business unit (such as finance, marketing or sales) requirements. It is smaller, more focused, and may contain data that best fits their user communities summary.

 

Guess you like

Origin www.cnblogs.com/SAPBI/p/11221778.html