Table of contents
Note: Learning courses: a full set of tutorials for dark horse programmers Hive, learning records from big data Hive3.x data warehouse development to enterprise-level practical applications.
concept
Data Warehouse (Data Warehouse) is a data system for storage, analysis, and reporting; the
purpose of a data warehouse is to build an analysis-oriented integrated data environment, and the analysis results provide decision support for enterprises.
The data warehouse itself does not produce data, and the data comes from different external systems;
at the same time, the data warehouse itself does not need to consume any data, and the generated data results are open to various external applications.
Therefore, it is called a warehouse, not a factory.
OLTP
1) Saving of operational records
The basic feature of OLTP (online transaction processing system) is that the user data received by the front desk can be immediately passed to the background for processing. A relational database is a typical application. Oracle, MySQL, SQLserver
2) Formulation of analytical decision-making
The safe way is: == Carry out data analysis based on business data, and provide support for decision-making based on the results of the analysis. This is known as data-driven decision making.
It is possible but not necessary to carry out analysis in the OLTP environment. The core of OLTP is business-oriented, business-supporting, and transaction-supporting. Generally speaking, the business of reading is greater than the pressure of writing . There are these problems: reading pressure is high; data is stored for several weeks or months; data is scattered in different tables, and field type attributes are not uniform.
Data Warehouse Construction
An analysis-oriented\analysis-supporting system is called an OLAP (Online Analytical Processing) system, The data warehouse is a kind of OLAP.
So why the data warehouse comes - for the purpose of analyzing data in the enterprise.
Data Warehouse System Diagram
feature
After the theme is determined, the data is usually distributed in multiple operational systems, which are scattered, independent, and heterogeneous. Before entering the data warehouse, it needs to be unified and integrated, and the data should be processedExtract Transform and Load (ETL).This step is the most critical step: to unify all contradictions in the source data, and to perform data synthesis and calculation. The time-
variation of the data warehouse is manifested in:
- The data warehouse time limit is much longer than the operational data data time limit;
- The operational system stores current data, and the data warehouse stores historical data;
- Data warehouse data is appended in chronological order, and all have time attributes.
OLTP VS OLAP
OLTP (On-Line Transaction Processi), typically a relational database RDBMS.
OLAP (On-Line Analytical Processing), for complex multi-dimensional analysis of historical data on certain topics, to support management decisions. The typical data warehouse DW is mainly used for data analysis. The difference between a database and a data
warehouse is actually the difference between OLTP and OLAP.
Database VS Data Warehouse
In short, the data warehouse is a data analysis platform and an integrated data environment.
Data warehouse vs data mart
In short: Data Mart is a subset of Data Warehouse
Various data sources are filled into the data warehouse through ETL; the data warehouse has different subject data, and the data mart is oriented to designated subjects according to the characteristics of different departments, such as procurement\sales\inventory. Users carry out data analysis\data reporting\data mining based on the subject data wait.
Data Warehouse Hierarchical Architecture
The most basic layers: operational data layer ODS, data warehouse layer DW, data application layer DA.
The three-layer structure of Alibaba Data Warehouse is as follows:
ODS layer
The operational data layer, which is stored as processed raw data, is consistent with the source system in structure, and is the data preparation area of the data warehouse. It mainly completes the import of basic data into the data warehouse, decouples from the data source system, and records the historical changes of basic data .
DW层
It mainly completes data processing and integration, constructs reusable detailed fact tables for analysis and statistics, and summarizes indicators of public granularity.
The internal division is as follows:
DA layer
Data application layer, oriented to end-use, oriented to business customizationData provided for use in products and data analysis
Including front-end reports, analysis charts, OLAP topics, data mining and other analysis.
Layered Benefits
ETL VS ELT
Extra, Transfer, and Load
first extract data from the data source pool, and the data is stored in a temporary database (ODS). Transform operations are then performed to structure and transform the data into a form suitable for the target data warehouse system. The structured data is then loaded into the warehouse.
With ELT, data is loaded immediately after being extracted in the source data pool, and there is no dedicated staging database (ODS), meaning data is loaded immediately into a single centralized repository. Data is transformed in a data warehouse system for use with business intelligence tools (BI tools).The characteristic of data warehouse in the era of big data is obvious。