[Big Data Hive3.x Data Warehouse Development] Data Warehouse Basic Theory

Note: Learning courses: a full set of tutorials for dark horse programmers Hive, learning records from big data Hive3.x data warehouse development to enterprise-level practical applications.

concept

Data Warehouse (Data Warehouse) is a data system for storage, analysis, and reporting; the
purpose of a data warehouse is to build an analysis-oriented integrated data environment, and the analysis results provide decision support for enterprises.

The data warehouse itself does not produce data, and the data comes from different external systems;
at the same time, the data warehouse itself does not need to consume any data, and the generated data results are open to various external applications.

Therefore, it is called a warehouse, not a factory.

OLTP

1) Saving of operational records
The basic feature of OLTP (online transaction processing system) is that the user data received by the front desk can be immediately passed to the background for processing. A relational database is a typical application. Oracle, MySQL, SQLserver
2) Formulation of analytical decision-making
The safe way is: == Carry out data analysis based on business data, and provide support for decision-making based on the results of the analysis. This is known as data-driven decision making.
It is possible but not necessary to carry out analysis in the OLTP environment. The core of OLTP is business-oriented, business-supporting, and transaction-supporting. Generally speaking, the business of reading is greater than the pressure of writing . There are these problems: reading pressure is high; data is stored for several weeks or months; data is scattered in different tables, and field type attributes are not uniform.

Data Warehouse Construction

An analysis-oriented\analysis-supporting system is called an OLAP (Online Analytical Processing) system, The data warehouse is a kind of OLAP.
So why the data warehouse comes - for the purpose of analyzing data in the enterprise.
insert image description here

Data Warehouse System Diagram

insert image description here

feature

insert image description here
After the theme is determined, the data is usually distributed in multiple operational systems, which are scattered, independent, and heterogeneous. Before entering the data warehouse, it needs to be unified and integrated, and the data should be processedExtract Transform and Load (ETL).This step is the most critical step: to unify all contradictions in the source data, and to perform data synthesis and calculation. The time-
variation of the data warehouse is manifested in:

  • The data warehouse time limit is much longer than the operational data data time limit;
  • The operational system stores current data, and the data warehouse stores historical data;
  • Data warehouse data is appended in chronological order, and all have time attributes.

OLTP VS OLAP

OLTP (On-Line Transaction Processi), typically a relational database RDBMS.
OLAP (On-Line Analytical Processing), for complex multi-dimensional analysis of historical data on certain topics, to support management decisions. The typical data warehouse DW is mainly used for data analysis. The difference between a database and a data
warehouse is actually the difference between OLTP and OLAP.
insert image description here

Database VS Data Warehouse

insert image description here
In short, the data warehouse is a data analysis platform and an integrated data environment.

Data warehouse vs data mart

In short: Data Mart is a subset of Data Warehouseinsert image description here
Various data sources are filled into the data warehouse through ETL; the data warehouse has different subject data, and the data mart is oriented to designated subjects according to the characteristics of different departments, such as procurement\sales\inventory. Users carry out data analysis\data reporting\data mining based on the subject data wait.

insert image description here

Data Warehouse Hierarchical Architecture

The most basic layers: operational data layer ODS, data warehouse layer DW, data application layer DA.
:

The three-layer structure of Alibaba Data Warehouse is as follows:insert image description here

ODS layer

The operational data layer, which is stored as processed raw data, is consistent with the source system in structure, and is the data preparation area of ​​the data warehouse. It mainly completes the import of basic data into the data warehouse, decouples from the data source system, and records the historical changes of basic data .

DW层

It mainly completes data processing and integration, constructs reusable detailed fact tables for analysis and statistics, and summarizes indicators of public granularity.
The internal division is as follows:
insert image description here

DA layer

Data application layer, oriented to end-use, oriented to business customizationData provided for use in products and data analysis
Including front-end reports, analysis charts, OLAP topics, data mining and other analysis.

Layered Benefits

insert image description here

ETL VS ELT

Extra, Transfer, and Load
first extract data from the data source pool, and the data is stored in a temporary database (ODS). Transform operations are then performed to structure and transform the data into a form suitable for the target data warehouse system. The structured data is then loaded into the warehouse.
insert image description here
With ELT, data is loaded immediately after being extracted in the source data pool, and there is no dedicated staging database (ODS), meaning data is loaded immediately into a single centralized repository. Data is transformed in a data warehouse system for use with business intelligence tools (BI tools).The characteristic of data warehouse in the era of big data is obvious

insert image description here

Guess you like

Origin blog.csdn.net/weixin_43629813/article/details/129777310