1. Basic theory of data warehouse

1. Data Warehouse

Concept :
Data warehouse is a data system for storage, analysis and reporting. The
purpose of data warehouse is to build an analysis-oriented integrated data environment, and the analysis results provide decision-making for enterprises.

Features :
The data warehouse itself does not "produce" any data, and its data sources are different from external systems.
At the same time, the data warehouse itself does not need to "consume" any data, and the results are open to various external applications.

1.1 The main characteristics of the data warehouse

Subject-oriented: Data warehouses are subject-oriented. Theme is an abstract concept, which is an abstraction for data synthesis, classification, analysis and utilization in enterprise information systems at a higher level.
Integration: Data is usually distributed in multiple operational systems, which are scattered, independent, and heterogeneous. Therefore, it is necessary to unify and synthesize the data to extract, clean, transform and summarize.
Non-volatile: The data in the data warehouse reflects the content of historical data for a long period of time. There are generally a large number of query operations in the data warehouse, but few modification and deletion operations.
Time-varying: The data in the data warehouse needs to be updated over time to meet the needs of decision-making.

2.OLTP、OLAP

Concept
Online transaction processing OLTP (On-Line Transaction Processing): traditional relational database system (RDBMS)
OLAP (On-Line Analytical Processing): data warehouse is a typical example of OLAP system, mainly used for data analysis

The difference between data warehouse and database

  • The data warehouse is not a large database, although the data warehouse stores large-scale data
  • The emergence of the data warehouse is not to replace the database
  • Database is transaction-oriented design, data warehouse is subject-oriented design
  • Databases generally store business data, and data warehouses generally store historical data
  • Databases are designed to capture data, data warehouses are designed to analyze data

The difference between data warehouse and data mart

  • The data warehouse is for the data of the entire group organization, and the data mart is for the use of a single department
  • It can be considered that the data mart is a subset of the data warehouse, and some people call the data mart a small data warehouse

3. Data warehouse layered architecture

Stratify according to the process of data flowing into and out of the logarithmic warehouse.
Each enterprise can be divided into different levels according to the business needs of the subset. But the most basic layering idea is theoretically divided into three layers: operational data layer (ODS), data warehouse layer (DW) and data application layer (DA)

ODS layer

Operational data layer, also known as source data layer, data introduction layer, data temporary storage layer, temporary cache layer.
This layer stores unprocessed raw data to the data warehouse system, which is structurally consistent with the source system

The DW layer
is the data warehouse layer, which is processed from the ODS layer data. It mainly completes data processing and integration, establishes consistent data dimensions, builds reusable detailed fact tables for analysis and statistics, and summarizes indicators of public granularity.
The internal specific division is as follows:
public dimension layer, public summary granularity fact layer, detailed granularity fact layer

The DA layer (or ADS layer)
data application layer is oriented to end users and business-oriented to specify the data provided for products and data analysis.

The benefits of data warehouse layering
The main reason for layering is to have a clearer control over the data when managing data. In detail, it is mainly due to the following reasons:
clear data structure ,
data lineage tracking
, reducing repetitive development
and complex problems Simplify
and shield the exception of the original data

4.ETL、ELT

The data warehouse acquires data from each data source and the data conversion and flow in the data warehouse can be regarded as the process of ETL (extracting Extra, converting Transfer, loading Load), but in actual operation, loading data into the warehouse produces two kinds of processes
: Different approaches: ETL and ELT

The ETL concept
begins by extracting data from a pool of data sources, which are usually transactional databases. The data is kept in the temporary cache database (ODS). Transformation operations are then performed to structurally transform the data into a form suitable for the target data warehouse system. The structured data is then loaded into the warehouse, ready for analysis.

ELT Concepts
With ELT, data is loaded as soon as it is extracted from the source data pool. There is no dedicated staging database (ODS), which means data is loaded immediately into a single centralized repository. Data is transformed in a data warehouse system for use with business intelligence tools (BI tools). The characteristics of data warehouses in the era of big data are obvious.

おすすめ

転載: blog.csdn.net/hutc_Alan/article/details/131481049