Basic concepts of data warehouse

0, the definition of data warehouse

Data warehouse is a subject-oriented, integrated, time-varying data collection, but the information itself is relatively stable, used to support the management decision-making process.
Subject-oriented: Data warehouses are based on a clear subject, only data related to the subject is needed, and other irrelevant details will be excluded.
Integrated: The data in the data warehouse is obtained through system processing, summarization and sorting based on the extraction and cleaning of the source business system data. The inconsistencies in the source data must be eliminated to ensure that the information in the data warehouse is consistent global information about the entire enterprise.
Changes over time: The data in the data warehouse usually contains historical information. The system records the information of the company from a certain point in the past (such as the time when the data warehouse is applied) to the current stage. Through this information, the development of the company can be improved. Quantitative analysis and prediction of history and future trends are
relatively stable: The data in the data warehouse is mainly used for corporate decision-making analysis, and the data operations involved are mainly data queries. Once a certain data enters the data warehouse, it will generally be used for a long time. Retention, that is, there are generally a large number of query operations in the data warehouse, but there are few modification and deletion operations, and it usually only requires periodic loading and refreshing.

1. The concept and classification of OLAP

That is, a subset of detailed data is extracted from the data warehouse, and after necessary aggregation, it is stored in the OLAP memory for the front-end analysis tool to read.
ROLAP stores the multidimensional data used for analysis in a relational database, and selectively defines a batch of real views according to the needs of the application, and also stores them in the relational database.
MOLAP physically stores the multidimensional data used for OLAP analysis as a multidimensional array. Form, forming a "cube" structure.

2. Data modeling process

1). Carry out the balanced selection of requirements, data, and technology
. 2). Abstract business to form a logical model: bus matrix, dividing themes.
3). Formulating specifications: development specifications, process specifications, and naming specifications

3. Basic principles

a. High cohesion and low coupling
What records and fields a logical or physical model consists of should follow the principles of high cohesion and low coupling in the most basic software design methodology.
Mainly consider from the two perspectives of data service characteristics and access characteristics:
design data with similar or related businesses and the same granularity as a logical or physical model;
put together the data with high probability of simultaneous access, and separate the data with low probability of simultaneous access storage.
b. The core model and the extended model are separated to
establish a core model and an extended model system. The fields included in the core model support commonly used core services, and the fields included in the extended model support the needs of personalization or a small number of applications, and the fields of the extended model cannot be excessively invaded into the core. Model, so as not to destroy the simplicity and maintainability of the core model.
c. The sinking and unitary of common processing logic. The
more common the processing logic at the bottom, the more it should be encapsulated and implemented at the bottom of the data scheduling dependency. Do not expose the common processing logic to the application layer implementation, and do not allow multiple common logics to exist at the same time.
d. Balance between cost and performance.
Appropriate data redundancy can be exchanged for query and refresh performance. Excessive redundancy and data replication should not be used.
e. Data can be rolled back. The
processing logic remains unchanged, and the results of multiple runs at different times are determined to remain unchanged.
f. Consistency
Fields with the same meaning must be named the same in different tables, and the names in the specification must be used.
g. The naming is clear and understandable. The naming of the
table must be clear and consistent. The name of the table must be easy to understand and use by the user.

Guess you like

Origin blog.csdn.net/hardyer/article/details/114262897