Data warehouse structure and characteristics

WHInmon, the father of the data warehouse, defines a data warehouse as: "A data warehouse is a subject-oriented, integrated, relatively stable, and time-dependent collection of data used to support management decision-making and business intelligence." Data warehouse technology, simple It is to comprehensively integrate, clean and organize the internal and external data of the enterprise, remove some purely transactional data, put enterprise data into a "warehouse" according to the theme, and then establish various decision-making support on this basis. Data serves business. The basic structure is shown in Figure 1:

We can understand the concept of data warehouse from two levels. First, data warehouse is used to support decision-making and is oriented to analytical data processing. It is different from the existing operational database of enterprises; The data sources are effectively integrated. After the integration, they are reorganized according to the theme, and include historical data, and the data stored in the data warehouse is generally no longer modified. A data warehouse has the following four characteristics:

    ①Oriented to the theme. The data organization of the operational database is oriented to transaction processing tasks, and each business system is separated from each other, while the data in the data warehouse is organized according to a certain subject domain, which corresponds to the application-oriented in the traditional database. A topic is an abstract concept that refers to the key aspects that users care about when using a data warehouse to make decisions. A topic is usually related to multiple operational information systems.

    ② integrated. Transaction-oriented operational databases are usually associated with some specific applications, and the databases are independent and often heterogeneous. The data in the data warehouse is obtained by systematic processing, summarization and sorting on the basis of the extraction and cleaning of the original scattered database data. The inconsistency in the source data must be eliminated to ensure that the information in the data warehouse is about the entire data warehouse. Consistent global information for the enterprise. The integration feature of data warehouse means that before data enters the data warehouse, it must go through data processing and integration, which is a key step in building a data warehouse. It can unify inconsistencies in the original data and transform the original data structure from application-oriented to topic-oriented.

    ③ relatively stable. The data in the operational database is usually updated in real time, and the data changes in time as needed. The data of the data warehouse is mainly used for enterprise decision analysis. The data operations involved are mainly data queries. Once a certain data enters the data warehouse, it will generally be retained for a long time, that is, there are generally a large number of query operations in the data warehouse. , but there are very few modification and deletion operations, usually only periodic loading and refreshing are required.

    ④ Reflect historical changes. Operational databases are mainly concerned with the data in a certain current time period, while the data in the data warehouse usually contains historical information. Information, through which quantitative analysis and prediction of the development process and future trends of the enterprise can be made.

   According to the above characteristics, in a data warehouse, data is a tight whole that is systematically added, aggregated and sorted on the basis of data extraction and cleaning from different sources; the information provided by the data is about a particular Themes Rather than the day-to-day operations of a company, a data warehouse is built with a clear theme, that is, to determine the scope of decisions and the problems to be solved. All data in the data warehouse is identified by a specific time period. The data in the data warehouse is relatively stable. The data in the data warehouse is mainly used for enterprise decision-making and splitting. Once a certain data enters the data warehouse, it is generally used. Long-term retention, modification and deletion operations are minimal, usually requiring only periodic loading and refreshing, which allows managers to obtain a consistent picture of the business.

    An important role of the data warehouse is to provide decision makers with the necessary intelligence to facilitate a better understanding of business crises, business opportunities and operational conditions. In the decision support process, the data warehouse mainly has four processes: integration, execution, intelligence and innovation, as shown in Figure 3.

 

 

(2) Data warehouse and data mart

    A data mart is a part of data that is independent from a data warehouse for a specific application purpose or scope of application. It can also be called departmental data or subject data, usually serving a single department or some users in an enterprise. According to different businesses, it can be divided into multiple data marts such as finance, sales, and marketing. Each data mart only contains data in a specific field. A comparison of data warehouses and data marts is shown in Table 1.

Table 1 Comparison of data warehouse and data mart

 

There are two distinct academic views on the sequence of building a data warehouse and a data mart. Ralph Kimball believes that "a data warehouse is simply the union of the data marts that make it up", while Inmon believes that only after building a few single-subject areas , a centralized data warehouse can create a data mart. In practice, the choice of method depends on the main commercial driver of the project. If the organization is suffering from poor data management and inconsistent data, or wants to lay a good foundation for the future, Inmon's approach is better.

    If the organization has a pressing need to provide users with information, Kimball's approach will meet that need. And once the urgent information needs are met, a transformation plan for the data architecture that includes the stand-alone data warehouse should be considered. Of particular concern is the need for individual departments to prevent the abuse of Kimball's methods out of centralized control.

    (3) Data Extraction, Transformation, Loading Data Extraction, Transformation, Loading Tool (Extract Transform.Load.ETL) is one of the important components of the data warehouse. Take it out from the database, carry out the necessary transformation and sorting, and then store it in the data warehouse in a uniformly defined format. It first filters the data to remove segments of data that are not meaningful for decision-making, then converts the data into uniform data names and definitions, calculates statistics and derived data: estimates default values ​​for missing data. The purpose of the data extraction, transformation and loading process is to merge data from various platforms into a standard format for the data warehouse for business intelligence objectives in a decision support environment. Data extraction tools can access data in different storage methods, and should be able to generate different programs, job control languages, scripts and statements to access different data.

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326565481&siteId=291194637