The Inmon vs Kimball Architecture Debate in Data Warehouses

There are always many different views on the optimal question of the data warehouse architecture. Some people even call the dispute between Inmon and Kimball a "religious war" in the data warehouse industry. A simple description and comparison of another popular system in the market is not to define which is good and which is not, but to let beginners better understand the insights of the two data warehouse originators on the data warehouse system. First, let's talk about Inmon's enterprise information factory. In May 2000, WHInmon published an article in the journal DM Review, in which he wrote a sentence "...if I had to design a data mart tomorrow, I would not consider using other methods"; it was revealed that The characteristics of his enterprise information factory. The following figure is the architecture diagram of his enterprise information factory: Let's understand the architecture. The left side is the operating system or transaction system, which includes many kinds of systems, including database online system, text file system...etc. . The data of these systems is loaded into the enterprise data warehouse through the ETL process. The ETL process is to integrate the data of different systems, after integration, cleaning and unification, so we can call it data integration. The enterprise data warehouse is the hub of the enterprise information factory and an integrated warehouse of atomic data. However, because the enterprise data warehouse is not a multi-dimensional format, it is not suitable for analytical applications, and BI tools can directly query it. Its purpose is to use additional data storage for various analytical systems. The data mart is to convert the information obtained from the enterprise data warehouse into a multi-dimensional format for different subject areas, and then aggregate and calculate through different means, and finally provide end-user analysis and use, so Inmon moves the information from the enterprise data warehouse. The process to the data mart is described as "data delivery". Next, let's look at Kimball's dimensional data warehouse: Kimball 's dimensional data warehouse is an enterprise-level data warehouse based on a dimensional model. There are many similarities in factories, all of which are integrated warehouses considering atomic data; let's analyze his point of view according to the following architecture:     
     


     

     
     
     


     Although there are many differences between the two diagrams at first glance, there are many similarities between the two structures: first, both assume that the operational system and analytical system are separated; second, the data sources (operational systems) are both There are many; three, ETL integrates the information of various operating systems into an enterprise data warehouse. Of course, if you want to distinguish their differences, the biggest difference is the different modes of enterprise data warehouses. Inmon uses the third normal form format, while kimball uses a multi-dimensional model-star model, and it is also the lowest granularity data storage. Secondly, the dimensional data warehouse can be directly accessed by the analysis system. Of course, this access method is rarely used in the analysis process after all. Finally, there is a logical difference in the concept of a data mart. In the Kimball architecture, a data mart is represented by a subset of the highlighted tables of the dimensional data warehouse. Of course, sometimes, in Kimball's architecture, there is a flexible design, that is, adding the ODS layer in the ETL process, so that a set of tables in the third normal form can be retained in the ODS layer as the transition of the ETL process. But this idea, Kimball seems to be just a process aid for ETL. In addition, it is also possible to separate the data mart from the enterprise dimensional data warehouse, so that there is an additional layer of so-called presentation layer. These alternative designs are acceptable as long as they meet the analysis needs of the enterprise itself. The last one is the stand-alone data mart. The implementation process from the market is widely used. The following is the architecture of the stand-alone data mart: it is characterized by very simple, easy to implement, and implementation time period. But the biggest problem is the provision of long-term costs and inefficiencies due to fast implementation, cheap process.
     
     
     


     Developing a stand-alone data mart is the most efficient way to get visible results, because there is no need to do cross-departmental, cross-functional analysis, and the data mart can be put into production quickly, so results can be obtained quickly and cheaply, So many organizations use this method. And many ERP integrators have similar functionality built into their systems as a selling point to attract customers. Although it has many advantages, but the most fatal disadvantage, short-term success brings long-term problems. Especially when an independent data mart supports multiple subject areas, it will lead to inconsistent data in multiple departments, which is the phenomenon of data fighting. And it makes each data mart an isolated island of information and lacks compatibility. Therefore, this solution is often unacceptable. Through the brief introduction of the three architectures in this article, I hope to help you accurately understand the architecture and implementation methods of the data warehouse
     

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326316089&siteId=291194637