Introduction to Data Warehouses and Data Marts

Link to the full text http://click.aliyun.com/m/22751/

Friends who are new to data warehouses must have heard of another similar concept: data marts. Many friends have doubts in their hearts, what is the relationship between these two, today's article will talk about it.
First, let's look at an online data warehouse architecture diagram, which is a subordinate data mart located at the upper level of the data warehouse.
e901353f0ebb80066e2e17f1b15243432e6a0d1b
First, the concept of data warehouse and data mart
Data warehouse (Data Warehouse) is a subject-oriented (Subject Oriented), integrated (Integrate), relatively stable (Non-Volatile), reflecting historical changes (TimeVariant) data Collections are used to support management decisions.
       First, the data warehouse is used to support decision-making and is oriented to analytical data processing, which is different from the existing operational database of the enterprise;
       secondly, the data warehouse is an effective integration of multiple heterogeneous data sources, and is reorganized according to the theme after integration. And contains historical data, and the data stored in the data warehouse is generally no longer modified.
Data marts are data warehouses used to meet the application needs of special users, and they may be hundreds of gigabytes in size. What makes it a data mart is its purpose, its scope, not its size. A data mart can be understood as a small department or workgroup level data warehouse. There are two types of data marts:
       Standalone (obtains data directly from an operational environment): These data marts are controlled by a specific workgroup, department or line of business and are built entirely to meet their needs . In fact, they don't even have any connectivity to data marts in other workgroups, departments, or lines of business
       Dependent (obtaining data from an enterprise data warehouse, such as the architecture diagram at the beginning): Such data marts are often implemented in a distributed fashion. While different data marts are implemented within specific workgroups, departments, or production lines, they can be integrated and interconnected to provide a more global, business-wide view of data. In fact, at the highest level of integration, they can become business-wide data warehouses. This means that end users in one department can access and use data in the data mart in another department.

2. How does the data mart come into being?
An enterprise often has many existing systems, followed by many existing OLTP databases. Although these databases have a lot of information, it is difficult for analysts to extract meaningful information from these systems. It's also slower. And these systems, while generally supporting reporting of predefined operations, often fail to support an organization's needs for historical, federated, or easily accessible information. Because the data is spread across many tables across systems and platforms, and is often "dirty", containing inconsistent and invalid values, making it difficult to analyze. In response to the situation that data is scattered and difficult to analyze in a centralized manner, data warehouses emerge as the times require. For large enterprises, the data is aggregated after ETL. However, the needs of the department are complex, so that if the data is directly extracted and analyzed from the data warehouse, the performance is not very impressive. At this time, the data warehouse came into being. According to the different classification requirements of different departments, the data warehouse extends various data marts to supply a certain part of a specific department or group of people. This greatly improves the execution efficiency of different analysis requirements.
A well-designed data mart has the following characteristics (some characteristics are also found in data warehouses, and some characteristics are relative to data warehouses):
       (1) Information required by a specific user group, usually a department or a specific organization. users without being subject to heavy demands and operational crises of source systems (think data warehouses).
       (2) Support access to non-volatile business information. (Non-volatile information is updated at predetermined intervals and is not affected by ongoing updates to the OLTP system.)
       (3) Reconcile information from multiple operating systems within the organization, such as accounting, sales, inventory, and customer management, and industry data outside the organization.
       (4) Provides sanitized data by defaulting to valid values, making values ​​consistent across systems, and adding descriptions to make sense of implicit code.
       (5) Provide reasonable query response time for ad hoc analysis and predefined reports (because the data mart is at the department level, the response time of query and analysis will be greatly shortened compared to the huge data warehouse).

3. Data Warehouse Design Methodology
Before the establishment of a data warehouse, its implementation methods will be considered. There are usually three implementation schemes: top-down, bottom-up and a combination of the two. They are briefly described below:
       (1) ) The top
       -down approach is to implement the data warehouse in a single project phase. Top-down implementation requires more planning and design work to be done at the beginning of the project. This requires involving people in every workgroup, department, or line of business involved in the data warehouse implementation. Decisions about data sources to use, security, data structures, data quality, data standards, and the overall data model generally need to be made before the actual implementation begins.
       (2) The bottom-up
       implementation includes the planning and design of the data warehouse, and there is no need to wait for the data warehouse design of a larger business scope to be installed. This does not mean that a larger business scope data warehouse design will not be developed; as the initial data warehouse implementation expands, it will be built incrementally. This approach is now more widely accepted than the top-down approach, because the immediate results of the data warehouse can be realized and can be used as a proof to scale the realization of a larger business scope.
       (3) Compromise plan
       There are pros and cons to each implementation. In many cases, the best approach may be a combination of the two. One of the keys to this approach is to determine the extent to which the business-wide architecture needs to be used to support the planning and design of the integration, since the data warehouse is built with a bottom-up approach. When using the bottom-up or staged data warehouse project model to build a series of data marts in a business-wide architecture, you can integrate data marts in different business subject areas one after the other, resulting in well-designed business data storehouse. Such an approach can be applied extremely well to business. In this approach, a data mart can be understood as a logical subset of the entire data warehouse system, in other words a data warehouse is a collection of consistent data marts.

Fourth, the difference between data warehouse and data mart
Full text link http://click.aliyun.com/m/22751/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326237517&siteId=291194637