Four common data models

Why perform data warehouse modeling?

        Performance: A good data model can help us quickly query the data we need and reduce data I/O throughput

        Cost: Reduce data redundancy, reuse calculation results, thereby reducing storage and computing costs

        Efficiency: Improve the user experience of using data and increase the efficiency of using data

        Improve the inconsistency of statistical calibers and reduce the possibility of data calculation errors

 Dimensional model

        Dimensional modeling is divided into star model, snowflake model and constellation model according to the type of data organization .

        (Dimensional modeling four steps: select business processing process->define granularity->select dimensions->determine facts)

    Star schema : mainly for tables and fact tables. With the fact table as the center, all dimensions are directly related to the fact table, showing a star-shaped distribution.

    Snowflake model : Based on the star model, the dimension table is associated with other dimension tables. This model has high maintenance costs and poor performance. So it is generally not recommended to use it. Especially when building a data warehouse based on the Hadoop system, reducing joins means reducing shuffles, and the performance gap will be huge.

                      The star model can be understood as a fact table that is associated with multiple dimension tables, and the snowflake model can be understood as that a fact table is associated with multiple dimension tables, and the dimension tables are associated with the dimension tables.

     Constellation model : It is an extension of the star model. Multiple fact tables share dimension tables. The constellation model is the norm in many databases, because many data warehouses have multiple fact tables, so the constellation model only reflects whether there are multiple fact tables and whether they share some dimension tables.

paradigm model

        That is, the entity relationship (ER model), a 3NF model is designed from the perspective of the entire enterprise, and the enterprise business architecture is described using a data model described by entities and relationships, which conforms to 3NF in paradigm theory.

        Features: The design idea is top-down, suitable for upstream basic data storage. Only one copy of the same data is stored, no data redundancy, convenient decoupling, and easy maintenance. The disadvantage is that the development cycle is generally long and the maintenance cost is high.

DATA VAULT model

        It consists of three parts: Hub (key core business entity), Link (relationship), and Satellite (entity attribute). It is a derivative of the ER relationship model. At the same time, the starting point of the design is for data integration, not for direct use in data decision-making analysis.

Anchor model

        For a highly scalable model, all extensions are just additions rather than modifications, so it standardizes the model to 6NF and basically becomes a KV structural model. Enterprises rarely use it.

Data model evaluation criteria

        The business process is clear : ODS is the original information and does not modify; DWD is oriented to basic business processes; DIM describes dimensional information; DWS does indicator calculations for small scenarios; ADS should also be layered, oriented to cross-domain construction, and application-oriented construction;

        The indicators are understandable : the business is divided according to a certain business transaction process, the granularity of the detailed layer is clear, and historical data can be obtained. The summary layer dimensions and indicators have the same name and synonyms, which can objectively reflect the quantification degree of the business from different perspectives;

        The core model is relatively stable : If the business process has been running for a long time and the process is relatively fixed, it must be moved to the public layer as soon as possible to form a reusable core model;

        High cohesion and low coupling : The data models within each theme must have high business cohesion to avoid coupling indicators of other businesses in one model, resulting in unclear themes and low cost performance of the model.

Guess you like

Origin blog.csdn.net/GX_0824/article/details/132540075