How the data model is good or bad

|0x00 Data Model Selection

There are four most commonly mentioned: paradigm, dimension, DataVault, Anchor. In the traditional industry, the paradigm is very popular, in the Internet industry, the dimension is very popular, and the other two are "only heard by the name, but not seen."

If you talk about these four methods, the design ideas are good or bad, then each has its own merits. But if you ask, which model is the most mature, then I am afraid that the paradigm and dimension will win, and the Internet industry can almost only choose dimensional modeling because it has the most practical experience.

This is a bit like a comparison of software or frameworks. Is Hadoop necessarily good? Is Java necessarily better than Python? Not really. But Hadoop must be the most mature, with the largest number of Java jobs. Because the ecology is established and more people are used, the methodology is mature and it is easy to use.

However, before talking about the data model, we must first look at the quality of the data architecture.

|0x01 Evaluation Criteria for Data Architecture

Data architecture is also a system in the strict sense, but a "data system". Therefore, the standards that can be used in the system, such as response speed, reusability, stability, robustness, etc., can also be used in the evaluation of data architecture. But unlike the application system, the data system is oriented to decision-making, not demand, so the response speed and reusability are more emphasized.

  • Response speed: The main scenarios of data architecture include: business development, data products, and operation analysis. No matter which scenario, the data architecture should respond to demand in the shortest possible time;
  • Reusability: Only when the reuse capability comes up, can the response speed be improved, which is reflected in indicators such as downstream dependency, number of calls, and core field coverage;
  • Stability: In addition to daily tasks without problems, once a problem is discovered, how short a time it can locate and recover the problem is very important;
  • Robustness: Except for e-commerce and other fields that have been cultivated for many years, most business models will change rapidly. How to adapt to this change is a test of architectural skills.

|0x02 Data Model Evaluation Criteria

How the data model is built is extremely dependent on the specifications. If the code style is "a thousand people ahead", then I am afraid that the business system will not be able to be seen in half a year. Nothing values ​​the "legal system" more than the "data system". A standardized system can not only ensure the consistency of data construction, but also deal with business handover situations and lay the foundation for automation.

  • The business process is clear: ODS is the original information and is not modified; DWD is oriented to the basic business process; DIM describes dimensional information; DWS performs index calculations for the smallest scenarios; ADS should also be layered, oriented to cross-domain construction, and application-oriented construction;
  • The indicators are understandable: the business is divided according to a certain business transaction process, the granularity of the detailed layer is clear, and the historical data can be obtained. The dimensions and indicators of the summary layer have the same name and have the same meaning, which can objectively reflect the degree of quantification from different perspectives of the business;
  • The core model is relatively stable: If the business process runs for a long time and the process is relatively fixed, it must sink to the public layer as soon as possible to form a reusable core model;
  • High cohesion and low coupling: The data model in each subject should be highly cohesive to avoid coupling other business indicators in one model, resulting in unclear subject and low cost performance.

| 0xFF The advanced nature of continuous construction

Even with the dimensional model, it is difficult to adapt to all reference scenarios in today's rapid business development. For example, the debate on data solutions for logistics, finance, and enterprises has never ceased. Although the dimensional model was a representative of advanced nature in the past, it has a large number of mature practical methodology and response tools, but its core ideas: data domain, business process, granularity, dimensions, measurement, facts, etc., as the business complexity further increases , The actual model design process has gradually faded these data warehouse concepts, and the useful ideas represented by large wide tables and redundancy have gradually become the main design ideas.

Therefore, not only must there be methodological support, but system products supported by advanced technology systems should also be gradually established, even after cloud computing is generally mature, the next generation of core products.

In this way, data development may also be replaced by automated tools in the future, which is terrible...

Guess you like

Origin blog.csdn.net/gaixiaoyang123/article/details/108011212