Let me talk about some of my thoughts on the construction of data warehouses

This article has been updated to Yuque. If you reply "Yuque" in the background, you can get all the information that is continuously updated throughout the career of Attack Bar Big Data.

At present, the construction of data warehouses in most industries is mostly carried out using Kimball's guiding ideology. In the initial stage of development, in order to quickly support the business, and hope that leaders can feel the value of the existence of data warehouses and thus bring greater investment and support. Most of the construction process is based on chimney development. However, with the rapid development of business, this development model will reveal various deficiencies, such as resource constraints, inconsistent data caliber, inability to quickly find the desired data, non-standard development, and unsound processes. Any problem here requires a lot of time and manpower to solve
  
, and the effect and value brought by solving these problems will not directly impact the top management. There may be many changes in the process of solving these problems, such as the resignation of members , Departments are gone, and even businesses go bankrupt. How do we break through this embarrassing situation? It should be noted that this chapter does not discuss construction details (how to formulate specifications, how to layer, how to model, these details will be explained separately later), but to discuss this idea of ​​breaking the situation.
  
In "Nanpi County Chronicles · Fengtuzhixia · Ballads", it is mentioned that the idea of ​​"soldiers and horses are not moved, food and grass go first" has been widely circulated, and the editor believes that it is also applicable to the construction of data warehouses. Whether you are building a data warehouse from zero to 1, or perfecting a data warehouse from 1 to 2, you should have systematic thinking before building or during the process, so that the control of the entire direction is clearer , and at the same time there will be great growth for each member. For example: think from the perspective of data flow (of course, the thinking here also covers the specification, layering, modeling and governance mentioned above)
   1. What is the method of data storage? How to control it? How to measure the standardization degree of warehousing? How to guarantee the quality?
   2. What method is used for data integration in the warehouse? Is the process standardized? How to guarantee the quality? How to measure the integration effect? How to control the cost?
   3. How to control the outbound flow? Is the data experience smooth? How to reflect the application value?
  During the construction process, these series of problems must be considered in place, and then a certain amount of manpower must be invested in continuous improvement and optimization. This is the way of healthy development. Children's shoes who are engaged in data warehouses must know that the construction of data warehouses cannot be completed overnight. In fact, we can regard data warehouses as the "product" of data personnel. Data is the soul of the product, the model is the form of the product, and the data personnel are in charge of the product. operating officer. Then we can adopt operational strategies and means to build a data warehouse. Next, discuss around the following aspects (the idea of ​​OSM indicators is adopted here):
   a. How to define the scope of operation?
   b. How are the objectives of the operation formulated?
   c. How to implement the operation strategy?
   d. How to evaluate the results of implementation?

Operating scope

The amount of data in the enterprise is very large and the types are relatively rich, but not all data is maintained by data personnel, and all data cannot be integrated without boundaries, which will increase operating costs instead. Put more energy into empowering the business. So the first thing we have to do is to confirm the data rights , which means that the data operation work should be handed over to the producers. Example: The data generated by the business system should be maintained by the system side. For data personnel, the entire closed-loop data link from data warehousing, data processing, and data application is within the scope of operations. That is to say, from data entering the warehouse to leaving the warehouse, the secondary data generated by each process should be operated and maintained by data personnel.

operational goals

For data personnel, the greatest sense of accomplishment or sense of mission is to be able to "revitalize" the data of the enterprise , and let the data truly display its value, guide the business, innovate the business, and bring positive benefits to the enterprise. In order to experience this sense of accomplishment as much as possible, we discuss it from two levels:   1. Business level : In the DT era, I believe everyone has already felt the convenience brought by the application of big data. In the data industry, everyone knows the importance of data and the need to empower businesses through data, but it is unclear whether the empowerment will be successful in the end. Although the data acts on the business, its value cannot be directly measured by a few indicators. For example: the business demand side proposes some indicators and statistics in the hope of improving operational efficiency. Then this artificial value is sometimes There is no way to measure (and sometimes it is not necessary to reassess and develop a set of indicators to measure the value of indicators because of the development of some indicators, so that the entire development cost and cycle will be greatly extended. For this unmeasurable value, then What we need to consider is how to achieve automation and intelligence, and fulfill business needs by reducing development costs). Therefore, for data developers, what we can do is complete, fast and accurate. That is to achieve full coverage of business, rapid support of business, and accurate judgment of business .   2. User level : In daily work, data developers usually do not directly affect the business, but collaborate with front-line business personnel, data analysts, and product managers, but the data is oriented to the entire enterprise, that is to say As long as data is used, it is more or less related to data workers. In fact, for data developers, data is a "product" that everyone maintains together. From a product perspective, anyone who uses data is our user. We hope that during the entire data experience journey, users can find data without worrying about it, find it comfortably, and use it with confidence . This is our product design goal.

Operation strategy

Guidelines

Before formulating specific strategic means, we usually use a series of principles as our policy of action to avoid detours in the process of implementation. Of course, everyone has a different understanding of these bottom-line principles, and some policies that conform to the status quo can also be adopted. The editor believes that in the entire operation process, asset classification combined with metadata-driven two-way assistance should be adopted.   For asset classification , we need to understand what is an asset? How to grade? After we delineate the scope of operations and determine the ownership of data, we need to classify each type of data as assets and classify them into levels (for example, they can be rated according to business impact, frequency of use, etc.). After we divide all data assets into grades, we have a distinction for the subsequent operation direction, and the operation strategies adopted by each grade are also different.
  There are also many articles explaining the concept of metadata in detail. The previous editor also sorted out 25 metadata management solutions (you can take a look at those who are interested in children's shoes). Now that assets have been rated, why should we combine metadata? Woolen cloth? In fact, it is to quickly measure the effect of our operation process, and at the same time, it is also to let the senior management see some hope and enhance confidence and determination. If adopting a certain strategy in the operation process has no effect or brings negative impact, it is necessary to adjust the strategy in time to avoid going all the way to the dark.

business level

After we sort out the overall direction and formulate the reference principles, we need to adopt different strategies for different goals. Next we will discuss

all covered

The comprehensive coverage of business is actually a way to evaluate the quality of data warehouse construction. Of course, it does not necessarily mean that all businesses must be included (this depends on the characteristics of each company), at least the core business of the company, life The foundation must be available, and then iterated into the data warehouse a little bit as the business expands. The editor thinks that it is difficult to formulate indicators for evaluating business integrity. Generally, it is judged by the subject domain of the construction data warehouse, but here it comes back to a normative problem (it is possible that some models lack subject domains during the construction process. logo). This is why the matrix must be done in the early stage of the data warehouse, and why it is necessary to formulate specifications. These preliminary preparations are also to facilitate the later measurement of value! Generally speaking, we need to focus on the integrity of the assets above the second level, and most of the assets at the third level and below are completed based on the second level and above, which is similar to our hierarchical concept.

quick support

For Internet companies, business changes are rapid. What we can do is to adapt to the changes, but we must keep up with the changes. This also tests the data warehouse team's level of model construction. If you want to expand a new business and want to quickly analyze data, but if you need a week to rebuild the model and need to summarize, then sorry, the data team is just a display. Therefore, for the data team, it must quickly support the business, which is why the middle platform has been mentioned in recent years, in order to improve reusability, thicken the public layer, and achieve agile development , which is also a second-level or above asset. focus of attention. When it comes time to prepare you for battle, your weaponry must be well-equipped. The editor believes that you can comprehensively evaluate whether you have achieved rapid support through engineering indicators and the scalability of your models. For example, whether your overall construction period cost indicators from receiving requirements to delivery have improved, whether your models are changed frequently, and whether the scope of changes is wide. Consider these aspects.

Accurate judgment

Children's shoes engaged in the data industry all have an important awareness. Seriously speaking, it is professional ethics , which is the issue of data security. It is the moral bottom line of every professional to ensure that corporate assets are not lost. Of course, quality awareness should also be improved, try to ensure that the data provided by you is accurate, why do you say try your best? Quality is something that cannot be achieved by one person alone. There are many links involved, and the cost of communication is also high. If you promote this issue, it may not be aligned with the goals of other teams. So this is why it is necessary for the entire organization to develop a standard system process to drive it. but! This does not mean that we will not do this. At the very least, we need to build a relevant indicator system to define rules in advance and monitor during the event, so that we can reverse the operation based on the report after the event , and also ensure that from the warehouse The entire process after that has been strictly controlled, so that the business side can have a sense of trust in us and increase dependence. The problem of accuracy is that it does not distinguish between asset classification, that is to say, every link must be guaranteed. This is the bottom line!

user level

don't bother looking

Anyone who uses data is a user of our products. We want users to quickly find the functions they want on the product. This convenient experience will increase user stickiness. Applied to the data level, when you see a table or a field, can you immediately understand what it represents? There are more things to do here, including the metadata-driven problem mentioned above. This convenience can be done based on metadata, but it also involves a standard specification process, because business metadata After all, it needs to be supplemented manually, and technical tools are only a means of assisting verification. There are many ways to consider this convenience. If your environment is relatively complete and one-stop, you can collect user behavior for analysis, such as star rating, number of interactions, number of complaints, number of favorites, clicks Count and so on. If the environment and facilities you are in are relatively simple, you can measure it by artificially counting the number of communications or user evaluation feedback. This convenient experience generally focuses on assets above the second level.

check comfort

Real-time performance is the goal that the entire industry is currently pursuing. We need to make data as fast as possible, so that users can experience smooth data without any lag. In fact, it is necessary to consider the issue of performance cost. Generally speaking, assets above the second level need to ensure performance, and assets below the third level need to achieve cost control. This is the balance between performance and cost. Of course, in order to consider this goal, it involves whether the output is timely, whether it is stable, resource consumption, and so on.

don't worry

Let users have a complete sense of trust when using data, just like the trust between people . If you can ensure that the data you deliver will not make the accepting party have any doubts, then you are amazing. The quality issue is the same as the accurate judgment at the business level mentioned above, and each level of assets must do this.

target assessment

  In the strategy process discussed above, it also involves the measurement indicators of each target. Here is a summary. If there is something wrong or something needs to be added, you are welcome to correct me.

Guess you like

Origin blog.csdn.net/qq_28680977/article/details/121894350