"Small but Beautiful" Data Governance Practice

about the author

@edan

Former business data analyst, current TMD data product manager.

I look forward to doing something interesting with my data partner~



1    Background and scope of application


Starting from the service experience to do data governance : Data serves the business, because the business is constantly developing, so when the data construction reaches a certain stage, various problems will follow. The underlying data model is too redundant and the link Too complicated and other issues, there are also experience issues when using data on the business side.


When everyone talks about data governance, they often want to solve the problem from the root, specify a series of index specifications, modeling specifications, and complete mandatory implementation with the help of metadata management tools. The development side has done a lot of actions, but the business side often feels indifferent, forming a situation where the development students are self-satisfied.


In the process of practice, the author promotes the implementation of data governance from the perspective of user experience, so that business students can directly experience the upgrade of data services brought by data governance.


Since the content of this article is more of the experience accumulated in the business line services (data BP), it is more meaningful for data development/products serving specific business lines.


2    Methodology


01Status   analysis


2404dd373080aba785d5586ded672bec.jpeg

Project background : 90% of the scenarios of the business that the author serves are based on self-service calculations, but the current index dimension data is not easy to use, resulting in high business use costs, and even mistakes in the number of times because of the confusion of the indicators.


There are two main reasons for the core: On the one hand, the core indicator system is not clear during the construction process, so many phased indicator dimensions have been expanded; on the other hand, due to years of business development, Datacang has done many “flip-flop” projects. Construction, the indicator dimensions were not sorted out after the business went offline.


Project goal : To manage the indicator dimensions for business-related data sets (based on the set of indicator dimensions divided by business lines) to reduce the difficulty of using the data set; and to precipitate a set of data set governance sop prototypes applicable to the business service system.


2.2   Project process method

2.2.1 Systematization of problems

According to the actual usage of the business, there are currently four types of problems in sorting out the indicator dimensions:

1) A large number of indicators for naming the arena (volume in the middle body / content volume / content volume in the middle station-de-duplication), no specific difference can be seen from the surface. There are reasons for the new and old models, and there are also reasons for the different ways of accessing numbers in the corresponding metadata systems in different scenarios;


2) The complex indicators are redundant: For example, there are many indicators such as "new content in the release state of China and Taiwan". You can consider the solution of combining core indicators through dimensions (limited to the "new content of the day" dimension under the "China and Taiwan release" dimension Content content");


3) Indicator naming is not standardized (such as: information flow_intrusive exposure_7-day retention rate, I don't know what action is to what action retention);


4) The indicator comment cannot explain its true meaning (the amount of content in the middle station: "the amount of content in the middle station (the status is not limited)". In fact, the bottom layer is limited to the release state).


2.2.2 Multi-party collaboration to formulate a plan

1) Business analysis students summarize the core indicator dimension system based on their understanding of the business;


2) The data product will highlight the indicator dimensions that have the above four types of problems;


3) The business side confirms whether the indicator can be offline according to the usage enthusiasm of the indicator dimension and business needs;


4) Data development gives suggestions on indicators that can simplify calculations through the combination of dimensions (for example: "Whether to add content on the same day" dimension and "Contents released in the middle of Taiwan" indicators to make a new content funnel, the original new indicators can be Chop off).


The whole process is initiated and led by the data product manager, and students in other roles make suggestions from their own professional knowledge.


2.2.3 From shallow to deep

Give priority to the surface-level governance of the indicator dimension, improve the perceivable data experience on the business side, and perform three types of actions at the application layer in response to different situations:

1) Rename non-standard indicators (for naming that is not easy to understand and non-standard indicator dimensions, rename them according to company standards);


2) Useless indicators/data models are offline (models that are no longer used in some indicator dimensions are offline; models corresponding to all indicators that have been confirmed to be offline are offline);


3) Integration of dimensions/dimension indicators with synonymous names and different names (for example: the author’s team, the reason for the dimensions with synonymous names and different names is mainly due to the irregular data construction in the early stage, for example, for the city name dimension, different fact table models are connected Different city dimension tables, and different dimension tables have different names for this field, we need to unify the dimension tables used in each fact table model).


After the surface management of the indicator dimension is completed, the data warehouse will further optimize the data link and do in-depth data management to help improve the efficiency of underlying data construction and data production.


2.3   Project effect

After the end of the project, business indicators have been reduced from 200+ to about 80, and dimensions have been reduced from 150+ to about 70. After the governance results went live, the business side feedback: "The efficiency is indeed improved! No longer have to worry about clicking the wrong indicators. At the same time, it also reduces the communication cost when other collaborative business parties use the data set."


3   summary


Thoughts worthy of summary in project practice:

 The governance implementation is “customer-centric” . Governance is implemented from the table to the inside, first from the most perceivable indicator dimension layer of the business to streamline operations; secondly, to improve the speed experience in the process of business use, to fundamentally manage the data warehouse link from the bottom, and to achieve long-term reduction of data development costs Effect.


 In the process of collaboration, we follow the principle of "cooperation and win-win", and work with ba and business parties to think about optimization solutions. This not only guarantees the reliability of the final implementation results from multiple perspectives, but also allows related parties to perceive the very low-level work of data governance. In the end, the governance project was able to achieve impressive results.




The private place of a data person is a big family that helps the data person grow, helping partners who are interested in data to clarify the learning direction and accurately improve their skills. Follow me and take you to explore the magical mysteries of data


1. Go back to "Data Products" and get <Interview Questions for Data Products from Big Factory>

2. Go back to "Data Center" and get <Dachang Data Center Information>

3. Go back to "Business Analysis" and get <Dachang Business Analysis Interview Questions>;

4. Go back to "make friends", join the exchange group, and get to know more data partners.


Guess you like

Origin blog.51cto.com/13526224/2665353