Corporate Governance - Experience Sharing

This article has been synchronized to Yuque's public knowledge base "Big Data Technology Architecture Manual-1"; the backstage of the official account can reply with the "mini program registration code" to view the interview question mini program for free

foreword

As a data person, I often laugh at myself as SQL Boy. One day, I suddenly discovered that SQL Boy has some more advanced work content: data governance. In the past two years, many bigwigs have shared a lot of dry goods about data governance and digital transformation, and I have learned a lot from them. But there is still a big difference between mastering these contents and learning programming: learning programming can achieve the unity of knowledge and action through some simple demo practices, but governance needs to combine multiple factors such as organization, process, culture, and system, and needs to stand at a higher level. Only when the right time, place and people are in harmony can we truly achieve the unity of knowledge and action.

Interestingly, not long ago, I chatted with a few good friends about their respective work content recently, and found that everyone is more or less doing one thing: reducing costs and increasing efficiency. It is not difficult to see that under the influence of major trends such as the epidemic and macroeconomic slowdown, enterprises have begun to take transformation, cost reduction and efficiency increase as their primary goals. At present, the goals and governance work of many enterprises coincide with each other, which also provides better development opportunities for many data students. Last year, the author was fortunate to lead the promotion of corporate governance work. On the one hand, this article serves as a work review, and on the other hand, I hope that some of the content of this article can help students who are participating in governance work.

Due to the limited ability and level of the author, if there are any mistakes, please give me some pointers.

Before governance

Looking back on the past, the early team has always been on the road of governance in a short-term filling manner, and it was not until the last quarter of 21 that the real and systematic governance work began. Considering the current situation of the enterprise, before carrying out the governance work, the team has done a lot of preparatory work:
1. What are the current internal pain points?
2. Where is the boundary and depth of governance?
3. How to formulate the implementation path of governance work?
4. How to establish a governance evaluation system applicable to itself?
5. How to strike a balance between governance work and daily business support?

As the saying goes: If you have the way but no skills, you can still seek skills; if you have skills but no ways, stop at skills. If students reserve some knowledge of data governance, the above points can be found in relevant books.

The following will introduce the above points in detail:

Pain points

Pain point 1 : In order to quickly support the business in the early days of Data Warehouse, coupled with the relatively fast iteration of personnel and weak documentation awareness, many historical tasks cannot track the source of their business needs and application value, and have fallen into an embarrassing situation where they dare not go offline ( business metadata missing )

Pain point 2 : The business development of the group is changing very fast. Some of the data warehouse themes and domains in the early design division have been unclear and inapplicable, and the incompleteness of the tool level cannot guarantee the standardization. For the later maintenance of the data warehouse personnel The cost is very high ( the model design is not clear and not standardized )

Pain point 3 : At present, there are more than 7,000 tasks on the platform, and the calculation logic of many tasks is repeatedly referenced. However, due to the lack of reference information at table granularity and the lack of unified index management, it is impossible to accurately locate which tasks have double calculations (double calculations ) , lack of unified index management )

Pain point 4 : 0 to 10 o'clock every day is the time period with the highest cluster load, and as time goes by, the number of tasks continues to increase, and cluster resources become more and more tense, so the stability and timeliness of tasks cannot be guaranteed. ( task stability and timeliness are poor )

Pain point 5 : In the entire data transfer link, quality issues have not been paid attention to at the level of real awareness. The source data is dirty, the processing link lacks monitoring, and there is no operation after the event. At present, the quality is only improved through feedback from the business level ( data poor quality )

Pain point 6 : The data warehouse is built around the principle of "business-oriented, data-centric", but the awareness is relatively weak in the entire link monitoring link of warehouse entry, warehouse processing, and warehouse exit, such as task stability , timeliness, model quality, value embodiment and other indicators have not been effectively measured (the monitoring link is weak and the value embodiment is not obvious )

Governance Boundary and Depth

As shown in the figure above, the content of data governance includes data architecture, data model design, data storage, data quality, metadata management, data security, master data and reference data, file content management, data integration operations, etc. That is to say, the content of governance covers the entire life cycle of data definition, production, storage, processing, use, and sharing.

The author considers comprehensive factors such as internal actual conditions and manpower input, and based on the above six major pain points, that is, to carry out phased work mainly around the directions of data life cycle, data quality, metadata management, data model, and benefit evaluation. According to the urgency of the pain points, phased goals are set for different degrees of governance.

Combined with the actual internal situation, pain points 1, 4, and 6 are urgent and important priority governance objects, and this article only introduces these three pain points;

It should be noted that governance work is not staged, but continuous for a long time; governance is not rigid, but is constantly changing and adjusted according to the actual environment or priorities. Governance goals for each stage. The best way is to integrate the content of governance into the daily standard process.

Governance Execution Path

The DAMA Data Management Knowledge System Guide defines data governance as follows: Data Governance (DG) is the exercise of power and control in the process of managing data assets, including planning, monitoring and implementation. Its purpose is to ensure that data is properly managed according to data management policies and best practices, and the overall driving force of data management is to ensure that organizations can derive value from data.

Guidelines

The author believes that governance is actually a kind of management of data assets, and metadata is indispensable in the process of managing data assets. Therefore, before carrying out the governance work, the guidelines of " based on metadata-driven, hierarchical operation and step-by-step implementation of assets " were formulated.

asset rating

Based on internal realities and working in accordance with governance guidelines. Since a lot of work has been done internally on metadata management, it is relatively mature. However, it is relatively weak in terms of data assets, so the author refers to the "Alibaba MaxCompute Platform Data Asset Development Standard" to carry out inventory and grading of internal data assets, and at the same time adopts different operating methods according to different levels to achieve the ultimate goal.

By classifying and defining data assets, governance and operations can be promoted in an orderly manner. The ultimate goal is to ensure data quality, accuracy, integrity, consistency, and timeliness, rather than being like a headless fly Bumping around. The standards for asset levels are not fixed. You can agree on the actual situation of the enterprise itself. Generally, they are divided according to their importance and impact on the business (the following levels are from the establishment of Ali MaxCompute): ● Destructive nature: once the
data Mistakes will cause major asset losses and face major loss of income. Mark it as A1.
● Global nature: the data is directly or indirectly used for enterprise-level business, effect evaluation, and important decision-making. Mark it as A2.
● Partial nature: the data is directly or indirectly used for the operation and reporting of certain business lines, and if there is a problem, it will have a certain impact on the business line or reduce work efficiency. Marked as A3.

● General nature: The data is mainly used for daily data analysis, and the impact caused by problems is minimal. Marked as A4.

● Unknown nature: the application scenario of the data cannot be clarified. Labeled Ax.

The importance of these properties decreases in order, that is, the degree of importance is A1>A2>A3>A4>Ax. If a piece of data appears in multiple application scenarios, it will be marked according to its most important degree.

The marking of asset grades should apply the entire link from the time the data enters the warehouse to the warehouse, so that the grade of a specific asset can be changed through reverse derivation. When grading internal data assets, the author set four levels. The figure below shows the proportion of intercepted tasks:

Note: When grading data assets, you can choose different measurement standards for grading according to the actual situation. The ultimate goal is to facilitate management

cost management

As mentioned earlier, based on the actual situation and the urgency of the problem, the author regards timeliness, stability, knowledge base accumulation, full link monitoring, and value reflection as the first stage of governance.

Among them, the guarantee of timeliness and stability belongs to the category of cost management. I believe that many friends have a relatively clear understanding of the means of cost management. Later, I will introduce the means adopted by the author on this content.

Full-link monitoring is to provide a reliable basic support for the development of follow-up governance work and daily operations. Monitor each link from data warehousing, storage, standardization, reusability, time-consuming output and resource occupation, data export type and frequency, etc. Of course, if you want to monitor each link, basic data is not available Indispensable, such as scheduling data, platform audit logs, resource allocation data, configuration data, and many other supporting data need to be collected for analysis and monitoring. (The picture has been desensitized and has low resolution)

Data governance work is often thankless, and requires the strong support of senior leaders to continue to carry out. Therefore, the establishment and improvement of the value indicator system is the best embodiment of governance results.

Next, I would like to introduce some measures taken by the author in terms of timeliness and stability (this needs to be combined with the previous asset inventory and grading, which level to give priority to and what measures to take can be decided according to your actual situation):

task optimization

The task optimization here includes: small file optimization, partition optimization, resource allocation is too large or too small and some other problems that can be solved from the code level.

task offline

For some worthless tasks such as zombie tasks, idle tasks, and tasks beyond the normal life cycle, offline cleaning is performed.

Computation lapse/task degradation

The cluster is under high-load operation every day and night, during which many tasks will be abnormal due to resource preemption. For some unimportant tasks, it can be degraded, and resources are allocated to high-priority tasks to ensure timely output of high-priority tasks

engine switch

Currently, two computing engines, hive on mr and spark, are used internally. Although spark has the characteristics of memory iterative computing, in the early days, due to the lack of strict resource application standards, members randomly assigned engines to schedule tasks. As time went by, more and more tasks resulted in a decline in component stability and task output. slow. In order to solve such problems, strict standards have been established for resource allocation and engine selection. The spark engine is given priority to high-priority tasks to ensure the output of high-priority tasks as much as possible.

Model optimization

Among them, some high-priority tasks belong to the wide table model type, which cannot be downgraded or have no room for optimization after being processed by various other means. At this time, model optimization will be selected, and the operation of splitting or merging will be performed if necessary. Of course, model governance is the content of the next stage, and it is a short-term filling in the cost governance stage.

Exception push

In the early days, the alarm push method for data warehouse tasks was only email, and SMS and Qiwei were added later. However, due to the high complexity of data warehouse tasks and rich monitoring dimensions (such as retries, failures, missing dependencies, quality monitoring, enumeration management, etc.), a large number of alarm emails often appear. With the passage of time, members have gradually become paralyzed by this notification method, and cannot handle abnormal tasks in a timely manner, and the large number of warning emails can easily bury important emails, resulting in missed reading.

In order to solve the paralysis caused by multi-channel push and ensure the timeliness of problem solving, the push method has been simplified, and multiple directions such as openness and transparency and alarm escalation have been adopted to ensure the timely resolution of problems.

drive governance

In the process of governance, it is necessary to preach and train team members from time to time, so that each member can clearly understand the purpose and significance of what they are doing so far, so that each member can consciously follow the normative standards Do things, improve personal awareness, and cultivate self-drive. In some materials, the measurement criteria of the scorecard are mentioned to drive the governance of each responsible party. Of course, the first stage of governance that the author participated in did not introduce scorecard standards. In the long run, scorecards are necessary, especially when it comes to cross-team collaboration, cost settlement, and performance appraisal.

Governance Evaluation System

Target

During the preparatory stage of the governance work, the team formulated four principles of " problem standardization, process strategy, governance quantifiable, and operational control ". Regarding the measurement standards and specified values ​​of the goals, you can formulate them yourself according to the actual situation. The author is not here to elaborate. Let me briefly share that the author's team at the time mainly formulated relevant goals from four aspects: cost benefit, quality benefit, human efficiency benefit, and value benefit, such as saving storage capacity, manpower saving, reducing the number of problems, timeliness compliance rate, quality pass rate, etc.

Balancing Governance Efforts and Business Support

If you want to carry out governance work, you need to find a fit with the business and get the support of the leadership, otherwise you will become a fish without water. Getting the support of leaders does not mean that governance will be successful, and it does not mean that daily support will not be done. Here, it is necessary to strike a balance between personnel allocation and work arrangements. As mentioned earlier, business Based on this sentence, everyone needs to distinguish between primary and secondary when doing governance work. It is ideal if the situation of manpower all in can be achieved. ".

Governance results

Regarding the governance results, it is not the focus of this article. I will briefly share the achievements made by the team after about a quarter of hard work, especially the qualitative leap in timeliness and stability. The overall link has increased the output by 2 to 3 hours. , the stability has increased by 80% compared with the same period, and the timeliness of problem solving is controlled within 1 day. At the same time, the corresponding knowledge base is also precipitated in the process of governance. According to the formulated plan, the next stage will revolve around model governance and indicator management. After the second phase is completed, it will also be shared with everyone in the form of an article.

Guess you like

Origin blog.csdn.net/qq_28680977/article/details/125035139