NetEase Shufan Yu Lihua: DataOps rewrites the rules of data governance, and the "iPhone moment" of data assets is coming丨Interview with Data Ape...

f3b0febcbb681916400b8ef6e8e3ddeb.png

bfc1025122235963d58739cd517e7832.jpeg

03163e855351dbeab74d6255388a5856.png




‍Data intelligence industry innovation service media

——Focus on digital intelligence and change business


One morning on a working day, Xiao Li, the head of the data development team of an e-commerce company, received a call from a colleague in the business department: "Mr. Li, we need a data sheet about the sales of a certain product, but we have been looking for it for a long time and have not been able to get it. Once we get accurate data, the promotion will be launched in the next few days, and we dare not make random decisions without data..."

Xiao Li immediately called the members of the data development team for an emergency meeting to discuss how to quickly provide accurate data to the business department. In the end, Xiao Li found that members of the data development team used different data models and specifications, and lacked a unified standard, which greatly affected the quality of data and output efficiency. At the same time, due to the lack of effective data governance measures, data quality cannot be effectively guaranteed, resulting in the unsatisfied data needs of business departments...

This scenario is a problem often faced by the data teams of many companies. With the continuous advancement of digitalization, many companies have indeed begun to accumulate and accumulate their own "data assets", and various systems have also improved their data analysis capabilities and capabilities as much as possible. Efficiency, however, there are still some insurmountable "gap" between data development and data consumption, because the data development process lacks quality assurance from a business perspective, which greatly affects the business operation efficiency, management, and decision-making of enterprises.

When many companies deal with this problem, the most common way is to build a data governance system separately, and establish a collaborative mechanism and process through cross-departmental collaboration and communication, thereby strengthening the connection between data development and data governance. On the surface, this method solves the problem, but in fact it increases the cost of communication and slows down the efficiency of the team.

Massive data not only failed to improve team work efficiency, but also caused the problem of "data poverty". According to Gartner's report, due to data quality problems, companies will lose 1 billion US dollars every year. How to solve this problem? In response to this problem, Data Ape interviewed Yu Lihua, general manager of Netease Shufan's big data product line, to share Netease Shufan's innovative practice of integrating data development and governance.

Mass data is not equal to data assets

With the rapid development of big data technology and the continuous upgrading of enterprise digitization, the amount of data is also increasing. In the process of digitalization, enterprises should use their data potential to adapt to external changes. Gartner believes that data assetization should be considered. Digital assets should be deliverable and reusable products with high shared value. However, massive amounts of data often make enterprises fall into the predicament of "data poverty", making it difficult to share and reuse. Yu Lihua believes that data poverty is concentrated in four aspects: not being able to find it, not being able to understand it, not being able to trust it, and not being able to control it.

"Not found" means that in modern enterprises, data is often stored in decentralized databases, which may be maintained by different departments or teams. This has resulted in some data not being mounted in the directory, and the standardization is poor. It is tantamount to finding a needle in a haystack for business personnel to find the desired data. Just like the e-commerce company at the beginning of the article, it has accumulated a large amount of data and has a professional data team, but the business personnel still can't find the data they need. Of course, such problems do not only exist in e-commerce companies. Large Internet companies like NetEase also had similar problems before. For example, within NetEase, NetEase Yanxuan has more than 100,000 tables for a business, and Cloud Music has more than 80,000 tables. It is very difficult for business personnel to find the required data. This not only wastes the time and energy of business personnel, but also limits the accuracy and efficiency of corporate decision-making.

"Incomprehensible" is a problem caused by the lack of metadata and poor management. For example, in a certain business within NetEase, because 78% of the metadata was missing, even if the business personnel found the data, they could not understand it. This is because data is often stored in a technical term that can be difficult to understand for non-experts. Additionally, the data itself can be complex and require specialized knowledge to understand. If business people can't understand data, they can't use it to make good decisions.

"Unreliable" is mainly due to problems with data quality and credibility. For example, in a certain business within NetEase, more than 10 data quality issues were complained every week, and there was even a supplier data leakage problem... This shows that there are problems with the quality and credibility of the data. Data quality problems may include missing data, data errors, data duplication, etc. These problems will cause business personnel to doubt the accuracy of the data and affect their trust in the data . Data leakage will lead to the disclosure of business secrets and customer privacy of enterprises, seriously affecting the reputation and credibility of enterprises.

"Uncontrollable" mainly means that data cannot be effectively managed and controlled. For example, in the data center of a business unit, 78.39% of the tables occupy 21.63% of the storage space. However, these data, which consume a lot of development manpower, storage resources and computing resources, have not been accessed once within 30 days, resulting in a large amount of resources. waste. If the data is not managed and controlled effectively, it will lead to unnecessary duplication, redundancy and uselessness of the data, which will waste the resources and costs of the enterprise.

In conclusion, poor quality of data assets is a serious problem, and DataOps has received widespread attention as a new tool for efficient use of data and improved data-driven decision-making. Currently, the common DataOps practice in the market is the establishment of a data development pipeline that integrates CI/CD capabilities. Although this method standardizes the overall data development process, it still lacks the necessary infrastructure to meet data consumption needs. constraints, and did not completely solve the above four common problems, therefore, it is necessary to find solutions from a higher dimension.

Netease pioneered the integration of data development and governance

In order to fundamentally solve the problem of "data poverty", NetEase Shufan put forward the concept of data development and governance integration based on the data development pipeline, which is the end-to-end DataOps practice. It can be seen from the literal meaning that the difference between Netease Shufan's method and the conventional method is that it completely connects data development and data governance. Yu Lihua told Data Ape that when many companies are doing data middle-end or building a data platform, the data development process and data governance process are separated, and may even be services provided by different suppliers. The separation of the two may easily lead to data standards and metadata. There are differences in data, etc., which in turn makes business personnel unable to find or understand data when consuming data, resulting in "data poverty". The core of the end-to-end DataOps proposed by Netease Shufan is 12 words: first design and then development, first standard and then modeling. That is to say, before data development, enterprises need to think clearly about what data is needed and how to design it... and then proceed with data development according to the overall design. This process is very similar to the development process of a software project. Starting from the end, first determine the goals and requirements of the product to ensure the correctness and efficiency of the development process. At the same time, it can also help enterprises reduce the cost of building data platforms.

34c96537efc19f2e8e3909a6539eda95.png

In fact, it is not difficult to find from the core of these 12 characters that Netease Shufan has a more global perspective in solving problems. The traditional solution is a "block-by-block" solution or a partial perspective, while end-to-end DataOps is a comprehensive consideration. In the early stage of platform construction, business parties, data architects, and data product developers need to carry out overall design. Precipitate business-related specifications on data standards, and then use data standards as the core to automatically generate data quality audit rules, automatically generate table classification and grading strategies, data desensitization strategies, data security management strategies, etc. - these are all Ensure that enterprise data can better share and reuse core assets. Going back to the e-commerce company at the beginning of the article, if there is a complete design before data development, and the standards and specifications are sorted out first, then the subsequent data platform will not have "cannot find" data.

From this logic, it is not difficult to find that in the integration of data development and governance, as long as there is a design and a standard, the subsequent development and modeling process will be very smooth, and the models constructed according to the design and standard specifications are in compliance with the specifications. It needs to be reviewed and remodeled and refactored.

For enterprises, with the overall design and standards of the data platform, there is no need to worry about the coding issues in the subsequent development and modeling process, and they can even outsource to third-party companies for coding with confidence, because the constraints in front of the standard Under the current situation, there is basically no "strange circle" of endless revision problems after going online, and there is basically no problem of delivered products that do not meet the specifications and standards. After all, the specifications and standards have already been determined. In this way, enterprises no longer need to worry about the problem of "data poverty", the efficiency of data use and business operations can be greatly improved, and development costs will also be greatly reduced.

Thinking with the end in mind, the efficiency of the R&D team increased by 10 times

Yu Lihua believes that, in a sense, the value of end-to-end DataOps for data assets can be compared to the "iPhone moment" in the mobile phone field, which will fundamentally rewrite the rules of data governance. Compared with traditional solutions, the core concept of data development and governance integration can basically be described as "dimension reduction strike", because it is from a more overall perspective and starts from the end to think about the construction of a data center or data platform , not just entangled in a certain link. The integration of development and governance can bring many advantages: it solves the problems of finding, understanding, trustworthy, and manageable data, greatly improves the quality and efficiency of data asset construction, and reduces rework caused by data errors , which reduces the difficulty of developing and managing collaboration among multiple teams, so the efficiency of delivering applications becomes higher. Gartner predicts that by 2025, compared with traditional methods, the efficiency of R&D teams adopting DataOps methods can be increased by 10 times. 

2bc26826e3bee22106eb675e6f59d147.png

Taking Netease Cloud Music as an example, adopting the DataOps development and governance integration model can improve the efficiency of model reuse, specification construction, and rule coverage. Logical sinking can greatly reduce the number of indicators, and the reuse of cloud music models has been increased by 4 times, and 34,000 models have been offline; in terms of specification construction, there has been no security specification before, and the current security level of fields and indicators has been completed 100% setting; in terms of quality, the rule coverage rate has been greatly improved, business metadata has also been supplemented, and data has been better used.

From the perspective of data application, Yu Lihua believes that the integration of development and governance has also improved the self-service data service capabilities of managers and business teams . A customer of Netease Shufan has formed a useful data asset by building an integrated data development and governance platform from 0 to 1. Currently, there are 200 data analysis teams that can perform self-service analysis, including 32 executives. In addition, it can reduce the probability of data accidents and improve business compliance capabilities . At present, NetEase Shufan has implemented 180 standards for a financial customer, helping customers effectively reduce the risk of regulatory penalties.

Yu Lihua also introduced the case of a certain telecom operator to Data Ape to further analyze the advantages that the integration of data development and governance can bring.

A telecom operator is a state-owned enterprise with a large amount of user data and operation data. To better manage this data, the operator established multiple data systems and implemented a data governance project. However, they still face the dilemma that the standard cannot be implemented. In fact, the main problem faced by the operator is that data standards, data quality, and data development specifications only stay at the dictionary level and cannot be integrated into the data production process. Secondly, the data quality audit rules cannot be connected with the value range constraints of the data elements in the data standards, the data elements in the data standards cannot be linked with the data modeling tools, and the data security level in the metadata management and the data desensitization of the security center cannot be linked. .

In order to solve these problems, the operator introduced Netease Shufan EasyData platform to realize end-to-end DataOps. The successful application of the EasyData platform provides a good solution for the operator's data governance, and also proves that the integration of data development can effectively improve data quality and development efficiency. With the help of the EasyData platform, the telecom operator has built more than 100 data quality audits, covering 8,000+ online operations, and supported a total of 60,000+ self-service analysis times. It is no longer a problem for the consumer to obtain high-quality data in a timely manner. It can be said that the implementation of EasyData not only strengthens data governance and management, but also effectively reduces the manual participation in data governance and improves the efficiency of data output.

Of course, the advantages of data development and governance integration are not only reflected in the "cost reduction and efficiency increase" on the user side and data security specifications, but also have their own unique advantages in research and development efficiency, data quality, self-service, and data accident reduction.

Data Ape Observation: The integration of data development and governance has become a new trend

Since the advantages of data development and governance integration are so obvious, will it become the future trend of the industry? Data Ape believes that it can be considered from the following four aspects.

First, integration of data development governance can improve data quality and reliability. According to Gartner's report, the integrated solution of data development and governance can minimize the loss caused by data quality problems, thereby improving the decision-making efficiency and business benefits of enterprises.

Second, the integration of data development and governance complies with laws, regulations and market demands. According to market research firm IDC, by 2025, the global market for data governance and privacy solutions will reach $15.2 billion. The integration of data development and governance can help enterprises meet the requirements of laws and regulations, such as protecting personal information, collecting and using data in compliance, and improving the competitiveness and innovation capabilities of enterprises.

Third, the integration of data development and governance can improve the efficiency and innovation capabilities of data production. According to Forrester's survey report, more than 50% of enterprises believe that data development efficiency and innovation capabilities are their most concerned issues . The integration of data development and governance can improve the unified management of data efficiency and innovation capabilities, thereby improving the competitiveness and innovation capabilities of enterprises.

Fourth, the integration of data development and governance is more conducive to the integration of various new technologies. Since the beginning of this year, artificial intelligence has attracted much attention with the popularity of ChatGPT. Data Ape believes that whether it is a general-purpose large-scale model or a large-scale vertical model, its development cannot be separated from the support of enterprise big data, and how to make good use of enterprise The data assets accumulated by itself and how to train large-scale model products that assist its own business have the most direct relationship with data development and data governance. Opening up the data assets of enterprises is precisely the advantage of data development and governance integration.

Combining the above four points, if we can sum it up in one sentence, it is: the integrated model of data development and governance is more in line with the future development trend of the data industry, and it is also a "strength tool" to enhance the competitiveness of enterprises. In fact, the integration of data development and governance has been written into the "DataOps Practice Guide (1.0)" led by the Institute of Cloud Computing and Big Data, China Academy of Information and Communications Technology . Integrated DataOps, by building a global data observation view and front-end data quality control, can effectively solve the past problems of two skins in development and governance, unsmooth data requirements, low product delivery efficiency, difficulty in advancing cross-domain collaboration, and difficulty in controlling development costs.

54e86a0eaf29357fb0f767fa933a7d0c.png

Yu Lihua mentioned two points when talking about the future planning of DataOps: one is to continue to do a good job in basic work, mainly to continuously improve user experience and connect more data bases, to create real-time DataOps, etc.; the other is to explore data development and governance Integration and new technology integration, such as low-code, AIGC, etc., through continuous integration of new technologies, intelligent governance capabilities such as security level recommendation, standard automatic matching, computing task automatic generation and error correction, and error automatic diagnosis can be realized, reducing user usage threshold. Data Ape believes that these two directions are the strategic direction of NetEase Shufan to "build high walls and accumulate food widely".

In the digital age where digital transformation is constantly advancing and technological innovation is accelerating, what enterprises fight to the end is not the accumulation of data and the speed of technology iteration, but the efficiency of data.

In order to provide new ideas and new directions for enterprise digital intelligence, on August 10, NetEase Shufan joined hands with China Academy of Information and Communications Technology, ecological partners, and financial, manufacturing and other enterprises to hold a themed "Intensive Farming for Digital Intelligence" at the JW Marriott Hotel in Beijing . Innovation Acceleration - NetEase Shufan City Tour (Beijing) "industry summit, to share the latest progress of digital intelligence technology and industry practical experience.

Text: Winner  /  Data Ape

31437ad7d9c29c24f8a01128ef192819.jpegc65d53276a76ef14ceccce02f0740f7e.png

56f3c021e93558009b5ceba9fe79529c.png

Guess you like

Origin blog.csdn.net/YMPzUELX3AIAp7Q/article/details/132073995