ByteDance's DataOps practice based on DataLeap

This article is based on the on-site sharing of Wang Yang, the person in charge of Douyin data research and development at the ArchSummit Global Architect Summit (Shenzhen Station). In the five parts of Byte Concrete, DataOps Productization and Implementation, Best Practices, and Future Outlook, the shared content comes from ByteDance’s practical experience in business.

Modes and Challenges of Bytedance Data R&D

Middle-end tools + data BP mode

In the process of implementing DataOps, Byte is combined with the organizational model of middle-end tools + data BP adopted by our data support. The middle-end tool team is responsible for building the functional base, realizing various basic capabilities of data development and Provide an open platform, provide personal technical support for the data BP team, and also export these capabilities through the internal and external integration mode of the volcano engine. The so-called integration of internal and external means that various data tools such as DataLeap achieve consistency in the use of internal and external users.

For the data BP team, in the process of landing DataOps, three things have been focused on: the first thing is the formulation of norms. During the long-term practice process within Byte, we believe that the practice team is the best source of norms The second thing is to realize the development of plug-ins based on the open platform of middle-end tools. The data BP is not a pure data warehouse team, which also includes some engineering teams. The engineering team integrated with the data warehouse can integrate the daily data warehouse Pain points are realized and landed in the form of plug-ins. Different data BP teams can develop different plug-ins according to their own characteristics; the last thing is to evaluate the benefits. After DataOps is promoted, it is also evaluated by BP instead of on the platform. The middle-end tool team can focus on the capability itself, and the development data BP team can focus on the entire specification and value. In the end, external customers can enjoy our platform capabilities and the accumulated BP model at the same time. This is the collaboration model implemented by the entire Byte team on DataOps.

Core indicators of data BP: 0987

b5a8b4a9724657a0deba5c0dc57f39ae.png

How to evaluate whether the data BP team is good or not, we use a set of easy-to-understand indicators 0987 to evaluate

0 means no data accidents. The accidents here include issues such as timeliness and quality. Since we support many online and asset loss scenarios, accidents are the lifeline in our evaluation system;

9 refers to the demand satisfaction rate. We have undertaken data needs from multiple parties, and hope to achieve the goal that we can deliver more than 90% of the needs on time;

8 refers to the analysis coverage rate. This indicator means that 80% of the external team's queries based on the data warehouse can use the tables we built and aggregated instead of the original tables;

7 refers to the NPS indicator. We will send out questionnaires to all users and data users every quarter to collect corresponding feedback information. 70% means that most of the students have positive comments on us and the negative comments are close to 0. ;

from the quality challenge

Under the current support mode of the byte data team, due to the variety of support modes, covering various core decision-making and online scenarios, the primary challenge we encounter comes from data quality:

  • The link is complex: the longest task has thousands of full-link nodes, and the maximum downstream number of a single task has reached the thousand level

  • Frequent changes: The number of data link changes of the live data team alone can reach thousands of times per week, involving hundreds of risk scenarios

  • Accidents are prone to occur: quality accidents occur from time to time, and 56% of the annual data R&D accidents in 22 years involved R&D specifications

Challenges from hardware cost

In the context of cost reduction and efficiency increase, hardware cost has gradually become a core challenge for the data team. In the past, we controlled costs in line with most companies, mainly based on budget. First, we calculated an annual computing and storage resource target, and then based on Budget to carry out governance actions, clean up invalid tasks or reduce TTL; but now we need to move towards the direction of refined control of requirements, and we need to further see how much hardware costs I need to do this requirement, so as to refine the control of hardware costs to the demand level

Challenges from Human Efficiency

In addition to hardware costs, another major cost of ours is labor costs. I am now leading a data research and development team. Every time I conduct an HC inventory, I will encounter two soul questions:

  • How to prove that the current state of the team is efficient?

  • How to create greater business value with fewer people?

This is actually a very real challenge. How do we prove the value of a data team?

The DataOps concept is embodied in bytes

Now that we are facing so many challenges, we have to think about how to break through these challenges and learn from the industry. We found that DataOps is a solution that can effectively help us solve the above problems

ICT Institute's definition of DataOps

  • Data R&D and Operation Integration (DataOps): It is a new paradigm of data development. It integrates agile, lean and other concepts into the data development process. Through the reorganization of data-related personnel, tools and processes, it breaks down collaboration barriers and builds integrated development, governance, The integrated automated data pipeline continuously improves the delivery efficiency and quality of data products and realizes high-quality digital development.

our understanding

  • DataOps is a set of methodologies that act on people + process + tools. The goal is to improve data quality and development efficiency. Mainly through agile collaboration, automation/intelligence, and clear measurement and monitoring, the data pipeline can achieve continuous integration, deployment, and delivery ( CI/CD), in the DataLeap system, DataOps mainly aims to standardize the R&D process, covering the "existing capability integration" of the standardized R&D process, forming a one-stop R&D experience, and also includes the key " New capability building + integration”, other data development basic capability iterations are not part of DataOps

9d2fc733a50166e1c33ba10ca29bb6e7.png 5d75cd68759a9b3ef4751c87a16e2e3b.png

We believe that the core of DataOps includes the following parts

The first one is the link. The so-called link is to open up the binding relationship of the entire chain of data from requirements, development, assets, and users. In terms of function, it is relatively simple, and it solves the problem of the relationship between requirements and codes. The business R&D side has already realized this capability, and every piece of code submitted by R&D personnel can know which requirement it is. But in the matter of data development, there was little attention in the past, so the first thing to do is to connect the demand with the whole process of data.

The second is norms. In the past, the entire process of data research and development lacked standardized productization. It was mainly carried by the documentation requirements within the team, including requirements review, model development testing, and online acceptance. We believe that DataOps is the most standardized. The first thing is to productize and embed these scattered specifications in all data development processes into the daily development chain.

DataOps Productization and Implementation-DataLeap

This picture shows the capabilities of the dataleap suite developed by byte data, covering computing engines, full-link development, global governance, assets and other tools. Such a one-stop big data development suite can help users quickly complete data integration, A complete set of data research and development work including development, operation and maintenance, governance, assets, and security can help data teams effectively reduce work costs and data maintenance costs, mine data value, and provide data support for corporate decision-making. DataLeap is not a product, but a suite (Suite). The analogy of the image is similar to Office, where multiple products cooperate with each other to solve the same big problem or solution, and the products are in the relationship of mutual cooperation and assistance.

DataOps Agile Specification R&D Platform

685f14184cbc88be3be41ee0e22687f4.png

This is the overall framework diagram of Byte's entire DataOps productization, and a set of DataOps agile specification R&D platform provided by the core. In the past, there was a model where the platform team took care of everything by themselves, formulated all the specifications, and pushed them to the data development team. However, this model is not suitable for us because the platform team is far away from the business.

We believe that in this case, the platform should give priority to providing open capabilities. The open capabilities here include open data and interfaces, open processes, etc. With this set of open capabilities, it means that all data development teams can arrange their own processes. , to make their own rules and regulations.

In addition, we found that after the development team is ready, this DataOps agile and standardized R&D platform is common to all data development teams. Blocks have special requirements: After the data is released, the data must be monitored, and the real-time data changes. Relying on the data support provided by the open platform, some real-time data will be provided to the anchors in the live broadcast scene to assist them in making timely decisions. These real-time data include user data, user portraits, etc., and anchors can adjust their speaking skills based on these user portraits. Based on the open platform, the entire release capability of the data BP team to do tasks, and then we found that this set of capabilities can be used universally.

demand management

6fc29104cc4951abe9226d8358fb7e24.png

Let me briefly show you the functions of the internal version that has been launched by Byte, including various dimensions of demand management. Of course, the core idea of ​​demand management here is to allow demand to enter the entire process of data research and development. You can see We will do the access requirements of the requirements, as well as the development process and delivery binding, and then some things related to the progress tracking and value evaluation of the requirements. This is a standard requirements pipeline, which is a set of processes on the byte requirements management platform. , It starts from the demand, and ends with preliminary evaluation, detailed evaluation, scheduling, R & D acceptance, and value feedback.

This is the requirement binding page. When doing task development, some current requirements need to be bound. Of course, this is only a diagram that provides a requirement binding development link. We will also have packages, such as asset links and task links. Various modification links will be bound to the requirements. This function is very simple, but the required full-link concatenation brings great benefits to bytes, which solves the first problem of being able to measure all the whole processes.

Pipeline management

5ef22dfa6609b9049180be841d92695b.png

The second is pipeline management. Byte’s pipeline management includes testing pipeline, publishing, offline, real-time task management, task priority management and other related capabilities. This is a task that is currently running online, and the state of the pipeline after running Register, test, check, and review the press conference, and then regularly release tasks, watch the disk, and other related actions.

Let me focus on the release and testing links here. Many companies actually have test environments for these two parts, but there is no data in the test environment for scenarios and data with a particularly large amount of data, or in more complex scenarios. Compared with business research and development, the test environment cannot cover a variety of issues. For example, the test environment and the production environment must be isolated in the banking scenario, but in the Internet scenario like Byte, our choice is not to separate. The release and testing are actually based on the same set of data and the same environment, so how to isolate testing from production? The core point is that we require that all tasks that have not passed the release pipeline cannot write to the production table, read any production table, but cannot write any production table. The advantage of this is that our testing and production are completely consistent, and at the same time, we can ensure that the test is directly pushed to production, so that the cost of subsequent testing and QA intervention is extremely low, which is adopted by Byte. a method.

Best Practices

Promotion and operation: How to implement DataOps on a large scale within the company?

How to promote these tools after making them? This is also the problem that Byte faced at the beginning of this year, that is, how to implement DataOps capabilities on a large scale within the company. It was very hard at the beginning and encountered many challenges, but I also summed up some experience.

catfish effect

The first one is called the catfish effect. The so-called catfish effect is because the data BP is leading this matter, so the leading team can push it first. For example, in the live broadcast scene, we first tried to get a lot of indicators and summed up the experience. We can use these indicators and experience to communicate with other teams and start from the perspective of improving human efficiency. In this case, some teams Will be willing to learn and try.

out of the box

The second is out-of-the-box. When we provide it to other BP teams, other BP teams don’t need to do anything else, just turn on its process switch and it’s OK. The cost of switching paths is very low.

top down

The third is top-down. Similar tools and capabilities like DataOps must be top-up or approved by higher-level business-sides before they can be continuously pushed down. Things like norms cannot be pushed from the bottom up.

Index traction

A R&D leader will definitely pay attention to the issue of R&D efficiency. Here I will share with you a set of byte-based R&D efficiency index traction system. The system has four dimensions of measurement indicators, including efficiency, quality, resource input, income and other related indicators. . These indicators are based on business research and development to form a set of data research and development indicator system. We will pay attention to the delivery cycle, fixed capacity rate, number of deliveries, defect repair time, online accidents, and business research and development ratio of data requirements. Finally, some matters related to key projects. Except for the last one, which requires manual intervention, the rest can now be counted online, which is very convenient.

Manager's Perspective

The so-called manager's perspective is around the value and future of the data development team, and through openness, the data team has professional value that can be exported. For the data team, there are two types of value, one is called business value, and the other is called professional value. The business value is very easy to say. It is how many needs I have made for the business, which key projects I have participated in, and finally how much efficiency has been improved for the business, and how much benefit the business has gained through certain data means. The second is the value of professionalism. This matter is a very troublesome problem for many data teams. What are the irreplaceable aspects of the data team in the industry and within the company? What are the professional things? When we were doing Datops practice here, we found that it is very important for the data team to have exportable professional value through openness, which allows the data team to fully participate in this matter.

developer perspective

From the perspective of developers, the core thing is how to get a sense of accomplishment in the work, which is the key to retaining people:

  • Recognition & Execution: The specification itself is anti-human. Implementing DataOps within the team requires full communication, combining team adjustment and personal development, explaining why, and avoiding rough implementation

  • Participation & Contribution: Build a development environment that everyone can participate in, so that data development can deeply participate in the process formulation and implementation process, and promote the improvement of personal influence

Yield Metric

The benefits of implementing DataOps mainly include three parts: standardization, quality, and efficiency. Specifically:

  • Specifications: Specification formulation and reuse in different directions to ensure 100% implementation of the process

  • Quality: Systematically solve the R&D process problems in risk scenarios, and the number of data quality accidents caused by the R&D process will return to 0

  • Efficiency: Avoid rework through more reliable delivery, and at the same time superimpose the ability to improve efficiency, which is expected to improve the development efficiency of R&D in meeting business needs by 10%+

future outlook

business value

Finally, I want to talk about the future of data research and development. First, I want to talk about business value:

  • Data Requirements Value Metrics

  • Scheduling strategy based on demand value maximization

The value measurement of data requirements is more complicated than that of functional requirements, so in the next stage we hope to be able to measure the specific value of data requirements, and then implement a scheduling strategy based on the maximization of demand value, so as to achieve our goals for human efficiency and cost. Control objectives.

quality and efficiency

Regarding quality and efficiency, we will focus on the following three points in the future:

  • Large model-based demand docking capability

  • Ability to assist development based on large models

  • Low-cost data testing and verification capabilities

Recently, large models are very popular. We believe that it is very practical and challenging for large models to participate in data research and development. Whether from the perspective of demand docking or auxiliary development, large models can provide us with more automation solutions to deal with the need to rely on experience in the past. The problem can only be solved by precipitation; at the same time, we found that the cost of data testing is very high at the data scale of bytes, and we also hope to explore low-cost data testing verification solutions in the future.

open to the outside world

323f94e01b5cfe65e9f0aa11826410a8.png

The achievements of the DataOps concept in Byte will also be exported through the volcano engine DataLeap. Volcano Engine DataLeap is a one-stop data center suite that can help users quickly complete the construction of a full set of data centers such as data integration, development, operation and maintenance, governance, assets, and security, helping data teams effectively reduce work costs and data maintenance costs, Mining the value of data and providing data support for enterprise decision-making.

Guess you like

Origin blog.csdn.net/u013411339/article/details/132439752
Recommended