A review of the live broadcast of the TechDay Zhishu training camp | Practice of enterprise-level labeling system construction

As one of the most common types of data assets at present, tags play an important supporting role for enterprises to gain insights into user profiles and carry out refined operations. The construction of an enterprise labeling system is not accomplished overnight. It requires overall planning from a business perspective, and involves complex data governance and data asset management.

This article reviews the fourth issue of Getui TechDay "Number Governing Training Camp", and shares the construction methodology, construction process and practical experience of the enterprise-level labeling system.

course review

At present, many enterprises have invested or are investing in the construction of their own labeling systems. However, due to the lack of effective data governance, lack of overall labeling system planning, and failure to carry out continuous operation and management of label data assets, some enterprises are building labeling systems. In the process, there are still problems such as unusable label data and inability to accumulate data experience.

In view of the above situation, GeTui deeply sorted out its own labeling system construction experience, and finally summed up the methodology of enterprise-level labeling system construction, and completed the construction of high-quality labeling system in five simple steps.

1. Determine the target

First of all, we need to determine the construction goals of the labeling system based on business needs. According to the use scenarios of tags, the construction goals of the tag system can be divided into two types: business goals and system goals.

①Business goals refer to the goals that ultimately bring actual benefits to the business. For example, through the construction and application of the label system, the next-day retention rate of users can be increased by 20%.

System goals refer to the target requirements in terms of use functions and system performance. For example, the labeling system/platform that is finally built must be able to support visual creation and management of labels; in terms of performance, it can complete label calculation and target circle selection for tens of millions of users within one hour.

2. Label system design

After determining the goal of labeling system construction, the next step is to start the top-level design of the labeling system. Among them, the following core issues need attention:

1. How to determine the label catalog?

We recommend that enterprises fully combine business needs and data assets to determine the label catalog, and the production of labels is jointly driven by "business + data" .

The business drive is easy to understand, that is, the labels to be produced are determined according to the business requirements. For example, in order to improve the ability of risk identification, enterprises need to create labels such as "risk account" and "blacklist"; in order to improve the conversion rate of payment, they can create "commodity Preference" and "Price Sensitivity" labels.

Data-driven refers to putting forward labeling requirements based on data assets. Generally speaking, business personnel are more business-focused and often have less understanding of the underlying data. This requires the in-depth intervention of data development engineers, data analysts, etc., to mine and extract valuable tags from data assets, such as extracting tags such as permanent residence and tourist destination preferences based on scene preference data; Based on internal data, labels such as consumption level and consumption preference are extracted.

It is worth noting that in the process of actually designing label catalogs and labeling systems, companies cannot achieve this simply by relying on pure business drivers or pure data drivers. Match with the data assets, and finally complete the design of a label catalog.

Experience summary

Regarding the determination of the label catalog, the following two common misunderstandings are often encountered.
Myth 1: The more tags, the better. In fact, the business side does not need too many labels. Generally speaking, 20% of the labels can meet 80% of the needs of the business side .
Myth 2: The more advanced the label, the better. Some algorithm engineers or technicians often spend a lot of time optimizing models and building complex model labels. In fact, doing a good job of basic tags and rule tags can basically meet most of the needs of the business side.

Therefore, in the process of building a labeling system, enterprises need to consider the input-output ratio , evaluate the actual application demand strength of labels, and measure the effect of different labels on business improvement.

2. Can the current data base support the construction of the labeling system?

Data construction is the cornerstone of the labeling system. Only by laying a good data foundation can enterprises build a high-quality labeling system. Therefore, before building the labeling system, enterprises need to comprehensively manage the data to improve the quality and usability of the data.

3. How to determine the labeling rules?

According to different production methods, labels can be divided into fact labels, rule labels, model labels and other types.

The rule definition of the fact label is relatively simple, and the data analyst can extract it from the original data of the business party on the premise of understanding the business data . For example, tags such as user source channel, gender, or age stage are extracted from user registration information.

Rule tags have strong business attributes, and require business personnel and data analysts to analyze and explore together, and create and stitch tag rules based on raw data . For example, if you want to create a "college student" tag, you need to define the characteristics of "college student" from various dimensions, for example, the age is generally between 18 and 25 years old, and some college student course management apps are installed in terms of online application preferences, etc. .

Getui daily number management platform DIOS realizes intelligent feature insight

Combined with its own labeling system construction practice and experience in serving customers in the industry, it is found that the definition of labeling rules is a common pain point and difficulty faced by enterprises in the process of building a labeling system. Even professional data analyst teams and business experts need to spend a lot of time and energy to understand the correlation between data and users, and then extract labeling rules from it.

In order to improve the labeling production efficiency of enterprise customers, the intelligent data operating system created by Getui realizes the target intelligent feature insight and intelligent data recommendation , helping customers quickly locate the required data, automatically refine labeling rules , and carry out label production more efficiently .

Model labels often refer to existing factual data to predict group preferences, characteristics, and classifications, such as predicting the churn probability of other user groups based on the characteristics of lost user groups. Generally, when the coverage, saturation, and accuracy of fact labels and rule labels are insufficient, we need to create model labels to meet relevant business needs.

Getui daily number management platform DIOS achieves zero code to build machine learning models

The traditional model label creation process is relatively complicated, involving complex tasks such as algorithm development, model building, and model tuning. DIOS, a push daily data management platform, has commercialized its own modeling methodology "five-step method" , realizing zero-code modeling capabilities, and business personnel without programming experience can also drag and drop on the visual interface of DIOS to complete quickly Build machine learning models .

3. Label development

After completing the overall label system design, the next step is to enter the label development process. Generally speaking, the process of label development can be disassembled into three stages: engineering development, engineering testing, and engineering launch . In the project development stage, the R&D engineers develop the project according to the label rules and data sources; then the test engineers conduct test and acceptance of the label quality in combination with business demands and project results to ensure the accuracy of the label and finally go online.

This involves a very important issue, that is, how to verify the accuracy of the newly constructed labels .

There are three common verification methods:

Check the logical self-consistency through TGI. For example, a new male label was constructed, and it was found through TGI that a large part of the people labeled as male were also labeled as female, which is obviously a very unreasonable phenomenon. (Note: "TGI" refers to the ratio of the proportion of a certain feature in the insight group to the proportion of the feature in the control group, which is used to compare the difference in characteristics between groups.)

Use a third-party platform for verification. For example, through Guangdiantong, third-party data is used to verify the accuracy.

Advertisement delivery. Circle the target groups under different tags, conduct A/B tests, and verify the accuracy of the tags based on the delivery results.

After completing the engineering test and label accuracy verification, the label system can be officially put into operation. We suggest that enterprises can first conduct small-scale multiple inspections in actual business scenarios, and then launch the labeling system on a large scale to avoid large-scale adjustments and changes in the later stage.

4. Label life cycle management

As an important data asset, enterprises also need to carry out refined management of the whole life cycle of tag assets after the tag system is launched.

In addition, we also recommend that enterprises establish a label quality assurance system and establish a responsible person system to ensure that the first person in charge of the label can respond to relevant matters in a timely manner; sort out the process and experience of label development and launch, and realize the follow-up process of label development, testing, and launch Standardization and standardization; more systematic label quality monitoring, such as through timed engineering monitoring label calculation engineering, magnitude, saturation and other information.

5. Application and Feedback

The ultimate goal of an enterprise to establish a labeling system is to serve the business. There are several common label business application scenarios:

①Develop data products. For example, the intelligent recommendation system is implemented based on tagged big data and algorithms.

②It is used for feature insight and circle selection of target groups. For example, Getui helps brand owners and APPs analyze the portraits of subdivided groups based on its own thousands of tags and 100 million-level feature data, and uses different tag combinations to select people who meet the target characteristics in the smart circle and advertise for customers. Delivery and user reach provide support.

Refined operation. After completing the portrait insight of the target group, more refined operations can be carried out. For example, brand owners can develop differentiated advertising materials for consumer groups with different interests and preferences, and choose different media platforms to improve the effectiveness of advertising.

For the results of label application in different scenarios, enterprises also need to do a good job of after-effect analysis, scientifically evaluate the quality and coverage of labels, and store the newly generated downstream data for processing, so that the entire life cycle of the entire label system Get systematic management and control, and truly realize the continuous value-added of data assets.

The above is a review of the fourth live broadcast of Getui TechDay "Training the Numbers Training Camp". You can watch the live review video to learn more about the relevant points of the label system construction.

Guess you like

Origin blog.csdn.net/Androilly/article/details/128478836