How to manage massive tags

Tag scoring is an important measure of tag governance. By scoring tags, you can clearly and intuitively evaluate tags from various dimensions, grasp the real usage of tags, and continuously optimize tags to help business operations. At the same time, it can also help the data team determine which tags should be invested in computing and storage resources, and plan cluster resources reasonably.

1. Why use label scoring

After the label system design and label processing in the early stage, the label can finally be launched, allowing business personnel to use it and play its value!

After the tags went online for a period of time, we began to care about the computing resources and storage space occupied every day, how many business students really used the hundreds of tags that ran out, and whether the business revenue could cover the data cost? After the label is launched, how is its quality? Is there any situation where the old rules do not apply and continuous optimization is required?

With this problem in mind, we need a method to evaluate the usage of tags after they go online and identify the value of each tag. Referring to movie ratings, Huabei ratings and other forms, we decided to also score and rank the tags, which is simple and clear.

2. Label Scoring Model

Label scoring model, after consideration, we selected 5 dimensions as scoring parameters:

Total label score = a * label usage score + b * label attention score + c * label quality score + d * label continuous optimization score + e * label security score

Among them, label usage, label attention, label quality, and label continuous optimization are the core dimensions, and label safety can be included according to the actual situation. a, b, c, d, e are weights, and the sum is 100%.

2.1 Label usage score

Tag usage: used to evaluate the usage of tags analyzed and external systems.

In Kangaroo Cloud tag products, tags have the following usage scenarios:

(1) Tag reference: For example, if an atomic tag is applied by a derived tag, or a derived tag is referenced by a combined tag, etc., based on this scenario, calculate the " number of tag references " indicator.

(2) Tag analysis: Calculate the " tag analysis times " indicator for tags analyzed in the tag circle group, group portrait, group comparison, significance analysis and other portrait analysis functions .

(3) Tag calls: the number of times tags are queried by external applications through the data API, and the " number of tag calls " indicator is calculated.

Based on the above three indicators, we first use the Sigmoid function to convert the indicators into scores, and then weight and summarize the scores of each indicator into a label usage score.

 2.2 Label attention score

Label attention: used to evaluate the situation of being searched, viewed, and saved.

In Kangaroo Cloud tag products, tag attention is related to the following scenarios:

(1) Tag search: When tags are searched by users in the tag market, calculate the index of " number of tag searches ".

(2) Tag viewing: the number of times a tag is clicked to view basic information, analysis pages, etc., and the " number of tag viewing " indicator is calculated.

(3) Label collection: the number of users who bookmark the label, and calculate the " number of users who bookmark " index.

The above three indicators can reflect the popularity of tags. We still use the Sigmoid function to convert the indicators into scores, and then aggregate the scores of each indicator into a tag attention score.

2.3 Label Quality Scoring

Labeling quality: used to evaluate the marking situation of users and reflect the rationality of labeling rules.

When we define the label and label value, after calculation, the label value is rarely hit on the user, which means that our rule execution is unreasonable. For example, we defined the label "activity", which is divided into "high activity, medium activity, low activity", etc., but the users who are actually labeled with this label are less than 70%, and a large part of them are null values. , the label is not marked, indicating that the label value rules we have formulated have loopholes and need to be improved.

The system will calculate the " tag coverage " of each tag, normalize the coverage into a score, and convert it into a score.

 

2.4 Scoring of continuous optimization degree

The continuous optimization degree is used to evaluate whether to optimize the label after the label is launched.

In the life cycle of customers, there are constant influx of new users and loss of silent users; corporate strategy adjustments, product releases, etc. will affect customer behavior. We need to present these changes in the form of data, so we need to constantly adjust according to business adjustments and customer changes. Our labeling strategy is based on the pursuit of directly and quickly reflecting customer situations through labels and guiding business operations.

The degree of continuous optimization, we use the " label optimization times " indicator to evaluate, which refers to the number of times the label is edited and re-released after the label is launched. We also use the Sigmoid function to convert indicators into scores.

 

2.5 Safety Score

Label security does not reflect the popularity of labels, but it is also used as a dimension of label scoring, which can be considered according to the situation of the enterprise.

In the kangaroo cloud label product, the policies related to label security include:

(1) Visibility of tags: the range of users who can edit and view tags.

(2) Is it necessary to apply for authorization to use the label: After the label is released, if other people use the label, do they need to apply for approval.

(3) Whether the label has row-level permission control: above we have controlled the column permission of the label, and the row-level permission reflects whether the label has row-level permission.

(4) Whether the label is desensitized: whether the label is desensitized.

According to the security policy configuration of the label, we also use the scoring method to evaluate.

Based on the scores of the above five dimensions, we weighted and aggregated them according to the aforementioned formula to obtain the total score.

 3. Application of tag scoring

Based on tag scores, in order to allow tag administrators and business personnel to view popular tags, silent tags, etc. more intuitively, it is presented in the form of a leaderboard.

3.1  Hot Tag Ranking

Calculate the popularity score of tags based on the three angles of tag usage, attention, and continuous optimization, and display TOP n popular tags.

Popular tags indicate that these tags are frequently used, and we need to continue to pay attention to the normal operation and quality of these tags to ensure business use.

3.2 Silent label leaderboard

The reverse order of popular tags is silent tags. Silent tags indicate that the usage rate of these tags is very low, and you can consider going offline regularly to save cluster resources.

3.3 Comprehensive leaderboard

The comprehensive leaderboard is sorted according to the comprehensive score of tags, and evaluates tags from several dimensions such as tag usage, attention, continuous optimization, quality, and safety.

 3.4 Ranking in terms of label usage, attention, continuous optimization, quality, and safety

Users can view the leaderboards in sub-dimensions of tag usage, attention, continuous optimization, quality, and security according to the dimensions they are more concerned about. At the same time, you can view the specific indicators of each tag, such as the usage dimension, you can view the current reference times, analysis times, and call times of each tag, and analyze specific indicators to meet different tag analysis scenarios.

After the label scoring model is launched, we need to adjust the weights of different dimensions according to the actual situation, in line with our own actual situation. After a period of application and everyone agrees with this set of evaluation logic, the static score display can be converted into dynamic alarms, automatic governance, etc., and label quality alarms and score alarms can be set to automatically notify label administrators and responsibilities. people wait.

 

 

Guess you like

Origin blog.csdn.net/u011487470/article/details/127517381