Tag Scoring: How to manage massive tags systematically?

This article is the fourth in the "Label Portrait Series". We have already introduced the label portrait system construction methodology, label system design and processing, label processing and storage. This time we will introduce the "label score".

Tag scoring is an important measure of tag governance. By scoring tags, you can clearly and intuitively evaluate tags from various dimensions, grasp the real usage of tags, and continuously optimize tags to help business operations. At the same time, it can also help the data team determine which tags should be invested in computing and storage resources, and reasonably plan cluster resources.

1. Why use label scoring?

After the previous label system design and label processing, the label can finally be launched, allowing business personnel to use it and give full play to its value!

After the label was launched for a period of time, we began to care about the hundreds of labels that occupied computing resources and storage space every day, how many business students really used, and whether the business income could cover the data cost? After the label is launched, how is the quality of the label? Are there situations where the old rules do not apply and need to be continuously optimized?

With this problem, we need a way to evaluate the use of tags after they go live, and identify the value of each tag. With reference to movie ratings, Huabei ratings, etc., we decided to give the tags a score and an order, which is simple and clear.

2. Label scoring model

For the label scoring model, after consideration, we selected 5 dimensions as scoring parameters:

fileTotal tag score = a * tag usage score + b * tag attention score + c * tag quality score + d * tag continuous optimization reading score + e * tag security score

Among them, label usage, label attention, label quality, and label continuous optimization are the core dimensions, and label security can be considered according to the actual situation. a, b, c, d, e are the weights and the sum is 100%.

01 Tag Usage Score

Tag usage, to evaluate the usage of tags analyzed and external systems.

In the Kangaroo cloud label product, the label has the following usage scenarios:

• Tag reference: For example, atomic tags are applied by derived tags, derived tags are referenced by combined tags, etc., based on this scenario, the "tag reference times" indicator is calculated.

• Tag analysis: When tags are analyzed in the profile analysis functions such as tag circle group, group portrait, group comparison, and significance analysis, calculate the "tag analysis times" indicator.

• Tag invocation: The number of times the tag is queried by external applications through the data API, and the "tag invocation times" indicator is calculated.

Based on the above three indicators, we first use the sigmoid function to convert the indicators into scores, and then weight the scores of each indicator into a label usage score.

file file

02 Tag attention score

Tag attention is used to evaluate the situation of being searched, viewed, and favorited.

In Kangaroo Cloud tag products, tag attention is related to the following scenarios:

• Tag search: When a tag is searched by users in the tag market, the indicator of "Number of Tag Searches" is calculated.

• Tab Views: The number of times a tab is clicked to view basic information, analysis pages, etc., and calculate the "Tag Views" indicator

• Tag Favorites: The number of users who have favorited the tag, and the indicator of "Number of Favorite Users" is calculated

The above three indicators can reflect the attention of tags. We still use the sigmoid function to convert the indicators into scores, and then aggregate the scores of each indicator into a tag attention score.

file

03 Label Quality Score

Label quality is used to evaluate the user's labeling situation and reflect the rationality of labeling rules.

When we define the label and label value, after calculation, the label value is rarely hit on the user, which means that our rule execution is unreasonable. For example, we define the label of "activity", which is divided into "high activity, medium activity, low activity", etc., but the actual users who are marked with this label are less than 70%, and a large part of the proportion is empty. , the label is not marked, indicating that the label value rules we formulated have loopholes and need to be improved.

The system will calculate the "tag coverage" of each tag, normalize the coverage to a score, and convert it into a score.

file

04 Continuous optimization score

The continuous optimization degree is used to evaluate whether to optimize the label in the future after the label goes online.

During the customer's life cycle, new users flow in and silent users are lost. Company strategy adjustment, product release, etc. will affect customer behavior. We need to present these changes in the form of data, so we need to constantly adjust our labeling strategy according to business adjustments and customer changes, in order to directly and quickly reflect customers through labels. situation and guide business operations.

The degree of continuous optimization is evaluated by the indicator of "number of tag optimizations", which refers to the number of times the tag is edited and republished after the tag is online. We also use the sigmoid function to convert metrics into scores.file

05 Safety Score

Label security does not reflect the popularity of labels, but it is also used as a dimension of label scoring, which can be considered according to the situation of the enterprise.

In Kangaroo Cloud Label products, the strategies related to label security are:

• Visibility of the label: The range of users who can edit and view the label

• Does the label use require authorization: after the label is released, if other people use the label, do they need to apply for approval

• Whether the label has row-level permission control: above, we have controlled the column permission of the label, and the row-level permission reflects whether the label has set the row-level permission

• Is the label desensitized: Is the label desensitized?

According to the security policy configuration of the label, we also use the scoring method to evaluate.

file

Based on the scores of the above 5 dimensions, we weighted and aggregated according to the aforementioned formula to get the total score.

file

3. Application of label scoring

Based on the tag score, in order to allow tag administrators and business personnel to view popular tags, silent tags, etc. more intuitively, it is presented in the form of a ranking list:

01 Top Tag List

Based on the usage, attention, and continuous optimization of tags, the popular scores of tags are calculated, and the popular tags of TOP N are displayed.

file

02 Silence Tag Leaderboard

The reverse order of popular tags is silent tags. Silent tags indicate that the usage rate of these tags is very low. You can consider taking them offline regularly to save cluster resources.

file

03 Comprehensive ranking

The comprehensive ranking list is sorted according to the comprehensive score of the tags, and the tags are comprehensively evaluated from several dimensions such as tag usage, attention, continuous optimization, quality, and safety.

file

04 Ranking of label usage, attention, continuous optimization, quality, and safety

Users can view the rankings of each sub-dimension of tag usage, attention, continuous optimization, quality, and safety according to the dimensions that they are more concerned about. At the same time, you can view the specific indicators of each tag, such as the usage dimension, you can view the current number of citations, analysis times, and invocations of each tag, and analyze specific indicators to meet different tag analysis scenarios.

file

After the label scoring model is launched, we need to adjust the weights of different dimensions according to the actual situation, in line with our actual situation. After a period of application, after everyone agrees with this evaluation logic, the static score display can be converted into dynamic alarms, automatic governance, etc., label quality alarms and score alarms can be set, and label administrators and responsibilities can be automatically notified. people and so on.

The above is the scoring logic applied in the product, I hope it will be helpful to you, and you can also propose different ideas to optimize the scoring model to achieve better label management effect.

Kangaroo Cloud Open Source Framework DingTalk Technology Exchange Group (30537511), welcome students who are interested in big data open source projects to join and exchange the latest technical information, open source project library address: https://github.com/DTStack/Taier

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/3869098/blog/5584290