One, label introduction

Label concept

Labels were originally used to classify and mark physical objects, such as indicating brief information such as the name, weight, volume, and purpose of the object. Later, it gradually became popular in the data industry to label data and quickly classify and analyze data.

Label features

Precise description of positioning and search, with life cycle characteristics, can be calculated, configured and regularized processing. Tags can be used to describe various structured and unstructured [documents, pictures, videos, etc.] data, so that these contents can be efficiently managed.

Describe features: label [phone color], feature [red, white];
Describe rules: tags [active users], rules [login daily, generate transactions];

Label value

The foundation of refined operation effectively improves the accuracy and efficiency of traffic.
Help products quickly locate demand data and perform accurate analysis;
Can help customers cut into the market cycle faster;
In-depth predictive analysis of data and timely response;
Develop an intelligent recommendation system based on tags;
Based on data analysis under a certain category, gain insight into industry characteristics;

The core value of the label, or the most commonly used scenario: real-time intelligent recommendation, accurate digital marketing.

Two, label definition

Attribute label

Attribute tags describe basic characteristics, do not require behavior generation, and are not based on rule engine analysis. For example, based on the user's real-name authentication information, it can obtain characteristics such as gender, birthday, and date of birth. The frequency of change is extremely small and the accuracy is high.

Behavior label

By burying points in different business channels, capturing user behavior data, and analyzing these data to form a label describing the results, for example: analyzing the user's "online shopping platform", the results obtained are Pinduoduo, Taobao, Jingdong, Tmall, etc. These are tags that need to be judged by behavioral data.

Rule label

The labels analyzed under the rules are more based on the product or operation perspective. For example, e-commerce platforms need to provide one benefit to members who have a membership level of more than 5 and have been active in the past 7 days. Two label applications are involved here: 1 . "Membership level" is based on what rules to judge; 2. How to judge "active in the last 7 days", whether it is based on login or transaction behavior, these must be dynamically configured, and then the results are generated based on the rules engine. Based on the dynamic rule configuration, after calculation and analysis, a description label is generated, that is, the rule label.

Fit label

Fitting tags are extremely complex. Through intelligent combination analysis of multiple tags, predictive descriptions are given, or advanced definitions are directly given, such as the so-called mind reading technique, which judges the human psychology through multiple characteristics and eye contact information activity. There is a saying in machine learning: Through long-term judgment and learning of user behavior, the machine may know the user better than the user.

Three, label management system

Hierarchical classification

The basic methods of label management are usually divided by industry: finance, education, entertainment, etc.; management is refined through multi-level classification.

Base label

That is, the key label of the data, the characteristics are precise and flat, and can not be subdivided, used to accurately describe the data, similar to metadata. When multiple tags are used to describe data characteristics, a structured table management will be formed.

Tag value type

Value type: number, dictionary, Boolean, date, text box, custom, etc. It is the management of the specific value of the label. For example, the tag "gender" and the tag value "male. female. unknown" are typical scenarios described by listing dictionaries.

Fourth, the label production process

1. Basic process

data collection

There are relatively many channels for data collection, such as various business lines within the same APP: shopping, payment, wealth management, takeaway, information browsing, etc. It is transmitted to a unified data aggregation platform through the data channel. With the support of these massive log data, the basic conditions for data analysis are available. Whether it is data intelligence, deep learning, algorithms, etc., are based on the basic conditions of massive data, so that valuable analysis results can be obtained.

Data processing

Combined with the above business, through the processing, analysis and extraction of massive data, to obtain relatively accurate user tags, there is a key step here, which is to continuously verify and repair the existing user tags, especially the rule class and fitting Related tags for the class.

Tag library

Through the tag library, complex tag results are managed. In addition to complex tags and timeline-based tag changes, tag data is already of considerable value here. Some fee-based services can be opened around the tag library, such as common ones. When an e-commerce APP browses certain products, you can see product recommendations on an information flow platform. The era of big data is so smart and suffocating.

Label business

After the data has gone through a large circle and converted into labels, it is natural to return to the business level. Through the analysis of users of label data, precise marketing and intelligent recommendations can be performed. E-commerce applications can increase transaction volume and information flow. China can better attract users.

Application layer

Develop the above-mentioned businesses into services and integrate them into existing application levels, continuously improve the quality of application services, and continuously attract users and provide services. Of course, the user's data is constantly generated at the application level, and in the transfer to the data collection service, a complete closed-loop process is finally formed.

2. Data aggregation pool

Based on IDmapping technology, replace the unique identifier [uid];
Based on the uid associated label, put it into the computing pool;
Tags carried by the same uid will behave like a snake;
Continuously enrich the label content carried under the uid;

Enrich the labeling scene in this way and generate greater data value;

Five, source code address

GitHub·地址
https://github.com/cicadasmile
GitEE·地址
https://gitee.com/cicadasmile

Data insight series articles

Serial number	title
01	Data analysis: based on smart tags, accurately manage data
02	Data analysis: data visualization chart, BI tool construction logic
03	Data analysis: quantitative evaluation process in complex business scenarios

Recommended reading: finishing programming system

Serial number	project name	GitHub address	GitEE address	Recommended
01	Java describes design patterns, algorithms, and data structures	GitHub·click here	GitEE·Click here	☆☆☆☆☆
02	Java foundation, concurrency, object-oriented, web development	GitHub·click here	GitEE·Click here	☆☆☆☆
03	Detailed explanation of SpringCloud microservice basic component case	GitHub·click here	GitEE·Click here	☆☆☆
04	SpringCloud microservice architecture actual combat comprehensive case	GitHub·click here	GitEE·Click here	☆☆☆☆☆
05	Getting started with SpringBoot framework basic application to advanced	GitHub·click here	GitEE·Click here	☆☆☆☆
06	SpringBoot framework integrates and develops common middleware	GitHub·click here	GitEE·Click here	☆☆☆☆☆
07	Basic case of data management, distribution, architecture design	GitHub·click here	GitEE·Click here	☆☆☆☆☆
08	Big data series, storage, components, computing and other frameworks	GitHub·click here	GitEE·Click here	☆☆☆☆☆

Label management system for data application scenarios