One, label introduction
Label concept
Labels were originally used to classify and mark physical objects, such as indicating brief information such as the name, weight, volume, and purpose of the object. Later, it gradually became popular in the data industry to label data and quickly classify and analyze data.
Label features
Precise description of positioning and search, with life cycle characteristics, can be calculated, configured and regularized processing. Tags can be used to describe various structured and unstructured [documents, pictures, videos, etc.] data, so that these contents can be efficiently managed.
- Describe features: label [phone color], feature [red, white];
- Describe rules: tags [active users], rules [login daily, generate transactions];
Label value
- The foundation of refined operation effectively improves the accuracy and efficiency of traffic.
- Help products quickly locate demand data and perform accurate analysis;
- Can help customers cut into the market cycle faster;
- In-depth predictive analysis of data and timely response;
- Develop an intelligent recommendation system based on tags;
- Based on data analysis under a certain category, gain insight into industry characteristics;
The core value of the label, or the most commonly used scenario: real-time intelligent recommendation, accurate digital marketing.
Two, label definition
Attribute label
Attribute tags describe basic characteristics, do not require behavior generation, and are not based on rule engine analysis. For example, based on the user's real-name authentication information, it can obtain characteristics such as gender, birthday, and date of birth. The frequency of change is extremely small and the accuracy is high.
Behavior label
By burying points in different business channels, capturing user behavior data, and analyzing these data to form a label describing the results, for example: analyzing the user's "online shopping platform", the results obtained are Pinduoduo, Taobao, Jingdong, Tmall, etc. These are tags that need to be judged by behavioral data.
Rule label
The labels analyzed under the rules are more based on the product or operation perspective. For example, e-commerce platforms need to provide one benefit to members who have a membership level of more than 5 and have been active in the past 7 days. Two label applications are involved here: 1 . "Membership level" is based on what rules to judge; 2. How to judge "active in the last 7 days", whether it is based on login or transaction behavior, these must be dynamically configured, and then the results are generated based on the rules engine. Based on the dynamic rule configuration, after calculation and analysis, a description label is generated, that is, the rule label.
Fit label
Fitting tags are extremely complex. Through intelligent combination analysis of multiple tags, predictive descriptions are given, or advanced definitions are directly given, such as the so-called mind reading technique, which judges the human psychology through multiple characteristics and eye contact information activity. There is a saying in machine learning: Through long-term judgment and learning of user behavior, the machine may know the user better than the user.
Three, label management system
Hierarchical classification
The basic methods of label management are usually divided by industry: finance, education, entertainment, etc.; management is refined through multi-level classification.
Base label
That is, the key label of the data, the characteristics are precise and flat, and can not be subdivided, used to accurately describe the data, similar to metadata. When multiple tags are used to describe data characteristics, a structured table management will be formed.
Tag value type
Value type: number, dictionary, Boolean, date, text box, custom, etc. It is the management of the specific value of the label. For example, the tag "gender" and the tag value "male. female. unknown" are typical scenarios described by listing dictionaries.
Fourth, the label production process
1. Basic process
data collection
There are relatively many channels for data collection, such as various business lines within the same APP: shopping, payment, wealth management, takeaway, information browsing, etc. It is transmitted to a unified data aggregation platform through the data channel. With the support of these massive log data, the basic conditions for data analysis are available. Whether it is data intelligence, deep learning, algorithms, etc., are based on the basic conditions of massive data, so that valuable analysis results can be obtained.
Data processing
Combined with the above business, through the processing, analysis and extraction of massive data, to obtain relatively accurate user tags, there is a key step here, which is to continuously verify and repair the existing user tags, especially the rule class and fitting Related tags for the class.
Tag library
Through the tag library, complex tag results are managed. In addition to complex tags and timeline-based tag changes, tag data is already of considerable value here. Some fee-based services can be opened around the tag library, such as common ones. When an e-commerce APP browses certain products, you can see product recommendations on an information flow platform. The era of big data is so smart and suffocating.
Label business
After the data has gone through a large circle and converted into labels, it is natural to return to the business level. Through the analysis of users of label data, precise marketing and intelligent recommendations can be performed. E-commerce applications can increase transaction volume and information flow. China can better attract users.
Application layer
Develop the above-mentioned businesses into services and integrate them into existing application levels, continuously improve the quality of application services, and continuously attract users and provide services. Of course, the user's data is constantly generated at the application level, and in the transfer to the data collection service, a complete closed-loop process is finally formed.
2. Data aggregation pool
- Based on IDmapping technology, replace the unique identifier [uid];
- Based on the uid associated label, put it into the computing pool;
- Tags carried by the same uid will behave like a snake;
- Continuously enrich the label content carried under the uid;
Enrich the labeling scene in this way and generate greater data value;
Five, source code address
GitHub·地址
https://github.com/cicadasmile
GitEE·地址
https://gitee.com/cicadasmile
Data insight series articles
Recommended reading: finishing programming system
Serial number | project name | GitHub address | GitEE address | Recommended |
---|---|---|---|---|
01 | Java describes design patterns, algorithms, and data structures | GitHub·click here | GitEE·Click here | ☆☆☆☆☆ |
02 | Java foundation, concurrency, object-oriented, web development | GitHub·click here | GitEE·Click here | ☆☆☆☆ |
03 | Detailed explanation of SpringCloud microservice basic component case | GitHub·click here | GitEE·Click here | ☆☆☆ |
04 | SpringCloud microservice architecture actual combat comprehensive case | GitHub·click here | GitEE·Click here | ☆☆☆☆☆ |
05 | Getting started with SpringBoot framework basic application to advanced | GitHub·click here | GitEE·Click here | ☆☆☆☆ |
06 | SpringBoot framework integrates and develops common middleware | GitHub·click here | GitEE·Click here | ☆☆☆☆☆ |
07 | Basic case of data management, distribution, architecture design | GitHub·click here | GitEE·Click here | ☆☆☆☆☆ |
08 | Big data series, storage, components, computing and other frameworks | GitHub·click here | GitEE·Click here | ☆☆☆☆☆ |