In the era of big data: the application of labeling system

In the era of big data: the application of labeling system

【1】Necessity

Projects generally use big data analysis platforms and user tags/user portraits. From the perspective of data query, from traditional associated key field queries to metadata custom queries, and to tagged queries, it is gradually deepened. Tags It is the most fine-grained division of data in advance, and the logical processing during complex multi-table association and joint query will not affect the logical judgment of business data, eliminating the performance impact caused by complex logical processing.

【2】Two typical label management methods

Original label : for the database, through the tables in the database (small business)

Labels after productization : terminal-oriented: small business impact, fast response, and good interface experience. (big business)

【3】Tag classification method

  1. by source
  2. By business scenario (hierarchical by business scenario and complexity)
  3. By data type (numeric/single-value/multi-value/text)

Classification of data labels:

(1) Property label

(2) Statistics tab

(3) Algorithm label

Classification of Huawei data labels:

  1. fact label
  2. rule label
  3. model label

 

【4】The content information of the label

mainly include:

【Basic Label Information】,

[Label data table information],

[label processing information],

[Quality information of the label],

【Application information of tags】

[5] The overall structure of label construction

The label construction project includes four links: label processing, label loading, label management and label service

Label processing: Many customers ask me whether label processing should be placed on the big data platform or the traditional data platform? In fact, it can be placed anywhere. If your basic data has already landed in HDFS, it is recommended that you do it on the big data platform. After all, the processing and batch processing speed of the distributed architecture will have advantages. In addition, it is suitable for algorithm mining and text mining. Labeling traditional data warehouse technology is also unable to support.

 

Label loading: The label loading layer is the physical storage layer for labels to provide external services. The selection of the database is particularly important here, and there will be a special chapter for analysis later. In terms of data modeling, wide tables are the gold standard for labeled data models. Many customers ask me, is there any limit to the number of fields in a wide table? If there are too many fields in a wide table, can the table be split by topic and join? The performance response requirements of all label scenarios are at the second level, and any database table association will greatly reduce query efficiency, so these factors must be fully considered in database selection.

 

Tag management: Tag management refers to the management end of the platform application. The management end is for internal users, and the server is for customer service. The management end includes data agency, tag library management, tag metadata management, tag approval, tag removal and removal, tag application effect evaluation, derived tag configuration, customer group extraction, customer group insight and other tag lifecycle management functions.

 

Tag service: Tag service refers to the server side of the platform application. The management side and the server side need to be divided into microservices. The purpose is to decouple and recommend separate deployment to isolate the resources occupied by different channel calls. For example, the internal screening customer group load is too large At this time, the mobile banking tag API call service cannot be affected, and the services must be isolated. The server-side design should fully consider the pressure of concurrent load at the gateway level, and implement distributed deployment to ensure that the bottleneck of concurrent performance does not appear on the side of the java process.

[6] Label classification/design and implementation method

In the "data center" planning system, the data label is between the data warehouse and the data mart. Prepare for the data mart.

[7] Design of data labels

 Design steps:

1. Determine the label object

2. Get through the object relationship

3. Label category design

4. Data label implementation (label fusion table)

Two forms of label fusion table (vertical fusion table/horizontal fusion table)

 

Guess you like

Origin blog.csdn.net/weixin_29403917/article/details/127982972