Qichezhijia recommendation system sort of way iterative algorithm

about the author:

Lichen Xu, senior algorithm engineer car home. 2014 joined the car home, has engaged in search intent analysis, text mining, sorting and so recommended. Massive resources in the home and car of hundreds of millions of users of this platform, and actively try and landing the latest algorithm research, but also actively try to innovate on this basis, and have achieved some results.

team introduction:

Car home car home recommended team responsible for home and other scenes of personalized recommendation and to enhance the effect, by creating a common platform for efficient recommendation, while supporting internal referral scene began technological achievements of foreign output, supporting millions of users to enhance the experience. Industry-leading large-scale data platform, GPU clustering and machine learning platform provides the foundation support for the team. Meanwhile, the team to follow up on the latest research results, to encourage bold and innovative practice at the same time, the effect of making innovative and leading models and architectures.

 

REVIEW: car home recommended system followed by cutting-edge technology, while supporting multiple internal recommendation of the scene, there is a degree of external output. We look forward to the next car home recommended system is not only cutting-edge technology applications, and even more is the promoter and innovator. The shared theme of qichezhijia recommendation system sorting algorithm iterations of the road, including:

  • Car home recommended system

  • Ranking Model

  • Characteristics and training samples

  • The future direction of optimization plans

01 . Qichezhijia recommendation system

1 Overview

Line nearly 5-year-old car home on recommendation system, mainly to home users recommend resources personalized automobile. On-line recommendation system is the car home APP ecological classification system to distribute content to transform recommended as personalized an important milestone. Car home recommended resources include: professional editing, automobile big V, user forums production of articles, videos, pictures, and items such as cars, on the total amount billion.

Recommended target is to recommend his favorite content to the user, can make recommendation for the dismantling of three sub-goals:

  • First, the user's understanding

  • Second, the characterization of resources

  • The third is the best match users and resources

The above three target again disassembled, comprising a user to understand the user attributes, behavior and behavior collected representation, characterization resources comprises matching attributes and their assigned resources external features, users and resources are plentiful method or model is the core of the work of recommendation system, the pursuit of better and better matching, recommendation system is evolving power lies.

Matching can be divided into two parts, and ordering the recall, the recall is to find as many users prefer the resources to do it again is sort of resources found preferable, recall and sorting can be further subdivided.

2. Architecture 

Recommended system from massive resource library, users quickly find the resources of potential interest. There are four main areas:

First, gather resources;

Second, find the resources from all the resources the user is interested in;

Third, the resources of interest to the user ordered based on user preferences;

Fourth, the output to n resource users are most interested in.

The recommended system architecture is designed based on the above four areas, generally consists of the following modules.

among them:

  • Resource Pool: various types of mass storage resources typically stored by the database, such as mysql, hive, redis like.

  • Label generation: the label is structured portrayal of resources more dimensions. For example: category tags, keywords, quality grading.

  • Index: various types of label resources, build keyword inverted index, and search engine and other similar vectors.

  • Filter: Exposure history of the user, the negative feedback filter.

  • Recall: user portrait or user Embedding tags to index and vector library to obtain relevant candidate resources.

  • User portrait: user attributes and based on the historical behavior of the user to the user to play tag.

  • Sort: scoring by each candidate resource model to predict the recall.

  • Feature / model: User feature order dependency, and resource characteristics model.

  • Operations: business strategies, including lifting weights, accounting for exposure control, break up other business strategies.

  • Output: the return package sorted and operational topN resources to the client.

02 . Sort models

1.  Model Introduction

Auto House Home recommend ordering model is mainly experienced LR, XGBoost, FM, DeepFM, DeepFM Online Learning these major evolution, in the process also experimented as Wide & Deep, DCN, LSTM, GRU and other models.

CTR LR model is an early estimate of the field's most successful model, most of the early industrial recommend ordering system to take LR This "artificial + linear model features a combination of the introduction of non-linear" model. LR model has a fast train, on the line fast, strong explanatory, easy on the scale advantages, there are still many practical system to take this model. Meanwhile LR model is also home recommended baseline model validation and model in the latter part of the initial ordering system.

XGBoost models of LR using the same feature, on the line after a relatively significant results achieved, then added user features real-time item and the experimental group was further improved CTR, generally opposing lifting 6%. After the control and observation continued for 3 weeks, the effect of improving XGBoost been verified, the total amount be replaced LR baseline model and the model line. After also tried XGBoost + LR, the effect is not improved significantly compared to XGBoost.

FM simple and elegant achieved the second-order combination of features. Easier to achieve combination of characteristics is contemplated that addition of second order characteristic based on the LR can be a combination, i.e., any combination of two features, the feature combinations of the novel features considered, added to the LR model. The right combination of features of the training focuses on the stage of learning available. But this combination of feature-based modeling, generalization is relatively weak, especially in the presence of a massive scene sparse features. FM model may also be introduced directly into the second order features of any two combinations of features, but for each feature, learning the weight value for the size of a feature of a k-dimensional vector, Xi and Xj two features combined, through the corresponding feature vector Vi and the inner product Vj <Vi, Vj> are represented. It is characterized by features Embedding of this nature, and various entities Embedding current common essential idea is the same. Recommended combinations of features for sorting is very important, as DNN model can not do without this feature combinations of features, but the MLP is a kind of inefficiency captures combinations of features of the structure, so the sort of depth model related, basically have a similar combination of features section FM . The main FM model practice is to increase the sample stage consists of several 10 to 300 million, CTR online experiment by 2% relative contrast XGBoost as soon DeepFM subsequent on-line, and did not push the whole FM.

Wide & Deep depth model is the earliest recommended areas to achieve greater success, introduced by Google in 2016. Wide & Deep model comprises a Wide portion and Deep portion, Wide portion of LR, the input is discrete characterized the one-hot and the like continuity characteristics after frequency division tub, which portions may be the memory of the more significant correlation characteristics and the target sample learning; Deep part of the MLP, the input is discrete and Embedding characteristics after continuous characteristics after normalization, can be generalized to study the potential correlation between the plurality of features not see the target sample. Another advantage of using Wide & Deep Wide part is that there is, before you can follow the shallow learning outcomes, particularly characteristic part of the project. Wide & Deep on the opposite lift line CTR 3%, 0.5% lower than the DeepFM line over the same period, it is mainly used DeepFM line model.

DeepFM the Wide & Deep Wide section of FM LR replace feature to avoid artificial project. DeepFM compared Wide & Deep model to better capture the low-level feature information. Meanwhile, Embedding layer portion Wide & Deep Deep need for a separate part of the design, but in the DeepFM, FM and Deep Embedding layer partially shared, FM training parameters obtained both as an output of the wide part, but also as an input portion MLP. DeepFM support end-end training, Embedding and network weights joint training, without pre-training and individual training. After DeepFM relative CTR on-line upgrade 3.49%, slightly better than the same period last line of Wide & Deep. After the whole effect has been verified DeepFM push model, making the whole amount of the baseline model and the model line. Bring to enhance the effect is to increase the recommended online ordering model predicts time-consuming, not to exceed the maximum level in order to ensure sound, without significantly improve loss model, the parameters were optimized Deep section, including the reduction of hidden dimensions and Embedding the number of neurons and the number of layers like layer. By observing the effect on the line after deployment model, the model predicted time-consuming reduced, and there is no significant fluctuations CTR, is still significantly higher, indicating strong generalization capability depth of learning, even reduce neural network configuration model, still can better fit the sample.

Wide & Deep and DeepFM and the like, part of the DCN to upgrade Wide Cross network, on the one hand Cross network through explicit cross higher-order features, on the other hand by fitting residue between layers, it is possible to dig deeper nonlinear combined with features of the relationship between goals, more quickly reach a stable fit state. In the case where the same features and DeepFM, and also experiments CTR DeepFM flat. According to the characteristics of the iterative experience and business lines, try to optimize the model structure of DCN. In DCN original Embedding and Stacking layer, characterized in that discrete Embedding after treatment, and continuous feature is directly spliced ​​together, back into the input unified network, such deficiencies are caused by: Although the network may be both explicit and implicit effective way of learning intersection between the feature, but the lack of a single internal tap feature information, this part of the pressure is transferred to the project characteristics, thereby increasing labor costs. All features are extended based on full use of each feature, adaptive learning to consider more information by ordering model attempts. Between the DCN Embedding and Stacking and network layer, spreading layer incorporating features of experiments, each feature from the original dimension expanded to 1 n-dimensional, and only discrete Embedding different process characteristics, process continuity feature dimensions also extended . The extended features in the accompanying increase in dimensions at the same time, more will enter their own information to the network layer to participate in higher-order calculation, compared to using only the original features, the model can be more in-depth use of existing features, saving labor characteristics of the project cost. Offline experiments, in the case of the same characteristics, compared to DCN, after the feature extension line based on the experimental DCN CTR relatively increased by 1%.

While the above model iteration LSTM further experiment, GRU other models, are next click LSTM GRU and sequence-based model, the model structure is very simple, for a Session clicks sequence X = [X1, X2, X3. ..Xn-1, Xn], in turn, X1, X2, ..., Xn-1 input into the model to predict a next of which one is clicked Item. First, each item in the sequence Xt is converted to one-hot, which is then converted into the corresponding embedding that, after the N hidden layer unit, and then through a lower probability get a full layer coupled to each item is clicked. CTR yet these models are significantly better than DeepFM, so are still in the experimental stage, did not become a major online model.

In practice DeepFM and other models in depth, easy to make the mistake of not considering the characteristics of the composition and number of samples, direct violence increases the complexity of the model, leading to a surge in training time, time forecasting model files larger and longer lines, The final cause recommendation system service timeout. If a small number of features, feature works perfect and reasonable sample selection, using simple but deep learning model can achieve better results.

2.  Online learning

Online learning with user feedback collected in real time, real-time updates model parameters to predict the impact of real-time feedback of user behavior change. Online learning can be understood as the relative off-line learning data set infinite, infinite time series, using a sample data stream one by one to update the model, the online learning is done on-line after DeepFM in optimizing model updates. Before the updated model from one week to the day, in order to more quickly learn to model the behavior of real-time user, we will update cycle model to do the minute level.

Here the work has two parts, one Label and get real-time characteristics, and second, real-time updates of the model.

For real-time access Label and features is the use of Label features and client server to dump (impressions, clicks) carried join generated by a unique identifier id for each request, to note here is that Label must and when the feature request times join, If the feature data is updated after the Label, problems can occur through the features.

After the model is updated in real time the samples acquired in real time to accumulate a BATCH, proceeds iteratively updated, the updated model every 10 minutes to push a line. A total of 10 minutes to ensure the user to capture real-time behavior of the effectiveness of the premise, not only can reduce the difficulty of engineering implementation, but also can reduce the jitter of the sample, the sample unbalanced situation can take a sampling strategy for processing.

3.  Sorting Service

Sort of service provided in the form of API, in which:

Input:

Deviceid: uniquely identifies the user, id get the properties and behavior characteristics of users through this in-house services.

Itemid: This is a resource id for the user to be sorted, in-house services will get the information on these resource properties, heat and labels.

Pvid: uniquely identifies times when the request for the associated client and server logs.

Model-name: the name of the model, which specifies the selected model, the sort service offers several models to choose from.

Model-version: model version, and with the use of Model-name, specify which versions of the same model, the parameter is mainly used for iterative optimization model.

Debug: This parameter is used to output some of the intermediate results in the sorting process.

Output: Itemid its scoring.

Multiple models ordering service has a different strategy update, the update cycle is configurable. Ordering service also depends on the characteristics of services and resources to obtain the user's characteristics, there are different models correspond to different features of project processing.

4. The  model update

Sort small flow through model experiments verified offline, online direct heat can be regularly updated. The total amount of the sort of online updating the model in addition to offline validation, but also before the full-push on a pre-line experiments, on-line to ensure that the pre-experimental group and the like on the CTR data no problem, then all updated model.

5. The  AB Experiment

In the field of machine learning, AB experiment is the primary means of final results validate the model. The main means of AB experiment was conducted user points the barrel, is about to users into the experimental group and the control group, the experimental group of users to impose new model, the user group to impose the old model. In the process of division of the tub, pay attention to the independence of the sample and sampling methods unbiased, ensuring the same user can only be assigned to the same bucket, select DeviceId must be completely random in the sub-barrel process, so in order to ensure the bucket sample is unbiased. Dividing the experimental group and control group randomly selected DeviceId must be under the same constraints. Below a, b, C division is not correct, wherein a divided user exceeds actual experimental constraint user group, b experimental group of users correctly, but all the remaining users is wrong as a control group, the C experimental correct user, user group has expanded is wrong, only d is correct.

Line AB has three main parts, one experimental points and barrel configuration, and second sub-code logic corresponding to the tub, the tub three experimental points and performance data. Flow experiments are orthogonal between each of the plurality of sub-flow experiment buckets are mutually exclusive. As we set up a model of the sorting experiment, a plurality of divided barrels, 2%, 5%, three 10% of the flow rate control, the model is on the line from start to gradually expand the amount of 2% for comparison to verify the effect.

6.  Model Training

Sort recommended training model is mainly based on home machine learning platform AutoPAI. Car home AutoPAI is a visual drag assembly modeling support, support hundreds of machine learning algorithms component that supports multiple deep learning framework, and Hadoop, Spark open, distributed GPU platform support deep learning training model and supports the development of online , automated deployment capabilities.

Our simple model such as LR, XGBoost can drag directly on the interface data source, feature processing component, component model, verification component to train and save the model, followed by the deployment of a key online services. Depth model to support online development, debugging, and then debugged the code submitted by the depth of learning components, select the number of GPU card training, after training is completed can be a key deployment services.

7.  Visualization  Debug

To validate the model or single request policy effects, generally on the line through the white list validation, but still want to code or configuration of on-line, on-line on the one hand there is a risk, on the other hand generally on the line is more experimental multiple functions together on the line, even if an experiment Add a whitelist may also be affected by other experiments. In this regard, we are based on the idea of ​​the code Debug, debugging on the program before the official release, we do recommend Debug system, the effect can be verified and validated by the middle part of the front line in the Debug system experimentally.

Our main recommendation Debug platform to verify the effect of input has two parts, one is recommended interfaces, and second, the experimental configuration. By changing the parameters of the two parts, you can simulate the real return line rendering, and may output the result of the entire intermediate link request. Debug platform also supports direct verification of the index, recall, sort, and other sub-modules, you can query the resource characteristics, portraits user, the user clicks exposure behavior. Debug on-line platform greatly improves the efficiency of our on-line.

03 . Characteristics and training samples

1. Introduction wherein

Input model generally comprises: a user characteristic illustration, item characteristics, characterized in context, cross feature, the location and sequence features characteristic item, wherein:

User portrait features are: the user's own attributes, such as gender, age, occupation, geographical and other; user behavior, such as different time windows through long, clicking, searching, posting, collection, thumbs and other acts; interest based on behavior arising preferences, such as cars, labels preference; behavior-based derivative statistical indicators, such as user ctr, activity and so on.

Item features have: item own attributes, such as title, text word count, the number of pictures and other; item-based mining features, such as content classification, keyword, emotional, professional degree content, content richness, of influence and so on; item by the user given behavior, such as uv, pv, ctr, collection, thumbs up, reply and so on.

CROSS wherein there are: a user tag matching the tag item.

2. Processing Characteristics

Not easily characterized directly using the original model fitting, so after passing model needs further processing, including: an abnormal value processing, normalization and frequency division like the tub.

Outlier handling:

The training samples are generally characterized by the presence of an abnormal value, for a discrete features, which can be separately allocated to a position one-hot, and there will be no full to zero, but for continuous characteristic, its normally assigned default value is 0 if the feature does not participate in the result of the calculation, if the average of the physical meaning of the various features may not meet. In order to obtain reasonable default values, sort processing stage characterized by the model, wherein the continuous introduction of each weight and bias is not 0, the default value is calculated as: weight x featurevalue + bias, weight and by the model bias Training that occurs when an abnormal value, the characteristic value is equal to the default bias. Offline experiments, loss test set is significantly decreased, CTR is also superior to the on-line default value 0 or averaged.

Normalized:

Continuous characteristic value distribution is not uniform in general, the amount of exposure and CTR e.g., exposure value greater than 1, and the maximum that may be several million, and the CTR value interval 0 to 1, if the input directly to the sorting model, the uneven distribution of statistical characteristics can lead to fluctuations in training, not only affects the convergence speed, and may end up not fitting. Conventional normalization method comprising: min-max, log, and other standard observed by off-line test set Loss experiment, wherein best results min-max.

Such as frequency division barrels:

For continuous features, when the abnormal characteristic value line changes, there may be generalized or poor robustness shortage. To this end the introduction of other frequency division tub, i.e. to sample the frequency distribution characteristic, a predetermined boundary value for each of a plurality of good characteristics, from the original feature values ​​into different buckets, then one-hot processing according to the number of barrels . NN part depth model of continuous use since long-tailed distribution features have been restricted to a maximum, then the experimental part of the NN using discrete features better, also uses discrete features.

3 .  Expression patterns

Our model not only introduces a sort of large-scale sparse features, and achieve various forms of expression vectors.

Such as content classification based on the item's Bert embedding, embedding item based on images and video, behavior-based Graph embedding, LSTM embedding and so on.

4. Production Characteristics

Characteristics of users and resources to do the offline plus real-time and off-line feature stores a user's behavior and resource the last three months, the real-time features a second level to do an update. Wherein production system architecture is as follows:

The service features

Service features to support basic services ordering services, off-line and real-time output characteristics of the main users and resources.

6.  training sample generation

Ordering service called each time the user requests the service features, access to real-time and historical features, input model, and entered into a dump queue and then click on the client's exposure samples join generate real-time training model update, the process above.

04Future direction of optimization plans 

Model target: subsequent optimization objective is not limited to CTR, clicks but a comprehensive, interactive, multiple targets while optimizing the duration, etc., this is the future objective optimization trends. Multi-target model may be independent of each target modeling and optimization of fusion can also be achieved through multi-objective network share parameters.

Model expression: that is, to upgrade the network structure, such as the use Transformer better feature extraction, feature automation engineering, AutoML automatic design better network model, and recommended scenario is consistent reinforcement learning.

Feature expansion and integration of information: There are multimodal information fusion expression short and long term interest of the user more accurate Embedding expression, text, images, video, interactive behavior, and so on.

Qichezhijia recommendation system followed by cutting-edge technology, while supporting multiple internal recommendation of the scene, there is a degree of external output. We look forward to the next car home recommended system is not only cutting-edge technology applications, and even more is the promoter and innovator.

Published 363 original articles · won praise 74 · views 190 000 +

Guess you like

Origin blog.csdn.net/sinat_26811377/article/details/104559739