IQIYI recommendation system

This link: https://blog.csdn.net/jxq0816/article/details/102993212

REVIEW

In the current era of mobile Internet, in addition to rich professional content, UGC content is more explosive development , each user is both a consumer of content, also become content creators. These vast amounts of content to meet our needs, but also makes us more difficult to find what you need, in which case the personalized recommendation came into being.

Personalized recommendation is in big data analysis and artificial intelligence techniques based on research by the user's interest preferences , be personalized computing to provide high-quality personalized content to the user, to solve the information overload problem, and better meet users It needs.

First, the recommendation system introduced iQIYI

Our recommendation system is divided into two phases, the recall stage and sorting phase.

Recall stage based on user interests and behavior history, with tens of millions of video library to pick out a small candidate sets (hundreds to thousands of videos). These candidates are interested in the content users, sorting stage , more precise calculations based on this, can be precisely scored to each video, and then select a small quality of most interest to the user from the hundreds of thousands of candidate content (more than a dozen video).

Overall configuration of a recommendation system shown in FIG, function of each module are as follows:

1 user portrait: analysis of multi-dimensional people attribute contains the user's history of behavior, interests and content preferences tendencies, is the cornerstone of personalized

2 project features: contains the class attribute of the video content analysis, statistical characteristics and preferences of the crowd a full range of drawing and measurement is the basis for video content and quality analysis

3 recall algorithm: contains the recalled models of multiple channels, such as collaborative filtering, topic model, such as the contents of the recall and SNS channels, diversity can select preferred content from the video library

4 Sort models: multiple content channels will be recalled with a score sort, select the optimal result of a small amount.

In addition to these recommendation systems also take into account the diversity, freshness recommendation results, multiple dimensions and surprises force the grid degree , better able to meet the needs of diverse users.

Second, the recommended ordering system architecture

In the recall stage, the contents of the plurality of channels is recalled not comparable, since the data is too big and it is difficult to perform more precise quality evaluation preferences and, hence the need for accurate scoring uniform sort result recall sorting phase .

User satisfaction of the video is a lot of dimension factor to determine the importance of these factors in customer satisfaction are also different , even between the various factors as well as multi-layer dependencies , people develop complex rules both difficult to achieve good results, not maintainable, which requires the aid of machine learning, the use of machine learning models to synthesize a wide range of factors to be sorted.

Ranking system architecture as shown, mainly by a collecting user behavior, wherein the filling, the training sample screening, model training, online ordering prediction and other modules.

The body of machine learning process is more generic, design architecture does not require complex theory, is the need for more careful scrutiny of the details, and data flow architecture logic.

The architecture draws on previous experience and lessons learned, solves two problems in machine learning architecture on the basis of:

Training predictable consistency

Machine learning models produced in the accuracy of the difference between the training will model and predict a great impact, especially when inconsistent feature model training and online services, such as user feedback on the recommendation results will immediately affect the characteristics of the user's preference, in training, when the state of user characteristics have changed, if the model will have a very large error based on user characteristics this time.

Our solution is to save feature at the online service down, and then filled into user behavior collected samples, thus ensuring consistency of training and forecasting features.

Continuous iteration

Internet products continued iteration on the line is the norm, when architecture design, data preparation, model training and online services must be able to have a good support for the continued iteration.

Our solution is to model data preparation and training of the various stages of decoupling, and policy configuration of this architecture to make the model test becomes very simple, you can quickly test multiple iterations in parallel.

Third, the recommended machine learning sorting algorithms evolution

1, in ancient times

When we first on-line sorting machine learning model, the choice of a relatively simple Logistic Regression, will put the focus on architecture design, try to ensure the correctness of the architecture. In addition, explanatory LR model is strong, easy to debug, and weight can be explained by the characteristics of the recommended content right, find the deficiencies of the model.

Before training model, we first solve is the evaluation indicators and the optimization target problem.

  • Evaluation indicators (metrics)

Online evaluation indicators need to match the effects of long-term goals , such as the use of inputs and activity level of the user and so on. In our experiments, the industry's popular CTR is not a good evaluation index, it will be more biased in favor of a short video, title of the party and vulgar content.

Offline evaluation indicators are in accordance with the business to customize to match the online evaluation index, so in the offline stage can eliminate ineffective strategy to avoid wasting online traffic.

  • Optimization goal (objective)

Machine Learning will be solved in accordance with the optimized target optimal solution, if there are deviations optimization goals, there are also variations resulting model, and the model will continue to learn from the direction of the deviation in the iteration, the deviation will be more serious.

Our approach is to add the right to re-sample, and the sample weights added to the loss function so that the optimization objectives and evaluation indicators as much as possible consistent with the purpose of controlling the model.

LR is a linear classification model requires input is linearly independent feature. We use the dense features (dimensions between tens to hundreds) are often non-linear, and has a dependency, it is necessary to convert the feature.

Wherein conversion requires the distribution of features, characteristics and analyze the relationship between the label and then use the appropriate conversion method. We used the following categories: Polynomial Transformation, Logarithmic or Exponential Transformation , Interaction Transformation and Cumulative Distribution Function and so on.

Although the LR model is simple, strong explanatory, but in case features gradually increased, weaknesses are obvious.

① of them need to be manually converted to a linear feature, very human consumption, and quality is not guaranteed

Characterized in the case of ② twenty-two for Interaction of model predictions complexity Yes. In the case of 100-dimensional dense features, there will be a combination of the characteristic dimension of 10 000, high complexity, more difficult characterized

For three or more characteristics ③ Interaction almost infeasible

2, the Middle Ages

To solve the above problems LR, we model the upgrade for Facebook GBDT + LR model, the model structure shown in FIG.

Boosting GBDT model ensemble is based on the idea, by a decision tree composed of multiple pieces, has the following advantages:

① the distribution of the input feature is not required

② The entropy automatic gain conversion feature, combination of features, feature selection and discrete, high dimensional combination of features, eliminating the need for manual conversion process, and the plurality of support features Interaction

③ prediction regardless of the complexity and number of feature

The number n = 160 is assumed wherein the number of decision numbers k = 50, the tree depth d = 6, the complexity of the prediction model generations following comparison, the model complexity is reduced to the original 2.72% after upgrading

stacking model GBDT and LR with respect to the upgrade will only GBDT slightly greater benefit is to prevent GBDT over-fitting. After the upgrade is GBDT + LR, online about 5% to enhance the effect, and because of the new iterative testing omitted wherein the step of manual conversion, increasing feature also easier.

3, recent history

GBDT + LR ordering model input feature dimensions of several hundreds dimensions, common features are dense.

This feature good generalization ability, but the memory capacity is relatively poor, so it is necessary to increase high-dimensional (millions dimension above) recommended content features to enhance memory capacity, video features, including ID, labels, and other topics.

GBDT does not support sparse high-dimensional feature, if the high dimensional feature was added to the LR, one needs an artificial combination of high dimensional feature, on the other hand the dimensions model and computational complexity will be O (N ^ 2) level of growth. Therefore, the design of the model GBDT + FM As shown, alternative uses Factorization Machines Model LR.

Factorization Machines (FM) model as shown below, has the following advantages:

1. model formula

2. The first two of a linear model, the equivalent model of the role of LR

3. The third term is a quadratic cross terms can be automatically cross composition wherein

4. By increasing the implicit vector, the computational complexity of the prediction model training and reduced to the O (N)

5. Support sparse feature

These advantages make the GBDT + FM characteristic having a good support sparse, the use of FM GBDT leaf nodes and sparse feature (content characteristic) as an input, the model structure diagram below, GBDT + compared to on-line after the model FM GBDT + LR indicators to enhance the effect of between 4% to 6%.

A typical FM user id used in the model as the user characteristics, which can lead to rapid increases dimensional model, and the user can only cover some popular, relatively poor generalization capabilities. Here we use the user's viewing history and interest in the label instead of the user id, reduced feature dimensions, and because the user is interested can reuse, but also improve the corresponding feature generalization.

We mainly try to use the L-BFGS, SGD and FTRL (Follow-the-regularized-Leader) three kinds of optimization algorithms to solve:

1. SGD and L-BFGS effect or less, the effect of L-BFGS close relationship with initialization parameters

2. tRL, compared with SGD has the following advantages:

(1) with a regular L1, the learning feature is more sparse

(2) using a gradient of accumulated, accelerate convergence

(3) determining the characteristic feature learning rate in accordance with occurrence frequency samples, to ensure sufficient learning each feature

FM models feature appearing frequency vary widely, FTRL to ensure that each feature can be fully learned, is more suitable for sparse features. Line tests show that sparse feature FTRL SGD than 4.5% of the lift effect.

4, the contemporary model

GBDT + FM model, having the structure wherein the depth of embedding information such as the use is not sufficient, and the depth of learning (Deep Neural Network) capable of learning embedded (embedding) dense features and common features extracted depth information to improve the accuracy of the model resistance, and it has been successfully applied to many areas of machine learning. Therefore, we will sort DNN introduced into the model, improve the overall quality of the sort.

DNN + GBDT + FM ensemble of model architecture shown in FIG, FM layer as the last layer of the model, i.e. fusion layer, whose input consists of three parts: DNN last layer of hidden layers, GBDT leaf node output, high-dimensional sparse features. DNN + GBDT + FM ensemble of model architecture described below, on-line after the model with respect to GBDT + FM enhance the effect of 4%.

4.1 DNN model

(A) using fully connected network, a total of three hidden layers.

(B) Number of hidden nodes 1024, 512 and 256, respectively.

(C) pre-trained Embedding vectors users and video, including user behavior as well as two Embedding based on semantic content.

(D) DNN characteristic information from a deep well having a mathematical distribution extracted, such as embedding feature, after normalization and so on statistical characteristics.

(E) While not required DNN features must be normalized, but testing found that some features outlier because the fluctuation range is too large, it will lead DNN effect is reduced.

4.2 GBDT model

(A) a separate train, wherein the input comprises a dense non-normalized and normalized.

(B) a process capable of continuous and discrete features did not return.

(C) automatically input in accordance with features and combinations of discrete entropy gain.

4.3 FM integration layer

(A) FM DNN model and model as the same network at the same time training.

(B) wherein the DNN, GBDT output characteristics and sparse fusion and cross.

4.4 Distributed TensorFlow training

4.5 online prediction based on TensorFlow Serving Micro Services

DNN + GBDT + ensemble FM model used is Adam optimizer. Adam combines The Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation (RMSProp) algorithm. It has better convergence rate, each variable has its own step size decreases, the overall decrease step size is adjusted according to the current gradient, can accommodate data with noise. Experiment tested a variety of optimization, the effect of Adam is optimal.

5, the industry status quo DNN ranking

①Youtube launched DNN sorting algorithm in 2016.

② Shanghai Jiaotong University and UCL launched Product-based Neural Network (PNN) network in 2016 to predict the user clicks. PNN equivalent features in DNN layer made cross, our approach is to feature cross over to FM do, DNN focus on extracting depth information.

③Google Wide And Deep Model introduced in 2016, this is the basis of our current model, using the FM on the basis of the replacement of the Cross Feature LR, simplifies the computational complexity, improve the generalization ability of the cross.

Ali ④ attention this year to use the mechanism introduced Deep Interest Network (DIN) commodity CTR estimate, optimize the accuracy of the embedding vectors, is worth learning.

IV Summary

Sort recommendation system is a classic scene of machine learning, recommendation for the results of the impact is also very important, in addition to the excellence of the model algorithm, but more features for business, engineering, architecture, and details of the data processing pipeline flow careful scrutiny and in-depth optimization.

Ranking the introduction of DNN is just the beginning, also need to follow the model architecture, Embedding feature, try more diversity and multi-target cold start learning to do, to provide more accurate and more humane recommendation, optimize the user experience.

references

[1] Cheng, Heng-Tze, et al. "Wide & deep learning for recommender systems." Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 2016.

[2] Qu, Yanru, et al. "Product-based neural networks for user response prediction." Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 2016.

[3] He, Xinran, et al. "Practical lessons from predicting clicks on ads at facebook." Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ACM, 2014.

[4] Zhou, Guorui, et al. "Deep Interest Network for Click-Through Rate Prediction." arXiv preprint arXiv:1706.06978 (2017).

[5] Rendle, Steffen. "Factorization machines." Data Mining (ICDM), 20

Published 18 original articles · won praise 588 · Views 1.03 million +

Guess you like

Origin blog.csdn.net/hellozhxy/article/details/104008633