Youtube deep learning recommendation systems

Here is Wang Zhe machine learning notes , every one to two weeks I will stand angle calculation algorithm engineer to explain some of the advertising, recommendation systems related articles. Select the article about the three conditions must be met:

  • First, project-oriented;
  • Second, Ali, facebook, google and other first-tier Internet companies produced;
  • Third, classic or cutting-edge.

This week we discuss together Youtube depth recommendation system thesis " Deep Neural Networks for YouTube Recommendations ", which is the 2016 paper, by today's standards, there is nothing new place, I also read this two years ago, after the article put down, but a few days ago reread this article, has allowed many highlights found almost everywhere is routine, experience everywhere, are not scared to God text. God gave me this paper the impression that there are two points:

  1. This is undoubtedly a model of community paper industry , is my very respected engineering-oriented algorithm engineers must-read articles;
  2. I think that humble place, also hidden in Youtube engineers valuable engineering experience, compared to last week's introduction of the depth of interest in the network DIN Ali , the most important value is that the mechanism Attention, this article should be accurate to sentence you to appreciate this is the reason I'm scared to God text.

 

Ado, here I would like to share different experiences and harvest twice have read this paper.

The first pass read the paper, I think everyone is directed at architecture algorithms to go in depth learning recommendation systems have become a major company, "Basic Operations" Today, Youtube algorithmically architecture does not surprise at, let's take a quick look algorithm architecture article deep learning recommendation systems.

 

Youtube user recommendation scene Needless to say, the world's largest UGC video site, the need for personalized recommendation in a mega-scale video. As the candidate video collection is too large, consider online system delays, not to recommend the direct use of complex networks, so Youtube took two deep Web to complete the recommended process:

  1. The first layer is a Candidate Generation Model complete video of the rapid screening of candidates, candidate video collection this step reduced from one million to the order of hundreds.
  2. The second layer is Ranking Model completed hundreds of fine discharge candidate video

First introduced the architecture candidate generation model

Youtube Candidate Generation Model

 

Our bottom-up point of view of the network, the lowest level of input that a user watched vector video embedding, and embedding vector search term. As for how this embedding vector is generated, the author of the original, it's

Inspired by continuous bag of words language models, we learn high dimensional embeddings for each video in a xed vocabulary and feed these embeddings into a feedforward neural network

After the authors is to use word2vec method for video and do a search token re-embedding as an input, which is done embedding the "Basic Operations", without much introduction; of course, in addition to the other we should be more familiar with It is by adding a layer of embedding training DNN together with the above two methods is better, what is suitable for the occasion, we can talk about.

 

Feature vector which also includes the geographical location of the user's embedding, age and gender. Then concatenate all these features together, feeding the upper ReLU neural network.

After the three-layer neural network, we see softmax function. Here Youtube students to look at this issue as a user recommendation problem next watch, so the output should be in all probability a candidate video distribution, nature is more than one classification.

Well, the "Basic Operations" which set down deep learning, constitute Youtube of candidate generation network, innocuous, but it is still hidden some problems, such as

  1. Chart method the upper left corner, why not directly use this network to predict in online serving time and to use the nearest neighbor search of?
  2. Multi-classification problems, Youtube candidate video of a million is huge, which means there are millions of classification, which will inevitably affect the training effect and speed, how to improve?

These problems reading the first pass when I do not think deep dark to see, but it is bound to encounter problems in engineering implementation, we followed in-depth presentation of papers solution.

Since hundreds of candidate sets obtained, the next step is to use fine sorting ranking models, the following is a schematic diagram of the learning network depth ranking.

Youtube Ranking Model

 

At first glance the above ranking model seems to be no different candidate generation model, the model architecture and depth of learning "Basic Operations", the only difference is the feature works, then we talk about the features of the project.

Indeed description also clearly illustrates that the introduction of another set of features as the object of DNN is to introduce more ranking model described, video and user relationship between them, a set of candidate video achieve the purpose of accurate sorting.

During ranking, we have access to many more features describing the video and the user's relationship to the video because only a few hundred videos are being scored rather than the millions scored in candidate generation.

Specific point, in order from left to right is characterized

  1. Video ID embedding impression : Video of embedding current to be calculated
  2. Video IDs Average embedding Watched : user watched the last N video embedding the average pooling
  3. embedding Language : embedding the user's language and the language of the current video embedding
  4. Operating since Last Watch Time : since the last time the same channel to watch the video
  5. Impressions #previous : The number of times the video has been exposed to the user

Five characteristics above, I would like to focus on the fourth and fifth. Since the introduction of these two good observation of user behavior.

Thoughts of four features behind

We observe that the most important signals are those that describe a user's previous interaction with the item itself and other similar items.

Some attention introduction means, here is the time since last watch interval characterized in that the reaction user to watch the same video. Think from a user perspective, if we had just seen "DOTA classic review," the channel's video, we will continue to see a high probability that this channel of video, then this feature will capture this good user behavior .

The first five feature #previous impressions of the introduction of the exploration of ideas to some extent, to avoid the same video continued exposure to the same user is invalid. Try to increase the likelihood of exposure of new video user never seen.

 

At this point, my first pass read the paper over, algorithm framework of Youtube with the concept, but the overall feel much better than this, nothing much new place. .

But if you really think, or too naive, and with the previous depth of interest in the network DIN Ali's different is that you read DIN mechanism of attention, you will catch 70% of the value of their paper, but this article , if you only read the recommended system architecture Youtube, you just grab 30% of the value . So where is the remaining 70% of the value in it?

When re-read this article, I'm from the perspective of an engineer, always Bengzhuo "how" this string found before the project is worth the paper I largely ignored. Below I've listed ten very valuable paper addressed the question:

  1. Paper converting recommendation problem into a multi-classification problem in the next watch the scene, each candidate will be a classified video, so a total of classification millions of giant, which when used softmax training is undoubtedly inefficient, Youtube question is how to solve?
  2. In the course of serving the candidate generation model, Youtube Why not just adopt a training model to predict when, instead of using a method of nearest neighbor search?
  3. Youtube users have a preference for a new video, then how to introduce this feature in the process of model building?
  4. In the pre-treatment process of the training set, Youtube did not use the original user log, but the number of users for each sample extraction training, which is why?
  5. Youtube Why not take a similar RNN of Sequence model, but completely abandoned the timing characteristics of the user's viewing history, the recent browsing history equated this with no loss of useful information it?
  6. In dealing with the test set, Youtube Why not leave a classical random method (random holdout), but the user must take a recent viewing behavior as a test set?
  7. In determining the optimum target time, Youtube Why not use the classic CTR, or playback rate (Play Rate), instead of using each expected playback time (expected watch time per impression) exposure as the optimization target?
  8. When making video embedding, why the long tail of video directly to a large number of vectors instead of directly with the 0?
  9. After for certain features, such as #previous impressions, why should the prescribing and square deal, as the input feature three model?
  10. Why ranking model does not use the classic logistic regression as the output layer, instead of using a weighted logistic regression?

 

Because I also work in the field of video recommendation, so you can responsibly say more than ten questions are very valuable. But today I write to you in one breath, feeling a little lack of air capacity. . If you are interested, then you can point a praise, I will analyze more than ten answers about the issue in detail tomorrow.

Answers to the above questions it has been completed, refer to my next article -

Wang Zhe: Top Ten YouTube depth engineering study recommended system problems zhuanlan.zhihu.comicon

Well, here is Wang Zhe machine learning notes the second article, limited level, we welcome Tucao, criticism, correction.

We also welcome attention to my namesake micro-channel public number Wang Zhe machine learning notes ( wangzhenotes ), or for further discussion by the public exchange number plus my personal micro-letters, thank you.

 

References:

  1. Deep Neural Networks for YouTube Recommendation
  2. Recommender System Paper List
  3. Attentional mechanisms Recommendation System - Ali depth of Interest Network (DIN)
Published 18 original articles · won praise 588 · Views 1.03 million +

Guess you like

Origin blog.csdn.net/hellozhxy/article/details/103976126