Recommender System Overview of Recommender System Series (Part 2)

In the first lecture of the overview of the recommendation system , we introduced the common concepts of the recommendation system, commonly used evaluation indicators, and the general recall strategy for the homepage recommendation scenario. In this article, we will continue to introduce the rest of the overview of the recommendation system, including the general recall strategy in the recommendation scene of the details page, the ranking model commonly used in the ranking stage, the cold start problem of the recommendation system and the recommendation system architecture. More details and more detailed content can be Refer to my  Github repo .

The Amazon cloud technology developer community provides developers with global development technology resources. There are technical documents, development cases, technical columns, training videos, activities and competitions, etc. Help Chinese developers connect with the world's most cutting-edge technologies, ideas, and projects, and recommend outstanding Chinese developers or technologies to the global cloud community. If you haven't paid attention/favorite yet, please don't rush over when you see this, click here to make it your technical treasure house!

General Recall Strategy in Detail Page Recommendation Scenario

General recall strategies in the detail page recommendation scenario include (the first two are the most commonly used): recall based on the similarity of item representation vectors; recall based on item association rules; recall based on clustering of item representation vectors.

  • Based on the recall of the similarity of item representation vectors, the common item representation methods are as follows: the representation of the explicit portrait of the item item; the representation of the entire embedding vector of the item item; the representation of the column vector corresponding to the item in the user-item interaction matrix (assuming that the user are rows, items are columns)
  • Recall based on item association rules (commonly used in shopping cart page recommendation or purchase page recommendation in e-commerce), find out the item sequences that appear frequently in all item data purchased by all users, do frequent set mining, and find items that satisfy the support degree (that is, the probability that two products are purchased at the same time) threshold associated items. The key concepts in association rule analysis include: Support (Support), which is the probability that two commodities (A∩B) appear in the total number of sales (N), that is, the probability that A and B are purchased at the same time; confidence (Confidence), which is the conditional probability of purchasing B after purchasing A; Lift, which represents the promotion effect of purchasing A first on the probability of purchasing B, and is used to judge whether the rules have actual value, that is, the products after using the rules Whether the number of occurrences in the shopping cart is higher than the frequency of the item appearing in the shopping cart alone.

When recalling, it is often necessary to build an index. Indexing all users is very storage-intensive and time-consuming, so when building an index, it may be appropriate to select monthly active users to build the index. When doing real-time recall, the user's behavior sequence characteristics can not only consider the behavior related to the recommendation business, but also consider other forms of the same application, such as the user's behavior in the search business. For example, the features of the YoutubeDNN recall model include not only the video id sequence/video embedding that the user has recently watched, but also the word sequence or word embedding that the user has recently searched for. The author mentioned that the user behavior of joining the search business has greatly improved the overall effect. The interesting thing is that the YoutubeDNN ranking model does not model the word sequences or embeddings that users have recently searched for.

Sorting models commonly used in the sorting phase

At present, the mainstream models in the ranking stage are based on traditional machine learning or deep learning models. The research on ranking models has always been a hot spot in the field of recommendation systems, and major domestic and foreign companies are making great efforts in this field. The current ranking model has the following trends: introducing behavioral sequence features; introducing attention mechanisms (such as DIN, DIEN, etc.); introducing multi-task/multi-objective (such as ESMM, MMOE, ESMM2, PLE, etc.); introducing multi-modality. Below we introduce several common simple sorting models.

  • LR logistic regression model, which is the most used model in the early stage of CTR prediction ranking task. The prediction function of LR is as follows:

The advantage of the LR model is that it is simple, convenient and easy to interpret. The disadvantage of the LR model is that when using LR, the discrete features are generally turned into one-hot vectors, which will easily cause the entire feature vector to become a high-dimensional sparse vector, thus increasing the difficulty of learning. LR is linear in nature. If you need to model a nonlinear relationship with the target variable, you need to manually introduce feature crossover to represent it. Compared with other models, more manual feature engineering is required. So far, there are two main usage scenarios for LR in the sorting phase: the first model in the sorting phase; use the LR model as a benchmark in the sorting phase or a bucket in the AB test.

  • GBDT+LR cascade model (for details, please refer to Facebook's paper), the idea is to use GBDT to encode all the original features, and then send the obtained encoding results to the cascaded LR for classification. Essentially, GBDT is used to automatically perform feature screening and combination. A variant of it is the GBDT+FM model, which uses FM to replace LR.

  • FM factorization machine model (refer to blog), which is a sorting model that was used more before the depth sorting model became popular. FM generally needs to convert category features including ID class features into one-hot vectors, so the dimension will be very high (the example in the figure below is for 3 users and 3 items). iQIYI replaces the user id with the user's viewing history and interest tags, which reduces the feature dimension, and because the user's interests can be reused, it also improves the generalization ability of the corresponding features.

FM can be regarded as a further extension of Matrix Factorization (MF, matrix decomposition). In addition to the two types of features User ID and Item ID, many other types of features can be further introduced into FM. FM automatically calculates the second-order intersection of features, which converts all these features into embedding low-dimensional vector expressions, and calculates the inner product of any two feature embeddings as the weight of the combination of these two features.

  • Wide & Deep networking Learning  (WDL) model, which essentially combines LR and MLP, is currently used more in the industry. The Wide part, LR, embodies the memory function, and the Deep part, MLP, embodies the generalization function, and the two complement each other to provide better performance. Different from GBDT+LR/FM (GBDT and LR/FM need to be trained independently), WDL is an end-to-end joint training. WDL can conveniently model the user's behavior sequence as a single feature. WDL pioneered the upsurge in joint modeling of wide and deep parts in deep sorting models. The wide part in WDL requires manual cross-features, which is its shortcoming (after the WDL model, there are many variants such as DeepFM, Deep & Cross networking learning, etc., and their core purpose is to design the network structure automatically. feature crossover). His network structure is as follows:

  • The DeepFM  model, which combines the FM part and the MLP part, does not require manual second-order cross-combination of features (this model is used a lot in domestic customers). Its network structure is shown in the figure below:

rearrangement stage

The rearrangement phase is mainly the intervention of business operators with various strategies/rules. In terms of the recommendation effect for end users, I think this stage is more important than the sorting stage. Rearrangement mainly intervenes from the following aspects (refer to blog):

The Cold Start Problem of Recommender Systems

The cold start problem of recommendation system can be divided into the following three categories:

For the cold start problem, it may be better to use a special recommended link. The specific method is as follows (refer to Zhihu blog ):

Recommended System Architecture

A good recommendation system architecture should have the following characteristics: real-time response to requests; timely, accurate, and comprehensive recording of user feedback (including explicit feedback and implicit feedback); graceful degradation; rapid experimentation with multiple strategies and multiple models.

There are two modes of online recommendation architecture: All in one process mode, that is, all logic including recall, sorting, and rearrangement are processed in one Recommendation server; decoupling mode, that is, the two parts of the logic are recalled and sorted separately. Handled by one service, the Recommendation server interacts with the two services separately.

Industrial-grade recommendation system architecture (refer to Zizhihu blog ):

The conventional model in the above figure refers to the periodic offline training and update to the online model; the real-time model in the above figure refers to the real-time collection of user behavior feedback, selection of training instances, real-time extraction of splicing features, and near real-time Updating the online recommendation model at the minute level. The advantage of this is that the user's latest interests can be reflected in the recommendation results in near real time. The reason for the coexistence of the regular model and the real-time model here may be that a recall model or ranking model cannot be incrementally trained, or the current regular model and the real-time model are in A/B Test deployment, or the regular model is used as a fallback option.

Netflix's personalized recommendation system architecture (2013) is shown below (refer to Netflix's official blog):

Netflix's recommendation system is divided into three parts: offline, near-online, and online. The online part should meet the low-latency SLA as much as possible to respond to real-time client requests. Online recall, prediction in the ranking stage, and business policy processing also belong to the online part. The offline part is a fallback option for the online part (that is, a method of graceful degradation), and it can provide a part of the final or intermediate recommendation results (such as a partition that is recalled all the way or a partition mixed recommendation), and it can provide some Precomputation of fields (such as user portraits and item portraits). Of course, the offline training of the model also belongs to this part. In the near-online part, in addition to incremental training and updating the online model in near real-time (such as at the minute level), offline recall results can also be supplemented based on the latest events, and interest tags extracted based on the latest browsing records of users can be added to user portraits.

Summarize

This is the end of the recommendation system overview of the recommendation system series. This article introduces the common concepts of recommendation systems, commonly used evaluation indicators, general recall strategies in the two scenarios of homepage recommendation and detail page recommendation, commonly used models in the ranking stage, rearrangement stage, cold start problem, and the architecture of the recommendation system. I believe that everyone now has a deeper understanding of the recommendation system. Next, we will discuss in depth the recall phase of the recommendation system. Thank you for your patience in reading.

The author of this article

Liang Yuhui

Amazon cloud technology machine learning product technical expert, responsible for the consultation and design of machine learning solutions based on Amazon cloud technology, focusing on the promotion and application of machine learning, deeply involved in the construction and optimization of many real customer machine learning projects. He has extensive experience in distributed training of deep learning models, recommendation systems and computational advertising.

Article source: https://dev.amazoncloud.cn/column/article/630a21b18a1013112795045a?sc_medium=regulartraffic&sc_campaign=crossplatform&sc_channel=CSDN

Guess you like

Origin blog.csdn.net/u012365585/article/details/130775074