Recommendation system (8) Evolution of classification TAB commodity flow multi-objective precise ranking model

1 Overview

The category TAB product flow is the product recommendation flow in all TABs in the Dewu App purchase page except the "Recommendation" page, such as "Shoes", "Luggage", etc. When the user enters the classified TAB, we can simplify it to the product flow recommendation given the <userId, tabId, itemId> triplet. It can be seen that the biggest difference between the recommended scenario of the classified TAB and other "open" recommendation scenarios is that, It is a recommendation under limited conditions (category) and is somewhat similar to the search scenario. The category TAB represents the user's category intention. With our current iteration progress, we mainly focus on the binary modeling of <userId, ItemId>. In fact, <userId, tabId>, that is, the correlation between user behavior and TAB, and <tabId, itemId>, that is, TAB The correlation with the product is what we need to consider in subsequent differentiated modeling; the following related progress mainly describes the implementation iteration of the more general product recommendation model. We use a multi-objective ranking model as the fine ranking strategy.


2. Model

2.1 Base ESMM

From the perspective of multi-objective learning paradigm, we choose the paradigm of ESMM model as our refinement model. We will not elaborate too much on the introduction of ESMM here. For details, please refer to the paper - "Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate》. From the paper, we know that the architecture of ESMM is as follows:

The business baseline model is to replace the MLP layer in the above figure with the structure of the DeepFM model, and add the FM structure to learn cross information. However, the model is still relatively shallow overall and does not extract more information representation of users. For this reason, we have upgraded the model structure.

2.2 Overall structure of the model

Our current modeling is <userId, itemId>. From the perspective of sample representation, item is a relatively dense and stable part. In a large sample environment, most of the information can be expressed by id embedding. On the contrary, user The part is relatively sparse, and the description of the user requires a large number of generalization features. The introduction and modeling of user sequence behavior can greatly enhance the distinction between samples, thereby improving the classification performance of the model . Therefore, we introduce a user modeling module to characterize user interests.

From the overall structure of the model, we do not change the learning paradigm of ESMM. We only improve the structure of generating ctr logits and cvr logits. The overall model structure is as follows:

The structure of Deep Interest Transformer is as follows:

Overall, we can see that for the ctr task and the cvr task, each task has its own main net and bias net; the user vector learned by the user interest modeling module is shared by the two tasks, that is, the interest transformer As an information extraction module, the extracted information is concated with other representations such as cross features to serve as the underlying shared information.

2.2.1 User behavior sequence modeling

We believe that the user's behavior is related to the product he or she clicks next, or has important attributes such as the same category, brand or series. In order to dismantle and analyze the impact of the correlation between user behavior sequence and recommended products on recommendation efficiency, we use the third-level category of products as the analysis dimension, as follows:

We analyze the user behavior logs on the tab product flow line and draw the impact of the relationship between the third-level category of the target item and the third-level category of the user behavior sequence product on the recommendation efficiency, as shown below:

The abscissa is the category of the product that describes the recommended product category and the user behavior sequence, and the number of behavioral products in the same third-level category in the sequence; the ordinate is pvctr; from the overall trend, the category of target item and The more relevant the product categories in the user behavior sequence are, the higher the pvctr is .

Of course, intuitively, the longer the click sequence (the higher the user activity), the higher the click rate. For this reason, we analyzed the relationship between sequence length and pvctr, as shown below:

Obviously, the deviation of user activity also affects pvctr, but through comparison, it can be clearly seen that the impact of user activity on pvctr has a small trend (slope); and the category of recommended products is different from the user behavior sequence. The trend of product category correlation is relatively obvious.

As for how to model the correlation between rated products and user behavior sequences, attention is obviously a reliable method. Self-attention can model the correlation between each product in the user behavior sequence, while target-attention is used to build A method to model the correlation between candidate products and user behavior sequences.

The Deep Interest Transformer in the model is a structure for learning user representations. We select the user's behavior sequence: real-time purchase, click to buy now, collection, click on product behavior, purchase within 7 days, click to buy now, collection, click on product behavior, and fuse it out A user behavior sequence up to 120 characters in length, the excess length is truncated, and the insufficient length is filled by default.

Deep Interest Transformer first performs multi-head self-attention on the user's behavior sequence to learn the correlation between each element in the sequence. This is to encode the user sequence; for the item to be scored, we call it the target item. We When decoding, the embedding representation of the target item is used as Q in attention, and the vector represented by the encoded user sequence is used as K and V in attention. This is target-attention, that is, for different scoring items, calculate the item and The correlation of elements in the user behavior sequence, so for different target items, the elements of the user behavior sequence activated are different, thus generating different user interest vectors, which will ultimately effectively distinguish the scores of different target items.

2.2.2 bias network

There are various biases in the recommendation system. For tab product flow, we analyze the user's own bias from several dimensions, such as the user's gender, device used, and registration location.

The comparison chart is omitted. The meaning of the picture is that different user groups have different biases. For example, male users have higher click-through rates on the wine and watch categories, while female users have higher click-through rates on the beauty and women's clothing categories; Android The click-through rate for digital categories is high among users but low among iOS users. We selected four representative regions in the country (Shanghai, Guangdong, Sichuan, and Beijing) in the southeast, northwest, and Beijing for analysis, and found that categories such as drinks, fashionable toys, and accessories , with obvious regional bias.

In order to explicitly model this bias, we add a bias net for each task. Each task adds a separate network for modeling, and its output logits are added to the main network. bias net In this stage, we only use the user's characteristics as the input of bias net. This separate network is used to model the user's own bias. For example, some users prefer to browse but not click, while some users have a high click rate, that is, the difference in activity, etc. In this case, using a separate bias net to model the user's own bias that has nothing to do with the recommendation results is a more effective method than simply adding bias features to learn in the main network.

More generally, bias net is not limited to using only user embedding for input. The recommendation system has many biases. If these biases are simply used as features for the main network input, the effect will not be as good as using a bias net alone to learn these biases. Bias net can not only model the user's click deviation for position, but also model the user's deviation for time features. It can more generally concat the user vector and the representation of various deviation features, and then input it into bias net for learning.

To sum up, the above model changes are the second version of our multi-objective model in classified TAB product flow. We have conducted long-term experiments, achieved good online returns, and have been promoted.


3. User long-term behavior sequence modeling

3.1 Long-term interest

From the iteration of the above version, the user behavior we use is limited to the product sequence in the user's real-time portrait and the product sequence within 7 days of the user being offline. From the analysis of sequence samples, we use the length of 120 as the maximum length of the sequence, where The average effective product sequence length is only 61; the median effective product sequence length is only 65. As shown in the following table:

That is, a large number of sequence lengths are filled with invalid default values. After attention is masked, this greatly weakens the user's expression of interest . Therefore, effectively expanding the user's effective behavioral length will be able to enrich the user's behavioral characteristics, which will enable some inactive users to make recommendations based on long-term behavior. In fact, even though users are very rich in behavior, their long-term purchasing, collecting and other behaviors are also beneficial to current recommendations.

In the above analysis, we use the third-level category as an analysis bridge to determine the trend impact of the correlation between the user behavior sequence and the candidate product on pvctr; for the long-term user sequence, whether there is the same trend, for this reason we will The long-term series is analyzed after excluding sequences within 7 days, as shown below:

It can be seen that the recommended product categories are still relevant to the categories that the user has used in the long term; the more relevant the categories in the user's long-term sequence of products are to the candidate product categories, the higher the pvctr of the candidate product.

Therefore, the intuitive approach is that we introduce the long-term behavior of users without considering the time span of user behavior. We fill the 160 items of the user's recent behaviors into the sequence we constructed before in a deduplication manner, and truncate the most recent 120 behaviors in time order . As can be seen from the table above, after filling long sequences, the user's effective sequence behavior length has a median of 120 and an average of 101. This greatly enriches the user's characteristic expression.

Judging from the offline evaluation indicators, ctr auc: +0.3%, cvr auc: +0.1%.

3.2 Long-term and short-term interest modeling

In the above versions, our modeling method is to integrate all user behaviors into a large sequence to generate user interest vectors. In fact, users' different behavioral time spans reflect different interests. We hope to model users' behaviors at different time spans in the model to describe users' interests at different granularities.

Moreover, based on the analysis of the clicked products, the categories in the user's short-term behavior are more relevant to the clicked product categories, while the categories in the long-term behavior are less relevant. As shown below:

Obviously, the category overlap between the product clicked by the user and the 10 products that were recently used is the highest, while the category correlation with the 50th product that has been used in the past gradually decreases significantly.

In order to consider the impact of this short-term and long-term behavior sequence on candidate products, we divide user behavior into short-term behavior and long-term behavior. We consider long and short interests in user interest modeling. We currently use real-time methods for long and short-term sequences. Portraits are divided into portraits and offline portraits. Experimentally, we tried the following two methods for modeling.

3.2.1 Separate modeling of long-term and short-term interests

Model the short-term interest and long-term interest user vectors respectively, which are Sv and Lv respectively; then concat [Sv, Lv] to obtain the user interest vector Uv, which is learned by the network of the respective tasks in the upper layer; as shown in the figure:

That is Uv = concat([Sv, Lv]) 

3.2.2 Long-term and short-term interests are integrated through the gate network

The short-term interest and long-term interest user vectors, respectively Sv and Lv, are integrated through the gate network, as shown in the figure:

即 Uv = aSv + (1-a)Lv

Among them, gate net, the input is the user feature vector and the user's long-term and short-term interest vectors learned by attention. It is activated with the sigmoid function through the MLP network, so that the short-term vector and the long-term vector are fused through the fusion gate to obtain a new user interest expression Uv . As for how to select the characteristics of the gate network, we refer to the structure of MMoE. At the same time, we believe that the characteristics of the users themselves should be the ones that can distinguish users' long-term and short-term interests.

In summary, the above two long-term and short-term interest modeling methods, judging from the offline evaluation indicators, auc: about +0.1%, have been combined with the above long-term interest modeling in the classified TAB scenario for online experiments. The two modeling methods have basically the same final effect. It is true that the first method concatenates the long-term and short-term interests and then hands them to the upper-layer tasks for autonomous learning. It also increases the specific network parameters of each upper-layer task. The second way will depend on the learning of the gate network.


4. Outlook

As mentioned above, the recommendation scenario of classified TAB actually belongs to the product flow recommendation of <userId, tabId, itemId> triplet. Our current work focuses on the modeling of <userId, itemId> triplet, and Mainly modeling on the user side to improve generalization. However, the information of the TAB itself is also a consideration in this scenario. For example, in different TABs, some users prefer high clicks and some prefer browsing. How to consider the differences of modeling different TABs in the model will be a follow-up direction. .

At the same time, the correlation between item and TAB itself will also be a direction of consideration. From a category perspective, TAB is similar to search category words and has category intent. Item and TAB have strong correlation and weak correlation, similar to search categories. Correlation is divided into strong and weak correlation bins. We believe that items that are strongly related to the TAB and hit the user's interests will be more likely to be clicked and converted. Of course, this requires further analysis by us.

5. Quote

1- Evolution of multi-objective sorting model for classified TAB commodity flow_Algorithm_Dewu Technology_InfoQ Writing Community

Guess you like

Origin blog.csdn.net/Jin_Kwok/article/details/131846881