User Diverse Preference Modeling by Multimodal Attentive Metric Learning

BACKGROUND

Existing models usually use a fixed vector p_uto represent user preferences. It seems inappropriate to assume that each dimension of the feature vector represents a characteristic or aspect of the user, because the user's preference for different items is different. In the same way, for example, users will like one movie because of the actors, and another movie because of the special effects. In view of this, this paper uses attention and metric learning methods to make the user feature vector change with different items.        

METHOD

p_u,q_i The latent feature vectors representing users and items, respectively, are randomly initialized.

Note: The core of the recommendation system is to learn the feature vectors of users & items.

It can be seen from the overall frame diagram that the training samples in this paper are triplets (u, i, k), where i is a positive sample (items that the user has interacted with), k is a negative sample (items that the user has not interacted with), and positive and negative samples The processing method is the same, the core is the attention module, and the attention obtained through the calculation is p_u,q_ichanged. Because the input of this module is user_id, item_id is the feature vector related to the item, so for different users and items, the calculated attention is different, and the feature representation of the user is also different.

In addition, this paper uses distance correlation to replace the dot product operation in MF (the dot product does not satisfy the triangle inequality, which limits performance)

Item Features

For each item i, the text featuresF_{t,i} and visual featuresF_{v,i} are extracted from the comments and pictures respectively . This paper performs feature fusion through a multi-layer neural network, as follows:

 And the final result, as the feature representation of item i F_tv,i = z_i

Attention Mechanism

 [;] for concat. The highlight of the attention section of this article is the conversion method from attention score to attention score.

   The dimensions of this paper \alphaare the same as the feature vector f dimension.

Why not use softmax directly?

It can be seen from the above that the output of this step acts on the feature vector of the user & item. In other words, a new feature vector must be multiplied by a number on each dimension of the initial id_embedding, and the input&output dimensions of the attention calculation module are equal to id_embedding= The dimension of f, assuming f=100, the average weight of 0.01 is very small, and the performance of the model will be affected due to the small distinction during training.

Optimization 

As can be seen above, the loss function consists of three parts. Let us look at the composition and function separately.

Metric Learning

m is the edge distance, [\ ]_+ = max\{0,x\} , here is the main part of the loss function, this paper replaces the dot product of MF with the Euclidean distance.

rank_d(u, j): In the top k recommendation, the position of item i in the list recommended to user u under metric d.

0 \leq ran_d(u,i)\leq k

Featured rank_dcast: Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, and Deborah Estrin. 2017. Collaborative metric learning. In WWW. IW3C2, 193–201

Regularization

1. Items with more similar text features and visual features are closer in the latent space.

Second, eliminate the dimensional linear correlation in the feature space through covariance,

Guess you like

Origin blog.csdn.net/qq_42018521/article/details/130324905