#Reading Paper# [Sequence Recommendation] Session-based Recommendation with Graph Neural Networks

#Thesis title: [Sequence Recommendation] SR-GNN: Session-based Recommendation with Graph Neural Networks (SR-GNN: Session-based Graph Neural Network Recommendation) #Paper Address:
https://arxiv.org/abs/1811.00855
#Paper Source code open source address: https://github.com/CRIPAC-DIG/SR-GNN
#The conference of the paper: AAAI 2019
#The unit of the paper: Chinese Academy of Sciences
insert image description here

1. Introduction

SR-GNN is a recommendation system based on session sequence modeling proposed by the Chinese Academy of Sciences. The so-called session here refers to the user's interaction (each session represents a user behavior and corresponding service, so each user record will be constructed into a Figure), the session sequence mentioned here should specifically represent a user's interaction sequence for a period of time in the past. Session-based recommendation is a commonly used recommendation method, and the commonly used session recommendations include recurrent neural networks and Markov chains. However, these commonly used session recommendation methods have the following two disadvantages

  • When the number of user actions in a session is very limited [that is, relatively small], it is difficult for this method to capture the user's behavior representation. For example, when using the RNN neural network for conversational recommendation modeling, if there are fewer action items in the previous sequence, when the last output generates recommendation items, the recommendation results will not be very accurate.
  • According to previous work, the transfer mode of items before is a very important feature in session recommendation, but RNN and Markov process only model the single item transfer vector of two adjacent items, while ignoring other items in the session. items. [It means that the RNN method lacks an overall view, and only constructs a single-item transfer vector, and the ability to express information is not strong enough]

First of all, when interpreting the paper, you need to understand what the input and output of sequence recall are. Generally speaking, the input of sequence recall is the user's behavior sequence (the list of item ids that the user has interacted with), and what needs to be predicted is the user's next moment. The top-k items clicked. In the actual operation process, we usually extract the user's behavior sequence into a user's characterization vector, and then use some ANN methods to quickly retrieve the item's vector, so as to filter out the top that is most similar to the user's characterization vector -k items. The SR-GNN we introduce below completes the above two processes

1.1 Model structure

Here we can see that we extract the user's vector representation from the input user's behavior sequence as follows:

  • 1. Construct the user's behavior sequence into a Session Graph
  • 2. We use GNN to extract the features of the obtained Session Graph to obtain the vector representation of each Item
  • 3. After the GNN extracts the Session Graph, we need to fuse the vector representations of all Items to obtain the vector representation of the User. After obtaining the vector representation of the user,
    we can train the model according to the idea of ​​sequence recall /The model is verified, let's discuss how these three points unfold

1.2 Build Session Graph

Here we first need to compose the graph according to the user's behavior sequence. Here, we need to construct a graph for each user's behavior sequence. The method of composition is also very simple. We regard it as a directed graph here. If v 2 v_2v2and v 1 v_1v1It is adjacent in the user's behavior sequence, and v 2 v_2v2at v 1 v_1v1After that, we connect a line from v 2 v_2v2to v 1 v_1v1According to this rule, we can construct a graph. (The following is a sample image given in the paper)

The reason why each user's behavior is packaged into a graph here is because if all user-to-product interactions are placed in a graph, it will lead to confusion in capturing the user's unique interests, so it needs to be constructed separately, and constructed in this way The advantage is that during subsequent training, each row of samples can be constructed into a graph, which is quite convenient.

After completing the composition, we need to use variables to store this picture, here we use A s A_sAsTo represent the result of composition, this matrix is ​​a ( d , 2 d ) (d,2d)(d,2 d ) , which is divided into a( d , d ) (d,d)(d,d ) Outing matrix and a( d , d ) (d,d)(d,d ) For the Incoming matrix of the Outing matrix, directly count the number of edges extending outward from the node. If the number of nodes extending outward from the node is greater than 1, perform a normalization operation (for example, node v 2v_2v2Extended two nodes v 3 , v 4 v_3,v_4v3,v4, then node v 2 v_2v2to nodes v 3 , v 4 v_3,v_4v3,v4values ​​are all 0.5). Incoming matrix is ​​the same

1.3 Learning the vector representation of Item through GNN

In this part, we mainly focus on how to learn the vector representation of Item from the graph, here we set vit v_{i}^{t}vitRepresents the vector representation of item i after the tth GNN iteration, A s , i ∈ R 1 × 2 n A_{s,i} \in R^{1 \times 2n}As,iR1 × 2 n meansA s A_{s}Asii in the matrixLine i represents theiiNeighbor information related to item i . Then we use the formula (1) to aggregate its neighbor information here, mainly through the matrixA s , i A_{s,i}As,iand user sequence [ v 1 t − 1 , . . . , vnt − 1 ] T ∈ R n × d [v_{1}^{t-1},...,v_{n}^{t-1 }]^{T} \in R^{n \times d}[v1t1,...,vnt1]TRThe multiplication of n × d is aggregated, but it should be noted that the formula here is not rigorously written. In actual cases, twoR 1 × 2 n and R n × d R^{1 \times 2n} and R^{n \ times d}R1 x 2 n andRThe n × d matrix cannot be directly multiplied. In the code implementation, the matrix A is divided into two matrices, in and out, to be multiplied with the user's behavior sequence.

a s , i t = A s , i [ v 1 t − 1 , . . . , v n t − 1 ] T H + b (1) a_{s,i}^{t}=A_{s,i}[v_{1}^{t-1},...,v_{n}^{t-1}]^{T}\textbf{H}+b \tag{1} as,it=As,i[v1t1,...,vnt1]TH+b(1)

'''
A : [batch,n,2n] 图的矩阵
hidden : [batch,n,d] 用户序列的emb
in矩阵:A[:, :, :A.size(1)]
out矩阵:A[:, :, A.size(1):2 * A.size(1)]
inputs : 就是公式1中的 a 
'''
input_in = paddle.matmul(A[:, :, :A.shape[1]], self.linear_edge_in(hidden)) + self.b_iah
input_out = paddle.matmul(A[:, :, A.shape[1]:], self.linear_edge_out(hidden)) + self.b_ioh
# [batch_size, max_session_len, embedding_size * 2]
inputs = paddle.concat([input_in, input_out], 2)

After getting the as in the formula (1) , it a_{s,i}^{t}as,itAfter that, two intermediate variables zs , it , rs , it z_{s,i}^{t},r_{s,i}^{t} are calculated according to formula (2)(3)zs,it,rs,itIt can be simply compared to LSTM, thinking that zs , it , rs , it z_{s,i}^{t},r_{s,i}^{t}zs,it,rs,itThey are the forget gate and update gate

z s , i t = σ ( W z a s , i t + U z v i t − 1 ) ∈ R d (2) z_{s,i}^{t}=\sigma(W_{z}a_{s,i}^{t}+U_{z}v_{i}^{t-1}) \in R^{d} \tag{2} zs,it=s ( Wzas,it+Uzvit1)Rd(2)

r s , i t = σ ( W r a s , i t + U r v i t − 1 ) ∈ R d (3) r_{s,i}^{t}=\sigma(W_{r}a_{s,i}^{t}+U_{r}v_{i}^{t-1}) \in R^{d} \tag{3} rs,it=s ( Wras,it+Urvit1)Rd(3)

It should be noted here that we are calculating zs , it , rs , it z_{s,i}^{t},r_{s,i}^{t}zs,it,rs,itThe logic is exactly the same, the only difference is that different parameter weights are used.
After obtaining the intermediate variables of formula (2) (3), we calculate the characteristics of the next update of the update gate through formula (4), and according to Formula (5) to get the final result

v i t ∼ = t a n h ( W o a s , i t + U o ( r s , i t ⊙ v i t − 1 ) ) ∈ R d (4) {v_{i}^{t}}^{\sim}=tanh(W_{o}a_{s,i}^{t}+U_{o}(r_{s,i}^{t} \odot v_{i}^{t-1})) \in R^{d}\tag{4} vit=English ( W _oas,it+Uo(rs,itvit1))Rd(4)

v i t = ( 1 − z s , i t ) ⊙ v i t − 1 + z s , i t ⊙ v i t ∼ ∈ R d (5) v_{i}^{t}=(1-z_{s,i}^{t}) \odot v_{i}^{t-1} + z_{s,i}^{t} \odot {v_{i}^{t}}^{\sim} \in R^{d} \tag{5} vit=(1zs,it)vit1+zs,itvitRd(5)

Here we can see that formula (4) actually calculates the Update part at the tth GNN layer, that is, vit ∼ {v_{i}^{t}}^{\sim}vit , and pass the forget gate zs in formula (5), it z_{s,i}^{t}zs,itTo control the tth GNN update, vit − 1 v_{i}^{t-1}vit1and the proportion of ${v_{i} {t}} {\sim} $. This completes the representation learning of the item in the GNN part

When writing the code here, we should pay attention to the formula (3)(4)(5), we observe carefully, for as , it , vit − 1 a_{s,i}^{t},v_{i}^{ t-1}as,it,vit1As far as these two variables are concerned, each variable is multiplied by three matrices. The calculation logic here is the same, so W a , U v Wa,Uvof ,Uv is regarded as a calculation unit. In the formulas (3)(4)(5), one such operation is involved, so we can put these three operations together, and then divide the result into 3 parts. Restore the three formulas, the relevant codes are as follows

'''
inputs : 公式(1)中的a
hidden : 用户序列,也就是v^{t-1}
这里的gi就是Wa,gh就是Uv,但是要注意这里不该是gi还是gh都包含了公式3~5的三个部分
'''

# gi.size equals to gh.size, shape of [batch_size, max_session_len, embedding_size * 3]

gi = paddle.matmul(inputs, self.w_ih) + self.b_ih
gh = paddle.matmul(hidden, self.w_hh) + self.b_hh
# (batch_size, max_session_len, embedding_size)
i_r, i_i, i_n = gi.chunk(3, 2)   # 三个W*a
h_r, h_i, h_n = gh.chunk(3, 2)   # 三个U*v
reset_gate = F.sigmoid(i_r + h_r)  #公式(2)
input_gate = F.sigmoid(i_i + h_i)  #公式(3)
new_gate = paddle.tanh(i_n + reset_gate * h_n)  #公式(4)
hy = (1 - input_gate) * hidden + input_gate * new_gate  # 公式(5)

1.4 Generating User Vector Representation (Generating Session Embedding)

After obtaining the embedded representation of the Item through GNN, more than half of our work is completed, and the rest is to fuse the embedded representations of multiple Items of the user sequence into a whole sequence of embedded representations

Here SR-GNN first uses the Attention mechanism to obtain each Item in the sequence for the last Item in the sequence vn ( s 1 ) v_{n}(s_1)vn(s1) ’s attention score, and then weighted and summed, the specific calculation process is as follows

a i = q T σ ( W 1 v n + W 2 v i + c ) ∈ R 1 s g = ∑ i = 1 n a i v I ∈ R d (6) a_{i}=\textbf{q}^{T} \sigma(W_{1}v_{n}+W_{2}v_{i}+c) \in R^{1} \tag{6} \\ s_{g}= \sum_{i=1}^{n}a_{i}v_{I}\in R^{d} ai=qTσ(W1vn+W2vi+c)R1sg=i=1naivIRd(6)

in get sg s_gsgAfter that, we will sg s_gsgCombined with the last Item information in the sequence to get the embedded representation of the final sequence

s h = W 3 [ s 1 ; s g ] ∈ R d (7) s_h = W_{3}[ s_1 ;s_g] \in R^{d} \tag{7} sh=W3[s1;sg]Rd(7)

'''
seq_hidden : 序列中每一个item的emb
ht : 序列中最后一个item的emb,就是公式6~7中的v_n(s_1)
q1 : 公式(6)中的 W_1 v_n
q2 : 公式(6)中的 W_2 v_i 
alpha : 公式(6)中的alpha
a : 公式(6)中的s_g
'''
seq_hidden = paddle.take_along_axis(hidden,alias_inputs,1)
# fetch the last hidden state of last timestamp
item_seq_len = paddle.sum(mask,axis=1)
ht = self.gather_indexes(seq_hidden, item_seq_len - 1)
q1 = self.linear_one(ht).reshape([ht.shape[0], 1, ht.shape[1]])
q2 = self.linear_two(seq_hidden)

alpha = self.linear_three(F.sigmoid(q1 + q2))
a = paddle.sum(alpha * seq_hidden * mask.reshape([mask.shape[0], -1, 1]), 1)
user_emb = self.linear_transform(paddle.concat([a, ht], axis=1))

So far we have completed the user vector production of SR-GNN, and the rest can be carried out according to the traditional sequential recall method

Reference: https://aistudio.baidu.com/aistudio/projectdetail/5313491

Guess you like

Origin blog.csdn.net/CRW__DREAM/article/details/128609712