Summary of recommended models in recall search matches (three): Depth learning method based on matching function learning

Part0 match function learning based on the depth of learning

Comparative representation learning method based on the characteristics of the maximum match function learning is not directly learn of the user and embedding that item, but through various existing input, a neural network by the frame, fit directly match score of the user and item

FIG match function learning 4.1 based depth matching model framework

Briefly, the first method is not a representation learning end-2-end process, by embedding user and learning the item as an intermediate product, and can easily calculate both the match score; the second method of matching function end2end is a learning method for fitting directly to obtain a final match score. This chapter focuses on deep learning method based on matching the match function learning.


Collaborative Filtering method based Part1

Based on the foregoing conventional matching model and represent model learning, which is inseparable from collaborative filtering model base, it may also be referred to as model-based matrix decomposition. Based match function learning model is no exception.

NCF frame-based approach

After the neural network learning method (the NCF) based on the proposed 2017 by Dr. He Xiangnan, CF Compared to traditional networks, and after obtaining the user vector item vector, MLP network is connected, it outputs the final fitting, to give an end-2- end of the model. Advantage of this framework is flexible enough, double column design item and user features may be added to any side of the side info, the MLP network may be a flexible design, shown in Figure 4.2.

FIG Framework 4.2 collaborative filtering based on neural network

Comparison Chapter NCF frame of said main methods CF MLP introduced to fit the user and the item is non-linear relationship, rather than a direct relationship between the two is calculated by the inner product or cosine, improve the fitting ability of the network. However, MLP does not really strong for direct learning and the ability to capture user and item vector extracted from mf. In fitting ability wsdm2018 article in question is characterized by a combination of MLP

FIG 4.3 DNN model fits the experimental data

The paper made a set of experiments, using 1 layer MLP network to fit the data; experiments show for the first order of two-dimensional data, also need 100 nodes to fit; if more than two bands, the whole MLP's performance will be very difference. Therefore, the article describes the ability to capture high-order information for DNN is not strong, only to capture low-level information.

Model talk below, is the model structure, or characteristic dimensions make various changes.


NMF模型(Neural Matrix Factorization)

Neural MF, as the name implies, while taking advantage of the ability of the neural network MLP MF and to fit the matching score; MF learning using correlation vector inner product and the item's user, while other high-MLP portion of information for both capture. In fact, in this paper and NCF framework is from a paper of the same. Model may be divided into two parts and MLP GMF view, shown in Figure 4.4.

FIG framework model 4.4 NeuMF

(1) GMF(General Matrix Factorization)部分

and user input item have been sparse vector by one-hot encoded and then mapped by a user embedding layer and Vector item vector. Thus obtained vector implicit user and the item, generally by a vector dot product or a Hadamard product (element-wide product) to give interaction, but in connection with a multi NeuMF connection layer, i.e. GMF layer

(2) MLP portion

And an input section GMF, are sparse coding one-hot, and then by embedding layer and mapped to user vector item vector. Notes that this user and item of vector and GMF part is not the same, because (personal feeling share embedding more fully trained embedding) and two different GMF MLP network structure of the hidden dimension requirements, MLP part will be higher

After embedding layer is the layers of a conventional MLP, to say nothing of this, the last layer output as the output of the MLP.


NNCF 模型 (Neighbor-based NCF)

One kind CIKM2017 proposed NCF neighbor method based biggest difference is that in addition to the input information and the item's user, the user is also introduced into the respective neighbor and the respective information item.

FIG model framework 4.5 NNCF

Input shown in FIG. 4.5 consists of two parts, an intermediate for the one- hot yi xu and enter the original item and the user by embedding the embedding layer mapping and pu vector qi is then input by the Hadamard product as the MLP. Nu sides of the input layer and ni is the user and the item information of each of the input neighbor, where ni and nu how information can be extracted using various means, such as two in FIG mining, user- CF item-CF or the like.

For neighbor information, due to inconsistencies and item number of each user's neighbor, a variable length input, extracts obtained by embedding the fixed length Pooling and convolution, and post itself, and then the vector concat user item entered into the model


ONCF 模型 (Outer Product based NCF)

Dr. He Xiangnan 2018 proposed outer-product based NCF NCF on the mold frame in the original frame NCF, introduced the concept of the outer product, shown in Figure 4.6.

FIG model framework 4.6 ONCF

After embedding layer, O-NCF model introduces interaction map is characterized in the intersecting layers, the vector for the user u i and items qi pu vector, the introduction of both the outer-product

E is a dimension of the matrix, wherein each element pairwise to give the two-dimensional matrix. With this, it can become a vector of dimension k2 launched by the two-dimensional matrix, as MLP's input. Suppose k = 64, then E is a vector of 4096, the number of hidden layer units each layer is provided on one half, then the dimension of the first layer is approximately 4096 * 2048 8.4 million required network parameters need to be trained, the parameters the amount is huge.

Thus, the paper proposes a parameter between the hidden layer to reduce the embedding layer CNN local connection sharing method utilizing the parameters, shown in Figure 4.7.

Figure 4.7 ConvNCF model framework

Suppose hidden layer dimension K = 64, there are six layers hidden layer, each layer 32 has a convolution kernel (feature map), step stride = 2, then after feature map the size of the core of each convolution of the original 1 / 4 (length and width less than half of each). Example convolution to the first layer.

Then the first layer of the network is the convolution 32 32 3D Vector 32, wherein 32 represents the number of a last feature map. Here's how it embodies the characteristics of the cross thinking? ei, j, c is the feature map representative of one of the front, second order and the i-th unit of the j-th element intersect. The first feature map layers, each cell layer on the extracted local area is connected to the information, the extracted information is the third layer of the first layer, it can be extracted in the feature map to the original level at the end of the network global connection information, so as to achieve high-order feature extraction.

In summary, using the original outer- product ideas, there are nearly 10 million parameters need to learn, and use the CNN network on the one hand can reduce the amount of the first parameter at the network layer, on the other hand at the same time extracting the low-order and high-order combination of features. Personally feel that the introduction of CNN of course can save memory, but also will lead to increased training time and reasoning, it is a time for space ideas. Further CNN with whether the combination of features can be more efficiently than the original fit MLP need to see binding data distribution.


Summary CF-based approach

NCF method based on the basic principles of the framework is based on collaborative filtering, and collaborative filtering is essentially doing matrix factorization user and item, therefore, MF-based approach se NCF framework is also based. Matrix factorization is essentially as the user and item of vector, through a variety of methods to allow user and item in the vector space mapped in as close as possible (with a click or cosine vector directly measure the distance vector is close).

And another idea, based translation method, also known as translation based model, think vector user and item mapped in the new space can have a gap, the gap with the relation vector to express, is to let users add vector relation vector of vectors, and item vector close as possible. The difference between the two approaches can be represented by the image of Figure 4.8.

Figure 4.8 based on matrix decomposition model and based on the difference between translation


Part2 translation frame-based approach

transRec model

One proposed 2017 recsys meeting on "translate" the recommended way to solve the problem next item is the recommendation. The basic idea is to say the user's own vector, coupled with a vector item on the user interaction, should be close to a vector item of user interaction, input is the (user, prev item, next item), to predict the next item to be recommended The probability

FIG model framework 4.9 transRec

Users vector expressed as follows:

Here ri and rj represents the item j on a user interaction item i and the next interaction, tu expression vector for the user itself. In the actual recommendation system, there is often sparse data and user cold start problems, so users of the vector tu decomposed into two vectors.

Where t may be considered a global vector, represents the average behavior of all users, user u bias TU represents itself, such as for cold start user, TU can be set to 0, with the expression t global user as a cold start.

For hot item due to the very large number of occurrences, would lead to the eventual popular item vast majority of users of vector and vector plus item vector is very close, so did the article on popular item punishment, ultimately, a item i, and at a known user a matching score j item is expressed as:

Wherein the first term represents the article beta_j global heat j; d the second term represents the vector of the user plus the distance vector and the vector j of article items i; i and j distance indicates a closer distance, recommended the greater the likelihood


LRML模型(Latent Relational Metric Learning)

Mentioned front, CF comparison based on the maximum frame difference method is to find a relation vector, such as user vector + relation vector translation frame close item vector based method. LRML WWW2018 model proposed by introducing memory network to measure distance learning. It can be divided into three layer, namely embedding layer, memory layer and the relation layer.

FIG model framework 4.10 LRML

(1) embedding layer

Embedding a conventional double column bottom, respectively, and a matrix embedding matrix embedding the user items, one- hot user input and one-hot item to obtain user input vector p and a vector q articles after embedding by

(2) memory layer

Memories layer is the core module of the article, the author as a memory layer by introducing a priori module. This module may be calculated in three steps:

a) embedding the article and the user fusion

embedding layer was item vector and user needs to p and q, after crossing a synthetic vector input to the next layer, referred to the use of Hadamard product better than MLP effect, more simple

b) User - Item key addressing

Obtained from the first step vector s, and the memory to the memory of each memory module network Vector one by calculating the similarity, the similarity may be expressed by the inner product and make normalized

Ai is obtained representative of the current user - the similarity of the input items (p, q) in the memory-network i-th vector.

c) weighting the final expression

Relation resulting vector is a weighting vector expressing different network memories of memory obtained in the second step, as shown below

(3) relation layer

Obtained from the vector memory layer r it can be considered as the user vector p and a vector q, the article relation vector, resulting from the loss measured in square, shown in Figure 4.11.

4.11 LRML relation FIG layer structure and the loss

Since the settlement of the problem is sorted recommended articles, articles using pairwise loss, so in the last layer of the network, for user and item were negative samples obtained by sampling p 'and q' are then optimized using pairwise hinge loss


Depth matching Part3 match function learning summary

The depth of the match function learning model, also divided based collaborative filtering model and feature-based model. The former CF and conventional models, except that access to the back of MLP model to enhance the expression of nonlinearity, the purpose of such vector user item and as close as possible, this method is based on the model NCF; have come through the introduction relation vector after the user vector plus the relation vector close to the item vector, this method is based on the model of translation.

Finishing this review is mainly based on the original slides, read part of the paper portion of the crude intensive, learned a lot of them, in full text with ideas on how to do the recommended match will as far as possible to string together a variety of methods, mainly behind the same idea guide. There are more than mistakes, welcome criticism pointed out.

Published 18 original articles · won praise 588 · Views 1.03 million +

Guess you like

Origin blog.csdn.net/hellozhxy/article/details/103960154