IJCAI2019-Convolutional Factorization Machines for Context-Aware Recommendation translation, He Xiangnan Gangster papers 19 years, combined with FM, attention mechanism, convolution

CFM: Convolutional Factorization Machines for Context-Aware Recommendation

Summary

Factorization machine is a perception based on the inner product recommendation system, modeling the interaction volume by the second order features. However, it is still not enough to capture high-order nonlinear signal. Recently a lot of work to enhance the performance of FM using neural network. The scenario assumes that the buried layer independent of each other, in an implicit way and higher order interaction model. This paper proposes a convolution factoring machine model to solve the problem. CFM through the outer product of second order interaction model, generates image data based capture correlation between the buried layer. Then, all of the generated image data are stacked to form a set of interaction matrix, on this basis, convolution learning higher order interaction signals. In addition, the use of self-interest paper also performs characteristic mechanisms (Pooling) pooling and reduce the time complexity, paper conducted experiments demonstrated a significant improvement in the method of three data sets.

Introduction

Describing user behavior data, in addition to user / project ID, also provides a wealth of contextual information. Including but not limited to, the population description, project properties, time / location information, history, and recent transaction information. The general approach is to convert them into high dimensional feature vector, the features in the input model.

Unlike continuous real-valued image and audio features found naturally, the recommendation system sensing input mostly discrete, leads to a high dimensional and sparse feature vector.

Existing feature modeling into two categories:

1 features manual configuration cross. 2 automatic learning feature interactions, wherein the embedded learning, the feature interaction is represented on the embedded functions.

In order to reduce the workload of construction paper cross features proposed feature interaction CMF automatic learning. Professional, the first feature is embedded into the self-attention pooled (Pooling) layer, the amount of calculation can be greatly reduced, weakening the influence of noise characteristics, and, second order outer product of modeling interaction model to generate a two-dimensional SET matrix. and the inner product compared to more easily capture the correlation between the buried layer, the resulting matrix is ​​stacked to give a three-dimensional matrix coding --- all two-way interactions. In order to explicit learning higher-order signal. Thesis on the use of three-dimensional matrix convolution, convolution and superimposed multilayer neural network.

On paper using a self-attention feature pool operation, reducing the amount of computation time and complexity.

Convolutional Factorization Machine

Original FM model only in a linear manner, the inner product by modeling two-way interactions. CMF:

 

The last one is the core of interacting components, the following will explain how to learn.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The input layer and the embedded

Sensing the input vector x, including user ID, the history items multi-hot features, and then, each feature map embedded layer to the D-dimensional vector representation intensive.

 

Self-Attention Pooling Layer

Features of pooled ( Pooling ) operation for each field embedded in a single study. Conventional pooling operation includes maximum and average cell pooling, but this does not effectively capture the importance of different features. Therefore, the paper proposes attentional mechanisms (by MLP map)

 

The hidden layer is mapped to the attention of scores.

The importance of the score function is achieved by normalizing the attention of:

 

The equation above, it is a pool of kernel function (kernel function representative feature significance):

 

 

 

 

 

Interaction Cube

After pooling layer paper using interactive set of matrices (3D cube) representing the feature interaction, and the outer product interacts.

 

to sum up:

Acquires importance 1MLP MLP matrix (group pooled as the first step of cell)

Wherein two pairs of matrix pooled to obtain (multi-channel matrix)

3 and the outer product (feature interactions), as a convolution.

 

 

 

Multidimensional data exchange advantages:

An outer product better capture the correlation dimension

2 high-order explicit interaction simulation program - Cube interaction between the different layers. Representative index layer, the inner layer of the representative elements.

3. The convolution kernel is a multi-dimensional matrix, the number of layers it is possible to dimension reduction.

 

Convolution:

1. Convolution (3-dimensional convolution kernels)

2. deviation activation function RELU

3. The first layer of the shape of the convolution kernel: [2,2,14] step [2,2,1] (corresponding to the width advanced) 

And a subsequent step the shape of the convolution kernel: [2,2,2]

Dimensional convolution output is one-dimensional vector, and a fully connected layer mapping:

 

Loss function:

Top-k recommendation

 

Sample corresponding to the sample is negative, the function is activated.

First positive samples collected (by mini-batch), comprising a user attribute items and context-aware. Then for each particular user, random negative samples collected. And then combined with the negative characteristics of the sample to establish the negative samples and user interaction. Finally, positive interaction and negative interactions are fed to training, get lost.

 

Data

Dataset

#users

#items

#transactions

#fields

Hit

957

4,082

96,203

10

Last.fm

1,000

20,301

214,574

4

MovieLens

6,040

3,665

939,809

4

Experiments

RQ1 Does CFM model outperform state-of-the-art methods
for CARS top-k recommendation?
RQ2 How do the special designs of CFM (i.e., interaction
cube and 3DCNN) affect the model performance?
RQ3 What’s the effect of the attention-based feature pooling?

Experimental Settings

Hit

app recommendation data

96203app use log data includes, for each log contains fields 10 wherein the context includes a user ID, item ID, time, additional information.

Last.fm

Music recommendation dataset

1000 user extract the latest songs recorded

User context is described as: user ID, user within 90 minutes of listening to music ID, including project properties, Music ID, the artist ID (Singer)

MovieLens

 

Evaluation

leave-one-out

For last.fm and movielens data, the latest interactive user removed for testing, and the rest to do the training set. For frappe, because no timestamp information, do so arbitrarily select an interactive test, recommend performance indicators HR, NDCG (higher score will be assigned to the top-ranking project)

parameter settings

Performance Comparison

Interaction Cube Performance Evaluation (wherein an outer product)

To prove the effectiveness of the cube, the paper will interact with three-dimensional matrix into a two-dimensional matrix / or execution of the operation the largest pool in the depth direction, results are not as good CMF.

CFM> Tile> largest pool of> FM prove the effectiveness of CNN outer product of learning

Pooling of the Feature Study (attention section)

Self-focus mechanism to implement pooling. Alternatively compared to the maximum of the pool and the pool mean. Pool attention will assign different importance for different features. It retains a wealth of characteristic information, to weaken the influence of noise.

 

Compare features model CMF and no pool. It found: pool features can reduce training time, bringing good recommendation quality.

 

to sum up

The new context-aware recommendation algorithm that combines FM and CNN, key design is based on the outer product of the interaction cube.

Guess you like

Origin www.cnblogs.com/liheyong/p/12111512.html