CFM: Convolutional Factorization Machines for Context-Aware Recommendation
Summary
Factorization machine is a perception based on the inner product recommendation system, modeling the interaction volume by the second order features. However, it is still not enough to capture high-order nonlinear signal. Recently a lot of work to enhance the performance of FM using neural network. The scenario assumes that the buried layer independent of each other, in an implicit way and higher order interaction model. This paper proposes a convolution factoring machine model to solve the problem. CFM through the outer product of second order interaction model, generates image data based capture correlation between the buried layer. Then, all of the generated image data are stacked to form a set of interaction matrix, on this basis, convolution learning higher order interaction signals. In addition, the use of self-interest paper also performs characteristic mechanisms (Pooling) pooling and reduce the time complexity, paper conducted experiments demonstrated a significant improvement in the method of three data sets.
Introduction
Describing user behavior data, in addition to user / project ID, also provides a wealth of contextual information. Including but not limited to, the population description, project properties, time / location information, history, and recent transaction information. The general approach is to convert them into high dimensional feature vector, the features in the input model.
Unlike continuous real-valued image and audio features found naturally, the recommendation system sensing input mostly discrete, leads to a high dimensional and sparse feature vector.
Existing feature modeling into two categories:
1 features manual configuration cross. 2 automatic learning feature interactions, wherein the embedded learning, the feature interaction is represented on the embedded functions.
In order to reduce the workload of construction paper cross features proposed feature interaction CMF automatic learning. Professional, the first feature is embedded into the self-attention pooled (Pooling) layer, the amount of calculation can be greatly reduced, weakening the influence of noise characteristics, and, second order outer product of modeling interaction model to generate a two-dimensional SET matrix. and the inner product compared to more easily capture the correlation between the buried layer, the resulting matrix is stacked to give a three-dimensional matrix coding --- all two-way interactions. In order to explicit learning higher-order signal. Thesis on the use of three-dimensional matrix convolution, convolution and superimposed multilayer neural network.
On paper using a self-attention feature pool operation, reducing the amount of computation time and complexity.
Convolutional Factorization Machine
Original FM model only in a linear manner, the inner product by modeling two-way interactions. CMF:
The last one is the core of interacting components, the following will explain how to learn.
The input layer and the embedded
Sensing the input vector x, including user ID, the history items multi-hot features, and then, each feature map embedded layer to the D-dimensional vector representation intensive.
Self-Attention Pooling Layer
Features of pooled ( Pooling ) operation for each field embedded in a single study. Conventional pooling operation includes maximum and average cell pooling, but this does not effectively capture the importance of different features. Therefore, the paper proposes attentional mechanisms (by MLP map)
The hidden layer is mapped to the attention of scores.
The importance of the score function is achieved by normalizing the attention of:
The equation above, it is a pool of kernel function (kernel function representative feature significance):
Interaction Cube
After pooling layer paper using interactive set of matrices (3D cube) representing the feature interaction, and the outer product interacts.
to sum up:
Acquires importance 1MLP MLP matrix (group pooled as the first step of cell)
Wherein two pairs of matrix pooled to obtain (multi-channel matrix)
3 and the outer product (feature interactions), as a convolution.
Multidimensional data exchange advantages:
An outer product better capture the correlation dimension
2 high-order explicit interaction simulation program - Cube interaction between the different layers. Representative index layer, the inner layer of the representative elements.
3. The convolution kernel is a multi-dimensional matrix, the number of layers it is possible to dimension reduction.
Convolution:
1. Convolution (3-dimensional convolution kernels)
2. deviation activation function RELU
3. The first layer of the shape of the convolution kernel: [2,2,14] step [2,2,1] (corresponding to the width advanced)
And a subsequent step the shape of the convolution kernel: [2,2,2]
Dimensional convolution output is one-dimensional vector, and a fully connected layer mapping:
Loss function:
Top-k recommendation
Sample corresponding to the sample is negative, the function is activated.
First positive samples collected (by mini-batch), comprising a user attribute items and context-aware. Then for each particular user, random negative samples collected. And then combined with the negative characteristics of the sample to establish the negative samples and user interaction. Finally, positive interaction and negative interactions are fed to training, get lost.
Data
Dataset |
#users |
#items |
#transactions |
#fields |
Hit |
957 |
4,082 |
96,203 |
10 |
Last.fm |
1,000 |
20,301 |
214,574 |
4 |
MovieLens |
6,040 |
3,665 |
939,809 |
4 |
Experiments
RQ1 Does CFM model outperform state-of-the-art methods
for CARS top-k recommendation?
RQ2 How do the special designs of CFM (i.e., interaction
cube and 3DCNN) affect the model performance?
RQ3 What’s the effect of the attention-based feature pooling?
Experimental Settings
Hit
app recommendation data
96203app use log data includes, for each log contains fields 10 wherein the context includes a user ID, item ID, time, additional information.
Last.fm
Music recommendation dataset
1000 user extract the latest songs recorded
User context is described as: user ID, user within 90 minutes of listening to music ID, including project properties, Music ID, the artist ID (Singer)
MovieLens
Evaluation
leave-one-out
For last.fm and movielens data, the latest interactive user removed for testing, and the rest to do the training set. For frappe, because no timestamp information, do so arbitrarily select an interactive test, recommend performance indicators HR, NDCG (higher score will be assigned to the top-ranking project)
parameter settings
Performance Comparison
Interaction Cube Performance Evaluation (wherein an outer product)
To prove the effectiveness of the cube, the paper will interact with three-dimensional matrix into a two-dimensional matrix / or execution of the operation the largest pool in the depth direction, results are not as good CMF.
CFM> Tile> largest pool of> FM prove the effectiveness of CNN outer product of learning
Pooling of the Feature Study (attention section)
Self-focus mechanism to implement pooling. Alternatively compared to the maximum of the pool and the pool mean. Pool attention will assign different importance for different features. It retains a wealth of characteristic information, to weaken the influence of noise.
Compare features model CMF and no pool. It found: pool features can reduce training time, bringing good recommendation quality.
to sum up
The new context-aware recommendation algorithm that combines FM and CNN, key design is based on the outer product of the interaction cube.