Price-aware Recommendation with Graph Convolutional Networks

Overview of price recommendation papers based on graph convolution

ICDE2020 very good article Price-aware Recommendation with Graph Convolutional Networks

paper link:https://arxiv.org/pdf/2003.03975v1.pdf

The original author of this article, Mengjiang, is still Mengmeng , and he modified it slightly to learn and strive to be a qualified knowledge porter.

Other references: this and this

I. Introduction

There are very few articles from the perspective of the producer in the recommendation system articles (there is one in 2018RecSys), and there are not too many actual/realistic factors associated with the recommendation scenario. Most studies pay more attention to user characteristics and the interaction between users and items to calculate relevant evaluation results such as click-through rates. However, in e-commerce, real or even some decisive factors like Price rarely appear in scientific research articles. This may be related to the fact that public data sets often do not have this feature.

A good model is to be able to recommend items that the user really likes, and the user does click on these recommended items. BUT, if the price of the product does not meet the user's own expectations, they will ruthlessly close the page and leave. Even if the click-through rate increases, the conversion rate will still not increase.

So there is this paper about the application of price-aware in the recommendation system

2. Key issues

In the article, the difficulty of applying the price in the recommendation system is as follows:

  • 1) The user's price awareness is unknown . The user's preference and sensitivity to commodity prices are unknown, which is only implicitly reflected in the commodities purchased by the user . In other words, the price of the product can only be judged by whether the user purchases it. And because users seldom declare their preference and sensitivity to commodity prices, therefore, to construct a data-driven method, it is necessary to infer the user's personalized awareness of commodity prices from the user's purchase history. More challenging is that we have to consider the CF effect reflected in the history of similar users to improve the accuracy of reasoning.
  • 2) Product categories have a great influence on users' price awareness . How the price of the product affects the user's intentions depends largely on the price of the product. Users' perception and affordability of commodity prices may change significantly among various product categories. For example, a woman may not want to spend 1,000 yuan on a watch, but may be willing to spend 2,000 yuan on a beautiful dress (relentless ==). And men may be the opposite. Therefore, considering the product category information can more accurately infer the user's price preference.

For the first difficulty, the authors modeled the relationship between users and products and between products and prices . This method draws on graph convolutional networks (GCN) . The core idea is to use products as a bridge to spread prices to users. Influence, and then make the learned users have price awareness.

For the second difficulty, we will be further integrated into the propagation item category in progress, and on the possible pair-wise interactions modeling to predict the interaction between the user and item.

Further analysis shows that modeling price awareness is particularly useful for predicting user preferences for undeveloped categories of goods.

Consequently, the author proposed an effective method to predict the user's purchase intention, and focused on the price factor in the recommendation system, named PUP model (Price-aware User Preference-modeling).

三、The main contributions of this work

Insert picture description here

四、Preliminary Study

WTP & CWTP

Insert picture description here

​In order to understand the inconsistency and sensitivity between various categories, the article expands the widely used willingness to pay (WTP) to measure category willingness to pay (CWTP). As an indicator reflecting the user's price awareness, WTP is defined as the highest level of goods that users are willing to pay Acceptable price. Further define CWTP as the highest price that a specified user is willing to pay for a given category of goods. Therefore, users who interact with multiple categories of goods will have multiple CWTP values. Then calculate the entropy of CWTP for the user (as shown in Fig.1 below) , where a smaller entropy value indicates that the user has the same price sensitivity to each category, while a larger value indicates that the user has different considerations for the prices of different categories of products (The result of calculating the entropy) .

The skewed distribution (Fig.1) verifies the above results, that is, price awareness is highly correlated with product categories, and the price sensitivity between different categories is inconsistent

Insert picture description here

Randomly sample three users from the data set, and display their price acceptance heat maps for different categories of goods as follows (Fig.2)
Insert picture description here

五、Problem Definition

The article emphasizes that the focus of this work is to use commodity prices to improve the accuracy of recommendations. Since the user’s price awareness is closely related to the product category, the category must be taken into consideration when designing the price awareness recommendation system.

We formulate this recommendation task as follows:

Among them, Uand Irepresents the user set and item set, R M×N is the user-item interaction matrix, and M and N represent the number of users and items respectively. If Rui = 1, it means that user u has purchased item i. Use p = {p1, p2, ..., pn}and c = {c1, c2, ..., cn}indicate the price of items and categories.

To facilitate modeling, we treat price as a categorical variable, and use unified quantification to discretize price values ​​to individual levels. For example, suppose the price range of the category mobile phone is [200,3000], and we will discretize it to 10 price levels. For example, if the price of a mobile phone is 1000 yuan, the price level of this mobile phone is as follows:
⌊ 1000 − 200 3000 − 200 × 10 ⌋ = 2 \lfloor\frac{1000-200}{3000-200}\times10\rfloor= 230002001000200×10=2
Finally,the issue price of perceived product recommendation stated as follows:

  • input: interaction matrix R, commodity price p and commodity category c.
  • output: The estimated probability of the purchase behavior of a given user-commodity pair (u, i).

6. Propose the PUP model

The model diagram is as follows:
Insert picture description here

The overall design components of the pup model : a unified heterogeneous map , a graph convolutional encoder and a decoder based on pairwise interaction .

The constructed unified heterogeneous graph consists of four types of nodes, among which user nodes are connected to commodity nodes, and commodity nodes are connected to price nodes and category nodes.

Next, let's split the various components of the model.

6.1 Unified heterogeneous graph

In order to explicitly model user behavior and commodity attributes, the author discretized price variables and established a heterogeneous graph containing four types of nodes. In order to solve the undeclared price awareness problem, the author explicitly introduces the price as a price node on the graph instead of the input feature of the project node. For the difficulty of class dependent effect , we further add a category nodes in the graph .

For the task of price-aware product recommendation, we have both the interaction data between the user and the item and the price attribute of the item. It is a challenge to clearly capture the user's price perception, because the user has no direct relationship with the price. In other words, the relationship between users and prices is based on the transfer relationship between users and prices. In this way, the product acts as a bridge connecting users and prices .

In order to solve the problem of transforming complex relationships into a unified model, we discretized price variables and constructed a heterogeneous graph consisting of four types of node users, items, prices, and categories.

The input interactive data and attributes (category and price) can be represented by the undirected graph G = (V, E). The nodes in V include user node u∈U, item node i∈I, category node c∈C, and price node p∈P. The edge in E consists of three parts, namely:

  • Interactive edge: the edge with Rui = 1 in (u, i)

  • Category edge: (i, ci)

  • Price side: (i, pi)

By introducing four types of nodes, we represent all entities, features, and relationships as a unified graph, and capture all pairwise relationships in an explicit way. As shown below:
Insert picture description here

In graph convolutional networks, some high-level feature vectors extracted by word2vec, such as word embeddings, are commonly used as input features of nodes. Similarly, it seems reasonable to encode price and category information into the input features of user nodes and item nodes, which makes the bipartite graph design more concise.

However, in the work of this article, the author explicitly regards two important attributes (price and category) as entity nodes to capture the category-dependent price perception in a more expressive way. ( Here, separate node types are used for category and price, instead of using a single node type for the cross characteristics of (category, price) to avoid redundant parameters. Intuitively speaking, items of the same category with different prices share functional similarity ; At the same time, goods of the same price from each category also reflect similar price awareness. Therefore, a single type of intersection feature lacks the connection between the above two situations. By applying different node types to categories and prices, you can Capture different levels of semantic similarity.)

By assigning different nodes, prices and categories are directly and clearly captured, and the difficulty of recommending the above two price-aware items is alleviated. Specifically, undeclared price perception is transformed into high-order neighbor proximity on heterogeneous graphs, which can be well captured by graph convolutional networks. By connecting the item node to the price node and the category node, the problem of category-dependent impact can be alleviated.

6.2 Graph convolutional encoder

In order to simultaneously capture the collaborative filtering (CF) effect and price perception, we use graph convolutional networks as an encoder to learn semantic representations of users, products, prices, and categories. By spreading embedded information on heterogeneous graphs, price-sensitive information is aggregated to user nodes to obtain user price sensitivity.

The traditional latent factor model (LFM) attempts to encode entities in the low-dimensional latent space, and is a widely used mechanism in recommendation systems. But when a user u buys a product i at a price p, there is a potential user price interaction (u, p). Traditional LFM only learns the representation of users and items, ie, models the representation of users and items.

So, the author of this article expands it and tries to learn the representation of four types of entities in the same latent space . The main motivation is that performing message passing on the graph can generate semantic and robust node representations for multiple tasks (such as node classification and link prediction). There is a special kind of algorithm called graph neural network, which has reached the latest level in the field of network representation learning. We use an encoding module, including an embedding layer that converts the one-hot input into a low-dimensional vector , an embedding propagation layer that captures the CF effect and price perception , and a modeling neighbor similarity the neighbor polymeric layer (neighbor aggregation layer).

Encoding module :

  • Embedding Layer: Since the price attribute and category attribute of the PUP model are taken out as nodes, ID is the only remaining feature of the node. Therefore, we introduce an embedding layer to compress the one-hot ID encoding into a dense real-valued vector. Denote each node with a separate embedding e'∈Rd, where d is the embedding size.

  • Embedding propagation layer: In GCN, the embedding of nodes is propagated to their first-order neighbors, and if more than one convolutional layer is applied, then further propagation. In other words, the receptive field of the node can be expanded to a certain extent by increasing the number of layers of graph convolution. In the encoder in this article, the embedded propagation layer captures the messages transmitted between two directly connected nodes, which can be user-commodity, commodity-price or commodity-category. The embedding of propagation between node i and node j can be expressed as:
    tji = 1 ∣ N i ∣ ej ′ t_{ji}=\frac{1}{\rvert N_i \rvert}e'_jtj i=Ni1ej

    Ni represents the set of neighbors of node i, and e'j is the embedding representation of node j from the embedding layer. Same as the original GCN, this article also adds self-loops to each node, because it reduces the spectrum of the normalized Laplacian, so we link each node in the heterogeneous graph to itself, making the node i also appears in Ni. ( Adding a self-loop is very important for GCN, because it will reduce the frequency spectrum of the normalized Laplacian, linking each node in the heterogeneous graph to itself )

  • Neighbor aggregation layer: From the perspective of network representation learning, the adjacency of two nodes in the graph structure means that their representations should also be close in the transformed latent space. Update the representation of the node by aggregating the representation of the node's neighbors. Among all aggregation operations, the most commonly used methods [24] and [28] are summation, averaging and LSTM. In the encoder proposed by the authors, the authors use average pooling and use a nonlinear activation function to perform message passing on the graph.

Insert picture description here

Insert picture description here

Due to the inherent expressive power of embedding propagation and neighbor aggregation, the learned representation extracted by the graph convolutional encoder can effectively model the relationship between nodes and their higher-order neighbors. Intuitively, goods at the same price level may be more similar than goods at different price levels. In the constructed heterogeneous graph, the price node is linked to all items of the price level, and the graph convolutional encoder guarantees that the output representation of these items will absorb the price embedded in itself through embedding propagation and neighbor aggregation. Therefore, the encoder generates an item representation with price-perceived similarity. Because the category node is connected to all item nodes belonging to the category, the similarity of category perception is captured in the same way. In addition, the user's price awareness is largely reflected through interactive filtering and the purchase history of other users through collaborative filtering. Therefore, it is important to use commodities as a bridge between users and price awareness. In the PUP model, the representation of users is clearly aggregated from their interactive items, and these items are directly linked to categories and prices. Therefore, category nodes and price nodes are higher-order neighbors relative to user nodes, and price awareness is propagated to users through intermediate item nodes.

From a recommendation point of view, the graph convolutional encoder proposed in this paper can capture the similarity when there is a path between any two nodes. Combined with the classic matrix factorization algorithm, the collaborative filtering effect is implicitly obtained by optimizing and estimating the user-item interaction. However, in our graph convolutional encoder, we explicitly incorporate the collaborative filtering effect by gathering the neighbors of a node. Specifically, similar users who have interacted with the same item are second-order neighbors on the heterogeneous graph.

6.3 Pairwise-interaction based decoder

Since the heterogeneous graph contains four kinds of nodes, these nodes are decomposed into a shared latent space. Inspired by the factorization machine, we adopt a decoder based on pairwise interaction to estimate the interaction probability.

We use a two-branch design to estimate user-item interaction, with the focus on incorporating prices into recommendations. The global branch focuses on the overall purchasing power of users and simulates the price effect on a large scale. The category branches are concentrated on a "partial" level, at this level, category factors will affect users' sensitivity to prices. For each branch, we use a decoder based on pairwise interaction to estimate the interaction probability, and merge the two prediction scores into the final result.

  • 1) Global branches model the price effect on a larger scale, which focuses on the overall purchasing power of users
  • 2) Category branches are concentrated at a fairly "partial" level, where category factors will affect the price sensitivity of users

In a unified heterogeneous graph, we represent users, commodities, categories, and prices as four types of nodes, so the learning representations of different types of nodes share the same potential space. The factorization machine decomposes all the features in the shared latent space, and estimates the interaction by taking the inner product of each pair of feature vectors. Inspired by this, we use a decoder in FM mode. Formally, using the same symbol as the encoder in the previous section, the estimated purchase probability and price p of user u and category c commodity i can be expressed as:

Insert picture description here

Among them, the final prediction combines the hyperparameter prediction results from the two branches to balance these two items. It should be noted that each branch has its own graph convolutional encoder, so the embeddings used to calculate S global and S category are different and independent.

Regarding the global branch, the three features of user, item and price are sent to a two-way FM decoder based on pairwise interaction. In this branch, the three inner products capture the user's interest, the user's overall price effect, and the product's price deviation. Without considering category embedding, we estimate the probability of interaction, so category nodes are only used as regularization items on the graph, making items of the same category close to each other. Since the category information is hidden in the decoding process of the global branch, the local effect of the price related to the category is pushed out of the potential space for learning. The global price impact, which reflects the overall purchasing power and affordability of users, is retained in the latent space by the powerful graph convolutional encoder .

Seven, model training

In order to train the proposed PUP model, in the coding stage, GCN is used to learn the expressive and robust representations of all four types of nodes. In the decoding stage, since predicting user-item interaction is the main task of recommendation, we only focus on reconstructing user-item edges on heterogeneous graphs, and omit item-price and item-category edges. In order to understand the user's preference for different items, Bayesian Personalized Ranking (BPR) is used as the loss function. BPR loss causes the model to rank positive samples (interaction terms) higher than negative samples (no interaction observed)

Realize :

  1. The graph convolutional encoder can be effectively implemented using sparse matrix generation;
  2. Since the decoder based on pairwise interaction uses the inner product of each pair of features, the computational complexity is relatively high. However , the technique introduced for the first time in FM in Literature 12 can be used to reduce the calculation to linear complexity;
  3. Dropout is an effective method to prevent overfitting of neural models. Dropout is used at the feature level, which means that the output representation is randomly dropped with probability p, which is a hyperparameter in our method. With the help of Dropout technology, the PUP method proposed by the author learns a more robust node representation on a unified heterogeneous graph;

8. Experiment

Data set :

Use two real-world data sets for comparison: Yelp and Beibei. The above table summarizes the statistics of these two data sets.

  1. Yelp: Use Yelp2018 open data set, which treats restaurants and shopping centers as projects. Select all sub-categories under the top category restaurant. In this data set, the price of each restaurant is displayed as a different number of dollar signs, ranging from 1 to 4. Therefore, in the experiment, the number of dollar signs is directly used as the price level. Finally, use the 10-core setting, which means that only users and items with at least 10 interactions are retained;
  2. Beibei: This is a data set collected from one of the largest e-commerce platforms in China. In this data set, all items have specific category and price information. Since the price of each item in this data set is a continuous price, the continuous price is discretized into 10 price levels using unified quantification, and a 10-core setting is used to ensure data quality;
  3. For each data set, first rank the records according to the timestamp, then select the top 60% as the training set, select 20% as the validation set, and finally select 20% as the test set. For each user, items that have not been interacted by the user are considered negative samples. Perform negative sampling to form positive and negative sample pairs for training. In order to evaluate the effectiveness of top-K recommendations, the indicators we use are: Recall and NDCG

Comparison method :

ItemPop、BPR-MF、PaDQ、FM、DeepFM、GC-MC、NGCF

九、Conclusion

In this work, the author emphasized the importance of including prices in recommendations.

In order to solve the two difficulties of price integration, that is, undeclared price awareness and category-dependent impact, we propose a gcn-based method PUP, and adopt a dual-branch structure specifically designed to separate the global and local effects of price awareness. We conducted a lot of experiments on real data sets to prove that our proposed PUP can improve the recommendation performance of existing methods. By gaining price awareness, you can further understand how to alleviate the cold start problem. Although our model is specifically designed to model price sensitivity, our proposed model has great universality in feature engineering, and other features can be easily integrated into our proposed method.

For category dependent influence, we propose a gcn-based method PUP, and adopt a dual-branch structure specifically designed to separate the global and local effects of price awareness. We conducted a lot of experiments on real data sets to prove that our proposed PUP can improve the recommendation performance of existing methods. By gaining price awareness, you can further understand how to alleviate the cold start problem. Although our model is specifically designed to model price sensitivity, our proposed model has great universality in feature engineering, and other features can be easily integrated into our proposed method.

As more and more researches focus on price factors from the perspective of service providers, how to extend price-conscious recommendation to value-conscious recommendation is an interesting and important research topic. In addition, modeling price dynamics is also a very promising direction.

Guess you like

Origin blog.csdn.net/weixin_43901214/article/details/108712626