Application of Large-Scale Heterogeneous Graph Recall in Meituan-to-Store Recommendation Ads

In the long-term implementation of the graph neural network, the Meituan recommendation advertising team analyzed the characteristics and challenges of the scene, designed the model in a targeted manner, and successfully implemented it many times through large-scale training tools and online deployment optimization. Increased online income. This article mainly introduces the practical experience of large-scale image recall technology in Meituan's in-store advertising scenario, including model design ideas, model iteration process, large-scale training tools, and online deployment performance optimization, etc., hoping to bring readers who are engaged in related work Get some inspiration.

1 Introduction

Meituan's in-store recommendation advertising technology department serves many local life service businesses such as in-store catering, leisure and parent-child entertainment, and beauty care. Among them, the recall link, as the first link of the recommended advertising system, assumes the role of finding high-quality candidates from a large number of commodities, and is one of the core issues of algorithm optimization.

There are two types of classic recall paradigms in recommender systems: explicit recall based on label-based inverted index construction and implicit recall based on model-based end-to-end modeling of user interests. In implicit recall, historical interaction behavior modeling is critical to accurately characterize user interests. In the e-commerce scenario, the interactive relationship between users, merchants, and products is suitable to be expressed through graph networks. Compared with the traditional model, the graph neural network can construct a variety of interactive relationships between users and products, and then use the transitivity of high-order network structures to reasonably expand the richness of user behavior, and integrate user behavior, user basic attributes and product content attributes and other heterogeneous information are fused in a unified framework, bringing a larger effect space.

Meituan's recommended advertising algorithm team and the NLP center knowledge computing team have conducted close cooperation around the application of graph technology in recommended advertising, which has achieved a significant improvement in online effects. This article mainly introduces the exploration process and related practical experience.

2. Introduction to Graph Neural Networks

Graphs, as a collection of nodes themselves and edge relationships between nodes, widely exist in various scenarios in the real world, such as social relationship graphs between people in social networks, interaction graphs between users and products in recommendation systems, and so on. Graph neural network can capture the characteristics of nodes and edges and the topological relationship between them, and has a good modeling effect on graph structure data. The graph neural network models commonly used in recommender systems can be divided into two categories: graph walk-based methods and graph convolution-based methods.

Method based on graph walk : The traditional neural network model is good at processing data in Euclidean space, but it is difficult to model the complex topological relationship contained in the graph structure. Therefore, early researchers proposed an indirect scheme of upsampling sequences from graph-structured data through the walk method, and then using traditional neural network models, among which DeepWalk[1], Node2vec[2] and other works are typical representatives. As shown in Figure 1 below, this type of method focuses on generating a node sequence using a predetermined walking strategy in the graph, and then uses the Skip-Gram model in the NLP field to train to obtain the vector representation of each node.

7187c515114a040045e4daabdc15f847.png

Figure 1 Walking and training process of DeepWalk model

Graph convolution-based method : The way to model the sequence from the graph is simple and straightforward, but due to the information loss in the conversion process from the original graph structure to the sequence, its effect has great limitations, so how to use the graph structure Direct modeling into neural networks has become a key issue in the research of graph neural networks. The researchers combined the Fourier transform of the signal on the spectral domain graph, defined the convolution operation on the graph, and connected the spectral graph convolution with the neural network through a series of simplifications.

The GCN [3] proposed by Thomas et al. in 2017 is one of the representative works. Figure 2 shows the evolution from the graph structure to the single-layer GCN formula, where and are respectively the adjacency matrix and the node degree matrix added to the self-loop, the feature matrix of the graph node, the trainable parameters of the GCN model, and the activation function (such as ReLU), is the output feature of the graph node features after passing through the single-layer GCN network.

f7ace91b0fe4bcd160385543cdedc833.png

Figure 2 Evolution of the formula of the single-layer GCN model

From the perspective of the whole graph, GCN breaks through the barriers between the original graph structure and the neural network, but the huge amount of calculation makes it difficult to apply to large-scale scenarios. In contrast, GraphSAGE [4] proposes a sampling-based message passing paradigm from the perspective of nodes on a graph, making efficient computation of graph neural networks on large-scale graphs feasible. SAGE in GraphSAGE refers to SAmple and aggregateGatE, namely sampling and aggregation. Figure 3 below shows the sampling aggregation process of GraphSAGE. The left side of the figure shows the use of two-layer samplers to sample its first-order and second-order neighbors. The right side of the figure shows the characteristics of the sampled first-order and second-order neighbors are aggregated through the corresponding aggregation function to obtain the node The representation of A, and then the representation of A can be used to calculate various graph-related tasks including node classification, link prediction and graph classification.

46568139e7e286da00895739cc3973af.png

Figure 3 Sampling and aggregation process of GraphSage model

For graph neural network methods based on the message passing paradigm such as GraphSAGE, the range of features that can be aggregated by the central node depends on the order of its neighbors sampled. When using this type of graph neural network training, in addition to using the inherent characteristics of nodes as model input, we can also add independent trainable vector parameters to each node, so as to better learn the correlation of high-order neighbors.

In addition to the methods mentioned above, the graph neural network field is one of the research hotspots. In recent years, excellent algorithms such as GAT[5], FastGCN[6], and GIN[7] have emerged continuously, and have been used in Pinterest[8], Alibaba, etc. [9], Tencent [10] and other companies have achieved good results in large-scale recommendation scenarios.

3. Business scenarios and challenges

On the traffic side, the in-store recommendation advertising business mainly covers various business scenarios such as information flow advertisements and detail page advertisements on both sides of Meituan/Dianping (as shown in Figure 4 below), and the supply side includes catering, beauty medical beauty, leisure and entertainment , marriage, parent-child and other advertiser categories, and each category contains different recommendation candidate types such as merchants, group orders, and general commodities.

fcc6c867f6936a2a9bf86120ec110cfc.png

Figure 4 The main business scenarios of Meituan’s in-store recommendation advertisements: information flow advertisements (left), details page advertisements (right)

The recall model modeling in the business faces the following two challenges:

a. Sparse feedback data in the same scene : Traditional sequential behavior modeling schemes rely on user feedback data in the same scene to construct positive and negative samples for model training, but user interaction behavior in recommended advertising scenes is relatively sparse. According to statistics, more than half of active users are in There has been no ad click behavior in the past 90 days, and more than 40% of the advertised products have not been clicked in the past month. How to solve the inaccurate characterization of user interests and insufficient learning of long-tail products caused by sparse feedback data is a major challenge we face.

b. Delineation of interests in different time and space scenarios in the LBS business : In the in-store business, users often have completely different preferences in their browsing behaviors at different times and spaces. For example, a user who is near the company on weekdays may be interested in a convenient work meal; at home during the holidays, he may want to find an interesting place to take the baby. However, the traditional graph neural network lacks real-time perception of user request time and location. Therefore, how to mine the candidate set matching the current spatio-temporal scene from the rich information contained in the graph is also a big challenge.

In view of the above business characteristics and challenges, we designed a large-scale heterogeneous graph modeling based on the high-order relationship of full-scenario data, and optimized the sparsity problem with the help of rich behavioral data in the whole scene; interest in.

4. Evolution of image recall technology in recommended advertising

4.1 Large-Scale Heterogeneous Graph Modeling Based on High-Order Relationships of Full-Scenario Data

The team's previous recall model was trained only by constructing positive and negative samples based on the user's behavior in the advertising scene. This method improved the consistency between the training data and the prediction scene, but it also inevitably resulted in inaccurate description of user interests and long-tail product recommendations. problems such as poor performance. In particular, recall, as the most upstream link of the recommendation system, determines the upper limit of the optimization of the entire link effect. We expect to use the powerful expressive ability of the graph neural network to comprehensively describe user interests and product information based on user behavior data in the entire scene.

As shown in Figure 5, the graph network produces the implicit representation (Embedding) of the user (User) and the product (Item) respectively, and measures the user's potential interest in candidate advertisements by distance similarity. In the selection of graph neural network, we use GAT[5] with Attention structure, so that the contribution of neighbor information can be adaptively adjusted according to its importance to the source node, and suppress noise caused by misclicks; use Jumping Knowledge Network [11] self-adjusts its aggregation network range according to the connectivity of nodes, avoiding the loss of personalized information of popular nodes due to their extensive connectivity aggregation range.

5225aac82ccf3081eca77f941df7a84e.png

Figure 5 Graph modeling based on multi-order relationships of full-scenario data

Full-scenario data modeling : In order to fully mine users' interests and preferences, we built a super-large-scale heterogeneous graph network for modeling through full-scene behavior data. The full scene here covers full business (search, recommendation, advertisement), full location (home page, product detail page, group order detail page) and all product types (merchant, group order, pan-commodity, etc.). The heterogeneous graph contains two types of nodes, users (User) and items (Item), and are connected by three types of edges: User clicks on Item, Item clicks together, and Item shares the same store.

In order to enhance the effective transfer of the rich information contained in the full-scene data between each scene, and at the same time distinguish the unique interest characteristics of users in the advertising scene. During the graph construction process, we model the same item in the advertising scene and the non-advertising scene as different nodes, sharing the same non-advertising features, but the nodes with the advertising logo will add additional advertising-specific features. In this way, during the training process, the model can not only transfer the information of the non-advertising scene through the shared features, but also learn the user's unique interest preference in the advertising scene. After the graph is constructed, it contains hundreds of millions of nodes and tens of billions of edges.

6f0f1b68b4fe3c8d2fcecfd09350ad11.png

Figure 6 The whole scene graph construction process

Graph cropping and noise suppression : The heterogeneous graph mentioned above covers the behavior data of users in the whole scene, and the data scale is huge, which brings huge computing power and performance challenges to the actual implementation. We found that in the topological structure of the graph, the degree distribution of each node is extremely uneven, and the number of neighbors of some popular nodes can reach hundreds of thousands. Since each node only samples a fixed number of neighbors to participate in the calculation during the training process, too many Neighbors introduce a lot of noisy data and also bring unnecessary resource overhead. According to the business understanding behind the graph data, we reasonably tailor the original topology.

Specifically: for the "User clicks on the Item edge", keep the topN outbound edges with relatively recent behavior time; for the "Item common clicked edge", keep the topN outbound edges with higher edge weights. After the graph is clipped, the number of nodes remains the same, the number of edges is reduced by 46%, the training memory overhead is reduced by 30%, and the offline Hitrate effect is improved by about 0.68%.

08603a7defe348b5daf3e4ebc00ffc3b.png

Figure 7 Image clipping example (a > b > c in the design)

Dynamic negative sample sampling : Since advertising merchants account for a small proportion of all merchants, the introduction of full-scenario behavior data increases the training sample space by an order of magnitude, which further exacerbates the SSB (Sample Selection Bias) problem, and the negative sample sampling strategy becomes an influence A key factor in the effectiveness of the model. The common random negative sampling method has insufficient Hard Negative sample size, resulting in poor generalization of the model in actual prediction. The static negative sample sampling strategy, such as the common negative sample based on distance and category construction in LBS scenarios, can achieve certain effect improvement, but its versatility is poor, the policy configuration is cumbersome, and it cannot be migrated and adaptively iterated according to user interests.

Taking cities of different grades as an example, users have different preferences for distance and category, so different thresholds need to be set. Therefore, we propose an iterative training paradigm based on semi-supervised learning. The merchant embeddings output by the previous round of models are clustered through KMeans, and Hard Negative is sampled in the cluster set where the positive samples are located, and added to the next round. In the training samples of , follow this step cycle to guide the model to continuously "self-improvement".

Experiments have found that with the increase of iteration rounds, the marginal benefit of offline indicators will narrow; considering the balance between training speed and revenue, we use two iterations online. Compared with random negative sampling, this optimization brings about 4.66% improvement in offline Hitrate effect; compared with static negative sample strategy (such as sampling based on distance and category), it brings about 1.63% improvement in offline Hitrate effect.

edea86255ae78d7a2277db4b503a6e4c.png

Figure 8 Dynamic negative sample sampling process

The iterations of the above three optimization points have been implemented in multiple main advertising positions, and the RPS (Revenue Per Search) indicator for measuring advertising revenue has increased by about 5% to 10% .

4.2 End-to-end Heterogeneous Graph Modeling for Enhanced Spatiotemporal Information Awareness

In LBS business, spatio-temporal information is an important factor affecting user interest. Users usually have stable long-term interests, but they can also show changeable short-term interests due to the influence of current spatio-temporal information. Therefore, we upgrade on the basis of the full-scene heterogeneous graph modeling introduced in Section 4.1. According to the characteristics of stable long-term interest and changeable short-term interest, we adopt targeted measures to model the impact of spatio-temporal information on long-term and short-term interest respectively.

As shown in Figure 9 below, we describe the long-term interest preferences of users in different spatio-temporal scenarios through spatio-temporal subgraphs, and describe the evolution of users' interests in short-term spatio-temporal scenarios through sequential modeling of multi-factor synergistic activation. It is worth noting that, different from the two-stage training method that introduces the heterogeneous graph pre-training Embedding as a static feature, we conduct one-stage end-to-end training for each part of the model under the same optimization goal to avoid the inconsistency of the optimization goal. Effect loss.

466327f3baf76ed71151b67450be50e0.png

Figure 9 End-to-end heterogeneous graph modeling for enhanced spatio-temporal information perception

Spatio-temporal subgraph construction and multi-view fusion : Users show different interests in different time and space. For example, a user may order coffee in the office on weekdays and participate in sports in the gym on rest days. Only using the graph model from the global perspective to extract the user's global interest is easy to lose the user's interest differences in different time and space. The traditional graph model scheme obtains the user's unified interest representation through global information, and cannot accurately describe the user's interest differences in different spatio-temporal scenarios.

There have been some research works in the direction of graph representation learning combining spatio-temporal information, such as STGCN [12]. On the basis of related work, we start from the business scenario of recommending advertisements, based on the time and space information corresponding to user behavior, construct subgraphs from four perspectives: time, space, time & space, and global, and use the multi-view fusion module Gain long-term interest from users. It is worth noting that all subgraphs share the Item2Item edge, because the relationship between Items (such as stores, common clicks, etc.) is relatively stable and is not easily affected by temporal and spatial changes.

As shown in Figure 10 below, when a user request arrives, the user's interest at the current location is obtained from the spatial subgraph, the user's interest at multiple times is obtained from the time subgraph, and the user's current location is obtained from the time & space subgraph. Multi-time interest under the location, combined with global interest and current time, for multi-view fusion. In practice, we divide the time into four time periods: morning, afternoon, evening, and late night, and use Geohash to divide the location into multiple geographic regions. According to statistics, the time period and geographical area involved in each user's historical behavior are relatively concentrated, and will not cause excessive pressure on the storage space. The construction and fusion of spatio-temporal subgraphs brought about 3.65% offline Hitrate improvement.

8a5e5f243be68d337f97cf1ecac8c488.png

Figure 10 Multi-view Fusion

Multi-factor collaborative activation of user sequence modeling : We use time information (difference between current time and behavior sequence time) and location information (difference between current location and behavior sequence position) as activation factors to activate short-term behavior sequences and capture user The migration trend of interests over time and space. In addition, the user's long-term interest vector output by the graph neural network reflects the user's relatively stable interest preferences in time, location and other dimensions, and is also conducive to extracting real-time interests that match the current spatiotemporal scene from short-term sequences. When using spatio-temporal information and long-term interests of users to activate short-term behavior sequences of users, it involves the coordinated activation of multiple factors. The common solutions in the industry are shown in Figure 11 below:

288d7288a35cb9c4f4b02ff42f8357be.png

Figure 11 Synergistic activation of multiple factors

In the business scenario of Meituan LBS, various activation factors may interact with each other. For example, time and geographic location have different emphases on the activation of behavior sequences. In order to achieve the best effect of multi-factor activation, we choose the "multi-factor fusion activation" mode in combination with offline indicators. The user sequence modeling of multi-factor cooperative activation brings about 6.90% offline Hitrate improvement.

It is worth mentioning that the multi-order relationship mined by the graph neural network can enrich the expression of user sequences. This multi-level relationship is not only reflected between coarse-grained nodes such as goods and goods, users and goods, but also between fine-grained features such as time, location, and category. Therefore, we have upgraded the feature output process, so that the product nodes in the graph neural network can share the Embedding dictionary with the user behavior sequence in the feature dimension, and end-to-end training based on a unified optimization goal helps fine-grained multi-level information Better transfer between graph neural networks and user sequences.

The iteration of the above two optimization points has been implemented in multiple main advertising positions, and the RPS (Revenue Per Search) indicator for measuring advertising revenue has increased by about 5% .

5. Performance optimization and application

In order to be able to go online in large-scale scenarios and perform real-time recall, we have optimized the offline training and online deployment of the model.

aa529b464b74235bc6da694bea7d45a2.png

Figure 12 Performance optimization and application

Large-scale graph neural network training framework adapted to LBS scenarios : With the promotion of graph neural network in the industry, a large number of excellent graph neural network training frameworks have emerged in the open source community, such as Euler and DGL. On the basis of the open source framework, we matched the company's internal big data and machine learning platform, and developed a large-scale graph neural network training framework suitable for LBS scenarios. The framework supports large-scale graph construction, feature extraction and other composition operations, and additional development supports common LBS graph neural network operations including "dynamic sampling of location information". Through this framework, we have implemented online models in multiple business scenarios, among which the largest scale is a graph neural network model with hundreds of millions of nodes, tens of billions of edges, and side-information.

Low-latency online computing process : The recall link is the first funnel of the advertising recommendation system, which needs to select a high-quality subset from the full amount of candidate ads within a limited time and pass it to the downstream. In view of the huge challenges of online time-consuming complex operations such as subgraph search and graph convolution, we propose a low-latency online computing process optimization scheme: In the model introduced in Section 4.2, the graph model part is mainly used to represent the long-term interests of users , not affected by real-time behavior and request information. Therefore, we calculate the graph node Embedding offline and store it in the KV table to avoid the online derivation of the graph model from becoming a time-consuming bottleneck. At the same time, the graph node Embedding and other features are processed in parallel during online requests. the extraction process. Practice has shown that after the above optimization and recall process, the online time-consuming increase is less than 2%.

6. Summary and Outlook

Graph neural network has a good modeling ability for graph-structured data, can make full use of the high-order neighbor information of graph nodes, and shows great potential in the recall module of large-scale recommendation systems. The leading companies in the industry have combined their respective businesses The practice of the characteristic graph model [8][9][10].

This paper introduces the application of large-scale image recall technology in Meituan-to-store recommendation advertisements. Based on the analysis of the characteristics of the recommended advertising scene in the store, we have carried out corresponding optimization when implementing the map recall technology. In terms of models, in order to solve the problem of sparse advertising feedback data, we integrated the data of the whole scene into the graph model to enrich the expression of user interest, combined with graph cropping and dynamic negative sample sampling technology, the cumulative increase in Hitrate was about 5.34%. For the perception of LBS dynamic scene information such as time and space, we use the space-time sub-graph module to describe users' interests in different time and space, and perform multi-view fusion and long-term and short-term sequence fusion, with a cumulative increase of about 10.55%. Cooperating with offline training and online computing performance optimization, we have successfully landed on multiple main advertising positions, and the online RPS has increased by 10%~15%.

In the future, we will continue to explore in the following technical directions:

1. Multi-scenario knowledge transfer

There are many in-store advertising scenarios, and the maintenance cost of maintaining different graph recall models for different advertising positions is relatively high. Multi-scenario joint training can not only enrich graph data, improve the description of user interests, but also apply a single graph recall model to different advertising positions, reducing maintenance costs. However, there are differences in the behavior of users under different advertising positions, and improper data fusion may lead to the introduction of noise and affect the model training results. How to describe the commonalities and differences of users' behaviors under different advertising positions in the model design is a key consideration.

2. Dynamic graph technology

User interests are constantly changing over time and space. The dynamic graph model can build dynamic information such as time and space into the graph structure. Compared with artificially dividing long-term interests and short-term interests, dynamic graphs can more flexibly perceive changes in user interests and better suit the characteristics of LBS services.

7. About the author

  • Qi Yu, Li Gen, Shaohua, Zhang Teng, Cheng Jia, and Lei Jun are from Meituan Daodian Business Group/Advertising Platform Technology Department.

  • Xiangzhou, Mengdi, and Wuwei are from the NLP Center of the Meituan Platform/Search Recommendation Algorithm Department.

8. References

[1] Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "Deepwalk: Online learning of social representations." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.

[2] Grover, Aditya, and Jure Leskovec. "node2vec: Scalable feature learning for networks." Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.

[3] Welling, Max, and Thomas N. Kipf. "Semi-supervised classification with graph convolutional networks." J. International Conference on Learning Representations. ICLR, 2017.

[4] Hamilton, Will, Zhitao Ying, and Jure Leskovec. "Inductive representation learning on large graphs." Advances in neural information processing systems 30 (2017).

[5] Velickovic, Petar, et al. "Graph attention networks." International Conference on Learning Representations. 2018.

[6] Chen, Jie, Tengfei Ma, and Cao Xiao. "FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling." International Conference on Learning Representations. 2018.

[7] Xu, Keyulu, et al. "How powerful are graph neural networks." International Conference on Learning Representations. ICLR, 2019.

[8] Ying, Rex, et al. "Graph convolutional neural networks for web-scale recommender systems." Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018.

[9] Wang, Menghan, et al. "M2GRL: A multi-task multi-view graph representation learning framework for web-scale recommender systems." Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2020.

[10] Xie, Ruobing, et al. "Improving accuracy and diversity in matching of recommendation with diversified preference network." IEEE Transactions on Big Data (2021).

[11] Xu, Keyulu, et al. "Representation learning on graphs with jumping knowledge networks." International conference on machine learning. PMLR, 2018.

[12] Han, Haoyu, et al. "STGCN: a spatial-temporal aware graph learning method for POI recommendation." 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 2020.

----------  END  ----------

Team Profile

Meituan’s in-store advertising algorithm team is responsible for optimizing the advertising algorithm of in-store related businesses, continuously improving the monetization efficiency of commercial traffic on the premise of ensuring user experience and ROI of advertising merchants. The main technical directions include trigger strategy, quality estimation, mechanism design, idea generation, idea optimization, anti-cheating, merchant strategy, etc. The team has a strong technical atmosphere, and drives continuous business development through continuous breakthroughs in cutting-edge technologies; it attaches great importance to talent training, and has a complete and mature training mechanism to help members grow rapidly.

Meituan scientific research cooperation

Meituan's scientific research cooperation is committed to building a bridge and platform for cooperation between Meituan's technical team and universities, scientific research institutions, and think tanks. Relying on Meituan's rich business scenarios, data resources, and real industrial problems, open innovation, gather upward forces, and focus on robots , artificial intelligence, big data, Internet of Things, unmanned driving, operational optimization and other fields, jointly explore cutting-edge technology and industry focus macro issues, promote industry-university-research cooperation and exchange and achievement transformation, and promote the cultivation of outstanding talents. Facing the future, we look forward to cooperating with more teachers and students from universities and research institutes. Teachers and students are welcome to send emails to: [email protected].

maybe you want to see

  Scenario-based application and exploration of graph technology  under Meituan Waimai

  |  Practice and Exploration of Meituantu Neural Network Training Framework

  |  KDD Cup 2020 Automatic Map Learning Competition Champion Technical Solution and Practice in Meituan Advertising

read more

Frontend  | Algorithms  | Backend  |  Data   

Security  |  Android  |  iOS   |  Operation and Maintenance  |  Testing

Guess you like

Origin blog.csdn.net/MeituanTech/article/details/128030368