KDD Cup 2020 automatic map learning competition champion technical plan and practice in Meituan advertising

ACM SIGKDD (International Conference on Data Mining and Knowledge Discovery, KDD for short) is the top international conference in the field of data mining.

Based on its own business scenarios, the search advertising algorithm team of the Meituan to-store advertising platform has been continuously optimizing and innovating cutting-edge technologies. The team’s strength, Hu Ke, Qi Yi, Qu Tan, Ming Jian, Bo Hang, Lei Jun and Tang Xingyuan from the University of Chinese Academy of Sciences jointly formed the participating team Aister, participated in the Debiasing, AutoGraph, and Multimodalities Recall three questions, and finally won the championship in the Debiasing circuit ( 1/1895), also won the championship (1/149) in the AutoGraph track, and won the third place in the Multimodalities Recall track (3/1433).

This article will introduce the technical solutions of AutoGraph competition questions, as well as the application and research of the team's graph representation learning in the advertising system. I hope it will be helpful or inspiring for everyone.

background

ACM SIGKDD (International Conference on Data Mining and Knowledge Discovery, KDD for short) is the top international conference in the field of data mining. The KDD Cup is a top international event in the field of data mining research hosted by SIGKDD. It has been held annually since 1997 and is currently the most influential event in the field of data mining. The competition is oriented to both the business and academic circles, gathering top experts, scholars, engineers, and students in the world's data mining industry to participate, providing data mining practitioners with a platform for academic exchanges and research results display. KDD Cup 2020 has a total of five competition questions (four tracks), which involve data bias (Debiasing), multimodalities recall (Multimodalities Recall), automated graph learning (AutoGraph), adversarial learning problems and reinforcement learning problems.

Figure 1 KDD 2020 conference

 

In recent years, Graph Neural Networks (GNN) have been widely used in various fields such as advertising systems, social networks, knowledge graphs and even life sciences. There are rich User-Ad, Query-Ad, Ad-Ad, Query-Query and other structural relationships in the advertising system. The search advertising algorithm team has successfully applied graph representation learning to the advertising system, and the business effect has been certain Promote. In addition, based on the technical accumulation of learning on the map of the advertising system, the team won the first place in this year's KDD Cup AutoGraph track. This article will introduce the technical solutions of AutoGraph competition questions, as well as the application and research of the team’s graph representation learning in the advertising system. I hope it can be helpful or enlightening to students engaged in related research.

Figure 2 KDD Cup 2020 AutoGraph competition TOP 5 list

 

Competition question introduction and problem analysis

Overview of AutoGraph issues

The Automated Graph Representation Learning Challenge (AutoGraph) is the first AutoML challenge ever applied to graph structure data. It is a combination of the two cutting-edge fields of AutoML and Graph Learning. The AutoML track challenge in KDD Cup 2020 is provided by Fourth Normal Form, ChaLearn, Stanford University and Google.

Graph structure data is ubiquitous in the real world, such as social networks, paper networks, and knowledge graphs. Graph representation learning has always been a very popular topic. Its goal is to learn the low-dimensional representation of each node in the graph, which can then be used for downstream tasks, such as friend recommendation in social networks, or classification of academic papers into citation networks Different themes. Traditional methods generally use heuristics to extract the characteristics of each node from the graph, such as degree statistics or similarity based on random walks. In recent years, the industry has proposed a large number of complex models for graph representation learning tasks, such as graph neural network (GNN) [1], which has helped many tasks (such as node classification or link prediction) to achieve new results.

However, whether it is the traditional heuristic method or the recent GNN-based method, a large amount of computing and professional knowledge resources are required, and only in this way can satisfactory task performance be obtained. For example, in Deepwalk[2] and Node2Vec[3], two well-known random walk-based methods must be fine-tuned to obtain various hyperparameters, such as the walk length and number of each node, and window Size, etc., for better performance. When using GNN models, such as GraphSAGE[4] or GAT[5], we must spend a lot of time to select the best aggregation function in GraphSAGE or the number of multi-head self-attention heads in GAT. Therefore, human experts need to spend a lot of time and effort in the parameter adjustment process, which limits the application of the existing graph representation model.

AutoML [6] is an effective method to reduce labor costs in machine learning applications, and has achieved encouraging results in hyperparameter adjustment, model selection, neural architecture search, and feature engineering. In order to enable more people and organizations to make full use of its graph structure data, the KDD Cup 2020 AutoML track held an AutoGraph competition for graph structure data. In this competition, participants should design a solution to automatically map the learning problem (without any manual intervention). The solution can effectively and efficiently learn the high-quality representation of each node based on the given features, neighborhood and structure information of the graph. The solution should be designed to automatically extract and utilize any useful signal in the graph.

Aiming at the cutting-edge field of automated graph learning, this AutoGraph competition selected graph node multi-classification tasks to evaluate the quality of representation learning. The competition officials have prepared 15 graph structure data sets, 5 of which are available for download, so that participants can develop their solutions offline. In addition, another 5 feedback data sets will be provided to participants to evaluate the public leaderboard score of their AutoGraph solution. After that, without manual intervention, the final submission of the competition will be evaluated in the remaining 5 data sets, which are always invisible to the participants, and the evaluation ranking will eventually be used to evaluate all participants s solution. Moreover, these data sets are collected from real businesses and randomly divided into training sets and test sets. Each data set is given graph node id and node characteristics, as well as graph edge and edge weight information, and each data set has Give time budget. Participants must design an automated graph learning solution under a given time budget and computing power memory constraints, and classify each data set. Each data set will be evaluated for accuracy through accuracy. The accuracy can determine the ranking of the contestants in each data set. The final ranking will be evaluated based on the average ranking of the last 5 data sets.

Data analysis and problem understanding

We analyzed five offline graph data sets and found that the graph types are diverse, as shown in Table 1 below. From the average degree of the graph, it can be seen that the offline graphs 3 and 4 are denser, while graphs 1, 2, and 5 are sparser. From the number of features, it can be seen that graphs 1, 2, 3, and 4 have node features, while graph 5 has no knot Point features, and we found that Figure 4 is a directed graph and Figures 1, 2, 3, and 5 are undirected graphs. We consider dividing the graph types into directed graphs/undirected graphs, dense graphs/sparse graphs, and feature graphs /No feature map etc.

From Table 1, we can also see that most of the graph data sets have a time limit of about 100 seconds, which is a very short time limit. Most neural network architectures and hyperparameter search schemes [7,8,9, 10] Both require a long search time, tens of hours or even days for architecture and hyperparameter searches. Therefore, unlike neural network architecture search, we need a structure and hyperparameter fast search scheme.

Table 1 Overview of five offline graph data sets

As shown in Figure 3, we found that there is a problem of model training instability on data set 5, and the accuracy of the validation set of the model drops significantly on a certain epoch. We consider mainly that the graph dataset 5 is easy to learn and over-fitting will occur. Therefore, we need to ensure the robustness of the model in the process of automated modeling.

Figure 3 Model instability during training

At the same time, from Figure 4 below, it can be found that, unlike the traditional fixed data set evaluation data mining competition, it is more important to ensure the stability of the ranking of multiple types and large distribution differences than to optimize the accuracy of a certain data set. For example, the model accuracy difference of data set 5 is only 0.15%, which results in a difference of ten rankings, and the model accuracy difference of data set 3 is 1.6%, but it only leads to a difference of 7 rankings. Therefore, we need to adopt a robust ranking method. Modular method to enhance the stability of the data set ranking.

Figure 4 Accuracy and ranking of different participating teams on different data sets

Problem challenge

Based on the above data analysis, there are the following three challenges in this competition:

  • Diversity of graph data : The solution should achieve a good effect on multiple different graph structure data. There are many types of graphs, including directed graphs/undirected graphs, dense graphs/sparse graphs, and bands. Feature map/no feature map, etc.

  • Ultra-short time budget : The time of most data sets is limited to about 100s, and a quick search solution is required for the search of graph structure and parameters.

  • Robustness : In the field of AutoML, robustness is a very important factor. The last submission requires the contestants to automate modeling on a data set that they have not seen before.

Competition technical solution

In response to the above three challenges, we designed an automated graph learning framework, as shown in Figure 5 below. We preprocess the input graph and construct graph features. In order to overcome the challenges of graph diversity, we designed multiple graph neural networks, each graph neural network has its own advantages for different types of graphs. In order to overcome the challenge of ultra-short time budget, we adopted a graph neural network structure and hyperparameter fast search method, using a smaller search space and fewer training rounds to achieve a faster search speed. In order to overcome the robustness challenge, we designed a multi-level robustness model fusion strategy. Finally, our automated graph learning solution can classify multiple nodes with different graph structures in a short period of time, and achieve robustness. Next, we will introduce the entire solution in detail.

Figure 5 Automated graph learning framework

Data preprocessing and feature construction

Directed graph processing : Most spectral domain GNN methods cannot handle directed graphs well. Their theory relies on the spectral decomposition of the Laplacian matrix, and the adjacency matrix of most directed graphs is asymmetric and cannot be directly Define the Lapley matrix and its spectral decomposition. In particular, when a node has only in-degree and no out-degree, methods such as GCN cannot effectively obtain neighbor information. Since the competition questions focus on node classification rather than link prediction, etc., considering most graph node classification problems, the more important thing is how to effectively extract the neighbor information of the graph, so we change the edge of the directed graph to For undirected graphs, the weight of the new edge of the undirected graph is equal to the weight of the reversed edge of the directed graph.

Feature extraction : In order to perform node representation learning more effectively, some artificial features of the graph are extracted to allow GNN to perform better representation learning, such as the node degree, first-order neighbors, and second-order neighbors' feature averages. We The features with a large numerical span are divided into buckets, and these features are embedding to avoid over-fitting while ensuring the stability of the numerical value.

Graph neural network model

In order to overcome the challenges of graph diversity, we combined two types of graph neural network methods in spectral domain and spatial domain, and adopted four graph neural network models of GCN[11], TAGConv[12], GraphSAGE[4] and GAT[5]. Different graph structure data for better representation learning, each model has its own advantages for different types of graph structure data.

As a kind of non-European spatial structure data, the number of neighbor nodes is variable and disordered, so it is difficult to design the convolution kernel directly. The spectral domain method uses the spectral decomposition of the graph Laplacian matrix and performs Fourier transform on the graph to obtain the graph convolution function. GCN is a classic method of spectral domain, the formula is as follows, where D is a diagonal matrix, each diagonal element is the degree of the corresponding node, and A is the adjacency matrix of the graph, which is obtained by adding a self-loop to each node So that the convolution function can obtain its own node information, the A-cap and D-cap matrices in the figure are the results after adding self-loops, and after the Fourier transform, the Chebyshev first-order expansion approximate spectral convolution is used to make each volume The build-up layer only processes first-order neighborhood information, and multi-level neighborhood information propagation can be achieved by stacking multiple convolutional layers. GCN is simple and effective. We apply GCN to all data sets, and most of the data sets can achieve better results.

Compared with the GCN method of stacking multiple layers to obtain multi-level domain information, TAGConv obtains multi-level domain information through the polynomial topological connection of the adjacency matrix. The formula is shown below. It can be found that by pre-calculating the k-th power of the adjacency matrix, compared to GCN, it can achieve parallel calculation of multi-order neighborhood convolution during the training process. The results of high-order neighborhoods are not affected by the results of low-order neighborhood Therefore, it can accelerate the learning of the model in the high-order neighborhood. In our experimental results, it can quickly converge on sparse graphs and achieve a better effect than GCN.

Compared with the spectral domain method that uses Fourier transform to design the convolution kernel parameters, the core of the spatial domain method is to directly aggregate the information of neighbor nodes. The difficulty lies in how to design a parameterized and learnable convolution kernel. GraphSAGE proposes a classic spatial learning framework, which introduces a convolution kernel with parameter learning through graph sampling and aggregation. Its core idea is to sample a fixed number of neighbors for each node, so that various aggregation functions can be supported. The formula of the mean aggregation function is shown below, where the aggregation function can be replaced with the maximum aggregation, or even a neural network such as LSTM with parameters. Since GraphSAGE has a neighbor sampling operator, we introduce the graph neural network to greatly accelerate the calculation of dense graphs. In our experimental results, its running time on dense graphs is much shorter than other graph neural networks, and it can achieve a better effect.

The GAT method introduces the Attention mechanism into the graph neural network, and the formula is shown below. It calculates the weight of each node and its neighbors through the Attention between the features of the graph nodes, and aggregates the nodes and their neighbors through the weights as the next level of representation of the nodes. Through the Masked Attention mechanism, GAT can handle a variable number of neighbor nodes, and it uses the characteristics of the graph node and its neighbor nodes to learn the weight of the neighbor aggregation, and can effectively use the feature information of the node to perform graph convolution , The generalization effect is stronger, it refers to the Transformer introduced Multi-head Attention to improve the fitting ability of the model. Because GAT uses node features to calculate the weights between nodes and neighbor nodes, it performs well on data sets with node features, but if the feature dimensions are large, GAT calculations will be slow, and memory overflow may even occur. , We need to limit the search of GAT parameters when there are many feature dimensions, and require it to search in a space with a smaller parameter amount.

Super parameter quick search

Due to the challenge of ultra-short time budget, we need to design a super-parameter fast search method to ensure that it takes less time to search for each graph model and use as many graphs as possible on each data set. The model is trained and predicted. As shown in Figure 6 below, we divide the parameter search into two parts: offline search and online search.

Figure 6 Super parameter quick search

When we search offline, we use a large search space on multiple data sets for each graph model to determine the graph structure and parameter boundaries to ensure that each data set has a better effect in this boundary. Specifically, we searched most parameters of different models for different graph types such as directed graph/undirected graph, sparse graph/dense graph, feature graph/non-feature graph, and identified several important hyperparameters. For example, for sparse graphs, adjust the number of GCN layers and the order of the TAGConv polynomial to make the convolution receptive field larger, and quickly fit the data set so that it can converge quickly; for graphs with particularly many features, adjust The number of convolutional layers, the number of multi-headed self-attention heads, and the number of hidden neurons in GAT make the training time within budget and have better results; for dense graphs, adjust the neighbor sampling of GraphSAGE to make it train Can be accelerated. We mainly determined the boundaries of the three important parameters of the learning rate of different graph models, the number of convolutional layers, and the number of hidden neurons.

Due to the limitation of online time budget, we determined a small parameter search subspace to search through offline parameter boundaries. Since the time budget is relatively small, we do not have enough time to do a complete training verification search on the parameters, so we designed a fast parameter search method. For the hyperparameter space of each model, we compare the accuracy of the validation set through a small number of epochs training to determine the hyperparameters. As shown in Figure 7 below, we use 16 rounds of model training to select the learning rate of 0.003 with the best validation set accuracy. Our goal is to determine which hyperparameters can make the model fit the data set quickly, instead of pursuing the best choice Hyper-parameters, which can reduce the search time of hyper-parameters and the time of subsequent model training. Through fast hyperparameter search, we ensure that each model can determine the hyperparameters in a short period of time on each data set, so as to use these hyperparameters to train each model.

Figure 7 Validation set accuracy of different learning rates under a small number of epochs model training

Multi-level robust model fusion

Since the final ranking is determined by the average ranking of the data set in this competition, robustness is particularly important. In order to achieve robust results, we adopt a multi-level robust model fusion strategy. As shown in Figure 8 below, we segment at the data level to train multiple groups of models. Each group of models includes a training set and a validation set. The accuracy of the validation set uses Early Stopping to ensure the robust effect of each model. Each group of models includes a variety of different graph models, and each graph model is trained with n-fold bagging for fusion to achieve a stable effect. Different types of graph models differ greatly in verification accuracy, so we need to perform dense adaptive weighted fusion of different types of graph models to take advantage of the differences of different models on different data sets. Finally, we then perform mean fusion on each set of graph models to take advantage of the differences between the data.

Figure 8 Multi-level robust model fusion

Dense adaptive weighted fusion: As shown in Figure 4, because some graph data sets are relatively sparse and featureless and easy to fit, the accuracy difference between players is small but the ranking difference is large. For example, the model accuracy difference of data set 5 is only 0.15%, which results in a difference of ten rankings, and the model accuracy difference of data set 3 is 1.6%, but it only leads to a difference of 7 rankings. Therefore, we use a variety of graph models Dense adaptive fusion method.

The fusion weight is shown in the following formula, where #edges is the number of edges, #nodes is the number of nodes, then #edges/#nodes is the density of the graph, acc (Accuracy) is the accuracy of the model verification set, alpha, beta, Gamma is a hyperparameter, and the weight of each model is determined by weight. It can be seen from the following formula that if the graph is dense enough, we only need to get the model weight based on the difference in model accuracy, without adaptive adjustment based on the density. The parameter alpha is the density critical value for whether to perform the density adaptive weighting; If the graph is sparse enough, the model weight is related to the accuracy of its validation set and the density of the data set. The sparser the graph, the greater the difference in model weights. This is because the sparser the graph, the smaller the difference in model accuracy, but the greater the difference in ranking between players, we need to give better models more weight to ensure the stability of the ranking.

evaluation result

Table 2 shows the test accuracy of different graph models on five offline graph data sets. It is consistent with the characteristics described in the graph neural network model chapter. GCN has good results on each graph data set. While TAGConv has better results on sparse graph datasets 1, 2, and 5, GraphSAGE achieves the best results on dense graph datasets 4. GAT performs better in characteristic datasets 1, 2, and 4. Model fusion can achieve more stable and better results on each data set.

Table 2 Test accuracy of different graph models on five offline graph data sets

As shown in Table 3 below, our solution achieves robustness on each graph data set. The ranking of each data set maintains a relatively leading level and avoids overfitting, thus achieving an average ranking. First, in the end, our Aister team won the championship on the KDD Cup 2020 AutoGraph track.

Table 3 The average ranking of the top 5 participating teams in all graph data sets on the last 5 data sets and their individual ranking in each graph data set

Advertising business application

The search advertising algorithm team is responsible for the search advertising and screening list advertising business on the dual platforms of Meituan and Dianping. The business types involve catering, leisure and entertainment, beauty, and hotels. The rich business types bring great space and challenges to algorithm optimization. In Meituan’s rich search advertising business scenarios, the node types are very rich, including user, Query, Ad, geographic location and even other subdivided combination nodes. The edge relationship between the nodes is also very diversified, which is very suitable for graphs. Learn to model. We conducted in-depth optimization of map learning in the trigger module of search ads and the click-through rate estimation module, which brought about an improvement in business results.

Not only do the nodes have rich edge relationships, but each node has rich attribute information. For example, the Ad store contains structured statistical information such as store name, category, address location, star rating, sales volume, customer unit price, and number of clicks to buy. Therefore, our graph is a typical heterogeneous attribute graph. At present, in the search advertising scenario, we mainly focus on heterogeneous attribute graphs containing two types of nodes: Query and Ad.

As shown in Figure 9 below, we build a graph containing Query nodes and Ad nodes, which are applied to the trigger module and the click-through rate estimation module. At present, the edge relations used in this graph mainly include the following:

  • Query-Query Session : multiple Query submissions by the user in one session;

  • Query-Query Similarity Mining : Query-Query correlation data mining based on user browsing click logs;

  • Query-Ad Click : Click on Ad under Query;

  • Ad-Ad CoClick : In the same request or user behavior sequence, two Ad clicks together.

Figure 9 Construction of heterogeneous graph

The graph model is mainly used in the vector recall of advertising Ad in the triggering module. It constructs the Ad vector index offline, estimates the Query vector online in real time, and recalls the highly relevant advertising Ad through the ANN retrieval method. Compared with the traditional Bidword-based triggering method, the vectorized recall based on the graph model has obvious advantages in semantic relevance and long-tail traffic, and it significantly improves the efficiency of advertising monetization by increasing the recall rate.

Figure 10 shows a trigger graph network based on graph representation of multi-task learning. We use MetaPath-based Node2Vec walk to generate positive examples, and negative examples are obtained through global sampling. When sampling negative examples, we limit the category of negative examples to be consistent with positive examples. Otherwise, because category features are used in features, the model will easily learn to use category features to distinguish positive and negative examples, weakening the learning degree of other features, resulting in The model is not well distinguished among nodes of the same category. And in the case of negative sampling, the weight of the node is used for Alias ​​sampling to ensure that the distribution is consistent with the positive example. In order to enhance the generalization ability to solve the cold start problem, we use the attribute features corresponding to each node instead of the node id feature. These generalization features can effectively alleviate the problem of unpopular nodes, nodes that do not appear in heterogeneous graphs. , According to its attribute characteristics, real-time estimation of online new query or ad vector.

At the same time, different deep network structures are used for different node types. For Query nodes, we use the LSTM-RNN network based on word granularity and word granularity, and Ad nodes use the SparseEmbedding+MLP network. For heterogeneous edge types, we hope to characterize the influence of different edges during model training. For the same node, there is a single deep network corresponding to different edges, and the Embedding generated by the deep network of multiple edges is fused by Attention to form the final Embedding of the node. In order to make full use of the structural information of the graph, we mainly adopt the node information aggregation method proposed in GraphSage. In the process of generating vectors at this node, in addition to using the attribute features of this node, neighbor aggregation vectors are also used as feature input to improve the generalization ability of the model.

In addition, in the Meituan O2O scenario, context information such as the user's visit time and geographic location is very important. Therefore, we tried the multi-objective joint training of the graph model and the dual-tower depth model. The dual-tower model uses user browsing click data, which contains rich context information. Query first obtains Context-independent static vectors through the graph model, and then splices them with Context feature Embedding, and obtains Context-Aware dynamic query vectors through the fully connected layer.

Figure 10 Trigger network based on graph representation learning

In the click-through rate estimation module, compared to the trigger module that focuses on correlation modeling, it focuses more on the personalized expression of users. The graph structure data can supplement and expand the user behavior sequence, play the effect of mining the potential multi-peak interest of the user, thereby increasing the user click rate. We introduce the graph neural network into the DSIN (Deep Session Interest Network) network to introduce more divergent user interest expansion into Session structured modeling. The global graph structure information not only effectively expands the user's potential points of interest, but the GNN Attention mechanism can combine the target Ad with the potential interest Ad information in the graph to further dig out the user's target interest.

As shown in Figure 11, for any user behavior sequence, each Ad in the sequence can be traversed by adjacent points in the Ad graph to obtain other Ad expressions whose interests are close; the user behavior sequence is the user's click sequence, which can be regarded as The display expression of user interest; the sequence obtained through the expansion of the Ad graph is the sequence composed of the most similar Ad in the graph data, and can be regarded as the expression of the user's potential interest. For modeling the original user behavior sequence, the current baseline adopts the DSIN model; for the modeling of the extended sequence, the related method of graph neural network is adopted, and the interest vector is obtained by GNN attention processing and crosses with the target Ad. Our experiments show that on the basis of the DSIN baseline model, the extended sequence can further improve the accuracy.

In the future, we will further explore the application of graph models in the click-through rate module, including graph models based on user intent.

Figure 11 Personalized prediction network based on graph neural network

Summary and outlook

KDD Cup is an international competition that is very closely connected with the industry. Every year, the competition questions are closely related to the hot issues and practical issues in the industry, and the Winning Solutions produced over the years have a great impact on the industry. For example, KDD Cup 2012 produced FFM (Feild-Aware Factorization Machine) and XGBoost prototypes, which have been widely used in the industry.

This year's KDD Cup mainly focuses on the fields of automated graph representation learning and recommendation systems. Graph representation learning has been a hot spot in the academic world in recent years and has also been widely used in the industry. The AutoML field is dedicated to exploring the end-to-end full automation of machine learning. The combination of AutoML and graph representation learning two major research hotspots helps to save labor costs for a lot of exploration on graphs, and solves more complex graphs. Network tuning issues.

This article introduces the solution of the KDD Cup 2020 AutoGraph competition problem of the search advertising algorithm team. Through data analysis of the given offline data set, we have located the three main challenges of the competition problem and adopted an automated graph learning framework. The combination of the graph neural network solves the challenge of the diversity of graph data. The hyperparameter fast search method is used to ensure that the running time of the automated modeling solution is within the budget, and the multi-level robust model fusion strategy is adopted to ensure that the Robustness of the data set.

At the same time, this article also introduces our business application of graph learning in the Meituan search advertising trigger module and click-through rate estimation module. This competition also gives us a further understanding of the research direction of automated graph representation learning. In future work, we will further optimize the graph model based on the experience gained in this competition, and try to optimize the advertising system through AutoML technology to solve the problems of model optimization and feature optimization that are difficult to traverse manually in the system.

references

[1] Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020.

[2] Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014: 701-710.

[3] Grover A, Leskovec J. node2vec: Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016: 855-864.

[4] Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs[C]//Advances in neural information processing systems. 2017: 1024-1034.

[5] Veličković P, Cucurull G, Casanova A, et al. Graph attention networks[J]. arXiv preprint arXiv:1710.10903, 2017.

[6] He X, Zhao K, Chu X. AutoML: A Survey of the State-of-the-Art[J]. arXiv preprint arXiv:1908.00709, 2019.

[7] Elsken T, Metzen J H, Hutter F. Neural architecture search: A survey[J]. arXiv preprint arXiv:1808.05377, 2018.

[8] Zhou K, Song Q, Huang X, et al. Auto-gnn: Neural architecture search of graph neural networks[J]. arXiv preprint arXiv:1909.03184, 2019.

[9] Gao Y, Yang H, Zhang P, et al. Graphnas: Graph neural architecture search with reinforcement learning[J]. arXiv preprint arXiv:1904.09981, 2019.

[10] Zhang C, Ren M, Urtasun R. Graph hypernetworks for neural architecture search[J]. arXiv preprint arXiv:1810.05749, 2018.

[11] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907, 2016.

[12] Du J, Zhang S, Wu G, et al. Topology adaptive graph convolutional networks[J]. arXiv preprint arXiv:1710.10370, 2017.

About the Author

Strong, Hu Ke, Jin Peng, and Lei Jun are all from the search advertising algorithm team of the Meituan advertising platform.

Tang Xingyuan, from the University of Chinese Academy of Sciences.

About Meituan AI

Meituan AI takes "helping people eat better and live better" as its core goal, and is committed to exploring cutting-edge artificial intelligence technology in actual business scenarios, and quickly implementing it in real life service scenarios to complete offline Digitization of the economy.

Meituan AI was born out of the rich life service scene demands of Meituan, and it has the uniqueness and advantages of scene-driven technology. Based on business scenarios and rich data, through image recognition, voice interaction, natural language processing, and distribution scheduling technology, it can be used in real scenarios such as unmanned distribution, unmanned micro warehouses, and smart stores, covering all aspects of people's lives, using technology Assist users in improving the quality of life, upgrading the industry’s intelligence, and even building a new infrastructure for life services for the entire society.

For more information, please visit: https://ai.meituan.com/ 

----------  END  ----------

Job Offers

The search advertising algorithm team of the Meituan advertising platform is based on the search advertising scene, exploring the most cutting-edge technological development of deep learning, reinforcement learning, artificial intelligence, big data, knowledge graphs, NLP and computer vision, and exploring the value of local life service e-commerce. The main work directions include:

Triggering strategy : user intention recognition, advertising business data understanding, Query rewriting, deep matching, correlation modeling.

Quality estimation : modeling of advertising quality. Estimated click-through rate, conversion rate, customer unit price, and transaction volume.

Mechanism design : advertising ranking mechanism, bidding mechanism, bid suggestion, traffic estimation, budget allocation.

Creative optimization : intelligent creative design. Optimize the display creativity of advertising pictures, text, group orders, discount information, etc.

job requirements:

  • Have more than three years of relevant work experience, and have application experience in at least one aspect of CTR/CVR estimation, NLP, image understanding, and mechanism design.

  • Familiar with commonly used machine learning, deep learning, and reinforcement learning models.

  • Excellent logical thinking ability, passion for solving challenging problems, sensitive to data, and good at analyzing/solving problems.

  • Master degree or above in computer and mathematics related majors.

The following conditions are preferred:

  • Have relevant business experience in advertising/search/recommendation.

  • Have experience in large-scale machine learning.

Interested students can submit their resumes to: [email protected] (please indicate the title of the email: Guangping Search Team).

Maybe you still want to watch

KDD Cup 2020 Debiasing competition champion technical solution and practice in Meituan advertising

The design and implementation of the real-time index of Meituan Dianping ads

Design and Implementation of Meituan Dianping Performance Advertising Experimental Configuration Platform

Guess you like

Origin blog.csdn.net/MeituanTech/article/details/108271434