Practice of multi-scenario and multi-task learning in Meituan's restaurant recommendation

With the continuous development of recommendation algorithm technology, cross-scenario learning has attracted the attention of more and more researchers. Inspired by related technologies in the industry, the Meituan Dine Algorithm Team continues to explore the optimization of multi-scenario recommendations for in-store dining, and has accumulated a lot of application experience in the field of multi-scenario multi-task learning recommendation. The team trained a unified multi-scenario and multi-task learning model using the global recommendation scene data for in-store catering, which reduced repetitive development and implemented it in multiple in-store catering recommendation scenarios, achieving remarkable results.

This article elaborates on the solution of multi-scenario and multi-task learning in Meituan's in-store catering business. The academic paper "HiNet: Novel Multi-Scenario & Multi-Task Learning with Hierarchical Information Extraction" based on this solution has been accepted by the International Data Engineering Conference Included in ICDE 2023.

1. Background

With the explosive growth of network information and services, recommender systems have become a key component in providing users with high-quality personalized decision-making suggestions and experiences. In traditional recommendation systems, model services usually need to be developed separately for specific scenarios to adapt to the differences in data distribution and feature space in different scenarios. However, in Meituan and other industrial Internet platforms, there are usually a variety of recommendation scenarios (such as homepage information flow, vertical sub-channels, etc.) that act on the decision-making link of user access, and based on the personalized recommendation model of each scenario, then Display items are sorted and finally presented to the user.

In the Meituan Diandian (hereinafter referred to as Dinning) platform, with the development trend of business refinement, more and more scenarios need to customize the construction of the recommendation system to meet the personalized needs of users for dining in the store. As shown in Figure 1 below, in reality, users often browse and click between multiple different scenarios, and finally make a deal.

Figure 1 User interaction process on the Meituan Dafan platform

However, with the increase in the number of recommended scenarios, the traditional independent development of recommendation models for a single scenario often leads to the following problems:

  1. Modeling only based on the data of a single scene itself cannot take advantage of the rich behavior information of users in cross-scenes, ignoring the common information of the scenes, especially considering that there may be repeated products displayed in multiple scenes (in the above figure 1 , the ones in the red rectangle are actually the same product).
  2. In some long-tail business scenarios, due to the small traffic and sparse user behavior, the amount of data is not enough for the model to model effectively.
  3. Since the feature mining, model training, and online deployment of each scenario are independently developed and isolated from each other, this will greatly increase the computing cost and maintenance burden.

Generally speaking, the recommendation algorithm has many limitations in modeling each scene separately. However, simply combining multiple scene datasets to train a ranking model to provide services cannot effectively capture the unique information of each scene.

In addition, in addition to the multi-scenario recommendation problem, there are usually different metrics such as user satisfaction and engagement in each scenario that need to be jointly optimized, such as click-through rate (CTR) and click-to-click conversion rate (CTCVR). Therefore, it is necessary to develop an effective and unified framework to solve the problem of optimizing the complexity of various indicators in multiple scenarios (ie, multi-scenario multi-task optimization problem).

In some recent studies, related methods often model multi-scene recommendation as a multi-task learning (Multi-Task Learning, MTL) problem, and most of these methods use multi-gate mixed experts (Multi-gate Mixture -of-Experts, MMoE) network framework as the basis for model improvement to learn commonalities and properties between scenes. However, this MTL-based method often projects the data information of multiple scenes into the same feature space for optimization, which is difficult to fully capture the complex relationship between many scenes with multiple tasks, and thus cannot further improve the multi-scene Performance of scene multi-task learning models.

Intuitively, multi-scenario and multi-task information modeling should belong to different levels of optimization and should be processed hierarchically. Therefore, in this paper, we propose a Hierarchical information extraction Network (HiNet). Specifically, we design an end-to-end two-layer information extraction framework to jointly model information sharing and collaboration between scenes and tasks.

First, at the Scenario Extraction Layer, HiNet can extract scenario sharing information and scenario-specific information through a separate expert module. In order to further strengthen the representation learning of the current scene, we design a Scenario-aware Attentive Network (SAN) to explicitly learn the contribution of other scenes to the information representation of the current scene.

Then, in the Task Extraction Layer, a custom gating network composed of task-sharing and task-specific expert networks is used to effectively alleviate the parameter interference between shared information and task-specific information in multi-task learning.

By separating the information extraction of the scene layer and the task layer in the model structure, multiple tasks in different scenarios can be clearly divided into different feature spaces for optimization, which is conducive to improving the performance of the model.

The main innovations of the whole paper are as follows:

  1. We propose HiNet, a novel multi-scenario multi-task learning model for optimizing multiple task metrics in multiple scenarios, in which a hierarchical information extraction architecture is innovatively applied.
  2. In the scene information extraction layer, we propose the scene-aware attention network SAN module, which further enhances the ability of scene information modeling.
  3. Experiments with offline evaluation and online A/B testing demonstrate that HiNet outperforms current leading methods. At present, HiNet has been fully deployed in two scenarios of Meituan Dianzhong.

2. Hierarchical information extraction network

2.1 Problem Definition

As mentioned above, we mainly focus on the optimization problem of multi-scenario and multi-task recommendation. We define the problem as: y ^ ij = fij ( x , si ) \hat{y}_i^j=f_i^j\left(x, s_i\right)y^ij=fij(x,si) , among whichsi s_isiIndicates the iii scene indicators,y ^ ij \hat{y}_i^jy^ijright iiTask jjin i sceneEstimated value of j , xxx represents the dense features of the input.

The original feature input mainly includes user portrait features, user behavior features, current scene-specific features, and product features. The numerical features are first converted into classification features, and then all classification features are mapped to a low-dimensional vector space to obtain xxx . Considering the specific optimization goals in the Meituan Dafan platform, we set two tasks, CTR and CTCVR, for each scene.

2.2 Method introduction

This section will introduce HiNet, a hierarchical information extraction network model. As shown in Figure 2-(A) below, the HiNet model mainly includes two core modules, the scene extraction layer and the task extraction layer. The scene extraction layer mainly includes the scenario-shared expert module, the current scenario-specific expert module and the scene-aware attention network. Through the information extraction of these three parts, the scene level is finally formed. In the task extraction layer, we use the custom gating network module CGC (Customized Gate Control) to model the multi-task learning of the current scene. The key parts of the HiNet model mentioned above will be described in detail below.

Figure 2 Hierarchical information extraction Network model structure

2.2.1 Scenario Extraction Layer

The role of the scene extraction layer is to extract scene-specific information representations and valuable information representations shared between scenes, which is the basis for improving the ability of task-level information representation. In the scene extraction layer, it mainly includes three parts: scene sharing expert network, scene-specific expert network and scene-aware attention network, which will be introduced in turn below.

  1. Scene sharing / unique expert network

Considering the cross-scenario interspersed behavior of users and the phenomenon of product overlap between multiple scenarios, there is valuable shared information among the data of multiple scenarios in the catering business. So strategically, we design a scene-sharing expert network. Influenced by the mixed expert network architecture MoE (Mixture of Expert), the scene sharing expert network is generated by using the sub-expert integration module SEI (Sub-Expert Integration module, as shown in Figure 2-©).

Specifically, the final output of the scene sharing expert network is GGG , whose formula is:

  1. Scene-aware attention network

As mentioned above, there is a certain degree of correlation between different scenes, so information from other scenes can also contribute to the information representation of the current scene, thereby enhancing the information expression ability of the current scene. Considering that different scenarios contribute differently to each other's representation capabilities, we design a Scenario-aware Attentive Network (SAN) to measure the importance of other scene information's contribution to the current scene information representation. Specifically, SAN consists of two parts of input:

2.2.2 Task Extraction Layer

In order to solve the problem of negative transfer in multi-task learning, in the task extraction layer, we are inspired by the PLE (Progressive Layered Extraction) model and use a custom gating network CGC module.

Custom Gating Network

The custom gating network mainly consists of two parts: a task-sharing expert network and a task-specific expert network. The former is mainly responsible for learning the shared information in all tasks in the current scene, and the latter is used to extract the unique information of each task in the current scene.

2.3 Training Objectives

The final loss function of our proposed HiNet is:

3. Experiment

3.1 Experimental setup

  1. Data Collection : We collected the user log data of six scenarios (scene numbers a to f) in the Meituan Dafan platform as our multi-scenario multi-task training and evaluation dataset, where scenarios a and b are big scenario data set. In contrast, c to f are treated as small scene datasets.

Table 1 Statistics of sample data sets in each scene

  1. Evaluation indicators : We consider the performance evaluation of CTR and CTCVR tasks for each scene separately, and use AUC (Area Under ROC Curve) as the evaluation indicator for multi-scene and multi-task datasets.
  2. Model comparison : In order to fairly compare our proposed HiNet model with the industry's SOTA (State-Of-The-Art) model, we use the same experimental environment and model parameters, and fully tune and optimize each model. Many experiments. The specific comparison models are as follows:

Multi-task learning model :

  • Shared Bottom: The model is a neural network model with hard parameter sharing.
  • MMoE: This method uses a flexible gating network to adjust the expert network representation information, and finally uses the tower unit to fuse all the expert network representation information for each task.
  • PLE: Based on MMoE, the model explicitly divides the expert network into task-sharing experts and task-specific experts, which effectively alleviates the negative transfer problem caused by the "seesaw" phenomenon.

Multi-scenario learning model :

  • HMoE: This method is improved from MMoE, models the predicted values ​​of multiple scenarios, and optimizes the task prediction results for the current scenario.
  • STAR: This method constructs a shared and scene-specific network through a star topology for learning the information representation of the current scene.

It should be pointed out that the above-mentioned models for comparison were originally proposed to simply solve the problem of multi-task learning or multi-scenario learning. In order to achieve a fair experimental comparison, we adaptively extend the relevant comparison model in the experiment to meet the needs of multi-scenario and multi-task modeling.

3.2 Performance comparison

Table 2. Performance comparison of relevant comparison models in all scenarios

Table 2 shows the performance comparison of our models in six scenarios on the Meituan Dafan platform. From the results, it can be seen that our proposed HiNet model outperforms other comparative models in terms of CTR and CTCVR task metrics in all scenarios, which demonstrates the advantages of HiNet in multi-scenario and multi-task modeling.

3.3 Ablation study

To investigate the effect of each key component in the HiNet model, we design two variants of the HiNet model for ablation analysis. details as follows:

  • HiNet (w/o hierarchy): Indicates that the hierarchical structure of information extraction is removed, and the CGC network is directly used for multi-scenario and multi-task learning modeling.
  • HiNet (w/o SAN): Represents the HiNet model after deleting the SAN module in the scene extraction layer.

Table 3 Comparison of ablation experiment results of HiNet model

From the experimental results in Table 3, we can observe that the variant model HiNet (w/o hierarchy) has severe performance degradation on all metrics, which indicates that the hierarchical information extraction architecture can effectively capture the commonalities and differences across scenarios , thereby improving the performance of the model. Similarly, after removing the SAN module in the scene extraction layer, the performance of the variant model HiNet (w/o SAN) also drops significantly in multiple scenarios, which indicates that the weight information learned by the SAN module can effectively enhance the performance of the scene extraction layer. Information representation ability.

3.4 Online A/B Testing

To further verify the online performance of our proposed HiNet model, we deployed the HiNet model in scenarios a and b in Meituan Dafan platform, and conducted a one-month online A/B test with the baseline model.

Table 4 Online A/B testing benefits of scenarios a and b

It can be seen from Table 4 that the HiNet model outperforms the baseline model in CTR and CTCVR metrics in multiple scenarios, and has a significant improvement in order gain, which further demonstrates the effectiveness of our proposed HiNet model. At present, the HiNet model has been fully deployed in the above two businesses and has made certain contributions to the growth of the business.

4. Summary and Outlook

Multi-scenario multi-task modeling is one of the most critical and challenging problems in recommender systems at present. Previous models mainly optimize multiple tasks in different scenarios by projecting all information into the same feature space, which leads to insufficient model performance.

In this paper, we propose Hierarchical Information Extraction Network HiNet model, which utilizes a hierarchical optimization architecture to model multi-scenario multi-task problems. On this basis, we design a scene-aware attention network module SAN at the scene extraction layer to enhance the representation learning ability of scenes. Both offline and online A/B testing experiments verify the superiority of the HiNet model.

It is worth mentioning that there have been a large number of applications of graph neural networks in recommendation models in the industry. Inspired by this, in future work, the Meituan Dafan Algorithm Team will combine the information transfer capability of the graph neural network into the multi-scenario and multi-task learning modeling scheme, continue to practice our method, and further design a more complete model , to solve the complex multi-scenario and multi-task modeling problems that exist in the meituan to meal platform.

About the Author

Zhou Jie, Xianshuai, Wen Hao, Bo Lin, Zhang Kun, etc. are all from the Meituan Daodian/Platform Technology Department.

references

  • [1] P. Li, R. Li, Q. Da, A.-X. Zeng, and L. Zhang, “Improving multi-scenario learning to rank in e-commerce by exploiting task relationships in the label space,” in Proceedings of the 29th ACM International Conference on * Information & Knowledge Management (CIKM), 2020, pp. 2605–2612.
  • [2] X.-R. Sheng, L. Zhao, G. Zhou, X. Ding, B. Dai, Q. Luo, S. Yang, J. Lv, C. Zhang, H. Deng et al., “One model to serve all: Star topology adaptive recommender for multi-domain ctr prediction,” in Proceedings of the 30th * ACM International Conference on Information & Knowledge Management (CIKM), 2021, pp. 4104–4113.
  • [3] J. Ma, Z. Zhao, X. Yi, J. Chen, L. Hong, and E. H. Chi, “Modeling task relationships in multi-task learning with multi-gate mixture-of-experts,” in Proceedings of the 24th ACM SIGKDD international conference on * knowledge discovery & data mining (SIGKDD), 2018, pp. 1930–1939.
  • [4] H. Tang, J. Liu, M. Zhao, and X. Gong, “Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations,” in Proceedings of the 14th ACM Conference on Recommender Systems (RecSys), 2020, pp. 269–278.
  • [5] L. Torrey and J. Shavlik, “Transfer learning,” in Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 2010, pp. 242–264.
  • [6] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010.
  • [7] F. Zhu, Y. Wang, C. Chen, J. Zhou, L. Li, and G. Liu, “Cross-domain recommendation: challenges, progress, and prospects,” in 30th International Joint Conference on Artificial Intelligence (IJCAI). International Joint * Conferences on Artificial Intelligence, 2021, pp. 4721–4728.
  • [8] Y. Zhang and Q. Yang, “A survey on multi-task learning,” IEEE Transactions on Knowledge and Data Engineering, 2021.
  • [9] S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098, 2017.
  • [10] O. Sener and V. Koltun, “Multi-task learning as multi-objective optimization,” in Thirty-second Conference on Neural Information Processing Systems (NeurIPS), 2018.
  • [11] C. Rosenbaum, T. Klinger, and M. Riemer, “Routing networks: Adaptive selection of non-linear functions for multi-task learning,” in International Conference on Learning Representations (ICLR), 2018.
  • [12] J. Wang, S. C. Hoi, P. Zhao, and Z.-Y. Liu, “Online multi-task collaborative filtering for on-the-fly recommender systems,” in Proceedings of the 7th ACM conference on Recommender systems (RecSys), 2013, pp. 237–244.
  • [13] R. Caruana, “Multitask learning,” Machine learning, vol. 28, no. 1, pp. 41–75, 1997.
  • [14] K. Weiss, T. M. Khoshgoftaar, and D. Wang, “A survey of transfer learning,” Journal of Big data, vol. 3, no. 1, pp. 1–40, 2016.
  • [15] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” arXiv preprint arXiv:1701.06538, 2017.
  • [16] D. Eigen, M. Ranzato, and I. Sutskever, “Learning factored representations in a deep mixture of experts,” Computer Science, 2013.
  • [17] M. I. Jordan and R. A. Jacobs, “Hierarchical mixtures of experts and the em algorithm,” Neural computation, vol. 6, no. 2, pp. 181–214, 1994.
  • [18] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural computation, vol. 3, no. 1, pp. 79–87, 1991.
  • [19] S. E. Yuksel, J. N. Wilson, and P. D. Gader, “Twenty years of mixture of experts,” IEEE transactions on neural networks and learning systems, vol. 23, no. 8, pp. 1177–1193, 2012.
  • [20] Y. Zhang, C. Li, I. W. Tsang, H. Xu, L. Duan, H. Yin, W. Li, and J. Shao, “Diverse preference augmentation with multiple domains for cold-start recommendations,” in IEEE International Conference on Data Engineering (ICDE), 2022.

Guess you like

Origin blog.csdn.net/MeituanTech/article/details/129744557