Causal Inference 16--Direct Heterogeneous Causal Learning for Resource Allocation Problems in Marketing (Meituan)

Title: Direct Heterogeneous Causal Learning for Resource Allocation Problems in Marketing

Direct Heterogeneous Causal Learning for Resource Allocation Problems in Marketing

Paper link: https://export.arxiv.org/pdf/2211.15728v2.pdf

Unit: Meituan 

Abstract: Resource allocation is an important decision-making problem in marketing. The two-stage approach of Machine Learning (ML) + Operation Research (OR) is the most common solution to this type of problem. Among them, the ML stage estimates the factors that affect the decision, and the estimated results are input into the OR stage to solve the optimization problem. However, this decoupled scheme design introduces two problems: the model in the ML stage only focuses on the accuracy of parameter estimation, without considering the way the downstream OR stage uses the estimated parameters and the final optimization goal; the OR stage also ignores The error of model estimation in the ML stage is eliminated, and the complex mathematical operations in the optimization algorithm will further amplify the model error. In order to solve the above problems, the paper proposes to establish the connection between ML and OR by analyzing and deriving the OR algorithm, and then introducing decision factors (factors that can directly obtain decision results through simple comparison) . In the ML stage, by designing a custom loss function, the individual decision factors are directly learned, so as to obtain an unbiased estimate and input it to the OR stage, and then perform a sorting or comparison operation on it to obtain the final decision result to avoid the OR stage introduces additional error. The paper applies the above framework to solve the problem of selection of binary marketing actions and budget allocation of multiple marketing actions in marketing. Experiments on public data sets and actual scenarios of takeaway marketing show the effectiveness of the method.

introduce

Marketing is one of the most effective mechanisms to increase user stickiness and platform revenue. Therefore, various marketing activities are widely used on many network platforms. For example, price cuts for perishable products in freshppo are used to boost sales (Hua et al. 2021), coupons in Taobao transactions can stimulate user activity (Zhang et al. 2021), and incentives in Kuaishou video platform can increase user activity . Retention rate (Ai et al. 2022).

Despite increased revenue, marketing campaigns also consume significant marketing resources (eg, budget). Therefore, due to the limited number, only some individuals (such as stores or products) can be assigned marketing treatment. In marketing, such a decision problem can be formulated as a resource allocation problem and has been studied for decades.

Most existing studies adopt a two-stage approach to address these issues (Ai et al. 2022; Zhao et al. 2019; Du, Lee, and Ghaffarizadeh 2019). As shown in Figure 1(a), the first stage is ML, which predicts the (incremental) responses of individuals under different treatments through predictive/uplift models. The second stage is OR, which takes the prediction results in ML as input to the combinatorial optimization algorithm. Therefore, existing works mainly focus on decoupled optimization and combinatorial optimization for predictive/uplift modeling.

Although widely used, the two-phase method suffers from two major drawbacks. The first is to obtain a solution after performing multiple intermediate calculations on the prediction results in ML, such as multi-factor combinations or complex mathematical operations in OR. Therefore, the improvement in the accuracy of the predicted parameters may not have a positive correlation with the final solution. The second is that the error of model prediction is not respected, and complex operations on the prediction results in OR lead to an increase in cumulative error. Due to the existence of accumulated errors, the theoretical optimal algorithm in OR does not always achieve the actual optimal, and in some cases it is even inferior to the heuristic strategy . Therefore, this decoupled optimization for ML and OR cannot be globally optimized for the original problem.

Instead of a two-stage approach, we propose a new approach to the resource allocation problem to alleviate the aforementioned drawbacks . First, we define an algorithm's decision factor as a factor for which a solution can be directly obtained only by sorting or comparing operations. As shown in Fig. 1(b) , we use the decision factors derived from OR as learning objectives for direct heterogeneous causal learning in ML . By this definition, there is no optional mathematical operation on the predicted outcome in OR. Therefore, the ranking performance of a model on the decision factors directly determines the quality of the solution, and improving the model can guarantee better solutions. Specifically, model error can be used to measure ranking performance, respected and not amplified in OR. Therefore, the new challenge is how to identify such a decision factor in OR and how to directly predict it in ML.

Following this line of thought, we examine two key questions in marketing. The first is a binary treatment assignment problem. The conditional average treatment effect (CATE) can be considered as a decision factor when ignoring costs incurred by treatment (the version that does not account for costs). Common boosting models for predicting CATE include meta-learners (Kunzel et al. 2019; Nie and Wager 2021) and causal forests (Wager and Athey 2018; Athey, Tibshirani, and Wager 2019). The former consists of multiple base models, and the latter usually combines generalized random forest (GRF) and dual machine learning (DML) methods. In contrast, we propose a new neural network- based uplift model for direct prediction , which achieves promising results both theoretically and practically. Despite the increased income, treatments also have varying costs. In this cost-conscious version, personal ROI (Return on Investment) can be regarded as a decision-making factor, calculated by dividing incremental revenue and incremental cost. However, most existing causal inference work does not involve treatment costs and cannot be applied to such direct predictions. Although some works (Du, Lee, and Ghaffarizadeh 2019) have studied similar problems, their loss functions cannot converge to stable extreme points theoretically. In this paper, we design a convex loss function to guarantee an unbiased estimate of an individual's ROI when the loss converges.

As a second case study, we apply our method to the budget allocation problem with multiprocessing and propose a new evaluation metric in this paper. Lagrange dual is an effective algorithm to solve the budget allocation problem . However, the algorithm's decision factors contain Lagrangian multipliers, which are non-deterministic and vary widely with budgets. It is difficult and unrealistic to directly predict such decision factors with all possible Lagrangian multipliers. This paper proposes an equivalent algorithm of the Lagrangian dual method in which the determinant is deterministic and independent of the Lagrangian multipliers. In addition, a corresponding causal learning model is established in this paper. When the custom loss function converges, a direct prediction of the decision factor can be obtained. Finally, we also propose a new evaluation metric, MT-AUCC, to estimate prediction results, which is similar to the Area Under Uplift Curve (AUUC) (Rzepakowski and Jaroszewicz 2010), but involves multiple treatments and augmentations. volume cost.

Large-scale simulations and online A/B testing validate the effectiveness of our approach. In the offline simulations, we used two real datasets collected from randomized controlled trials (RCTs) of online advertising/food delivery platforms. Multiple evaluation metrics and online AB testing show that our model and algorithm achieve significant improvements, with an average increase in target reward of more than 10% compared to the state-of-the-art.

related work

two-stage approach

The combination of machine learning (ML) and operations research (OR) is one of the most commonly used approaches to solve resource allocation problems, which this paper calls a two-stage approach. In the first stage, uplift models were designed to predict the incremental responses of individuals under different treatments. In addition to meta-learners (Kunzel et al. 2019; Nie and Wager 2021) and causal forests (Wager and Athey 2018; Athey, Tibshirani, and Wager 2019; Zhao, Fang, and Simchi-Levi 2017; Ai et al., 2022) , representation learning (Johansson, Shalit, and Sontag 2016; Shalit, Johnson, and Sontag 2017; Yao et al. 2018) was also used for uplift modeling. Several works (Betlei, Diemert, and Amini 2021; Kuusisto et al. 2014) propose a unified learning framework to rank CATE. As one of the most efficient algorithms, Lagrangian duality is often used in the second stage to solve decision problems in many different domains. For example, it was developed to solve the budget allocation problem in marketing (Du, Lee, and Ghaffarizadeh 2019; Ai et al. 2022; Zhao et al. 2019), and to calculate the optimal bidding strategy in online advertising (Hao et al. al. 2020).

direct learning method

Policy learning and reinforcement learning are two important methods to directly learn the treatment assignment policy instead of the treatment effect, avoiding the combination of ML and OR. Based on the double robust estimator (Athey and Wager 2021), a general framework for policy learning using observational data is proposed, and its work is extended to multi-action policy learning (Zhou, Athey, and Wager 2022). As a real-world application, works (Xiao et al. 2019; Zhang et al. 2021) formulate the coupon assignment problem in sequential incentive marketing as a constrained Markov decision process, and propose reinforcement learning to solve the problem. However, all of the above methods move the resource constraint into the reward function by using Lagrangian multipliers. Therefore, the model may need to keep changing as the Lagrangian multipliers change.

Decision-Oriented Learning (DFL)

Similar to our motivation, DFL focuses on learning model parameters in terms of downstream optimization tasks, rather than prediction accuracy. However, many existing works on DFL require that the feasible regions of decision variables be fixed and known deterministically (Wilder, Dilkina, and Tambe 2019; Elmachtoub and Grigas 2022; Shah et al., 2022; Mandi et al. 2022). Probably the most relevant work to ours is that of Donti, Amos, and Kolter 2017, where they solve a stochastic optimization problem involving probabilistic and deterministic constraints. However, this work assumes that the decision variables are continuous and handles the probability constraints via Lagrangian duality, which is significantly different from our study.

binary processing assignment problem

Treatment allocation without cost considerations

Cost Conscious Treatment Allocation Issues

Budget allocation issues for multiple treatments

Evaluate

While AUUC (area under the uplift curve) and AUCC (area under the cost curve) (Du, Lee, and Ghaffarizadeh 2019) have been developed to evaluate the ranking performance of uplift models without/with processing costs, respectively, they have not been used to estimate the different Deal with the evaluation index of marginal utility (' ij ). The latter is directly related to the business objectives of MTBAP. To this end, this paper proposes a new evaluation index MT-AUCC (Area Under Cost Curve for Multiple treatment)

evaluate 

offline simulation

in conclusion

This paper proposes a solution method of resource allocation problem based on decision factors. Using this as a learning objective avoids the need to do substitute math on the predicted results. This idea solves two key problems in marketing, and has great advantages in theory and practice. Large-scale offline simulation and online AB test verify the effectiveness of the method. Our future work will focus on the application of this method to more complex marketing scenarios. For example, multiple marketing campaigns may be running simultaneously and interacting with each other. Therefore, it is more challenging to derive decision factors and conduct direct heterogeneous causal learning in this case

Guess you like

Origin blog.csdn.net/as472780551/article/details/130094924