An End-to-End Framework for Marketing Effect Optimization under Budget Constraints (Kuaishou)

Table of contents

Translation: An end-to-end framework for marketing effectiveness optimization under budget constraints

1 Introduction

2. Related work

2.1 Budget Allocation

2.2 Gradient Estimation

3 Our end-to-end framework

3.1 Symbols

3.2 Two-stage paradigm

3.3 Marketing Goal as a Regularizer

 3.4 Gradient Estimation

3.5 training

4 Experimental results

4.1 Dataset

4.2 Evaluation indicators

4.3 Implementation Details

4.4 Results on Synthetic Datasets

4.5 Experimental results of CRITEO-UPLIFT v2

5 Conclusion


Title: An End-to-End Framework for Marketing Effectiveness Optimization under Budget Constraint

Translation: An end-to-end framework for marketing effectiveness optimization under budget constraints

Unit: Kuaishou

Paper link: https://export.arxiv.org/pdf/2302.04477v1.pdf

Code: None 

Abstract: Online platforms often incentivize consumers to increase user engagement and platform revenue. Since different consumers may respond differently to incentives, budget allocation at the individual level is an important task in marketing campaigns. Recent advances in this field typically use a two-stage paradigm to solve the budget allocation problem: the first stage uses causal inference algorithms to estimate individual-level treatment effects, and the second stage uses integer programming techniques to find optimal budget allocation solutions. Since the goals of these two stages may not be completely consistent, such a two-stage model may damage the overall marketing effect. In this paper, we propose a new end-to-end framework to directly optimize business objectives under budget constraints. Our core idea is to build a regularizer to represent marketing objectives and optimize it efficiently using gradient estimation techniques. Hence, the obtained model can learn to maximize marketing objectives directly and precisely. We extensively evaluate our proposed method in offline and online experiments, and the experimental results show that our method outperforms the current state-of-the-art methods. Our proposed method is currently used for marketing budget allocation for hundreds of millions of users on short video platforms, and has achieved significant improvement in business objectives. Our code will be public.

Keywords: Marketing, Causal Learning, Integer Programming, Neural Networks, Gradient Estimation

1 Introduction

Offering incentives (e.g., cash rewards, discounts, coupons) to consumers is an effective way for online platforms to acquire new users, increase user engagement, and improve platform revenue [2,3,9,20,21,32,37,38, 40,41]. For example, coupons are provided in Taobao [37] to increase user activity, promotions are provided in Booking [9] to increase user satisfaction, cash rewards are used in Kuaishou [2] to stimulate user retention, and in Uber [40] Use promotions in to encourage users to start using a new product. Although effective, these marketing campaigns may incur high costs, so the total budget is usually limited in practical scenarios. From an online platform perspective, the goal of a marketing campaign is often described as maximizing a specific business objective (e.g., user retention) within a specific budget constraint. To this end, assigning appropriate incentives to different users is crucial to optimize marketing effects, since different users respond differently to incentives. This budget allocation problem has major practical implications and has been studied for decades.

Many recent advances use a two-stage paradigm to address the budget allocation problem [2, 3, 32, 38]: in the first stage, individual-level heterogeneous treatment effects are estimated using causal inference techniques, and the estimated effects are then input as coefficients into the integer programming formula to find the optimal assignment. However, the goals of the two phases are fundamentally different, and in practice, we observe that these goals are not fully aligned. For example, if the total budget is very low, then an efficient allocation algorithm should only allocate incentives to a small fraction of users who are quite sensitive to incentives. In this case, if the model has higher accuracy on average for all users, but performs poorly for this small subset of users, the improved predictor of treatment effect may have a worse marketing effect.

In this paper, we propose a novel end-to-end approach to directly and precisely optimize marketing performance under budget constraints. Our core idea is to build a regularizer to represent the marketing target and further optimize it using gradient estimation techniques. Specifically, we use deep neural network (DNN) based S-Learner [19] to predict the treatment effect of different users. Considering the estimated treatment effect, we further formulate the budget allocation problem as a multiple-choice knapsack problem (MCKP) [16], and use the expected outcome measure (EOM) to establish an unbiased estimate (denoted by ?) of the marketing objective (i.e. expected business objectives within certain budgetary constraints). The ? function expresses that the effect of marketing is exactly what we want to maximize, so how do we treat ? and integrate it into the training objective of the s-learner model. However, there is still a challenge in doing so: since our construction process ? involves solving a series of MCKPs using Lagrangian multipliers and thus involves many non-differentiable operations, modern machine learning libraries such as TensorFlow [1] and PyTorch [24] ) will not give the correct gradient. Our solution to this problem is to treat λ as a black-box function and utilize gradient estimation techniques such as finite difference method or natural evolution strategy (NES) [34] to obtain the gradient. Therefore, the regularizer can be effectively jointly optimized with the original learning objective of the S-leaner model. This regularizer endows the S-Learner model with the ability to learn directly from the marketing objective and further maximize it in a principled manner. Our proposed method has been extensively evaluated in offline simulations and online experiments. Experimental results demonstrate that our proposed end-to-end training framework can achieve better marketing effects compared to the current state-of-the-art. We further deployed the proposed method on a large-scale short video platform, distributing cash rewards to users to motivate them to create more videos, and online a/B tests showed that our method outperforms The baseline method significantly outperforms video creators by at least 1.24%, which is a huge improvement over several months. Currently, our proposed method serves hundreds of millions of users.

2. Related work

2.1 Budget Allocation

Starting from a common goal, many budget allocation methods have been proposed. Early methods [10, 39, 40] usually heuristically choose the user's incentive amount after obtaining heterogeneous treatment effect estimates. However, the lack of an explicit formulation of the optimization problem may make these methods less effective in maximizing the marketing objective.

A popular way to allocate budget follows a two-stage paradigm [2, 3, 9, 21, 32, 38]: in the first stage, an uplift model is used to predict the treatment effect, and in the second stage, integer programming is invoked to find the optimal distribute. For example, Zhao et al. [38] proposed to use the logit response model [26] to predict the treatment effect, and then obtain the optimal configuration under KKT conditions through root-seeking. Subsequently, Tu et al. introduced various high-level estimators (including causal trees [4], causal forests [33] and meta-learners [19, 29]) to estimate heterogeneity effects. The second phase has also received much research attention. Makhijani et al. For a better expression, [21] formulated the marketing objective as a minimum cost flow network optimization problem. More recently, Ai et al. [2], Albert and Goldenberg [3] used MCKP to represent the budget allocation problem in the discrete case and developed an efficient solution method based on Lagrangian duality. While effective, the solution of these two-stage approaches may not be optimal due to the inconsistency of the goals of the two stages.

Several approaches to directly learn optimal budget allocation policies have also been proposed [8, 9, 35, 37, 41, 42]. Xiao et al. [35], Zhang et al. [37] developed reinforcement learning solutions. Direct learning of optimal policies based on constrained Markov decision processes. However, learning such a complex policy in a pure model-free black-box approach that does not exploit causal structure may result in sample inefficiency. Du et al. bbbb10, Goldenberg et al. [9], Zou et al. [42] propose to directly learn the value-to-cost ratio in a binary treatment setting, where users with higher scores can be treated first. Later, contemporary research [41] proved that the loss proposed in [8, 42] cannot achieve the correct rank when the loss converges. Furthermore, the method proposed in [9] is only suitable for the case where the budget constraint is monetary ROI ≥ 0, which is not flexible enough for many marketing activities on online Internet platforms. Zhou et al. extended the idea of ​​decision-centered learning to multiprocessing settings and developed a loss function to learn the decision factors of MCKP solutions. However, the proposed method relies on the assumption of diminishing marginal utility, which is often not strictly adhered to in practice.

2.2 Gradient Estimation

Estimating the gradient of a black-box function has been extensively studied for decades and has been widely used in the field of reinforcement learning [30] and adversarial attack [11, 13]. The simplest approach is the finite-difference strategy, which computes gradients by definition. Finite-difference strategies can give exact gradients, but can be computationally intensive if the arguments of the black-box function have high dimensions. Another approach to gradient estimation is Natural Evolution Strategies (NES) [34], which generates several search directions to probe the local geometry of black-box functions and estimates gradients from the sampled search directions. NES is generally more computationally efficient than finite difference strategies, but the estimated gradients are noisy. Backsampling [28] can reduce the variance of NES gradient estimates. In this paper, we propose finite-difference-based and nes-based gradient estimation algorithms.

3 Our end-to-end framework

Many budget allocation methods follow a two-stage paradigm, which can lead to performance degradation because of the inconsistency between the objectives of the two stages. We believe it is beneficial to properly incorporate the learning objective into the second-stage based assignment process so that the model can directly learn to maximize the marketing objective. In this section, we first introduce the main notations, then briefly analyze the two-stage paradigm, and finally we propose our end-to-end training framework.

3.1 Symbols

 3.2 Two-stage paradigm

At the same time predict response, cost.

 loss function

 

3.3 Marketing Goal as a Regularizer

As described in Section 1, the core idea of ​​our end-to-end training is to build a regularizer to represent the marketing objective (i.e., the expected response under a certain budget constraint), and optimize it efficiently using gradient estimation techniques.

Assuming we have a batch of RCT users from the training set B (we set the batch size to B = 10,000), the S-Learner model can predict the responses of each user, and we will stack their predictions. Response into a ? × ? matrix? To simplify notation, the cost of applying arbitrary processing to these users can also be stacked into a B × K matrix v in a similar fashion. To simplify notation, the costs of applying arbitrary processing to these users can also be stacked to the B × K matrix ? in a similar fashion.

 

 As mentioned above, the x-optimal budget allocation scheme obtained is that the user is under a certain budget constraint, and we can adjust the total budget by adjusting the value of the dual feasible solution. Therefore, we can start from a = 0 and gradually increase its value, then the total budget of MCKP will decrease accordingly.

 3.4 Gradient Estimation

 

 3.5 Training

4 Experimental results

4.1 Dataset

 Synthetic datasets. To detail the key components of our idea (i.e., EOM and gradient estimation), we generate a synthetic dataset consisting of 10,000 RCT users.

CRITEO-UPLIFT v2[7]. This publicly available dataset is designed for evaluating uplift modeling models. The dataset contains 13.9 million samples collected from randomized controlled trials. Each sample has 12 dense features, a binary processing indicator, and two binary labels: access and conversion. In this dataset, treatment was defined as whether a user was targeted by an ad, and the label was defined as positive if the user visited/converted the advertiser's website during the two-week test period. To evaluate different budget allocation methods, we follow [41] and use visit/conversion labels as cost/value respectively. We compare the performance of our proposed method with the state-of-the-art CRITEO-UPLIFT v2 to demonstrate the effectiveness of our method.

KUAISHOU-PRODUCE-COIN. Produce Coin is a common marketing campaign to incentivize short video creators to upload more short videos on Kuaishou. In this event, we will provide a precious task for every short video creator on the Kuaishou platform. Specifically, in each task, if the creator uploads the video within 24 hours, he/she will get a certain amount of coins as a reward. Each creator can see the number of coins he/she may get, and if this number is attractive enough, the creator may complete the task.

4.2 Evaluation indicators

We use the following evaluation metrics to evaluate and compare the performance of different methods:

AUCC (area under the cost curve), the unit is b[8]. Existing literature [2,8,41] commonly used AUCC to evaluate the ranking performance of the uplift model under two treatment conditions. Interested readers can check out the original paper [8] for more details about the AUCC evaluation process. In this paper, we use AUCC to compare the performance of different methods in CRITEO-UPLIFT v2.

EOM (Expected Outcome Measure). EOM or similar metrics are also commonly used in the existing literature [2,41] to empirically estimate the expected outcome (eg, per capita response or per capita cost) of an arbitrary budget allocation policy. EOM can make unbiased estimates of any outcome, as long as the budget allocation solution is given on the RCT data, the result is computable. Therefore, EOM is more flexible than AUCC in practice. In this paper, we evaluate different budget allocation methods using the synthetic dataset and the EOM in the Kuaishou-produce-coin dataset. The technical details of EOM have been introduced in Eq. (5) and Eq. (6) in Section 3.3.

4.3 Implementation Details

The implementation details of the different datasets are as follows:

4.4 Results on Synthetic Datasets

 

4.5 Experimental results of CRITEO-UPLIFT v2

The results of the CRITEO-UPLIFT v2 trial are summarized in Table 1. OursFD and Ours-NES respectively adopt finite difference strategy/NES as the method of gradient estimator. We find that both our-fd and our-nes significantly outperform all competing methods in the AUCC sense. Our -fd has the highest AUCC variance over 20 runs, a phenomenon that may be partly explained by the fact that the NES gradient is noisy. Although our-fd outperforms our-nes, our-fd runs 4 times longer on an NVIDIA Tesla T4 GPU (our-fd 17.6 hours vs. our-nes 4.3 hours), since we need to evaluate ? Function called multiple times.

5 Conclusion

This paper investigates the application of an end-to-end training framework to the budget allocation problem . We formulate the business objective under a certain budget constraint as a black-box regularizer, and develop two efficient gradient estimation algorithms to optimize it. For both gradient estimation algorithms, we propose a hyperparameter that trades off computational complexity and accuracy of estimated gradients. Our proposed method should endow well-trained DNN models with the ability to directly learn and further maximize from marketing objectives. Extensive experiments on three datasets show that our method excels in both offline simulations and online experiments. Our future work will focus on reducing the variance of gradient estimates and applying this method to more complex real-world marketing scenarios. 

Guess you like

Origin blog.csdn.net/as472780551/article/details/130329506