Causal inference 3--DRNet (personal notes)

Table of contents

Learning Counterfactual Representations for Estimating Individual Dose-Response Curves

1 Introduction

2 related work

3 methods

4 experiments

5 Results and Discussion

6 Conclusion

7 understanding


paper title

Learning Counterfactual Representations for Estimating Individual Dose-Response Curves

Recording sessions:

AAAI 2020

Paper link:

https://arxiv.org/abs/1902.00981

Code link:

https://github.com/d909b/drnet

Learning to estimate counterfactual representations of individual dose-response curves

Abstract: Estimating an individual's potential response to varying degrees of therapeutic exposure is of high practical interest for several important fields, including healthcare, economics, and public policy. However, existing learning methods for estimating counterfactual outcomes from observational data either focus on estimating mean dose-response curves or are limited to only two treatments without associated dose parameters. Here, we propose a novel machine learning method for learning counterfactual representations for estimating individual dose-response curves for any number of treatments with continuous dose parameters using neural networks. Building on the established framework of potential outcomes, we introduce performance metrics, model selection criteria, model architecture, and open benchmarks for estimating individual dose-response curves. Our experiments show that the method developed in this work sets a new state-of-the-art approach in estimating individual dose responses.

1 Introduction

Estimating dose-response curves from observational data is an important problem in many fields. In medicine, for example, we are interested in using data from people who have been treated in the past to predict which treatments and associated doses will lead to better outcomes for new patients. At its core, this question is a counterfactual one, that is, we are interested in predicting what would happen if we gave a patient a particular treatment at a particular dose under given circumstances. Answering such counterfactual questions is a challenging task that requires further hypotheses about the underlying data-generating process, or prospective intervention experiments such as randomized controlled trials [2–4]. However, performing prospective experiments is expensive, time-consuming, and, in many cases, ethically unjustified. Estimating counterfactual outcomes from observational data alone has two difficulties [6,7]: first, we only observe factual outcomes and never counterfactual outcomes that might have occurred if we had chosen a different treatment. In medicine, for example, we only look at the results of giving a patient a certain dose of a particular treatment, but we never look at what happens if a patient is given a potential alternative treatment or a different dose of the same treatment. Second, in observational data, treatments are often not randomly assigned. In a medical setting, doctors consider a range of factors when choosing treatment options, such as a patient's expected response to treatment. Because of this treatment allocation bias, the treated population may differ significantly from the general population. A supervised model trained to minimize factual errors would overfit to the attributes of the treated group and therefore fail to generalize to the entire population.

To address these issues, we introduce a novel approach to training neural networks for counterfactual inference, extending to any number of treatments with a continuous dose parameter. To control for treatment assignment bias in the observational data. We adapt the method to various regularization schemes originally developed for discrete processing settings, such as distribution matching [8,9], propensity descent (PD) [10] and balanced score matching [7,11,12]. In addition, we devise performance metrics, model selection criteria, and open benchmarks for estimating individual dose-response curves. Our experiments show that the method developed in this work creates a new state of the art in inferring individual dose-response curves. The source code for this work is available at https://github.com/d909b/drnet.Contributions.

contribute. We make the following contributions:

  • We introduce a new method for training neural networks for counterfactual inference that, in contrast to existing methods, is applicable to estimating counterfactual outcomes for any number of treatment regimens with associated exposure parameters.
  • We develop performance metrics, model selection criteria, model architectures, and open benchmarks for estimating individual dose-response curves.
  • We extend state-of-the-art counterfactual inference methods to a multiparametric treatment-protocol setting with two non-parametric treatments.
  • We perform extensive experiments and show that our method is state-of-the-art for inferring individual dose-response curves from observational data on several challenging datasets.

2 related work

Background: Causal analysis of treatment effects through rigorous experiments is an essential tool for validating interventions in many fields. In medicine, prospective experiments, such as randomized controlled trials, are the de facto gold standard to assess whether a given treatment is effective for a specific indication in a population [13,14]. However, performing prospective experiments is expensive, time-consuming, and often impossible for ethical reasons. Thus, historically, there has been great interest in developing methods for causal inference using readily available observational data [3,11,15–19]. Naïve methods of training supervised models to minimize observed factual errors are generally not suitable for counterfactual inference tasks due to treatment assignment bias and the inability to observe counterfactual outcomes. To address the shortcomings of unsupervised and supervised learning in this context, several adaptations of existing machine learning methods have recently been proposed, aiming at estimating counterfactual outcomes from observed data [6-10,20-22] . In this work, we build on these advances to develop a machine learning approach for estimating individual dose responses using neural networks.

Individual Treatment Effect Evaluation (ITE). The matching method [12] is one of the most widely used methods for causal inference from observational data. The matching method estimates a sample of counterfactual outcomes X t treated using the observed factual outcomes to obtain t. The curse of nearest neighbor propensity score matching (PSM) [11] combats the dimensionality of matching covariates directly in X rather than matching the scalar probability p(t|X) of receiving treatment t against X; another class of methods uses adjustment After the regression model, get covariates X and treatment as input. The simplest model is ordinary least squares (OLS), which can use a single model for all treatments or a separate model for each treatment [23]. More complex models based on neural networks, such as Treatment Agnostic Representation Networks (TARNETs), can be used to construct nonlinear regression models [9]. Estimators that combine a form of adjusted regression with exposure models in a way that makes them robust to any kind of error specification are called dual robust [24]. In addition to OLS and neural networks, tree-based estimators such as Bayesian Additive Regression Trees (BART) [25, 26] and Causal Forests (CF) [20], and distributional modeling methods such as causal multi-task Gaussian Process (CMGP) [21], Causal Effect Variational Autoencoders (CEVAEs) [22] and Generative Adversarial Nets (GANITE) [6] for Personalized Treatment Effect Inference, have also been proposed for ITE estimation Other methods, Such as Balanced Neural Networks (BNNs) [8] and Counterfactual Regression Networks (CFRNET) [9], attempt to achieve a balanced covariate distribution across treatment groups by explicitly minimizing treatment groups using metrics such as the Wasserstein distance [28] The empirical difference distance between . Most of the work mentioned above has focused on the simplest setup, with two available treatment regimens and no associated dosage parameters. A notable exception is the generalized propensity score (GPS) [1], which extends propensity scoring to sequential doses of therapy.

In contrast to existing methods, we propose the first machine learning approach to learn to estimate a single dose-response curve for multiple available treatments by obtaining continuous dose parameters from observational data via a neural network. We also extend several known regularization schemes for counterfactual inference to account for treatment assignment bias in observational data. To facilitate future research in this important area, we introduce performance metrics, model selection criteria, and open benchmarks. We believe this work is particularly important for precision medicine applications, as current state-of-the-art techniques for estimating average dose responses across populations do not take into account individual differences, although large differences in dose responses between individuals for many diseases are well documented [29 -31].

3 methods

Problem Statement: We consider a setting where we are given a sample of N observations X with preconditioning covariates xi and xi and i ∈ [0 . . p − 1]. For each sample, the potential outcome yn,t(st) is the response of the nth sample to treatment t in the set of k available treatment options st = {0,...,k−1} with a dose st∈{st∈R , at > 0|, at ≤ s ≤ bt}, where at and bt are the minimum and maximum doses of treatment t, respectively. A treatment set T can have two or more available treatment options. As training data, we receive fact samples X and their observations yn,f(sf), after applying a specific observation treatment f, dose sf. Using training data with actual outcomes, we wish to train a predictive model to produce accurate estimates of the potential outcomes of all available treatment options t over the entire range of s.

Assumptions: Following [1,33], we assume no confounding, which consists of three key components: (1) Conditional independence assumption: Given the pre-treatment covariate X, the assignment to treatment t is related to the outcome yt unrelated; (2) common support assumption: for all values ​​of X, it must be possible to observe all treatments with a probability greater than 0; (3) stable unit treatment value assumption: any observation of one unit must be unaffected by assignment to other units effects of treatment. Furthermore, we assume smoothness, i.e. units with similar covariates xi have similar outcomes y, both for model training and selection. 

Metrics: To enable meaningful comparisons of models in the described setting, we used metrics covering several desirable aspects of trained models used to estimate individual dose-response curves. Our proposed metrics are respectively designed to measure the ability of predictive models to (1) recover dose-response curves across the entire range of dose values, (2) identify optimal dose points for each treatment, and (3) derive optimal treatment policies holistic approach, including selecting the correct treatment and dosing point for each case. To measure how well the model covers the entire range of a single dose-response curve, we use the difference between the true and predicted dose-response y estimated by the model over N samples, all treatments T, and the entire range of doses s [at, bt]. The mean integral squared error3 (MISE) between

Model Architecture: Model architecture plays an important role in representation learning for counterfactual inference in neural networks [7, 9, 35]. A particularly challenging aspect of training neural networks for counterfactual inference is that the effect of dealing with the indicator variable t may be lost in high-dimensional hidden representations [9]. To address the problem of setting two available treatments without a dose parameter, Shalit et al. [9] proposed the TARNET architecture, which uses a shared base network and separate head networks for the two treatments. In TARNETs, ​​the head network is trained only on samples that received corresponding processing. Schwab et al. [7] extended the TARNET architecture to multiple treatment settings by using k separate head networks, one head network per treatment protocol. In the setting of multiple treatment options with associated dose parameters, this problem is further complicated because we must preserve not only the influence of t on the hidden representations throughout the network, but also the influence of the continuous dose parameter s. To ensure the influence of t and s on the hidden representation, we propose a hierarchical structure for multiple treatments, called dose-response network (DRNet, Figure 1). DRNets assign A header to ensure that dose parameters s retain their influence, these dose layers subdivide the range [at, bt] of potential dose parameters. The hyperparameter E defines the trade-off between computational performance and the resolution (b−a)E over which to divide the range of dose values . T

To further weaken the influence of the dose parameter s in the head layer, we repeatedly add s on each hidden layer in the head layer. We motivate the proposed hierarchy with the effectiveness of regression and comparison methods for counterfactual inference [23], where separate estimators are constructed for each available treatment option. There is data sparsity in separate models for each treatment because only units that received each treatment can be used to train the model for each treatment, and each treatment may not have a large number of samples. DRNets alleviate the data sparsity problem by being able to share information across the dose range through the treatment layer and across the treatment range through the base layer.

Model selection: Given multiple models, deciding which model performs better on counterfactual tasks is non-trivial, since we often do not have access to the true dose-response to compute the error metrics given above. Therefore, we use nearest-neighbor approximation to MISE to perform model selection using ground truth data not used for training. We calculated the nearest neighbor approximation NN-MISE of MISE

Figure 1: Dose-response network (DRNet) architecture with a shared base layer, k intermediate treatment layers, and k*E heads for multiple treatment settings with associated dose parameters. Shared base layers are trained on all samples, while treatment layers are trained only on samples from their respective treatment classes. Each treatment layer is further subdivided into E head layers ( only the set of E = 3 head layers for t = 0 treatment is shown above ). Each head layer is assigned a dose layer that subdivides the range of potential doses [at, bt] into Epartitions of equal width (b - a)/E. Each head layer predicts the outcome yt(s) over a range of values ​​of the dose parameter s, and is trained only on samples belonging to the respective dose layer. The hierarchical structure of DRNets enables them to share a common hidden representation among all samples (base layer), processing options (treatment layer) and dose layers (head layer), while maintaining the influence of t and s on the hidden layer. 

Regularization scheme: DRNets can be combined with the developed regularization scheme to further address treatment assignment bias. To determine the utility of various regularization schemes, we evaluate DRNets using distribution matching [9], propensity descent [10], whole dataset matching [12] and batch-level matching [7]. We naïvely extended these regularization schemes, since none of these methods were originally developed for the dose-response setting (Appendix A).

4 experiments

Our experiments aim to answer the following questions:

1 How does the performance of our proposed method compare to current state-of-the-art methods for estimating individual dose responses?

2 How do different E choices affect the performance of counterfactual reasoning?

3 How does an increase in treatment assignment bias affect the performance of a dose-response estimator?

Using real data, we conduct experiments on three semi-synthetic datasets with two or more processing schemes to better understand the empirical properties of our proposed method. To cover a wide range of settings, we selected datasets with different outcome and treatment assignment functions, and different numbers of samples, features, and treatments (Table 1). All three datasets are randomly split into training (63%), validation (27%) and test (10%).

Models: We evaluate DRNet, ablation, baseline and all related state-of-the-art methods: nearest neighbor (kNN) [12], BART [25,26], CF [20], GANITE [6], TARNET [9] and GPS [1], using the “causaldrf” package [40]. The entire training set is preprocessed by matching across treatment group distributions (+Wasserstein)[9], PD (+PD)[10], batch matching (+PM)[7], and using the PM algorithm (+PSMPM)[7] Step [41], we evaluate which regularization strategy for learning counterfactual representations is the most effective. To determine whether the DRNet architecture is more effective than its alternatives in learning counterfactual inference representations, we also evaluate (1) a multi-layer perceptron (MLP) that receives as additional inputs a treatment index t and a dose s, and (2) multiple treatments (TARNET) [7, 8] that receive the dose s as an additional input, while keeping all other hyperparameters beside the architecture the same. As the last ablation of DRNet, we test whether appending the dose parameter to each hidden layer of the head network works (repeatedly) by training DRNet receiving the dose parameter only once in the first hidden layer of the head network. We naïvely extended CF, GANITE, and BART by adding dose as an additional input covariate, since they were not designed for dose treatment. 

5 Results and Discussion

Counterfactual Inference: To evaluate the relative performance of various methods in a wide range of settings, we compared the MISE counterfactual inference of the listed models on the News-2/4/8/16, MVICU, and TCGA benchmarks (Table 2; On benchmarks, we found that DRNets outperformed all existing state-of-the-art methods in terms of MISE. We also found that DRNets with an additional regularization strategy outperformed all existing methods on News-2, News-4, News-8 and News-16 outperforms vanilla DRNets. However, on MVICU and TCGA, drnet with additional regularization performed similarly to standard drnet. Wasserstein regularization (+ Wasserstein) between treatment groups and batch Matching (+PM) is generally slightly more effective than PSMPM and PD. Also, dose parameters are not repeated for each layer in each dose range header (-Repeat) of DRNet than in News-2, News-4 and News-8 The additional dose parameter performs worse. Finally, the results show that DRNet achieves a large improvement over the TARNET and MLP baselines on all datasets - a testament to the hierarchical dose subdivision introduced by DRNet.

 

 Treatment allocation bias. To evaluate the robustness of DRNet and existing methods to increasing levels of treatment assignment bias in observed data, we compared DRNet with TARNET, MLP with different treatment assignment bias κ ∈ [5, 20] on the test set of News-2 and GPS performance (Fig. 3). We find that DRNet outperforms existing methods across the range of treatment assignment bias evaluated .

6 Conclusion

 We propose a deep learning approach based on observational data to learn to estimate individual dose responses to multiple treatments, using continuous dose parameters. We extend some existing regularization strategies to any number of treatment regimens with associated dose parameters and combine them with our method to account for treatment assignment bias inherent in observational data. Additionally, we introduce performance metrics, model selection criteria, model architectures, and new open benchmarks for this setting. Our experiments show that model structure is critical in learning neural representations for counterfactual inference of dose-response curves from observational data, and that there is a trade-off between model resolution and computational performance in DRNets. DR Nets significantly outperform existing state-of-the-art methods in inferring individual dose-response curves on multiple benchmarks.

7 understanding

This paper proposes new metrics, new datasets, and training strategies that allow estimation of outcomes for any number of treatments.

Setting: This article considers that there are multiple scenarios for treatment, that is, if it is a doctor-patient scenario, each treatment may correspond to a dosage of medication. The training goal is to give an estimated value for any one within the range of each treatment, so for an individual at this time, the causal effect is displayed as a curve, which is a function of treatment.

 To be added:

  1. DRNet source code implementation
  2. VCNet implementation
  3. Research on multitasking learning
  4. Applications of multitasking in causal inference

reference:

  1. Many articles will look at the past and present of individual causal inference (ITE)_PaperWeekly Blog-CSDN Blog

Many articles will look at the past and present of individual causal inference (ITE)_PaperWeekly Blog-CSDN Blog

    3. [Statistical methods for causal inference] Review and personal understanding

Guess you like

Origin blog.csdn.net/as472780551/article/details/128391360