92124572625Stat Methods Med ResStat Methods Med ResStatistical methods in medical research0962-28021477-033430381005651147710.1177/0962280218808817HHSPA1012864ArticleCausal inference with multiple concurrent medications: A comparison of methods and an application in multidrug-resistant tuberculosisSiddiqueArman Alam1SchnitzerMireille Ehttp://orcid.org/0000-0001-8049-96462BahamyirouAsmahttp://orcid.org/0000-0001-5334-27152WangGuanbo3HoltzTimothy H4MiglioriGiovanni B5SotgiuGiovanni6GandhiNeel R7VargasMario H89MenziesDick1011BenedettiAndrea31011Department of Statistics, McMaster University, Hamilton, CanadaFaculty of Pharmacy, Université de Montréal, Montreal, CanadaDepartment of Epidemiology, Biostatistics & Occupational Health, McGill University, Montreal, CanadaDivision of Global HIV and TB, Centers for Disease Control and Prevention, New Delhi, IndiaWorld Health Organization Collaborating Centre for Tuberculosis and Lung Diseases, Fondazione S. Maugeri, Tradate, ItalyClinical Epidemiology and Medical Statistics Unit, Department of Medical, Surgical and Experimental Sciences, University of Sassari, Sassari, ItalyRollins School of Public Health and Emory School of Medicine, Emory University, Atlanta, USADepartamento de Investigación en Hiperreactividad Bronquial, Instituto Nacional de Enfermedades Respiratorias, Mexico City, MexicoUnidad de Investigación Médica en Enfermedades Respiratorias, Instituto Mexicano del Seguro Social, Mexico City, MexicoRespiratory Epidemiology and Clinical Research Institute, McGill University Health Centre, Montreal, CanadaDepartment of Medicine, McGill University, Montreal, CanadaCorresponding author: Mireille E. Schnitzer, Faculty of Pharmacy, Université de Montréal, Montreal, Québec H3C3J7, Canada. mireille.schnitzer@umontreal.ca17420193110201812201901122019281235343549
This paper investigates different approaches for causal estimation under multiple concurrent medications. Our parameter of interest is the marginal mean counterfactual outcome under different combinations of medications. We explore parametric and non-parametric methods to estimate the generalized propensity score. We then apply three causal estimation approaches (inverse probability of treatment weighting, propensity score adjustment, and targeted maximum likelihood estimation) to estimate the causal parameter of interest. Focusing on the estimation of the expected outcome under the most prevalent regimens, we compare the results obtained using these methods in a simulation study with four potentially concurrent medications. We perform a second simulation study in which some combinations of medications may occur rarely or not occur at all in the dataset. Finally, we apply the methods explored to contrast the probability of patient treatment success for the most prevalent regimens of antimicrobial agents for patients with multidrug-resistant pulmonary tuberculosis.
Causal inferenceconcurrent medicationsgeneralized propensity scoremachine learningmultidrug-resistant tuberculosistargeted maximum likelihood estimationIntroduction
Polypharmacy is the intake of multiple medications, potentially more than medically necessary, at the same time. Apart from the increased costs for multiple medications, the degradation of quality of life, the possibility of interactions between those medications, and adverse drug reactions,^{1} make polypharmacy an important area of research.
The concurrent usage of multiple medications is necessary for some diseases. Multidrug-resistant tuberculosis (MDR-TB), with almost 500000 new cases in 2016^{2} and a 45% mortality rate worldwide,^{3} is defined as a disease caused by strains of Mycobacterium tuberculosis that are resistant to at least the two most effective drugs, isoniazid and rifampin, used to treat tuberculosis. Patients with MDR-TB are treated with multiple alternative antimicrobial agents in order to cure the infection and prevent further drug resistance (or to prevent the selection of drug resistant strains of M. tuberculosis). Current guidelines recommend the simultaneous usage of five or more antimicrobial agents depending on the therapeutic phase and drug resistance pattern.^{4} A systematic review published in 2012 identified international studies that investigated associations between different treatments and treatment outcomes of MDR-TB.^{5} The combination of individual patient data from these studies is currently the greatest resource for evaluating medication effectiveness in MDR-TB. However, with patients taking as many as 7 antimicrobial agents concurrently,^{5} and the data containing 15 different antimicrobial agents overall, the analysis presents a challenge for the application of causal inference methods.
Many causal estimation techniques for binary treatments use the propensity score, defined as the probability of receiving one of the two treatment options. In the case where multiple treatments are available, Imbens^{6} extended this framework by defining the generalized propensity score (GPS) as the probability of receiving a specific treatment. Imbens,^{6} Imai and Van Dyk,^{7} and Lopez and Gutman^{8} developed various techniques reliant on the GPS for the estimation of causal effects. Further, McCaffrey et al.^{9} proposed using generalized boosted models for the estimation of the GPS for multiple treatments.
In this paper, we explore methods to estimate the relative effects of taking multiple medications. The previous methods cited above primarily estimated the effects of continuous (such as medication dose) or low-dimensional categorical treatment options. In contrast, we are interested in the setting where patients may take more than one medication of interest concurrently, resulting in a potentially large number of possible drug combinations, many of which may not be observed in the data.
In order to approach this problem, we take the exposure to be a categorical variable of regimens, where regimen refers to a specific combination of medications (perhaps taken over a pre-specified period). We then employ various machine learning algorithms for the estimation of the GPS. We provide short introductions for these machine learning algorithms along with several causal estimation procedures in Section 2. We present a simulation study in Section 3 in order to compare the appropriateness of each method. In Section 4, we present an application of these methods for the MDR-TB data in which we provide estimates of the expected rates of treatment success (with outcome defined by the World Health Organization^{4}) for the 10 most prevalent regimens in the combined dataset of Ahuja et al.^{5}
Methods
In order to estimate the causal effects of multiple medications, we propose to estimate the GPS, defined as the probability of taking a specific regimen conditional on covariates. To this end, we investigate the usage of different machine learning algorithms for the GPS. Further, in order to estimate the causal contrasts, we employ Inverse Probability of Treatment Weighting (IPTW),^{10} Propensity Score Adjustment^{11} and Targeted Maximum Likelihood Estimation (TMLE),^{12,13} all of which use the GPS. We also investigate G-Computation,^{14} which exclusively uses a model for the outcome conditional on medications taken and covariates in order to estimate an effect of interest.
General notation
The observed data O_{i} include a vector of covariates, X_{i} = {X_{ij};j = 1, …, J}, and a univariate outcome, Y_{i} where i = 1, …, n indexes the set of subjects. We consider a fixed set of K potential medications that all patients in the study are hypothetically eligible for. For any patient i, the binary variable Aik indicates exposure to medication k ϵ {1, …, K}. We define Ci=(Ai1,…,AiK) as the set of treatments being taken by patient i. We denote R_{i} as a categorical variable corresponding to the observed regimen for patient i, represented by the combination of treatments C_{i}. For each individual, R_{i} corresponds to one of the 2^{K} different possible regimens. We denote a specific fixed regimen as r and the corresponding vector of binary elements as c^{r}. We also define Bir as an indicator for the regimen r, i.e. if patient i took regimen r, then Bir=1. Clearly, C_{i}, R_{i}, and Bir contain the same information, but we require these definitions in order to describe the proposed models. We drop the i subscript when referring to a random draw of a variable from the population.
The goal of the analysis is to estimate E(Yr), which is also equivalent to E(Ycr), where Yir or Yicr represents the potential outcome of subject i had they received an intervention corresponding with a treatment regimen r. We may then contrast different regimens by comparing their respective estimated values of E(Yr). In MDR-TB example, the binary outcome is defined as treatment success (the treatment was completed and cured the infection) versus failure (patient still tested culture positive for MDR-TB, died, or defaulted on treatment/were lost to follow-up). The goal of the application was therefore taken to be the estimation of the probability of treatment success under a given regimen of antimicrobial agents. The regimens with the higher probabilities of treatment success may then be interpreted as having greater effectiveness than those with a lower probability.
Estimation of the generalized propensity score
The propensity score^{15} is defined as the probability of receiving a treatment conditional on covariates. When dealing with a binary treatment where C ϵ {0,1}, the propensity score can be mathematically expressed as
g(X)=Pr(C=1|X)
With multiple treatments, the propensity score was extended to the GPS^{6} defined as
g(r,X)=Pr(R=r|X)=Pr(C=cr|X)
the probability of receiving a given regimen r. We use multi-class classification, with classes corresponding to regimens, in order to estimate the GPS. Multi-class classification is the fitting of models for different classes in the dataset where the classes are mutually exclusive. In this section, we provide basic descriptions of support vector machines, softmax regression (i.e. multinomial regression), and generalized boosted models, which we later use to estimate the GPS.
Support vector machines
Support Vector Machines (SVMs) (Hastie et al.,^{16} Chapter 12), a supervised learning approach, have been proposed as a method for multi-class classification and have been identified as one the most important research topics in the field of machine learning.^{17} Computationally efficient, SVMs use hyperplanes to delineate a particular class by identifying the most influential observations in the determination of the boundaries between the classes. These observations are also known as the support vectors. The main aim of SVMs is to find a maximum margin hyperplane, where margin corresponds to the distance between the hyperplane and closest elements on either side of the hyperplane.
For the pairwise classification of two different regimens, say r_{1} and r_{2}, Soft-Margin SVMs^{18} construct a hyperplane {X; f(X) = w^{T}X + b = 0}, with the constraint {I(R_{i} = r_{1}) – I(R_{i} = r_{2})}(w^{T}X_{i} + b) ≥ 1 −ζ_{i}, for all i = 1, …, n where the ζ_{i} ≥ 0 are called “slack variables” and I(·) is the indicator function. If ζ_{i} = 0 for all i = 1, …, n, this would imply that the hyperplane would be able to perfectly separate and classify the data. The slack variables therefore allow for misclassification.
The parameters w, b and ζ_{i} are estimated by minimizing a loss function F(w, b, ζ) over w and b subject to the above constraints. This loss function is given by
F(w,b,ζ)=‖w‖22+C∑i=1nζi
where C is a constant which maintains the trade-off between the training error and the margins (a smaller C allows for a smoother boundary f(X)). The function F(w, b, ζ) is minimized using optimization methods with Lagrangian multipliers.^{16}
We apply the default settings of the function svm in the e1071R package^{19} for the implementation of SVMs. In particular, this function uses One-Vs-One classification^{20} (i.e. constructs boundaries for each pair of classes separately, and the final classification for each observation is determined by which class is most frequently selected), sets C=1, and applies a non-linear basis expansion with a radial kernel (Hastie et al.,^{16} Section 12.3). Finally, the probability of class membership (following a given regimen r) is estimated by fitting a logistic regression of R=r on the boundary values f(X) computed for each pairwise comparison.^{21,22}
Softmax regression
Softmax regression,^{23} a common classification method, is equivalent to multinomial logistic regression. We restrict the probability for a patient to be treated with regimen r as
Pr(Ri=r|Xi,Φ)=exp(ϕrTXi)∑l=12Kexp(ϕlTXi)
The model parameters ϕr∈ℝj+1, r ϵ {1, … 2^{K}}, with J corresponding to the number of covariates present in the model, are stacked together to form Φ, a matrix of dimension 2^{K} × (J + 1) with entries Φ_{r,j}. The parameter matrix Φ is then estimated by minimizing the loss function L(Φ) (corresponding to the negative quasi log-likelihood), which is given by
L(Φ)=∑i=1n∑r=12KI(Ri=r)logexp(ϕrTXi)∑l=12Kexp(ϕlTXi)
For implementation, we use the softmaxreg package^{24} in R.
Generalized boosting
Generalized Boosted Models (GBMs) (Hastie et al.,^{16} Chapter 10) are machine learning algorithms that build up an additive model using multiple classification trees. Classification trees (Hastie et al.,^{16} Chapter 9) create a piecewise model for a treatment by learning which sequential splits in the covariates most improve prediction of the treatment. Boosting generates a sequence of trees while upweighting the observations that were misclassified by the previous trees. Finally, the predictions from the individual trees are combined using an error-weighted majority vote.
Implementations of GBMs have been proposed to estimate the GPS for multiple treatments. To prevent overfitting, one needs to identify the total number of trees to use. McCaffrey et al.^{9} propose to select the number of trees by comparing the values of the covariates in the GPS-weighted treatment group versus the entire sample. A good “balance” means that covariate distributions are similar between these groups. The number of trees can be chosen by satisfying a criterion such as the Absolute Standard Bias (ASB), which compares the standardized difference in covariate means between groups, or the Kolmogovov–Smirnov (KS) Statistic, which compares the empirical distributions. In addition to the number of trees, the tuning parameters include a shrinkage term (learning rate) for the GBM, the minimum number of observations in the trees’ terminal nodes, and the depth of interactions (indicating the maximum number of splits the algorithm performs on a tree after the initial split) included in the model, all of which are important in order to properly smooth the model. We estimate the GPS for each regimen separately using the twang package^{25} in R.
Causal estimation methods
After obtaining the GPS, we aim to estimate E(Yr), where Y^{r} is the potential outcome of an arbitrary patient under regimen r. In order to obtain an estimate of E(Yr), one may choose from various causal estimation methods, several of which we describe in this section. Causal estimation methods adjust for the confounders (roughly, those pre-treatment variables X that are related to both treatment regimen and Y) in order to produce estimates of the marginal parameter E(Yr). These causal estimation methods rely on several assumptions,^{6} including 1) positivity: the probability of receiving any regimen r conditional on the confounders, X, should be a non-zero quantity for all subjects; 2) consistency: for any patient i taking regimen R_{i} =r, the counterfactual outcome for patient i under r is the observed outcome of the patient; and 3) conditional exchangeability: the observed covariates should be sufficient to satisfy conditional independence between the regimens and the potential outcomes. Since we have 2^{K} different regimens, some of which may not at all be observed in the data, the assumption of positivity is very likely to fail (either empirically or theoretically) for some regimens. This would imply that without additional extrapolation, we would not be able to estimate E(Yr) for those regimens. In the following, we only estimate the parameter of interest for prevalent regimens.
G-Computation
G-computation is a causal estimation method proposed by Robins^{14} that can be used for the estimation of E(Yr). The algorithm for G-Computation^{26} is as follows:
G-Computation for E(Y^{r})
1:
Fit an outcome model for E(Y|R,X) using the available data, defined as Q(R, X). We then compute predictions of the conditional expectations under the regimen r for every subject. In our context, one may use a model Q^{(a)}(R, X) that is conditional on the regimens directly (i.e. subsetting on B^{r} = 1 or taking the indicators B^{r} as covariates) or an alternative Q^{(b)}(C, X) that is conditional on the medications (taking the A^{k} as covariates).
2:
For each observation, predict the value of Qn(r,Xi)=En(Y|R=r,Xi) using the above obtained model where a subscript n denotes an estimate of the quantity.
3:
The G-computation estimate of E(Yr) is thus given by
ψn,G−compr=1n∑i=1nQn(r,Xi).
The unbiasedness of G-computation relies on the correct specification of the outcome model.
Inverse probability of treatment weighting
IPTW^{10} is an approach for the estimation of E(Yr) using the propensity score. The algorithm for performing IPTW is as follows:
IPTW for E(Y^{r})
1:
Estimate the GPS for each regimen, g_{n}(r,X_{i}) = Pr_{n}(R = r|X_{i}).
2:
Obtain the weight w_{n}(r,X_{i}) = I(R_{i} = r)g_{n}(r,X_{i}) for each observation, which is only non-zero for subjects who took the regimen of interest, r.
3:
Run a linear regression model of Y on an intercept, with weights w_{n}(r,X).
The resulting estimate of the intercept is our IPTW estimate, ψn,IPTWr. The consistency of IPTW relies on the correct specification of the propensity score model. In order to calculate the variance of the resulting estimate, we use the sandwich package^{27} in R, which is used for calculating robust variance estimates (that take into account the uncertainty in the propensity score). One could alternatively use the non-parametric bootstrap to estimate the variance, but this may be excessively time-consuming when the GPS is estimated with a machine learning method.
Propensity score adjustment
Propensity Score Adjustment (PSA) is a causal estimation method that relies on the specification of the propensity score model in addition to a model for the outcome, conditional on the propensity score and treatment. The propensity score is a balancing statistic, that is, given the propensity score, the potential outcome is conditionally independent of the treatment.^{11} For a single binary treatment C ∈ {0, 1}, one might use the following model
E(Y|C,g(X))=θ0+θ1C+θ2g(X)
where θ_{1} can also be written as θ1=E(Y|C=1,g(X))−E(Y|C=0,g(X)). The estimate θ^1 is obtained using least squares and is an unbiased estimate of E(Y|C=1,g(X))−E(Y|C=0,g(X))=E(Y1−Y0) if the propensity score and the outcome regression model are correctly specified and if the causal assumptions hold. However, if the expected outcome is not linearly dependent on the propensity score or if the propensity score model is incorrectly specified, then the ordinary least squares estimate of θ_{1} is a biased estimator of E(Y1−Y0).^{11} If a non-linear model is used, the above result may not be applicable, since θ_{1} for this case might correspond with a conditional parameter (and estimation would therefore be biased for the marginal contrast between the potential outcomes).
This method of estimation can also be extended to the case with multiple treatments.^{6} For our setting, we propose the following algorithm.
Propensity Score Adjustment for E(Y^{r})
1:
Fit a model Q^{(1)}(R, g(r,X) (conditional on the regimen indicators, B^{r}) or Q^{(2)}(C,X) (conditional on the treatment indicators, A^{k}) for E(Y|R,g(r,X)).
2:
Using the model fit, obtain predictions
Qn(1)(r,gn(r,X))=En(Y|Br=1,gn(r,X))orQn(2)(cr,gn(r,X))=En(Y|C=cr,gn(r,X))
3:
The estimates of E(Yr) are then given as
ψn,PSA(I)r=1n∑i=1nQn(1)(r,gn(r,Xi)),andψn,PSA(II)r=1n∑i=1nQn(2)(cr,gn(r,Xi))
Targeted maximum likelihood estimation
TMLE^{13} is a semi-parametric estimation technique that produces doubly robust and locally efficient plug-in estimators. In our situation, TMLE invokes a two-step process that first produces estimates of the conditional expectation of the outcome under a fixed regimen (as in the first step in G-Computation) and then updates these initial estimates.^{28} The update procedure uses the propensity score and is designed to reduce the bias in the estimate of the causal parameter of interest. An algorithm for the computation of TMLE for the multiple medication case with target parameter E(Yr) is described below.
Targeted Minimum Loss-Based Estimation for E(Y^{r})
1:
First, fit an outcome model and generate estimates of the conditional expectation under the fixed regimen r, denoted Q_{n}(r,X). We may use Qn(a)(r,X) or Qn(b)(cr,X) as described in Section 2.3.1.
2:
Define weights w_{n}(r,X)=I(R = r)g_{n}(r,X).
3:
Regress Y on 1 with offset logit{Q_{n}(r,X)} and weights w_{n}(r,X). Denote the estimate of the intercept term by ϵ^.
4:
Compute the updated estimate, Qn*(r,X), which is given by
logit(Qn*(r,X))=logit(Qn(r,X))+ϵ^
5:
The TMLE estimate for E(Yr) is then given by
ψn,TMLEr=1n∑i=1nQn*(r,Xi)
The double robustness property of this TMLE means that, unlike the propensity score adjustment method, the TMLE is a consistent estimator if either E(Y|R=r,X) or g(r, X) is consistently estimated. For the approximation of the estimation standard error, one may use the efficient influence function (EIF),^{29} corresponding to the firstorder expansion of the estimator
EIFr(Q,g)(O)=(Y−Q(r,X))I(R=r)g(r,X)+Q(r,X)−ψTMLEr
In large samples, the variance of the estimator will correspond to the sample variance of the estimated EIF. Therefore, the 95% confidence interval for ψn,TMLEr can be estimated by ψn,TMLEr±1.96(σn,TMLEr)2/n, where (σn,TMLEr)2 denotes the sample variance of EIF^{r}(Q_{n},g_{n})(O_{i}).
Simulation study
In order to evaluate the appropriateness of the above causal estimators paired with each GPS method, we contrast their performance in a Monte Carlo simulation study. We first describe the data-generating mechanisms. We estimate the expected counterfactual outcomes under the most prominent regimens. We compare the performance of several implementations of G-computation and then of each causal estimator that uses the GPS. In the Supplementary Materials, we perform a second simulation study with a larger number of medications, often leading to more regimens than subjects in the sample. For this second scenario, we evaluate a data subsetting method that can greatly reduce computational time.
Data generation
Full details of the data generation are given in Section 1 of the Supplementary Materials.
We independently generate 12 baseline variables X_{ij},j = 1, …, 12 from a standard uniform distribution, i.e. X_{ij} ~ U(0, 1). We also generate four dichotomous treatment indicators, Aik, k = 1, 2, 3, 4, conditional on a subset of the baseline variables. In addition, A^{1} and A^{2} are generated as positively correlated as are A^{3} and A^{4}, and all other treatments pairs are independent. Specifically, a patient is more likely to take medication 1 if they are also taking medication 2 (and vice versa), and similarly for medications 3 and 4. A binary outcome Y_{i} is generated using a logistic model conditional on the X_{ij}s and Aiks with first-order interactions (including treatment–treatment, covariate–covariate, and covariate–treatment interactions). As subjects can take up to four medications, there are 2^{4} = 16 possible regimens. The two most likely regimens (on average) are regimen 1 (1,1,0,0) and regimen 2 (1,1,1,1) and are defined as the regimens of interest. The true propensity score Pr(A^{1} = a_{1},A^{2} = a_{2}, A^{3} =a_{3},A^{4} = a_{4} | X) in this case can be factorized as
Pr(A1=a1,A2=a2,A3=a3,A4=a4|X)=Pr(A1=a1,A2=a2|X)Pr(A3=a3,A4=a4|X)
The true values of E(Yr) are 0.61 and 0.57 for regimens 1 and 2, respectively.
Comparison of outcome regression models
Since propensity score adjustment and TMLE both use a model for the outcome, we first evaluate the performance of six implementations of G-Computation to see whether each outcome model produces biased effects of E(Y^{r}). We fit the following outcome models with logistic regressions: 1) by regimen, subsetting on B^{r} = 1 for each r of interest, and 2) by treatment, adjusting for the treatment indicators A^{k}, k = 1,2,3,4 in the regression. For the latter case, we fit the outcome models without interactions (taking the main terms of A^{k} only) and then with first-order interactions between the A^{k}. We apply these three approaches to G-Computation both with and without adjustment for the baseline covariates as main terms.
We generated 1000 datasets of sample sizes n=500 and n=1000, respectively. Table 1 gives the mean estimates and Monte Carlo standard errors for each implementation. For regimen 1, the G-computation estimate had little bias when adjusting by regimen or by treatment with first-order interactions, regardless of the adjustment for X_{ij} as main terms. However, it was substantially biased when fit with treatment main terms only, regardless of adjustment for X_{ij}. For regimen 2, the G-computation estimate was unbiased when adjusting by regimen or by treatment with first-order interactions but only when also adjusting for confounding by X_{ij}. It was biased when not adjusting for confounding and when the treatment interactions were not included. The standard error was lower for the larger sample size but the bias remained steady.
Comparison of methods
The implementations of causal estimators that are evaluated in this section are:
IPTW, using a weighted linear regression model (Section 2.3.2);
PSA(I), propensity score adjustment with a logistic regression to estimate Q^{(1)} conditional on regimen (Section 2.3.3);
PSA(II), propensity score adjustment with a logistic regression to estimate Q^{(2)} conditional on treatments as main terms (Section 2.3.3);
TMLE(I), using a logistic regression to model the outcome conditional on regimen and baseline covariates, i.e. Q^{(a)} (Section 2.3.4);
TMLE(II), using a logistic regression to model the outcome conditional on treatments and baseline covariates, i.e. Q^{(b)} (Section 2.3.4).
The GPS for each regimen of interest was estimated using the three approaches in Section 2.2. When fitting GBMs for each regimen R_{i}, we chose values of the tuning parameters that optimized the balance between the pretreatment covariates in R_{i} and the pooled sample of all the other regimens for five simulated datasets using the plots function in twang. The maximum number of iterations in the Softmax regression was set to 100 with the default learning rate of 0.05 and the tuning parameters for SVMs were similarly assigned the default values.
We drew 1000 samples of sizes n=500 and n=1000, respectively. Table 2 gives the mean estimates and Monte Carlo standard errors for the top two occurring regimens in our simulated data. The numbers of subjects exposed to each of these regimens varied by sample and are given in Section 3 of the Supplementary Materials. TMLE performed well when implemented with SVMs, Softmax regression, and GBMs. IPTW and PSA(I) performed well with Softmax regression but were biased with SVMs and GBMs for the second regimen, likely due to the suboptimal convergence rate of these nonparametric GPS methods.^{30} The estimates of PSA(I) and IPTW with SVMs and GBMs appeared to slowly approach the true values with larger sample sizes (results not shown) though some bias still existed at n = 10000. PSA(II) performed poorly throughout, due to the incorrect specification of the outcome model when conditional on the treatments only as main terms, and did not converge with larger sample sizes. Note that PSA(II) performed similarly to the closely related adjusted G-Computation with treatment main terms. For the second regimen, TMLE(I) was essentially unbiased but often had more variance than IPTW and TMLE(II).
We conducted a second simulation study with eight dichotomous treatment variables and a sample size of n=500. In our simulated data, out of the 256 possible regimens, roughly 150 different regimens occurred in each dataset. Some of these regimens were only followed by several subjects, making the corresponding GPSs difficult to estimate. We tested whether removing the observations corresponding to the 20 and 30% least supported regimens affected the causal estimation. Specifically, we did not use these observations in the GPS model fitting but kept them in for the other causal estimation steps. We found that, out of a total of 500 observations, this resulted on average in the removal of only 30 and 45 observations, respectively, reduced the computational time, and did not change the quality of the estimation. We present the full description and the results of this simulation study in the Supplementary Materials Section 2.
Application of the above methods to the MDR-TB data
The Collaborative Group for Meta-Analysis of Individual Patient Data in Multidrug-Resistant Tuberculosis (IPD-MDRTB)^{5} assembled individual patient data on treatment outcomes from 31 observational studies comprising 9290 individual pulmonary MDR-TB patients. This dataset contains information on the antimicrobial agents used, the baseline covariates (summarized in Table 3), and clinical outcomes. Patients were observed to take 15 different antimicrobial agents in various combinations. We refer to these different sets of medications as regimens and present the 10 most prevalent regimens used in the first row of Table 3. Notably, the most common regimens included five or more different antimicrobial agents, while 207 subjects did not take any antimicrobial agent. The antimicrobial agents in the ten most observed regimens are ethambutol (EMB), ethionamide (ETH), ofloxacin (OFX), pyrazinamide (Z), kanamycin (KM), cycloserine (CS), capreomycin (CM), para-aminosalicylic acid (PAS), prothionamide (PTO), streptomycin (SM), and rifabutin (RBT).
A binary outcome was defined as either treatment success (the treatment was completed and cured the infection) or failure (patient still tested culture positive for MDR-TB, died, or defaulted on treatment/were lost to follow-up). After removing the 2.77% of subjects with a missing outcome and the 0.34% with missing baseline information, we were left with a sample size of n = 9001 observations taking 1626 different regimens. The covariate age was divided into six categories (0–24, 25–33, 34–42, 43–52, 53–63, 64–) approximately corresponding to age sextiles and the year of study (defined as the final year of patient treatment) was treated as categorical with 14 values. As observed in Table 3, there are differences across the regimen groups in terms of all covariates. This is evidence of indication bias as medication regimens may be differentially assigned across countries, time periods, and patient disease characteristics.
The objective of this data analysis is to compare the results of the different methodological approaches for the estimation of E(Yr). We do this for the 10 most prevalent regimens in the dataset, corresponding to the first ten regimens in Table 3. The parameter E(Yr) can be interpreted as the proportion of the study population that would have had a successful recovery had all the patients been treated with regimen r. Therefore, larger values of this parameter indicate which regimens may be more beneficially applied on a large scale. Ethics approval was obtained for the reanalysis of this data through the Ethics in Health Research Committee at Université de Montréal (certificate number 17–111-CERES-D).
In order to estimate the GPS with SVMs and Softmax Regression, we removed all of the subjects with regimens only supported by one or two subjects (1420 subjects). The models were fit using the 7581 remaining observations. The GPS was then predicted for the entire population of n = 9001 patients conditional on the covariates in Table 3 and indicators for missing values. GBMs were run using the twang package and we selected the combinations of interaction depth, n.minobsinnode (minimum observations in each node), and shrinkage parameters that produced the best balance using the KS statistic as explained in McCaffrey and others.^{9} After obtaining the GPS with these methods, we proceeded with the causal estimation procedures described in Section 3.3 for the estimation of E(Yr).
Tables 4 and 5 present the estimates of E(Yr) obtained for the 10 most frequent regimens. No closed-form approximation of the standard error is available for the multi-treatment version of PSA, and given that the machine learning methods were very computationally intensive, numerical methods like bootstrapping weren’t feasible for our implementation. Therefore, the confidence intervals for this method were omitted. The logistic regression outcome model used in TMLE(I) overfit the data (causing the update step to fail) and therefore a LASSO penalty was added to the outcome model with penalty parameter chosen using cross validation with the R package glmnet.^{31} We used empirical summaries of the weights and GPS (Supplementary Materials Sections 5 and 6) to evaluate whether the positivity assumption may be nearly violated for some subjects. No truncation of the GPS^{32} was used for the results presented, though we conducted a sensitivity analysis where 20% truncation was used to remove the smallest values of the GPS. Numerical results of the sensitivity analyses are presented in the Supplementary Materials Section 6 and discussed below.
The point estimates of E(Yr) and the confidence intervals in Tables 4 and 5 often varied depending on which method was used to estimate the GPS. The point estimates also sometimes disagreed between causal inference methods using the same GPS vector (e.g. regimens 1 (OFX-KM-Z-EMB-ETH) and 5 (OFX-SM-PTO-CS-PAS)) and to a lesser extent between GPS methods using the same causal inference method. None of the GPS methods consistently produced narrow confidence intervals for TMLE or IPTW. However, TMLE was often found to have narrower confidence intervals than IPTW. GPS truncation resulted in at most small changes in the point estimates though very small values of the GPS were observed, suggesting possible near-positivity violations.
Table 6 presents the top 5 most beneficial regimens based on the estimates of E(Yr). Regimens 2 (OFX-KM-Z-ETH-CS) and 8 (OFX-CM-Z-ETH-CS-PAS) were often classified in the top 2 and were in the top 5 of all methods except for PSA(II) with SVMs and GBMs. Regimens 3 (OFX-KM-PTO-CS-PAS), 7 (OFX-KM-Z-ETH), and 10 (OFX-KM-Z-EMB-ETH-CS) were also often ranked in the top 5. This would suggest the superior effectiveness of these treatment combinations among the regimens investigated.
World Health Organization (WHO) guidelines^{4,33} suggest that MDR-TB regimens include a fluoroquinolone (such as OFX) and an injectable agent (such as KM, SM or CM). No treatment (included as a benchmark despite questionable clinical interest) and regimen 4 (Z-EMB-RBT) performed the worst overall and follow neither of these guidelines. Regimen 9 (OFX-PTO-CS-PAS), which also performed poorly, also lacked an injectable agent.
WHO guidelines also point to the importance of the number of drugs in the regimen, suggesting five or more that have certain or almost certain effectiveness.^{4} Previous studies have suggested that a majority of MDR-TB patients are resistant to EMB and Z in many settings.^{34,35} When excluding EMB and Z, of the regimens evaluated here, regimens 3, 5, and 8 had five remaining drugs, though only regimens 3 and 8 were found to be among the most effective. Regimen 5 was identical to regimen 3 except that it replaced KM by SM, for which resistance is also commonly seen among MDR-TB isolates. Regimens 2, 9, and 10 had four remaining drugs. Regimens 2 and 10 both included an injectable (KM) and were identical except that regimen 10 also included EMB. Interestingly, regimen 2 was found to perform the best among the regimens evaluated while 10 was found to be less effective. These results point to the potential importance of the inclusion of KM in a regimen. While we estimated the expected mean of the potential outcome under 10 regimens, future applications may use marginal structural models (modeling of the expected potential outcomes conditional on treatments) and a broader range of regimens to estimate the contributions of each individual treatment and treatment interaction on the outcome. In the discussion, we point out some limitations of the simplified analysis in the current paper, which limits the interpretability of the results.
Discussion
In this paper, we investigated the causal estimation of multiple concurrent medications as motivated by the clinical question of how best to treat patients with MDR-TB. The topic of polypharmacy (resulting in potential overmedication and dangerous medication interactions) is gaining in importance in the medical literature. In particular, polypharmacy is highly prevalent in the elderly (ages ≥ 65), an important and growing population^{36}, leading to potential adverse drug reactions.^{37} For example, multiple cardiovascular medications, taken by more than 50% of elderly people, have been shown to be associated with an increased risk of acute kidney disorders.^{38} Given the toxicity of second-line anti-tuberculosis drugs, the analysis of polypharmacy is particularly relevant for treating MDR-TB cases.
In order to address estimation in this challenging scenario, we defined a treatment “regimen” as each unique combination of medications and used three methods to estimate the GPS, or the probability of receiving a specific regimen. One weakness of this GPS approach is that it does not directly allow for information to be shared between different regimens that contain one or more of the same medications. In a Monte Carlo simulation study, we showed that missing treatment interactions in the outcome model can lead to bias in the estimation of both PSA and G-Computation. In real world applications, it might therefore be difficult to correctly specify these models. However, due to its double robustness property, TMLE was found to produce unbiased point estimates even when the outcome model was incorrectly specified. Further investigations could involve the implementation of TMLE with a non-parametric method used for the outcome model as well, which might add additional robustness to the estimation.^{39}
In the application, we estimated the probability of treatment success for the 10 most prevalent medication combinations in the MDR-TB dataset. We chose to estimate the most prevalent medications because they may be of greatest clinical interest and also have the greatest amount of data support (i.e. number of patients following the regimens) which allowed for better estimation. An interesting question for future research would involve empirically identifying which regimens have sufficient data support. One may also integrate existing methods to data-adaptively select covariates in the GPS for a given regimen.^{40}
The different methods often agreed on the preferred MDR-TB regimens but produced sometimes differing estimates of probabilities of treatment success. Closed-form confidence intervals are not available for PSA with multiple regimens and we were unable to use a numerical approach to approximate them given the computational complexity of the GPS estimation. Previous investigations of this data source^{5} used regression analyses to estimate the associations between each treatment and outcome separately, ignoring other treatments. Associations between the number of treatments and duration of treatment were also investigated. In contrast to the previous approach, our general approach considers the joint effect of treatments. TMLE also has the advantage of being doubly robust and therefore consistent when either the GPS or the outcome model is correctly specified. Since the dataset consists of the fusion of multiple observational studies, a more appropriate application of these methods would formally consider the heterogeneity between studies in the point estimation (e.g. using a random effects outcome model by study) and standard error estimation^{41} and account for selection bias as different populations were observed to take different regimens of antimicrobial agents. Our analysis also did not consider known drug resistance in the analysis, which may affect treatment decisions and outcomes, nor did we address the extrapolation required to synthesize evidence when certain regimens are only observed in select time periods. Ongoing analyses more appropriately address these issues and strong clinical conclusions about medication or regimen effectiveness are beyond the scope of this article.
Because of the large number of regimens, the GPS model may sometimes predict very small probabilities for some regimens. This creates well-known stability problems for methods that weight by the inverse of the GPS. We addressed this problem by using formulations of IPTW and TMLE that use the inverse GPS as a weight in a regression. Alternative approaches (results not shown) were sometimes highly biased in the simulation study. The robustness of the regression approach is likely due to the dampening of the residuals in the weighted regression step. TMLE and IPTW often benefit from GPS truncation as a bias-variance trade-off and data-adaptive approaches have been recently proposed.^{42} However, small values of the GPS may also indicate true positivity violations and the nonexistence of the parameter of interest. Very small values of the GPS could be investigated to identify patients who were truly ineligible for a given treatment due to clinical or demographic features. Related to the simplifications mentioned above, we did not consider this possibility.
An alternative approach that we considered but did not take in this paper (that addresses the mentioned limitation) involves treating the regimens not as categorical, but as a multivariate binary variable, with each component indicating whether a subject was on that specific medication. One could then attempt to use multivariate regression modeling^{43} for the GPS that allows for some correlation between the treatments. Optimal Classifier Chains^{44} or simpler regression approaches that otherwise allow for dependencies between the usage of different treatments are potential approaches.
It is clear from the medical literature that the estimation of the effects of multiple concurrent medications is an important topic but standard methods are limited. Given the complexity of the problem, we hope that this paper encourages additional focus on these methodological issues.
Supplementary MaterialAcknowledgements
The data and context were provided by the Collaborative Group for Meta-Analysis of Individual Patient Data in Multidrug-Resistant Tuberculosis. The authors gratefully acknowledge Matthew Cefalu’s (Rand Corporation) recommendations for the implementation of twang.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Canadian Institutes of Health Research (CIHR) (project grant 378067 to MES and AB). MES is also funded by CIHR (New Investigators Salary Award) and the National Sciences and Engineering Council of Canada (Discovery Grant with Accelerator Supplement). NRG is funded in part by the National Institutes of Health (NIH) (K24 award, K24AI114444). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The corresponding author had full access to all data in the study and had final responsibility for the decision to submit for publication. The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the funding agencies.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material is available for this article online.
ReferencesMaherRL, HanlonJ and HajjarER. Clinical consequences of polypharmacy in elderly. World Health Organization. MillardJ, Ugarte-GilC and MooreDA. Multidrug resistant tuberculosis. World Health Organization. AhujaSD, AshkinD, AvendanoM, Multidrug resistant pulmonary tuberculosis treatment regimens and patient outcomes: an individual patient data meta-analysis of 9,153 patients. ImbensGW. The role of the propensity score in estimating dose-response functions. ImaiK and Van DykDA. Causal inference with general treatment regimes: generalizing the propensity score. LopezMJ and GutmanR. Estimation of causal effects with multiple treatments: a review and new ideas. McCaffreyDF, GriffinBA, AlmirallD, A tutorial on propensity score estimation for multiple treatments using generalized boosted models. HorvitzDG and ThompsonDJ. A generalization of sampling without replacement from a finite universe. VansteelandtS and DanielRM. On regression adjustment for the propensity score. ScharfsteinDO, RotnitzkyA and RobinsJM. Adjusting for nonignorable dropout using semiparametric nonresponsemodels (with discussion and rejoinder). Van der LaanMJ and RubinD. Targeted maximum likelihood learning. RobinsJA new approach to causal inference in mortality studies with a sustained exposure periodapplication to control of the healthy worker survivor effect. RosenbaumPR and RubinDB. The central role of the propensity score in observational studies for causal effects. HastieT, TibshiraniR and FriedmanJ. AhujaY and YadavSK. Multiclass classification and support vector machine. CortesC and VapnikV. Support-vector networks. MeyerD, DimitriadouE, HornikK, HsuCW and LinCJ. A comparison of methods for multiclass support vector machines. KaratzoglouA, MeyerD and HornikK. Support vector machines in R. WuTF, LinCJ and WengRC. Probability estimates for multi-class classification by pairwise coupling. HosmerDWJr, LemeshowS and SturdivantRX. Applied logistic regressionDingXRidgewayG, McCaffreyD, MorralA, SnowdenJM, RoseS and MortimerKM. Implementation of G-computation on a simulated data set: demonstration of acausal inference technique. ZeileisAObject-oriented computation of sandwich estimators. Van der LaanMJ and RoseS. TsiatisAKennedyEH. Semiparametric theory and empirical processes in causal inferenceFriedmanJ, HastieT and TibshiraniR. Regularization paths for generalized linear models via coordinate descent. ColeSR and HernánMA. Constructing inverse probability weights for marginal structural models. World Health Organization. MunirS, MahmoodN, ShahidS, Molecular detection of isoniazid, rifampin and ethambutol resistance to M. tuberculosis and M. bovis in multidrug resistant tuberculosis (MDR-TB) patients in Pakistan. AllanaS, ShashkinaE, MathemaB, pncA gene mutations associated with pyrazinamide resistance in drug-resistant tuberculosis, South Africa and Georgia. US Department of Health and Human Services. A profile of older Americans: 2016. QatoDM, WilderJ, SchummLP, Changes in prescription and over-the-counter medication and dietary supplement use among older adults in the United States, 2005 vs 2011. ChaoCT, TsaiHB, WuCY, Cumulative cardiovascular polypharmacy is associated with the risk of acute kidneyinjury in elderly patients. BenkeserD, CaroneM, Van der LaanMJ, Doubly robust nonparametric inference on the average treatment effect. GruberS and van der LaanMJ. SchnitzerME, Van der LaanMJ, MoodieEEM, Effect of breastfeeding on gastrointestinal infection in infants: a targeted maximum likelihood approach for clustered longitudinal data. JuC, SchwabJ and van der LaanMJ. On adaptive propensity score truncation in causal inference. IpS and XueJ. A multivariate regression view of multi-label classificationChengW, HüllermeierE and DembczynskiKJ.
Monte Carlo mean estimates and standard errors for different implementations of G-Computation.
n = 500
n = 1 000
Q_{n}corr
Reg 1
Reg 2
Reg 1
Reg 2
Unadjusted
By Regimen
Y
0.63(0.05)
0.62(0.07)
0.63(0.03)
0.62(0.04)
By treatment (main terms)
N
0.48(0.05)
0.83(0.03)
0.48(0.03)
0.83(0.03)
By treatment (first order interactions)
Y
0.63(0.05)
0.62(0.07)
0.63(0.03)
0.62(0.04)
Adjusted for X_{ij}
By Regimen
Y
0.64(0.04)
0.57(0.07)
0.64(0.03)
0.57(0.05)
By treatment (main terms)
N
0.47(0.04)
0.81(0.03)
0.47(0.03)
0.82(0.03)
By treatment (first order interactions)
Y
0.62(0.05)
0.58(0.06)
0.62(0.03)
0.58(0.04)
Note: The true value for regimen 1 is E(Y1) = 0.61 and the true value for regimen 2 is E(Y2) = 0.57. Q_{n}corr indicates whether the outcome model includes the true treatment–treatment interactions.
Monte Carlo means and standard errors over 1000 draws for different causal estimators that utilize the generalized propensity score.
n = 500
n = 1000
Q_{n}corr
Reg 1
Reg 2
Reg 1
Reg 2
SVM
IPTW
N/A
0.63(0.05)
0.61(0.07)
0.63(0.04)
0.61(0.05)
PSA(I)
Y
0.64(0.06)
0.51(0.08)
0.64(0.04)
0.52(0.05)
PSA(II)
N
0.44(0.05)
0.83(0.04)
0.44(0.03)
0.83(0.03)
TMLE(I)
Y
0.62(0.05)
0.58(0.09)
0.62(0.04)
0.58(0.06)
TMLE(II)
N
0.62(0.05)
0.60(0.07)
0.62(0.04)
0.59(0.05)
Softmax Regression
IPTW
N/A
0.62(0.06)
0.58(0.11)
0.62(0.04)
0.58(0.07)
PSA(I)
Y
0.64(0.05)
0.57(0.07)
0.63(0.04)
0.57(0.05)
PSA(II)
N
0.47(0.04)
0.83(0.04)
0.47(0.03)
0.83(0.03)
TMLE(I)
Y
0.62(0.06)
0.58(0.10)
0.62(0.04)
0.58(0.07)
TMLE(II)
N
0.62(0.06)
0.58(0.10)
0.62(0.04)
0.58(0.07)
GBM
IPTW
N/A
0.62(0.05)
0.60(0.08)
0.62(0.04)
0.59(0.06)
PSA(I)
Y
0.63(0.06)
0.51(0.10)
0.63(0.04)
0.52(0.06)
PSA(II)
N
0.42(0.04)
0.85(0.04)
0.43(0.03)
0.84(0.03)
TMLE(I)
Y
0.62(0.06)
0.58(0.10)
0.62(0.04)
0.58(0.07)
TMLE(II)
N
0.62(0.05)
0.59(0.08)
0.62(0.05)
0.59(0.06)
Note: The true value for regimen 1 is E(Y1) = 0.61 and the true value for regimen 2 is E(Y2) = 0.57. Outcome regression models were fit by (I) regimen and (II) treatments as main terms covariates. Q_{n}corr indicates whether the outcome model includes the true treatment-treatment interactions. SVM: Support vector machine; GBM: generalized boosted model; IPTW: inverse probability of treatment weighting; PSA: propensity score adjustment; TMLE: targeted maximum likelihood estimation.
Summary of the baseline and outcome data for the application study in Section 4.
Regimen
1
2
3
4
5
6
7
8
9
10
OFX-KM-KMZ-EMB-ETH
OFX-KM-Z-ETH-CS
OFX-KM-PTO-CS-PAS
Z-EMB-RBT
OFX-SM-PTO-CS-PAS
None
OFX-KM-Z-ETH
OFX-CM-Z-ETH-CS-PAS
OFX-PTO-CS-PAS
OFX-KM-Z-EMB-ETH-CS
N
1514
364
263
237
209
207
178
153
151
137
P
0.16
0.04
0.03
0.03
0.02
0.02
0.02
0.02
0.02
0.01
Covariates
Year, median(IQR)
2008(0)
2008(0)
2004(2)
1997(7)
2002(2)
2002(7)
2008(0)
2004(0)
2004(2)
2008(0)
Country Income Group
High, p
0.00
0.01
0.97
0.84
1.00
0.61
0.01
0.92
1.00
0.20
Lower middle, p
0.00
0.00
0.03
0.00
0.00
0.03
0.00
0.08
0.00
0.00
Upper middle, p
1.00
0.99
0.00
0.16
0.00
0.36
0.99
0.00
0.00
0.80
Age, mean(SD)
37.71
36.23
43.51
44.82
44.82
41.07
36.03
36.41
40.23
36.64
(10.67)
(11.55)
(15.72)
(14.60)
(15.11)
(15.66)
(11.90)
(11.14)
(15.32)
(10.40)
Sex, female, p
0.37
0.36
0.22
0.29
0.24
0.35
0.42
0.12
0.29
0.35
HIV
+ve, p
0.26
0.25
0.00
0.40
0.00
0.21
0.26
0.02
0.00
0.22
−ve, p
0.43
0.37
0.95
0.49
0.99
0.58
0.34
0.97
0.99
0.54
Unknown, p
0.31
0.38
0.05
0.11
0.01
0.21
0.40
0.01
0.01
0.24
Past TB
Yes, p
0.90
0.87
0.75
0.24
0.77
0.53
0.90
0.95
0.74
0.93
No, p
0.07
0.06
0.25
0.74
0.22
0.44
0.06
0.00
0.25
0.05
Unkown, p
0.03
0.07
0.00
0.02
0.01
0.03
0.04
0.05
0.01
0.02
Sputum smear status
+ve, p
0.60
0.67
0.66
0.73
0.83
0.58
0.72
0.85
0.72
0.77
−ve, p
0.30
0.23
0.27
0.17
0.17
0.20
0.19
0.12
0.26
0.19
Unknown, p
0.10
0.10
0.07
0.10
0.00
0.22
0.09
0.03
0.02
0.04
Cavities on CXR
+ve, p
0.55
0.64
0.42
0.22
0.46
0.35
0.64
0.54
0.37
0.57
−ve, p
0.10
0.10
0.57
0.26
0.52
0.09
0.15
0.37
0.62
0.18
Unknown, p
0.35
0.26
0.01
0.52
0.02
0.56
0.21
0.09
0.01
0.25
Outcome
Treatment Success, p
0.46
0.62
0.50
0.14
0.38
0.31
0.54
0.71
0.42
0.56
p: proportion of subjects following regimen; −ve: Negative; + ve: Positive; SD: Standard deviation; IQR: Interquartile range; TB: Tuberculosis; CXR: Chest X-ray.
Estimates of the probability of treatment success along with the confidence intervals under regimens 1–5 for the MDR-TB application in Section 4.
Regimen
1
2
3
4
5
OFX-KM-Z-EMB-ETH
OFX-KM-Z-ETH-CS
OFX-KM-PTO-CS-PAS
Z-EMB-RBT
OFX-SM-PTO-CS-PAS
SVM
0.46
0.71
0.59
0.27
0.32
IPTW
0.46
0.71
0.59
0.27
0.32
(0.44,0.49)
(0.62,0.80)
(0.47,0.70)
(0.09,0.45)
(0.17,0.46)
PSA(I)
0.44
0.67
0.63
0.32
0.55
PSA(II)
0.66
0.69
0.64
0.42
0.68
TMLE(I)
0.61
0.78
0.63
0.54
0.31
(0.60,0.61)
(0.76,0.80)
(0.60,0.65)
(0.52,0.57)
(0.28,0.34)
TMLE(II)
0.49
0.69
0.60
0.34
0.37
(0.48,0.50)
(0.68,0.70)
(0.58,0.63)
(0.31,0.36)
(0.35,0.38)
Softmax
Regression
IPTW
0.46
0.65
0.56
0.27
0.37
(0.43,0.49)
(0.59,0.70)
(0.49,0.64)
(0.18,0.36)
(0.29,0.44)
PSA(I)
0.38
0.63
0.55
0.22
0.45
PSA(II)
0.56
0.64
0.59
0.36
0.62
TMLE(I)
0.60
0.65
0.61
0.57
0.37
(0.59,0.62)
(0.62,0.67)
(0.59,0.64)
(0.54,0.60)
(0.35,0.39)
TMLE(II)
0.48
0.64
0.59
0.26
0.45
(0.47,0.50)
(0.62,0.67)
(0.57,0.62)
(0.22,0.30)
(0.43,0.48)
GBM
IPTW
0.55
0.81
0.59
0.25
0.27
(0.39,0.72)
(0.64,0.98)
(0.47,0.70)
(0.11,0.39)
(0.01,0.52)
PSA(I)
0.43
0.68
0.63
0.35
0.55
PSA(II)
0.65
0.68
0.64
0.37
0.66
TMLE(I)
0.63
0.83
0.60
0.54
0.27
(0.58,0.68)
(0.79,0.87)
(0.54,0.67)
(0.51,0.56)
(0.22,0.32)
TMLE(II)
0.55
0.77
0.57
0.34
0.30
(0.50,0.60)
(0.76,0.79)
(0.50,0.64)
(0.30,0.37)
(0.28,0.32)
SVM: Support vector machine; GBM: generalized boosted model; IPTW: inverse probability of treatment weighting; PSA: propensity score adjustment; TMLE: targeted maximum likelihood estimation. Outcome regression models were fit (I) by regimen and (II) with treatments as main terms covariates.
Estimates of the probability of treatment success along with the confidence intervals under regimens 6–10 for the MDR-TB application in Section 4.
Regimen
6
7
8
9
10
None Z-ETH
OFX-KM-Z-ETH-CS-PAS
OFX-CM-PTO-CS-PAS
OFX-Z-EMB-ETH-CS
OFX-KM-
SVM
IPTW
0.20
0.56
0.67
0.57
0.56
(0.08,0.31)
(0.48,0.64)
(0.55,0.0.80)
(0.37,0.77)
(0.47,0.64)
PSA(I)
0.29
0.59
0.61
0.56
0.57
PSA(II)
0.38
0.63
0.61
0.58
0.66
TMLE(I)
0.21
0.58
0.67
0.62
0.61
(0.18,0.23)
(0.56,0.60)
(0.65,0.69)
(0.58,0.66)
(0.58,0.63)
TMLE(II)
0.24
0.58
0.60
0.58
0.57
(0.21,0.27)
(0.56,0.60)
(0.58,0.62)
(0.55,0.61)
(0.54,0.60)
Softmax
Regression
0.31
0.56
0.69
0.45
0.56
IPTW
0.31
0.56
0.69
0.45
0.56
(0.24,0.38)
(0.48,0.64)
(0.61,0.78)
(0.35,0.54)
(0.48,0.65)
PSA(I)
0.37
0.55
0.56
0.46
0.54
PSA(II)
0.38
0.56
0.59
0.50
0.65
TMLE(I)
0.25
0.58
0.68
0.55
0.60
(0.22,0.28)
(0.55,0.61)
(0.66,0.70)
(0.52,0.58)
(0.57,0.64)
TMLE(II)
0.35
0.56
0.62
0.49
0.56
(0.29,0.41)
(0.53,0.60)
(0.60,0.64)
(0.47,0.52)
(0.52,0.61)
GBM
IPTW
0.24
0.70
0.75
0.56
0.55
(0.17,0.32)
(0.41,0.98)
(0.65,0.83)
(0.25,0.86)
(0.45,0.65)
PSA(I)
0.38
0.60
0.60
0.54
0.57
PSA(II)
0.40
0.62
0.60
0.52
0.65
TMLE(I)
0.25
0.67
0.73
0.62
0.59
(0.21,0.28)
(0.62,0.73)
(0.70,0.77)
(0.56,0.67)
(0.57,0.62)
TMLE(II)
0.26
0.67
0.67
0.58
0.54
(0.22,0.31)
(0.65,0.68)
(0.63,0.71)
(0.54,0.61)
(0.52,0.58)
SVM: Support vector machine; GBM: generalized boosted model; IPTW: inverse probability of treatment weighting; PSA: propensity score adjustment; TMLE: targeted maximum likelihood estimation. Outcome regression models were fit (I) by regimen and (II) with treatments as main terms covariates.
Ranking of the top 5 medication regimens estimated by each method in terms of the estimated population recovery rate of MDR-TB treatment success.
Causal Estimation
IPTW
TMLE(I)
TMLE(II)
PSA(I)
PSA(II)
Methods
SVM
Reg 2
Reg 2
Reg 2
Reg 2
Reg 2
Reg 8
Reg 8
Reg 8
Reg 3
Reg 5
Reg 3
Reg 3
Reg 3
Reg 8
Reg 10
Reg 9
Reg 9
Reg 7
Reg 7
Reg 1
Reg 10
Reg 10
Reg 9
Reg 10
Reg 3
Softmax Regression
Reg 8
Reg 8
Reg 2
Reg 2
Reg 10
Reg 2
Reg 2
Reg 8
Reg 8
Reg 2
Reg 10
Reg 3
Reg 3
Reg 3
Reg 5
Reg 7
Reg 10
Reg 10
Reg 7
Reg 8
Reg 3
Reg 1
Reg 7
Reg 10
Reg 3
GBM
Reg 2
Reg 2
Reg 2
Reg 2
Reg 2
Reg 8
Reg 8
Reg 8
Reg 3
Reg 5
Reg 7
Reg 7
Reg 7
Reg 7
Reg 10
Reg 3
Reg 1
Reg 9
Reg 8
Reg 1
Reg 9
Reg 9
Reg 3
Reg 10
Reg 3
Reg 1: OFX-KM-Z-EMB-ETH; Reg 2: OFX-KM-Z-ETH-CS; Reg 3: OFX-KM-PTO-CS-PAS; Reg 4: Z-EMB-RBT; Reg 5: OFX-SM-PTO-CS-PAS; Reg 6: None; Reg 7: OFX-KM-Z-ETH; Reg 8: OFX-CM-Z-ETH-CS-PAS; Reg 9: OFX-PTO-CS-PAS; Reg 10: OFX-KM-Z-EMB-ETH-CS; SVM: Support vector machine; GBM: generalized boosted model; IPTW: inverse probability of treatment weighting; PSA: propensity score adjustment; TMLE: targeted maximum likelihood estimation. Outcome regression models were fit (I) by regimen and (II) with treatments as main terms covariates.