03706251170BiometricsBiometricsBiometrics0006-341X1541-042024245800408684210.1111/biom.12103NIHMS568273ArticleSharpening Bounds on Principal Effects with CovariatesLongDustin M.Department of Biostatistics, West Virginia University, Morgantown, WV 26506-9190, USAdmlong@hsc.wvu.eduHudgensMichael G.Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599-7420, USAmhudgens@bios.unc.edu1652014181120131220130872014694812819Summary

Estimation of treatment effects in randomized studies is often hampered by possible selection bias induced by conditioning on or adjusting for a variable measured post-randomization. One approach to obviate such selection bias is to consider inference about treatment effects within principal strata, i.e., principal effects. A challenge with this approach is that without strong assumptions principal effects are not identifiable from the observable data. In settings where such assumptions are dubious, identifiable large sample bounds may be the preferred target of inference. In practice these bounds may be wide and not particularly informative. In this work we consider whether bounds on principal effects can be improved by adjusting for a categorical baseline covariate. Adjusted bounds are considered which are shown to never be wider than the unadjusted bounds. Necessary and sufficient conditions are given for which the adjusted bounds will be sharper (i.e., narrower) than the unadjusted bounds. The methods are illustrated using data from a recent, large study of interventions to prevent mother-to-child transmission of HIV through breastfeeding. Using a baseline covariate indicating low birth weight, the estimated adjusted bounds for the principal effect of interest are 63% narrower than the estimated unadjusted bounds.

BoundsCausal EffectsPartial IdentifiabilityPotential OutcomesPrincipal Strata
1. Introduction

Often in randomized trials to evaluate the effect of a treatment, inference is hampered by possible selection bias induced by conditioning on or adjusting for a variable measured post-randomization. For instance, in randomized clinical trials where some study participants do not comply with their treatment assignment, investigators are often interested in the effect of treatment in participants who were compliant. One approach that avoids potential selection bias induced by conditioning on a post-randomization variable is to focus inference on the causal effect within a principal strata of interest, i.e., the principal effect (Frangakis and Rubin 2002). Principal strata are defined by the pair of potential outcomes under either treatment assignment of the post-randomization variable. For instance, in the setting of non-compliance the principal stratum of interest may be study participants who would comply with their randomization assignment regardless of whether assigned to treatment or control (Angrist et al. 1996). In vaccine trials, a principal stratum of interest may be individuals who would be infected at a certain time regardless of vaccine status (Shepherd et al. 2011). In studies of interventions to prevent mother-to-child transmission (MTCT) of HIV through breastfeeding, a principal stratum of interest is infants who would be uninfected at a certain time regardless of treatment (Nolen and Hudgens 2011; Long and Hudgens 2012). There are many other settings where inference about treatment effects within certain principal strata may be of interest; see VanderWeele (2011) for a recent overview. A primary goal in these settings is often to better understand a treatment’s effect by drawing inference about effects within subgroups defined by the principal strata. The principal stratification approach is less helpful for decision making, as principal stratum membership is generally not identifiable prior to treatment (Joffe 2011).

In many instances, even after treatment assignment individual principal strata membership is not identifiable from the observable data without strong assumptions because only one of the two post-randomization variable potential outcomes is ever observed for an individual. In turn, the principal effect of interest is not identifiable. One approach to cope with lack of identifiability is to conduct sensitivity analysis wherein some model is assumed, indexed by an unidentifiable parameter conditional on which the principal effect is identifiable from the observable data. Inference about the principal effect is then conducted conditional on some value of the unidentifiable parameter and sensitivity of the inference is examined by considering different values of the parameter. An alternative approach entails drawing inference about bounds on the principal effects, e.g., Zhang and Rubin (2003), Cheng and Small (2006). Informally, these extreme bounds provide the smallest and largest possible values of the principal effect consistent with the observed data distribution. This approach is appealing in that typically bounds can be obtained under minimal assumptions. However, in many cases the bounds may be quite wide and therefore not particularly informative about the principal effect.

Grilli and Mealli (2008) derived nonparametric bounds on principal effects under a number of different assumptions. They suggested these bounds can be improved (or narrowed) by creating bounds within strata defined by a baseline covariate and combining these stratum specific bounds by taking a weighted average to obtain new adjusted bounds on the principal effect. Grilli and Mealli employed this approach in the analysis of data from an employment study with mixed results: the adjusted bounds were an improvement on only one side of the unadjusted bounds, i.e., the adjusted upper bound was less than the unadjusted upper bound but the adjusted lower bound was also less than the unadjusted lower bound. The reason for only partial improvement was not addressed. More recent work by Lee (2009) and Mealli and Pacini (2012) indicate the adjusted bounds will never be wider than the unadjusted bounds and sometimes the adjusted bounds will be strictly narrower than the unadjusted bounds. In this paper we characterize the exact circumstances for which adjusting for a baseline covariate leads to improved bounds.

The outline of the remainder of this paper is as follows. In Section 2, notation and assumptions are introduced. Section 3 explains the lack of identifiability of the principal effect and in Section 4 the unadjusted bounds are reviewed. Section 5 defines the adjusted bounds based on a weighted average of bounds within levels of the baseline covariate. Section 6 contains the main result of this paper, giving necessary and sufficient conditions under which the covariate adjusted bounds improve upon (i.e., are narrower than) the unadjusted bounds. In Section 7 large sample inferential methods are discussed. In Section 8, the adjusted and unadjusted bounds are compared using data from a recent, large MTCT study. Conditions in which adjusting for the covariate leads to identification of the principal effect are discussed in Section 9. A brief discussion is given in Section 10. Proofs of the propositions in Section 6 are given in the Web Appendix A.

2. Notation and Assumptions

To motivate, throughout we consider the MTCT example where infants of HIV positive mothers are randomized at birth (i.e., baseline or time 0) to treatment or control. Suppose n infants are enrolled in a MTCT study and for i = 1, …, n let Zi denote the randomization assignment for infant i. Without loss of generality let Zi = 0 correspond to control and Zi = 1 correspond to treatment. Let Xi be some binary variable measured at baseline (prior to randomization) taking on values 0 or 1. For simplicity Xi is assumed to be binary for now, although the results derived below will apply for any baseline categorical covariate with a finite number of levels. The primary endpoint in MTCT studies is typically HIV infection of the infant by some time point τ (e.g., six months) after baseline. Denote the presence or absence of the primary endpoint by Yi, where Yi = 1 indicates infant i became infected by τ and otherwise Yi = 0. Because the goal of treatment is to prevent breast milk transmission of HIV, investigators are primarily interested in infant HIV infections that occur before τ but after some time τ0 > 0 (e.g., τ0 = 2 weeks), as infections prior to τ0 are likely due to in utero or peripartum transmission and not breastfeeding. Let Si denote whether infant i is infected by τ0, where Si = 1 if infant i is HIV infected by τ0 and Si = 0 otherwise. Let Si(z) denote the potential value of Si when assigned treatment z for z = 0, 1 such that Si = (1 − Zi)Si(0) + ZiSi(1). Define Yi(z) similarly. Assume the treatment assignment of an infant does not affect the potential outcomes of other infants (i.e., there is no interference) and that there are not multiple forms of treatment.

An analysis of the effect of treatment that simply excludes infants infected by τ0 (i.e., Si = 1) is subject to selection bias. That is, because the set (or population) of infants that would be infected by τ0 when assigned treatment Zi = 1 is not necessarily the same as the set of infants that would be infected by τ0 if assigned control Zi = 0, direct comparisons between trial arms that exclude infants infected by τ0 in general do not have a causal interpretation. To avoid such selection bias, the principal stratification framework may be adopted (Frangakis and Rubin 2002). Principal strata are defined by sets of infants with the same potential outcome pair (Si(0) = s0, Si(1) = s1). Define the always infected (AI) principal stratum to be infants with s0 = s1 = 1, i.e., infants who would be infected at τ0 regardless of randomization assignment. Similarly define the harmed stratum as those infants with s0 = 0, s1 = 1; the protected stratum as those infants with s0 = 1, s1 = 0; and the never infected (NI) stratum as those infants with s0 = s1 = 0. Based on considerations described above, investigators conducting MTCT trials are interested in the NI stratum. The causal estimand of interest, the principal effect, is the effect of treatment on Yi in infants who would be uninfected at τ0 under either randomization assignment, namely

CE=Pr[Yi(1)=1Si(0)=Si(1)=0]-Pr[Yi(0)=1Si(0)=Si(1)=0].

Below we consider large sample bounds for CE that do and do not adjust for the baseline covariate Xi. Throughout we assume

Assumption 1 (Independent Treatment Assignment): Zi ⊥ (Xi, Si(0), Si(1), Yi(0), Yi(1))

Assumption 2 (Monotonicity): Si(1) ≤ Si(0) for all i

where ⊥ denotes independence. Assumption 1 will hold in randomized trials. Monotonicity assumes that treatment does no harm with respect to the intermediate variable Si, i.e., there are no infants who would be infected by τ0 only if treated. Under Assumption 2, there are only three possible principal strata: AI, NI, and protected.

3. Partial Identifiability

In this section we consider identifiability of CE. Let θzst = Pr[Yi(z) = 1|Si(1) = s, Si(0) = t], πz = Pr[Yi(z) = 1|Si(z) = 0], and γ = Pr[Si(0) = 0|Si(1) = 0], such that CE = θ100θ000. Assume γ > 0 as otherwise the NI stratum is empty with probability 1. Under Assumptions 1 and 2, θ000 = Pr[Yi = 1|Si = 0, Zi = 0], which is identifiable from the observed data. However, θ100 is not identifiable. Following Hudgens and Halloran (2006) note

Pr[Yi(1)=1Si(1)=0]=Pr[Yi=1Si=0,Zi=1]=Pr[Yi(1)=1Si(0)=0,Si(1)=0]Pr[Si(0)=0Si(1)=0]+Pr[Yi(1)=1Si(0)=1,Si(1)=0]Pr[Si(0)=1Si(1)=0],

i.e.,

π1=γθ100+(1-γ)θ101.

Under Assumption 1, π1 is identifiable. Under Assumptions 1 and 2, γ is identifiable because

γ=Pr[Si(0)=0]Pr[Si(1)=0]=Pr[Si=0Zi=0]Pr[Si=0Zi=1].

On the other hand, θ100 and θ101 are not identifiable because infants who were assigned treatment and uninfected at τ0 are a mixture of infants from the protected and NI strata. Solving (1) for θ100 yields

θ100=π1γ-1-γγθ101.

Equation (2) describes a line with intercept π1/γ and slope −(1 − γ)/γ. Any pair of points (θ101, θ100) in the unit square that are on this line will give rise to the same observed data distribution.

Note that CE is identifiable if and only if γ = 1, π1 = 1, or π1 = 0. If γ = 1, then (2) is a horizontal line with intercept π1 and thus θ100 = π1. Note γ = 1 is equivalent to Pr[Si(0) = 0] = Pr[Si(1) = 0], i.e., treatment has no effect on the intermediate variable Si. If π1 = 1, then (1) implies θ100 = 1; geometrically this corresponds to the line (2) intersecting the unit square at the upper right corner (1,1). In words, π1 = 1 means that all treated infants who are uninfected at τ0 will become infected by τ. Likewise, if π1 = 0, then (1) implies θ100 = 0, corresponding to the line (2) intersecting the unit square at the origin (0,0). In words, π1 = 0 means that all treated infants who are uninfected at τ0 will not become infected by τ. Otherwise, if γ < 1 and 0 < π1 < 1, under Assumptions 1 and 2, CE is not identifiable from the observable random variables. On the other hand, θ000, π1, and γ are identifiable, implying the observed data distribution does reveal some information about possible values for CE, i.e., CE is partially identifiable. The focus of the sequel is on large sample bounds, i.e., the smallest and largest possible values of CE that are consistent with the observed data law.

4. Unadjusted Bounds

In this section, we present large sample bounds for CE (Zhang and Rubin 2003; Hudgens et al. 2003) that ignore the baseline covariate X. The large sample bounds for CE are found by first bounding θ100. The upper bound for θ100 is obtained by assuming θ101 = 0 or θ100 = 1. Likewise, the lower bound for θ100 is obtained by assuming θ101 = 1 or θ100 = 0. These bounds can be envisaged as corresponding to the points where the line (2) intersects the unit square (Hudgens and Halloran 2006). In particular, the upper and lower bounds are

θ100u=min{π1γ,1}andθ100l=max{π1-(1-)γ,0}.

Bounds for CE are found by replacing θ100 by θ100u and θ100l, i.e., CEu=θ100u-θ000 and CEl=θ100l-θ000. These bounds will be referred to as “unadjusted” bounds because no information from the covariate is used.

To illustrate, let the probabilities corresponding to a fictitious trial of MTCT of HIV be γ = 0.95, π1 = 0.02, and π0 = 0.05. Using (3), for this trial θ100l=max{(0.02-(1-0.95))/0.95,0}=0 and θ100u=min{0.02/0.95,1}=0.021 (note here and in the sequel that reported numerical results are rounded, in this case to the third decimal place). This gives the unadjusted bounds as [CEl, CEu] = [0–0.05, 0.021–0.05] = [−0.05, −0.029] because θ000 = π0. In this example the bounds exclude zero, implying treatment reduces the risk of Y = 1 in the NI stratum. Let the probabilities for a second fictitious trial be γ = 0.80, π1 = 0.85, and π0 = 0.95. Then θ100u=1 and θ100l=0.813, implying the unadjusted bounds are [CEl, CEu] = [−0.137, 0.05]. These two fictitious trials will be revisited in the next section.

5. Adjusted Bounds

Next we consider the method proposed by Grilli and Mealli (2008) for adjusting the large sample bounds using the binary baseline covariate X, i.e., bounds will be obtained within strata defined by X and then weighted averages of the stratum specific bounds will be computed. Let θzstx = Pr[Yi(z) = 1|Si(1) = s, Si(0) = t, Xi = x], γx = Pr[Si(0) = 0|Si(1) = 0, Xi = x], πzx = Pr[Yi(z) = 1|Si(1) = 0, Xi = x], ϕx = Pr[Xi = x|Si(1) = Si(0) = 0], and λx = Pr[Xi = x|Si(1) = 0]. Throughout it is assumed that Pr[Si(1) = 0, Si(0) = t, Xi = x] > 0 for x, t = 0, 1 such that the conditional probabilities θz0tx, γx, and πzx are all well defined. This assumption also implies γx > 0, ϕx > 0 and λx > 0 for x = 0, 1. Note

θ100=xθ100xϕx, where here and in the sequel x=x=01. As in the unadjusted case, θ100x is not identifiable but using arguments analogous to (2) for Xi = x

θ100x=π1xγx-1-γxγxθ101x, and identifiable upper and lower bounds for θ100x are

θ100xu=min{π1xγx,1}andθ100xl=max{π1x-(1-γx)γx,0}.

Under Assumptions 1 and 2, ϕx is identifiable because Pr[Xi = x|Zi = 0, Si = 0] = Pr[Xi = x|Si(0) = 0] = Pr[Xi = x|Si(1) = Si(0) = 0]. Therefore, identifiable bounds for θ100 can be obtained by combining (4) and (6), namely

θ100Xu=xθ100xuϕxandθ100Xl=xθ100xlϕx.

This leads to adjusted bounds CEXu=θ100Xu-θ000 and CEXl=θ100Xl-θ000.

Table 1 contains the values of two different binary baseline covariates, X1 and X2, for each of the fictional trials discussed in Section 4. For the first trial and X1, by (6) we have θ1000u=min{0.035/0.995,1}=0.035,θ1000l=max{(0.035-(1-0.995))/0.995,0}=0.030,θ1001u=0.011, and θ1001l=0. Thus, θ100Xu=0.035×0.419+0.011×0.581=0.021 and θ100Xl=0.013. Therefore, there is improvement to the lower bound of CE but not to the upper bound when adjusting for X1, and thus the unadjusted bounds [CEl, CEu] = [−0.05, −0.029] are wider than the adjusted bounds [CEX1l,CEX1u]=[-0.037,-0.029]. On the other hand, adjusting by X2 in the first trial does not yield an improvement since θ100Xu=0.021=θ100u and θ100Xl=0=θ100l.

For the second trial and X1, θ1000u=0.854,θ1000l=0.730,θ1001u=1, and θ1001l=0.878. Thus, θ100Xu=0.935 and θ100Xl=0.813. Here adjusting for X1 yields a smaller upper bound resulting in narrower bounds, i.e., [CEl, CEu] = [−0.137, 0.05] to [CEX1l,CEX1u]=[-0.137,-0.015]. Moreover, the adjusted upper bound is less than the null value of 0, indicating treatment has an effect in the NI principal stratum; such a conclusion was not possible prior to adjusting for X1. On the other hand, adjusting for X2 in the second trial yields no improvement in the bounds because θ100Xu=1=θ100u and θ100Xl=0.813=θ100l.

A graphical depiction of the unadjusted and adjusted bounds is given in Figure 1. The unadjusted bounds correspond to where the solid lines intersect the unit square. Bounds within strata defined by X correspond to where the dashed and dotted lines intersect the unit square. The adjusted bounds, represented by ○ and +, are weighted averages of these stratum specific bounds. For example, in the upper left panel corresponding to trial 1 and X1, we see that θ100Xl is greater than θ100l because the vertical value of + is greater than zero, the point where the solid line intersects the horizontal axis.

6. Improvement of the Bounds

The examples in the preceding section illustrate that adjusting for a baseline covariate may or may not improve the bounds on CE. In this section, we give necessary and sufficient conditions for when the adjusted bounds (7) will be narrower than the unadjusted bounds of (3). Proofs of all propositions are given in Web Appendix A.

Proposition 1

[θ100Xl,θ100Xu][θ100l,θ100u] for any baseline binary covariate X.

According to Proposition 1, the adjusted bounds will be at least as narrow as the unadjusted bounds no matter the choice of X. This proposition is analogous to Proposition 1b of Lee (2009) for Y continuous. To characterize the conditions under which the adjusted bounds are strictly narrower than the unadjusted bounds, consider the following two criteria: π1x<γxandπ1x>γxforsomexx and

π1x>(1-γx)andπ1x<(1-γx)forsomexx where the value of x in (8) and (9) is not necessarily the same. In words, (8) and (9) indicate that Xi is informative about particular orderings between (i) the distribution of Si(0) given Si(1) = 0 and (ii) the distribution of Yi(1) given Si(1) = 0. For the motivating example, these criteria indicate that the ordering of (i) the risk of infection (or not) by τ0 when not treated and (ii) the risk of infection by τ when treated is different between strata defined by the levels of Xi among infants who would not be infected by τ0 if treated. On the other hand, if Xi is uninformative about the relation between (i) and (ii) then neither (8) nor (9) will hold. For example, if Xi is independent of Si(0) given Si(1) = 0 and if Xi is independent of Yi(1) given Si(1) = 0, then neither (8) nor (9) will hold. Using (8) and (9), the following propositions characterize exactly the situations when the adjusted bounds will be narrower.

Proposition 2

θ100Xu<θ100u if and only if X satisfies (8).

Proposition 2 states (8) is a necessary and sufficient condition for the adjusted upper bound for θ100 to be less than the unadjusted bound. This proposition is exemplified in the second fictional trial from Section 5 using X1, where π10 < γ0 and π11 > γ1. As shown in the lower left panel of Figure 1, (8) is satisfied because the dashed line has intercept greater than 1 and the dotted line has intercept less than 1. In contrast, for X1 in the first fictional trial (8) is not satisfied because the dashed and dotted lines in the upper left panel of Figure 1 both have intercepts less than 1; thus the adjusted upper bound based on X1 does not improve upon the unadjusted upper bound.

Proposition 3

θ100Xl>θ100l if and only if X satisfies (9).

Proposition 3 provides a necessary and sufficient condition for the adjusted lower bound to be greater than the unadjusted lower bound. This proposition is illustrated in the first fictional trial from Section 5 using X1, where π10 > (1 − γ0) and π11 < (1 − γ1). As shown in the upper left panel of Figure 1, (9) is satisfied because the dashed line intersects the bottom of the unit square whereas the dotted line intersects the right side of the unit square. In contrast, for X1 in the second ficitious trial (9) is not satisfied because the dashed and dotted lines in the lower left panel of Figure 1 both intersect the right side of the unit square; thus the adjusted lower bound equals the unadjusted lower bound.

Additional insight regarding criteria (8) and (9) can be obtained by constructing principal strata based on cross-classification of {Si(0), Si(1)} as well as {Yi(0), Yi(1)}. Further details regarding this approach are given in Web Appendix B.

It follows immediately from Propositions 2 and 3 that if (8) and (9) both hold then the adjusted bounds are strictly contained within the unadjusted bounds and if neither hold, the adjusted and unadjusted bounds are equal. Note while it was assumed that X was a binary covariate, Propositions 1–3 can immediately be extended to any categorical baseline covariate with a finite number of levels k. Specifically, suppose adjusted bounds are calculated within each of the k strata and weighted averages of these bounds are computed analogous to (7). Then Proposition 1 will hold, and if there exists any two levels of X that satisfy either (8) or (9), then either Proposition 2 or Proposition 3 will hold respectively.

7. Inference

The unadjusted and adjusted bounds can be consistently estimated by plugging in consistent estimators of the component parameters of the bounds. Specifically, let

γ^=min{(1-Si)(1-Zi)/(1-Zi)(1-Si)Zi/Zi,1} and

γ^x=min{(1-Si)(1-Zi)I(Xi=x)/(1-Zi)I(Xi=x)(1-Si)ZiI(Xi=x)/ZiI(Xi=x),1}, where =i=1n and I() is the usual indicator function. Likewise, let ϕ̂x = Σ(1 − Si)(1 − Zi)I(Xi = x)/Σ(1 − Si)(1 − Zi), π̂z = ΣYi(1 − Si)I(Zi = z)/Σ(1 − Si)I(Zi = z), and π̂zx = ΣYi(1 − Si)I(Zi = z, Xi = x)/Σ(1 − Si)I(Zi = z, Xi = x). In the data example presented in Section 8 below, individuals drop out of the study prior to evaluation of Yi so that the simple moment estimators π̂z and π̂zx given above will not be possible to compute. We ignore this complication for now and will return to this issue below.

Consistent estimators of the unadjusted bounds are CE^l=θ^100l-θ^000 and CE^u=θ^100u-θ^000 where θ^100l and θ^100u are obtained by plugging in π̂1 and γ̂ into (3) and θ̂000 = π̂0. Similarly, letting θ^100xl and θ^100xu denote (6) evaluated at π̂1x and γ̂x, consistent estimators of the unadjusted bounds are CE^Xl=θ100Xl-θ^000 and CE^Xu=θ^100Xu-θ^000 where θ^100Xl and θ^100Xu are obtained by plugging in ϕ̂x, θ^100xu, and θ^100xl into (7).

These estimated bounds reflect ignorance due to partial identifiability but do not account for uncertainty due to sampling variability; such uncertainty intervals can be constructed using the approach of Imbens and Manski (2004) and Vansteelandt et al. (2006). In particular, Lee (2009) proved the estimated unadjusted bounds CE^l and CE^u are asymptotically normal under standard regularity conditions and provided that Pr[Si(1) < Si(0)] > 0, i.e., treatment has a protective effect on the intermediate variable Si. Under these conditions, a (1 − α) × 100% pointwise uncertainty interval for CE is given by

[CE^l-cα/2var^{CE^l}1/2,CE^u+cα/2var^{CE^u}1/2] where cα/2 can be computed using equation (4.3) of Vansteelandt et al. (2006), and var^{CE^l} and var^{CE^u} are consistent estimators of var{CE^l} and var{CE^u}. As n → ∞, the interval (10) will contain CE with probability (1 −α). A pointwise uncertainty interval based on the estimated adjusted bounds can be constructed analogously.

8. Illustration

The Breastfeeding, Antiretroviral, and Nutrition (BAN) study was a randomized clinical trial to assess interventions for the prevention of breast milk transmission of HIV in 2369 HIV infected mothers and their infants in Lilongwe, Malawi (Chasela et al. 2010). There were three arms in the BAN study: daily antiretroviral therapy (ART) for the infant, daily ART for the mother, or control. While the primary analysis of the study considered comparisons of both ART arms to control, we will focus on comparing the infant ART and control arms only. The primary endpoint of BAN was infant HIV infection by τ = 28 weeks, so we let Yi = 1 if the infant was HIV positive by 28 weeks and Yi = 0 otherwise. Per protocol, infants who died or were infected in the first two weeks post-treatment were to be excluded from the primary analysis. Let Si = 1 if the infant became HIV positive or died by τ0 = 2 weeks, Si = 0 otherwise. An analysis that compares randomization groups conditional on Si = 0 is subject to selection bias. Instead we let the target of inference be CE, the change in risk of infection by 28 weeks due to infant ART in infants who would be HIV negative and alive at two weeks regardless of randomization assignment. Below estimated unadjusted bounds for CE are compared with estimated adjusted bounds based on a binary variable Xi indicating low birth weight (< 2.5 kg), i.e., Xi = 1 if the infant was of low birth weight, 0 otherwise.

Table 2 presents data on S and X from BAN by randomization arm. The proportion of infants who die or become HIV infected by two weeks is somewhat higher in the control group (5.7% versus 4.6%), although the difference is not statistically significant (Fisher’s exact test two-sided p = 0.35). That the proportion was higher in the control group supports the monotonicity assumption. Because some mother-infant pairs dropped out of the study prior to 28 weeks, the simple moment estimators of πz and πzx given in Section 7 cannot be computed. In the absence of competing risks and informative censoring, πx and πzx can be consistently estimated using the Kalpan-Meier estimator. However, in the BAN study death and breastfeeding cessation prior to HIV infection were competing risks. Therefore the Aalen-Johansen estimator (Aalen and Johansen 1978) for the cumulative incidence function can be used to estimate πz and πzx; see Long and Hudgens (2012) for additional details. The Aalen-Johansen estimates (ignoring X) were π̂0 = 0.0581 and π̂1 = 0.0141. Using these estimates, the estimated unadjusted lower and upper bounds of CE are [CE^l,CE^u]=[-0.0556,-0.0438]. Stratifying by X, the Aalen-Johansen estimators for each randomization arm were π̂00 = 0.0609, π̂01 = 0.0233, π̂10 = 0.0107, and π̂11 = 0.0604. Therefore the estimated adjusted lower and upper bounds are [CE^Xl,CE^Xu]=[-0.0482,-0.0431].

In this example, the estimated adjusted lower bound is greater than the estimated unadjusted lower bound, which is expected based on Proposition 3 and the fact Xi satisfies (9) empirically, i.e., π̂10 > (1 − γ̂0) and π̂11 < (1 − γ̂1). On the other hand, the estimated adjusted upper bound is greater than the estimated unadjusted upper bound, empirically contradicting Proposition 1. That the estimated adjusted upper bound is not less than the estimated unadjusted upper bound is actually not surprising based on Proposition 2 and the fact Xi does not satisfy (8) empirically, in particular π̂1x < γ̂x for x = 0, 1. This suggests the adjusted and unadjusted upper bounds are equal, in which case no particular ordering between the adjusted and unadjusted estimators might be expected. Grilli and Mealli (2008) reported a similar finding in an analysis of employment data of university students. This apparent contradiction between the estimated bounds and Proposition 1 suggests alternative adjusted estimators of the bounds, namely CEXl=max{CE^l,CE^Xl} and CEXu=min{CE^u,CE^Xu}. These estimators are consistent for CEXl and CEXu and by construction will always empirically satisfy Proposition 1, i.e., will always be at least as narrow as the estimated unadjusted bounds. For the BAN data, [CEXl,CEXu]=[-0.0482,-0.0438]. That is, by adjusting for the baseline covariate X indicating lower birth weight, the estimated bounds on CE are 63% narrower than the estimated unadjusted bounds.

Uncertainty intervals corresponding to the estimated unadjusted and adjusted bounds were computed using (10) with bootstrap variance estimates (as in Long and Hudgens 2012). For the unadjusted bounds, the estimated 95% pointwise uncertainty interval equals [−0.076, −0.025]. Corresponding to the estimated adjusted bounds [ CEXl,CEXu] based on the low birth weight indicator, the estimated 95% pointwise uncertainty interval equals [−0.069, −0.024], i.e., adjusting for low birth weight also results in a narrower estimated uncertainty interval.

9. Identifiability

As noted at the end of Section 3, in the absence of covariate X, CE is identifiable if and only if one of the following three conditions occur: γ = 1, π1 = 1, or π0 = 0. When at least one of these conditions holds, CE is identifiable and θ100u=θ100l, i.e., the bounds collapse to a single point. In this section we consider conditions under which adjusting for the binary covariate X renders CE identifiable in the sense that the adjusted bounds collapse to a point, i.e., CEXl=CEXu. By the form of the adjusted bounds given in (7) and the assumption ϕx > 0 for x = 0, 1, it follows that the adjusted bounds yield a single point if and only if

γx=1,π1x=1,orπ1x=0 for x = 0 and x = 1.

Ding et al. (2011) also considered identifiability of a principal effect when outcomes are truncated by death, which is mathematically identical to the problem considered here. In addition to Assumptions 1 and 2 above, Ding et al. provided two additional assumptions which are sufficient for identifiability: (i) XiYi|{Si(0), Si(1), Zi} and (ii) Pr[Xi = x|Si(0) = Si(1) = 0] ≠ Pr[Xi = x|Si(0) = 1, Si(1) = 0]. Unfortunately, assumption (i), which under Assumption 1 can be equivalently stated as XiYi(z)|{Si(0), Si(1)} for z = 0, 1, is in general not subject to empirical test based on the observable data. Ding et al. also gave sufficient identifiability conditions that do not require (i) but instead require that X takes on at least three levels or is continuous and that the mean of Y satisfies a particular linear model.

In contrast, condition (11) can be assessed from the observable data because γx and π1x are identifiable under Assumptions 1 and 2 only. Moreover, (11) suggests a strategy for selecting X. In particular, if a covariate X can be found such that (11) holds for x = 0, 1, then CE will be identifiable. If no such covariate is available, then selecting X such that (11) approximately holds for x = 0, 1 should yield adjusted bounds with width close to zero. For instance, in the MTCT from the previous section, the low birth weight indicator covariate X yields γ̂0 = 1.000, i.e., (11) empirically holds for x = 0; while (11) does not hold empirically for x = 1, π̂11 = 0.0604 is not too far from zero and indeed the birth weight adjusted bounds for CE are substantially narrower than the unadjusted bounds.

Finally, we note two special cases where Xi identifies CE. First, suppose Xi = 1 if and only if Si(0) = Si(1) = 0, i.e., Xi is a perfect predictor of membership in the NI principal stratum. Then trivially CE is identifiable under Assumption 1 alone by the stratum of individuals with Xi = 1. This first case is related to the “principal score,” i.e., the probability an individual is within a principal stratum conditional on one or more covariates (Jo and Stuart, 2009). In practice, principal scores are not known but predicted based on fitted models using the observed data. For example, in the MTCT setting a model for Pr[Si(0) = Si(1) = 0|Xi = x] can be fit using data from infants assigned Zi = 0 because under Assumptions 1 and 2 such infants are in the NI stratum if and only if Si = 0. If a set of one or more baseline covariates (not necessarily binary or even discrete) can be found such that the principal scores for the NI stratum equal zero or one for each individual, then CE is identified by the stratum of individuals with principal scores equal to one. In practice this may not be possible; however, if covariates can be found such that the principal scores for the NI stratum are all close to zero or one, i.e., the principal scores are highly predictive of NI stratum membership, then the adjusted bounds constructed by stratifying on a dichotomization of the principal score should have width near zero. For the second special case, suppose Yi = 1 if and only if Xi = 1, i.e., Xi is a perfect predictor of Yi. Then π10 = 0 and π11 = 1 implying (11) holds for x = 0 and x = 1 and therefore CEXl=CEXu. In settings where Zi has an effect on Yi and Zi is assigned randomly, no such perfect predictor Xi will exist (because Xi is measured pre-randomization), such that the second case seems to have little practical implication.

10. Discussion

In this paper we considered whether bounds on principal effects can be improved by adjusting for a categorical baseline covariate. Necessary and sufficient conditions were derived for which the covariate adjusted bounds will be sharper (i.e., narrower) than the unadjusted bounds. The methods were illustrated using data from a study of interventions to prevent mother-to-child transmission of HIV through breastfeeding. Using a baseline covariate indicating low birth weight, the estimated adjusted bounds for the principal effect of interest were 63% narrower than the estimated unadjusted bounds.

The veracity of the analysis of the BAN trial depends on several key assumptions. Assumption 1, independent treatment assignment, is reasonable because infants were randomly assigned treatment. Assumption 2, monotonicity, implies that the treatment is no worse than control for any individual in terms of the intermediate variable S. As mentioned in Section 8 this assumption is supported by the observed data. Additionally, the BAN study principal investigator, Dr. Charles van der Horst, has indicated that monotonicity is reasonable (personal communication). However, in other settings this assumption may be unrealistic. For example, monotonicity might be considered questionable in an analysis comparing the two active arms of the BAN trial, i.e., infant ART versus maternal ART. In such settings, bounds that do not require the monotonicity assumption would be needed.

There are several possible avenues of related future research. One possible direction entails studying how adjusting for covariates affects the efficiency with which the bounds are estimated. For example, is it possible that certain covariates could be advantageous to adjust for with respect to sharpening bounds, but disadvantageous to adjust for in terms of efficiency? In the presence of multiple baseline covariates, future research could explore a formal approach for determining which covariates to adjust for (or include in a principal score model) and how to subsequently draw valid inference accounting for the covariate selection process. Related to Assumption 2, future research could examine relations between adjusted and unadjusted bounds when monotonicity is assumed for the primary endpoint Y.

Supplementary Material

This work was supported in part by grants U48-DP001944 from the US Centers for Disease Control and Prevention (CDC), and U54-GM104942 and R01-AI085073 from the US National Institutes of Health (NIH). The content is solely the responsibility of the authors and does not necessarily represent the official views of CDC or NIH. The authors would like to thank the BAN investigators for access to the data from their study, and the Associate Editor and two anonymous reviewers for helpful comments that improved the paper. This paper is based on a chapter of the first author’s dissertation (Long 2012).

Supplementary Materials

Web Appendices referenced in Sections 1 and 6 are available with this paper at the Biometrics website on Wiley Online Library.

AalenOJohansenS1978An empirical transition matrix for non-homogeneous Markov chains based on censored observationsScandinavian Journal of Statistics53141150AngristJDImbensGWRubinDB1996Identification of causal effects using instrumental variables (disc: P456–472)Journal of the American Statistical Association91444455ChaselaCHudgensMJamiesonDKayiraDHosseinipourMKourtisAMartinsonFTeghaGKnightRAhmedYKamwendoDHoffmanIEllingtonSKachecheZSokoAWienerJFiscusSKazembePMofoloIChigwenembeMSichaliDvan der HorstC2010Maternal or infant antiretroviral drugs to reduce HIV-1 transmissionNew England Journal of Medicine362242271228120554982ChengJSmallDS2006Bounds on causal effects in three-arm trials with non-complianceJournal of the Royal Statistical Society: Series B (Statistical Methodology)68815836DingPGengZYanWZhouXH2011Identifiability and estimation of causal effects by principal stratification with outcomes truncated by deathJournal of the American Statistical Association10615781591FrangakisCERubinDB2002Principal stratification in causal inferenceBiometrics58212911890317GrilliLMealliF2008Nonparametric bounds on the causal effect of university studies on job opportunities using principal stratificationJournal of Educational and Behavioral Statistics331111130HudgensMGHalloranME2006Causal vaccine effects on binary post-infection outcomesJournal of the American Statistical Association101516419096723HudgensMGHoeringASelfSG2003On the analysis of viral load endpoints in HIV vaccine trialsStatistics in Medicine222281229812854093ImbensGWManskiCF2004Confidence intervals for partially identified parametersEconometrica72618451857JoBStuartE2009On the use of propensity scores in principal causal effect estimationStatistics in Medicine28232857287519610131JoffeM2011Principal stratification and attribution prohibition: Good ideas taken too farInternational Journal of Biostatistics71Article 3522049269LeeDS2009Training, wages, and sample selection: Estimating sharp bounds on treatment effectsReview of Economic Studies76310711102LongDM2012Causal Inference and Principal Stratification: Competing Risks, Bounds, and SurrogatesDoctoral DissertationDepartment of Biostatistics, University of North CarolinaChapel HillLongDMHudgensMG2012Comparing competing risk outcomes within principal strata, with application to studies of mother-to-child transmission of HIVStatistics in Medicine31273406341822927321MealliFPaciniB2012Using secondary outcomes and covariates to sharpen inference in randomized experiments with noncomplianceWorking Paper 4Department of Statistics, University of FlorenceNolenTLHudgensMG2011Randomization-based inference within principal strataJournal of the American Statistical Association10658159321987597ShepherdBGilbertPDupontC2011Sensitivity analyses comparing time-to-event outcomes only existing in a subset selected postrandomization and relaxing monotonicityBiometrics6731100111021114663VanderWeeleT2011Principal stratification – uses and limitationsThe International Journal of Biostatistics71Article 2821841939VansteelandtSGoetghebeurEKenwardMGMolenberghsG2006Ignorance and uncertainty regions as inferential tools in a sensitivity analysisStatistica Sinica16953979ZhangJLRubinDB2003Estimation of causal effects via principal stratification when some outcomes are truncated by “death”Journal of Educational and Behavioral Statistics28353368

Graphical depiction of bounds for the two fictional MTCT trials discussed in Section 5 with two different binary covariates X1 and X2. The solid lines depict equation (2) with π1 = 0.02 and γ = 0.95 in the upper panels and π1 = 0.85 and γ = 0.8 in the lower panels. The · · · (– – –) lines represent (5) for X = 0 (X = 1). The vertical value of ○ (+) corresponds to θ100Xu(θ100Xl).

Probabilities from the fictional trials described in Section 4 stratified by X1 and X2.

Trial 1X1
X2
0101
γx0.9950.9200.9800.880
π1x0.0350.0100.0050.055
λx0.4000.6000.7000.300
ϕx0.4190.5810.7220.278
Trial 2X1
X2
0101
γx0.8900.7400.8750.625
π1x0.7600.9100.9100.710
λx0.4000.6000.7000.300
ϕx0.4450.5550.7660.234

Note by Bayes Theorem ϕx must equal γxλx/(γ0λ0 + γ1λ1) for x = 0, 1; values of ϕx reported in the table have been rounded.

Data from the BAN study stratified by X (low birth weight < 2.5 kg).

Control (Z = 0)Treatment (Z = 1)
X = 0X = 1TotalX = 0X = 1Total


S = 128103836339
S = 05844663075162813