Interference, the dependency of an individual’s potential outcome on the exposure of other individuals, is a common occurrence in medicine and public health. Recently, targeted maximum likelihood estimation (TMLE) has been extended to settings of interference, including in the context of estimation of the mean of an outcome under a specified distribution of exposure, referred to as a policy. This paper summarizes how TMLE for independent data is extended to general interference (network-TMLE). An extensive simulation study is presented of network-TMLE, consisting of four data generating mechanisms (unit-treatment effect only, spillover effects only, unit-treatment and spillover effects, infection transmission) in networks of varying structures. Simulations show that network-TMLE performs well across scenarios with interference, but issues manifest when policies are not well-supported by the observed data, potentially leading to poor confidence interval coverage. Guidance for practical application, freely available software, and areas of future work are provided.

Causal effect estimation often relies on the assumption of no interference, such that an individual’s potential outcomes are independent of all other individuals’ exposure.^{1–3} However, interference is common across many areas of medicine and public health, most notably in infectious disease and medical social sciences. Examples include interference between injection drug users at risk of HIV,^{4,5} students within the same school,^{6} and individuals connected within social networks.^{7} The ongoing global SARS-CoV-2 pandemic has brought further attention to interference; evaluation of physical distancing and shelter-in-place policies have highlighted how such policies, or lack thereof, can impact other nearby geographic regions.^{8,9} In addition to infectious disease, interference occurs across substantive areas, with examples including transmission of opioid use within households in pharmacoepidemiology,^{10} passive tobacco smoke exposure in cancer epidemiology,^{11,12} and behaviors among children within classrooms in developmental psychology.^{13}

When interference is present, multiple estimands may be considered.^{1,14} One estimand of public health importance is the mean of an outcome under a specific policy. For example, what would the three-month risk of influenza have been if 60% of the population had been randomly selected to receive an influenza vaccine? To estimate this quantity or other related estimands, methods allowing for interference have been developed for two broad settings: partial interference and general interference. The partial interference assumption stipulates interference occurs within but not between groups of individuals,^{1,15} which allows for the application of standard statistical theory.^{16–19} While the partial interference assumption is sometimes reasonable, interference patterns do not always allow the separation of individuals into independent groups. General interference allows, in principle, for any two units in a population to affect each other. Methods for general interference may further be delineated by whether the exposure is randomized. In randomized experiments, methods can leverage the random assignment as the basis of inference.^{20,21} In the observational setting, inference is more challenging because of the potential for confounding and lack of independent replicates. Extensions of targeted maximum likelihood estimation for independent data (IID-TMLE) have recently been developed to allow for general interference in observational studies.^{22–24}

In this paper, we present simulation studies of TMLE for general interference (network-TMLE) in observational settings. While simulations have been conducted to evaluate the finite sample performance of network-TMLE,^{23,24} previous empirical studies have been limited to relatively simple random networks. In practice, networks often exhibit more complex properties,^{25–27} limiting the utility of previous simulation studies to guide application. Additionally, previous simulations have explored only a narrow set of data generating mechanisms and model specifications. To address these gaps, we conducted simulations for the estimation of the mean potential outcome under varied data generating mechanisms with a wider variety of networks, including an observed network of face-to-face contacts among university students, and various model specifications. Two policies were assessed: setting all individuals to some constant probability of exposure and shifting the probability of exposure by a constant.

The outline of the remainder of this paper is as follows.

The TMLE of average causal effects is a doubly-robust substitution estimator that incorporates an outcome model and a propensity score (or exposure) model through a targeting step.^{28,29} These models are often referred to as nuisance models since they are not of direct interest. The double-robustness property means that if one nuisance model is correctly specified, then the estimator will be statistically consistent. Under the assumption that both nuisance models are correctly specified, TMLE has the advantage of retaining root-n convergence rates when paired with data-adaptive (machine learning) estimators for the nuisance models that have at least quarter-root-n convergence.^{28,30,31} In the absence of interference, TMLE methods have been developed for average causal effects,^{28} causal effects under different longitudinal treatment plans,^{32} and stochastic policies.^{33} The following is a brief review of IID-TMLE for stochastic policies.

Consider drawing inference about the effect of a binary exposure on either a _{i} indicate a vector of observed baseline covariate(s), _{i} the observed exposure, and _{i} the observed outcome. Assume (_{i}, _{i}, _{i}) for _{i}(^{p} is the average outcome in the target population if policy _{i} and potential outcomes _{i}(^{c}, which corresponds to the mean under the policy _{i} as fixed for the population of

To express the estimand as a function of the observed data, identification assumptions are necessary. The following sufficient identification assumptions are used:

If _{i} = _{i} = _{i}(

If Pr * (

^{34}2 is conditional exchangeability,

^{35}and 3 is the positivity assumption for stochastic policies.

^{33}

IID-TMLE can be divided into five steps: outcome model estimation, weight estimation, targeting, estimation of ^{p}, ^{s}, and ^{c}, with differences occurring in variance estimation.^{36} Therefore, we only distinguish between the estimands during estimation of the variance. In the case of a continuous

IID-TMLE begins with the estimation of an outcome model for _{i}│_{i},_{i}], using either a parametric model or machine learning. Predicted values from the outcome model _{i}│_{i}) is estimated with either a parametric model or machine learning, and the following weight is computed
_{i}, Pr (_{i} = 1) is replaced with the rescaled _{i}. This targeting step solves the efficient score equation in a single step,^{37} without introducing additional parametric modeling assumptions. The estimated intercept, ^{38} While the clever covariate and weighted targeting approaches are asymptotically equivalent, weighted targeting may have better finite sample performance due to reduced sensitivity to stochastic positivity violations.^{39}

To estimate

For stochastic policies, the estimation procedure above requires modification. Because the distribution of _{i} = 1│_{i})). For each

Lastly, (1 − _{1 − α/2} denotes the 1 − ^{p} the variance is estimated by^{33}
^{c} the variance is estimated by^{36}
^{36} Therefore, the sample mean variance may be conservatively estimated by the conditional sample mean variance estimator.

In the presence of interference, the potential outcomes depend on both an individual’s exposure and the exposure of others in the population. Consider the setting where individuals are connected via a network of edges (e.g., an edge may indicate two individuals are friends within a social network, live within a certain distance of each other, or had a face-to-face conversation in the past week). Suppose the network structure is static (i.e., fixed over time) and can be summarized by an

From _{1},_{2},…,_{n}) and

In the presence of interference, the potential outcomes for individual _{i}(_{i},_{−i}), where _{−i} indicates the exposure for all individuals excluding ^{20} such that the potential outcomes may be denoted by _{1},_{2}, …,_{n}).^{24} Note that where there is no interference and the units are IID, ^{p} can be interpreted as the expected mean outcome for a network of

The following assumptions allow for

If _{i} = _{i} = _{i}(^{s})

^{s}) ⊥ ^{s}|^{s}, for all

If Pr * (^{s} = ^{s}│^{s}) > 0then Pr (^{s} = ^{s}│^{s}) > 0, for all

^{s}such that the conditional exchangeability assumption is plausible may be informed by subject matter knowledge. In many settings, the degree of units,

^{s}and may also affect the outcome of interest, as individuals who have many contacts may be different from those who have fewer contacts. Accounting for degree of units is analogous to methods for clustered data which allow for informative cluster sizes,

^{40}i.e., associations between the outcome and the cluster size which exist even after conditioning on other baseline covariates.

Network-TMLE extends the TMLE framework to dependent data by allowing _{i} to depend on _{i} and _{i} to depend on _{i}, _{i}, and ^{p} and ^{c} remains the same, with differences occurring in estimation of the variance.

A model for ^{22}). For example, ordinary least squares could be used to estimate the parameters of the model
_{i} and

The weights can be expressed as
^{s}, estimation of the weights can be done using generalized propensity functions.^{41}

Estimation of the numerator of the weights can be accomplished using a simulation approach as follows. Here, only policies where _{i} and _{j} are independent conditional on ^{s}), as opposed to specifying a policy in terms of both an individual’s exposure and their contacts summary measure, Pr * (^{s} = ^{s}│^{s}).^{24} The first step to the simulation approach entails generating _{i}, _{i} and

To target, the following logistic regression model is fit using weighted maximum likelihood,
_{i} = 1) is replaced with the rescaled _{i} in the case of a continuous outcome. As with IID-TMLE, the targeting step solves the efficient score equation. Similarly, regression with a clever covariate could be used instead, with these approaches being asymptotically equivalent.^{22} However, the weighted targeting approach may have better finite sample performance and be less computationally intensive in the network dependence setting compared to the clever covariate approach.^{23}

As in the IID setting, stochastic policies are evaluated using a Monte Carlo approach. First,

Estimating the variance of the network mean estimator is challenging. Closed-form and bootstrap variance estimators have been proposed, ^{24} but these variance estimators require either that _{1},…,_{n} are IID or that the distribution of ^{24} In settings where network interference may be present, the IID assumption will be unrealistic and correctly modeling the distribution of ^{23} However, this estimator can be uninformative under some skewed degree distributions, which commonly occur in many social networks.^{42}

Due to the overly restrictive conditions on the variance estimators for the network mean, the ^{23}
^{24} However, latent dependence, where individuals close in the network are more likely to share unmeasured covariates compared to individuals far away, is likely in practice. Therefore, we also considered the following variance estimator,^{24} which allows for latent dependence limited up to second-order contacts (immediate contacts of

Simulations were conducted to assess the performance of the network-TMLE methods described in the previous section across varying networks and data generating mechanisms. All simulations were repeated 3000 times. We considered two different policy types. For the first policy, all individuals in the network were assigned the same probability of exposure,

Seven different networks were used: uniform random graphs (^{43} The uniform random graphs followed uniform degree distributions with minimum degree 1 and maximum degree 6. The modified clustered power-law random graphs consisted of separately generated clustered power-law random subgraphs, with edges randomly generated between the random subgraphs. Each of the clustered power-law subgraphs was separately generated from a Barabasi-Albert random graph model with a fixed probability for closing triads between nodes.^{44} For each node, three connections were generated and the probability of triad closure was set to 0.75. The advantage of this approach is that the random graph takes on common characteristics of empirical networks, including a power-law degree distribution, a high clustering coefficient, and an underlying community structure. Lastly, the eX-FLU network was based on data from the eX-FLU cluster-randomized trial, a study to assess the efficacy of three-day self-isolation when symptomatic of respiratory illness on subsequent respiratory infection among university students.^{43} Over the ten-week study period, enrolled students reported face-to-face contacts each week. From the ten weeks of self-reported contacts, we generated a single static network and selected the largest connected component. Summary statistics for the networks are provided in

Different nuisance model specifications of network-TMLE were evaluated for each scenario: both nuisance models were correctly specified, misspecification of the outcome model, and misspecification of the exposure model (model specifications are provided in ^{s}, the summary measure(s) of covariates, along with excluding a covariate from ^{s}. As the correct specification of nuisance models will often be unknown in practice, we further evaluated a more flexible specification of nuisance models. This approach consisted of the following procedure. For a categorical covariate, 𝑉^{s}, included in ^{s}, the corresponding summary measure was defined as ^{s} is the only measure included in ^{s} and the possible values for 𝑉_{i} were {0,1,2}. Therefore, the summary measures for covariates would be ^{s} = (𝑉^{s}(^{s}(^{s}(^{45} This approach kept narrower bins at lower values of 𝑉^{s}(^{s}(

Performance of network-TMLE may be improved when restricting inference to nodes below a pre-specified degree for skewed degree distributions.^{24} Therefore, restricting inference by degree was further compared for the clustered power-law random graphs and eX-FLU network. Nodes with degrees above a pre-defined maximum had their value for held as fixed and were considered as background

Four data generating mechanisms inspired by real-world scenarios were considered. Each data generating mechanism was selected to feature different possible exposure effects, including individual-specific (i.e., unit-treatment) effects and spillover effects from contacts. Below is a brief description of each of the data generating mechanisms with further details provided in

To simulate a no interference setting, a data generating mechanism based on a hypothetical study on statin initiation and subsequent atherosclerotic cardiovascular disease (ASCVD) was created. Statins are cholesterol-lowering drugs that have been shown to reduce cardiovascular disease risk by reducing cholesterol synthesis.^{46,47} The mechanism of action may reasonably allow researchers to believe that whether individual ^{48} and included age, low-density lipoprotein levels, and ASCVD risk score.

Let _{i} indicate age, _{i} indicate log-transformed low-density lipoprotein, _{i} indicate risk score, and _{i} = (_{i},_{i},_{i}). The conditional probability of taking a statin was specified by:

For spillover effect only, a data generating mechanism based on the effect of naloxone on subsequent opioid overdose deaths was created. Opioid overdose deaths have dramatically increased in recent years.^{49,50} Naloxone has been used as an emergency intervention to rapidly reverse opioid overdoses by blocking opioid receptors,^{51} and has been made increasingly available to the general population to prevent overdose deaths.^{52,53} Nasal spray formulations rely on another person for administration, with self-administration having occurred only in rare cases.^{54} Therefore, the prevention of opioid overdose deaths with naloxone is an example where the protective effect may operate solely via spillover effects. Confounders included gender, recent overdose, and recent release from prison, which have been observed as predictors of opioid overdose in previous studies.^{55,56} In the context of this mechanism, the interference pattern could be thought of as a co-injection network.

Let _{i} indicate gender, 𝑂_{i} indicate recent overdose, 𝑃_{i} indicate recent release from prison, and _{i} = (_{i},_{i},𝑃_{i},_{i}). The conditional probability of naloxone was generated according to:

For simultaneous unit-treatment and spillover effects, a data generating mechanism based on a comprehensive dietary intervention on body mass index (BMI) was created. Research has found BMI to be socially clustered,^{57,58} with the transmission of obesity theorized to result from social pressures or the shared environments of social contacts.^{58} Comprehensive dietary interventions that limit caloric intake and increase the quality of food may reduce BMI.^{59} Our simulation focused on a theoretical dietary intervention that impacts an individual’s BMI as well as their immediate friends’ BMI. Confounders included baseline BMI, gender, and baseline exercise. In this context, the interference pattern can be viewed as a network of friendships.

Let _{i} indicate gender, _{i} indicate baseline BMI, _{i} indicate exercise at baseline, _{i} indicate the unobserved variable (proximity to work), and _{i} = (_{i},_{i},_{i},_{i}). The conditional probability of starting the proposed diet at baseline was specified by:
_{i} ~ Normal(0, 1). Here, _{i} is related to _{i} and not _{i}, and is unobserved (i.e., not included in the network-TMLE outcome nuisance model). Additionally, _{i} was made to be assortative in the underlying network. Therefore, this data generating mechanism was expected to exhibit latent variable dependence.

The fourth simulation mechanism entailed a Susceptible-Infected-Recovered (SIR) model of human-to-human transmission of an infectious agent. The hypothetical vaccine followed a ‘leaky’ model, such that the vaccine reduced the probability of infection given a single exposure to an infectious agent.^{60} The spillover effect of the vaccine was composed of contagion (vaccinated individuals were less likely to become infected and thus less likely to transmit) and infectiousness effects (vaccinated-but-infected individuals had reduced probability of transmitting the disease).^{61}

The stochastic SIR model was implemented as follows. Of

Let 𝑉_{i} indicate asthma, 𝐻_{i} indicate hand hygiene, and _{i} = (𝑉_{i},_{i},_{i}). The probability of being vaccinated was specified by:
_{i,t}) by individual _{j,t} = 1 indicates whether _{i} is the indicator variable of ever infected by the end of follow-up.

To assess the performance of network-TMLE, the following metrics were used: bias, empirical standard error (ESE), and 95% CI coverage. Bias was defined as the mean of

All simulations were conducted using Python 3.6.6 with the following libraries: NumPy,^{62} SciPy,^{63} statsmodels,^{64} patsy,^{65} and NetworkX.^{66} Since no current implementation of network-TMLE was available in Python, we designed one. MossSpider is freely available on the Python Package Index (PyPI) and GitHub (^{23} All simulation code is available at

For the hypothetical study of statins the assumption regarding no interference is valid, so network-TMLE is not necessary for estimation in this context but is expected to correctly estimate the proportion under each policy. When both nuisance models were correctly specified, there was little bias for all networks (

Misspecification of either the exposure or outcome models did not substantially alter the performance of network-TMLE. As expected, misspecification of the outcome model resulted in an increased ESE compared to both model beings correctly specified or only the outcome model being correctly specified. The proposed flexible approach for modeling the ^{s} terms performed adequately, but policies where the probability of exposure was greater than 0.5 had the lowest CI coverage of the differing model specifications for the uniform and the power-law random graphs (

Under increased

For simulations of naloxone and opioid overdose (where there was a spillover effect only), network-TMLE had variable performance across the scenarios. For the uniform random graph (

For the eX-FLU network restricted by degree, network-TMLE had minimal bias across policies (^{s} was improved over scenario where only the exposure model was correctly specified. Performance in terms of CI coverage was worse when the eX-FLU network was not restricted by degree, mainly for policies were nearly everyone would have been exposed (

Performance of network-TMLE for the clustered power-law random graph (

For simulations of a comprehensive dietary intervention on BMI with both unit-treatment and spillover effects, network-TMLE point estimates exhibited minimal bias and the corresponding CI coverage was approximately 95% in most scenarios. For the uniform random graph (^{s} had similar performance to other models but coverage was slightly decreased near the extremes of

The overall patterns with the varying model specifications were similar for the eX-FLU network when restricted by degree (^{s} resulted in decreased coverage, particularly for policies where the probability of exposure was set to be low. Restricting by degree reduced the ESE, particularly when the outcome model was misspecified, but did not substantially alter the CI coverage (

Network-TMLE’s performance across the power-law random graphs (^{s} performed adequately for most policies (

For the uniform network (

For the eX-FLU network restricted by degree, bias was present for some policies across all model specifications (

Here, TMLE for IID data with stochastic policies and an extension of TMLE to network-dependent data were reviewed. The performance of network-TMLE was assessed in a variety of different network structures, combinations of unit-treatment and spillover effects, and nuisance model specifications. Finally, software implementing network-TMLE in Python is made freely available, which may help facilitate wider application.

Network-TMLE inference about the

To help identify how well a proposed policy is supported by the observed data, we propose the following diagnostic. For the chosen summary measure ^{s}, a bar chart or histogram of the observed values are plotted stratified by ^{s *} are plotted as done for ^{s}. Therefore, the observed distribution of ^{s} can be visually compared to the distribution of exposures under the proposed policy ^{s *}. An illustrative example of this diagnostic plot for both well supported and poorly supported policies is provided in

Limiting inference to policies ‘close’ to the observed data or focusing on policies which more modestly perturb the exposure distribution may also be more relevant from a practical perspective. In the absence of interference, commonly targeted estimands, like the average causal effect, contrast two extreme exposure distributions: everyone exposed versus no one exposed.^{33,67–69} Such extreme counterfactual exposure settings may be unrealistic or irrelevant in practice. For instance, when assessing the effect of smoking on some health outcome, the counterfactual scenario where all individuals smoke is likely unrealistic. Rather, there may be more interest in the effect of policies or interventions which modestly decrease the likelihood of smoking. As another example, consider policies to encourage influenza vaccination uptake. Previous approaches have resulted in only minor to moderate increases in the vaccine receipt.^{70–73} Therefore, the counterfactual scenario of everyone in the population being vaccinated may be of less relevance, in addition to being difficult to draw valid inferences about.

In practical application, nuisance models should be flexibly specified. Here, we demonstrated one option that binned ^{s}. To help reduce unnecessary increases in the dimensionality of the model, the procedure was further paired with a L2-penalized regression model. Overall, the flexible modeling approach performed well over a wide range of settings and was comparable in terms of bias and coverage to network-TMLE when both nuisance models were correctly specified. As correct model specification is unlikely in practice, flexible specification of the nuisance models is recommended. Nonparametric or data-adaptive approaches could be utilized; for instance ^{74} Provided the nonparametric estimators of the nuisance models converge at a sufficiently fast rate and other conditions hold, the network-TMLE estimator will still be consistent and asymptotically normal.^{24} Additional empirical research is needed to study the finite-sample performance of network-TMLE when such data-adaptive approaches are used for nuisance model estimation.

In the vaccination and infection mechanism, the infection status of a unit depends on units outside of its immediate contacts. This more widespread dependence, a violation of weak dependence, likely resulted in the occurrence of some bias, reduced CI coverage, and explains the difference observed between the variance estimators. To address weak dependence, the extension of network-TMLE for longitudinal data is needed.^{22} A longitudinal extension could instead require that weak dependence holds only within each measured time interval,^{24} as opposed to weak dependence holding over the entire duration of follow-up. Furthermore, the longitudinal extension would allow for a summary measure of infectious immediate contacts within each time interval to be included in the nuisance models.

Future work could consider the following. Our simulation study assessed a flexible approach for including summary measure of covariates, ^{s}, in nuisance models. However, the summary measure ^{s} was assumed to be known. While reliable background information on ^{s} may be known in some settings, this will not always be the case. One method of avoiding specification of a particular summary measure for ^{s} in the exposure model is to factor ^{s} into ^{23} However, this approach is limited to scenarios where the degree distribution is near uniform. Instead, a similar categorization and binning approach could also be applied to ^{s},^{24} which would apply to both nuisance models and allow for non-uniform degree distributions. Here, two closed-form variance estimators for the ^{75,76} with a parametric bootstrap estimator for network-TMLE in the context of the network mean outperforming the closed-form variance.^{24} Further comparison of bootstrap variance estimators in terms of their assumptions and performance remains of interest. Additional empirical evaluation could be conducted of network-TMLE for other estimands such as marginal unit-treatment effects (i.e. direct effects).^{23} Generalization of network-TMLE to other related estimands (i.e., spillover effects, total effects) is also of interest. Finally, direct comparisons between network-TMLE and auto-g-computation, a recent extension of the parametric g-formula for general interference,^{77} could be undertaken.

PNZ was supported by T32-HD091058 and T32-AI007001. MGH was supported by NIH grant R01-AI085073. AEA received funding from NIBIB R01-EB025021 and acknowledges NICHD T32-HD091058 and P2C HD050924. The eX-FLU study was funded by United States Centers for Disease Control and Prevention U01-CK000185 grant. The contents of this publication are solely the responsibility of the authors and do not represent the official views of the National Institutes of Health (NIH) or the United States Centers for Disease Control and Prevention.

We thank Elizabeth Ogburn for her feedback and discussion. We would further like to thank the University of North Carolina at Chapel Hill and the Research Computing group for providing computational resources that have contributed to these results. The Python implementation of network-TMLE is available as MossSpider on the Python Package Index and on GitHub (

All authors declare no conflicts of interest.

Target maximum likelihood estimation for statins and atherosclerotic heart disease, and the uniform random graph. Left y-axes and violin plots correspond to bias, defined as the estimated conditional sample mean minus the true conditional sample mean. The right y-axes and diamonds correspond to 95% confidence interval (CI) coverage The red diamond corresponds to the direct-transmission-only variance estimator and the blue square corresponds to the latent-variable-dependence variance estimators. The first column corresponds to all individuals in the population having the same set probability of statins. The second column corresponds to the shift in log-odds of the predicted probability of statins for each individual. The proportion of statins in the observed data was 25%. A: Network-TMLE with both nuisance models correctly specified. B: Network-TMLE with the exposure model misspecified. C: Network-TMLE with the outcome model misspecified. D: Network-TMLE with a flexible specification of W^s.

Target maximum likelihood estimation for statins and atherosclerotic heart disease, and the eX-FLU network restricted by degree. The maximum degree for participants was restricted to be 22 or less. Left y-axes and violin plots correspond to bias, defined as the estimated conditional sample mean minus the true conditional sample mean. The right y-axes and diamonds correspond to 95% confidence interval (CI) coverage. The red diamond corresponds to the direct-transmission-only variance estimator and the blue square corresponds to the latent-variable-dependence variance estimators. The first column corresponds to all individuals in the population having the same set probability of statins. The second column corresponds to the shift in log-odds of the predicted probability of statins for each individual. The proportion of statins in the observed data was 24%. A: Network-TMLE with both nuisance models correctly specified. B: Network-TMLE with the exposure model misspecified. C: Network-TMLE with the outcome model misspecified. D: Network-TMLE with a flexible specification of W^s.

Target maximum likelihood estimation for naloxone and opioid overdose, and the uniform random graph. Left y-axes and violin plots correspond to bias, defined as the estimated conditional sample mean minus the true conditional sample mean. The right y-axes and diamonds correspond to 95% confidence interval (CI) coverage. The red diamond corresponds to the direct-transmission-only variance estimator and the blue square corresponds to the latent-variable-dependence variance estimators. The first column corresponds to all individuals in the population having the same set probability of naloxone. The second column corresponds to the shift in log-odds of the predicted probability of naloxone for each individual. The proportion of naloxone in the observed data was 35%. A: Network-TMLE with both nuisance models correctly specified. B: Network-TMLE with the exposure model misspecified. C: Network-TMLE with the outcome model misspecified. D: Network-TMLE with a flexible specification of W^s.

Target maximum likelihood estimation for naloxone and opioid overdose, and the eX-FLU network restricted by degree. The maximum degree for participants was restricted to be 22 or less. Left y-axes and violin plots correspond to bias, defined as the estimated conditional sample mean minus the true conditional sample mean. The right y-axes and diamonds correspond to 95% confidence interval (CI) coverage. The red diamond corresponds to the direct-transmission-only variance estimator and the blue square corresponds to the latent-variable-dependence variance estimators. The first column corresponds to all individuals in the population having the same set probability of naloxone. The second column corresponds to the shift in log-odds of the predicted probability of naloxone for each individual. The proportion of naloxone in the observed data was 34%. A: Network-TMLE with both nuisance models correctly specified. B: Network-TMLE with the exposure model misspecified. C: Network-TMLE with the outcome model misspecified. D: Network-TMLE with a flexible specification of W^s.

Target maximum likelihood estimation for diet and body mass index, and the uniform random graph. Left y-axes and violin plots correspond to bias, defined as the estimated conditional sample mean minus the true conditional sample mean. The right y-axes and diamonds correspond to 95% confidence interval (CI) coverage. The red diamond corresponds to the direct-transmission-only variance estimator and the blue square corresponds to the latent-variable-dependence variance estimators. The first column corresponds to all individuals in the population having the same set probability of diet. The second column corresponds to the shift in log-odds of the predicted probability of diet for each individual. The proportion on a diet in the observed data was 48%. A: Network-TMLE with both nuisance models correctly specified. B: Network-TMLE with the exposure model misspecified. C: Network-TMLE with the outcome model misspecified. D: Network-TMLE with a flexible specification of W^s.

Target maximum likelihood estimation for diet and body mass index and the eX-FLU network restricted by degree. The maximum degree for participants was restricted to be 22 or less. Left y-axes and violin plots correspond to bias, defined as the estimated conditional sample mean minus the true conditional sample mean. The right y-axes and diamonds correspond to 95% confidence interval (CI) coverage. The red diamond corresponds to the direct-transmission-only variance estimator and the blue square corresponds to the latent-variable-dependence variance estimators. The first column corresponds to all individuals in the population having the same set probability of diet. The second column corresponds to the shift in log-odds of the predicted probability of diet for each individual. The proportion on a diet in the observed data was 52%. A: Network-TMLE with both nuisance models correctly specified. B: Network-TMLE with the exposure model misspecified. C: Network-TMLE with the outcome model misspecified. D: Network-TMLE with a flexible specification of W^s.

Target maximum likelihood estimation for vaccination and infection and the uniform random graph. Left y-axes and violin plots correspond to bias, defined as the estimated conditional sample mean minus the true conditional sample mean. The right y-axes and diamonds correspond to 95% confidence interval (CI) coverage. The red diamond corresponds to the direct-transmission-only variance estimator and the blue square corresponds to the latent-variable-dependence variance estimators. The first column corresponds to all individuals in the population having the same set probability of vaccination. The second column corresponds to the shift in log-odds of the predicted probability of vaccination for each individual. The proportion vaccinated in the observed data was 30%. A: Network-TMLE with both nuisance models correctly specified. B: Network-TMLE with the exposure model misspecified. C: Network-TMLE with the outcome model misspecified. D: Network-TMLE with a flexible specification of W^s.

Target maximum likelihood estimation for vaccination and infection, and the eX-FLU network restricted by degree. The maximum degree for participants was restricted to be 22 or less. Left y-axes and violin plots correspond to bias, defined as the estimated conditional sample mean minus the true conditional sample mean. The right y-axes and diamonds correspond to 95% confidence interval (CI) coverage. The red diamond corresponds to the direct-transmission-only variance estimator and the blue square corresponds to the latent-variable-dependence variance estimators. The first column corresponds to all individuals in the population having the same set probability of vaccination. The second column corresponds to the shift in log-odds of the predicted probability of vaccination for each individual. The proportion vaccinated in the observed data was 35%. A: Network-TMLE with both nuisance models correctly specified. B: Network-TMLE with the exposure model misspecified. C: Network-TMLE with the outcome model misspecified. D: Network-TMLE with a flexible specification of W^s.

Proposed diagnostic plots for policies for a network. A: observed distribution of Âs by individual’s A for 500 individuals. B: distribution of Âs under a well-supported policy. C: distribution of Âs under a poorly supported policy. Here, the network has a maximum degree of six. Since the policy in C has little-to-no support, estimation of the latter policy should be avoided, or recognize that results are highly dependent on extrapolations from the nuisance models.