Studies of social networks provide unique opportunities to assess the causal effects of interventions that may impact more of the population than just those intervened on directly. Such effects are sometimes called peer or spillover effects, and may exist in the presence of interference, i.e., when one individual’s treatment affects another individual’s outcome. Randomization-based inference (RI) methods provide a theoretical basis for causal inference in randomized studies, even in the presence of interference. In this article, we consider RI of the intervention effect in the eX-FLU trial, a randomized study designed to assess the effect of a social distancing intervention on influenza-like-illness transmission in a connected network of college students. The approach considered enables inference about the effect of the social distancing intervention on the per-contact probability of influenza-like-illness transmission in the observed network. The methods allow for interference between connected individuals and for heterogeneous treatment effects. The proposed methods are evaluated empirically via simulation studies, and then applied to data from the eX-FLU trial.

The novel coronavirus disease 2019 (COVID-19) can be spread through person-to-person contacts (

This question is not unique to the study of coronavirus. Public health interventions are often designed to disseminate benefits to more of the population than is treated directly (

As a motivating example, consider the eX-FLU trial (

In general, RDS samples will be non-representative of the target population and will exhibit dependence among individuals in the sample (

Many of the existing methods for causal inference in the presence of interference assume either that interference is restricted to disjoint subgroups (

The outline of the remainder of this paper is as follows.

The eX-FLU trial was designed to study the efficacy of a three-day social distancing intervention for prevention of ILI transmission in a network of students at a large university. Social distancing (isolation) aims to reduce contact between individuals in an effort to limit opportunities for disease transmission. The eX-FLU experiment followed the students over a ten-week period during an influenza season to track ILI transmission (

Consider an experiment such as eX-FLU, conducted on _{i} equal 1 if individual _{1}_{n}) represent the vector of observed intervention assignments. Let _{1}_{n}) given the randomization design. For example, eX-FLU employed a cluster randomized design, in which clusters (or groups) of students were defined by subdivisions of student residence halls, and each cluster of individuals was randomly assigned to the intervention or control condition. By design, exactly _{i} = _{j} for all individuals

Suppose the study participants were then followed for _{it} denote the observed outcome for individual _{t} = (_{1t}_{nt}) be the observed outcomes for all _{i0}. In eX-FLU, 25 individuals had ILI at baseline (13 cases in the intervention group).

The eX-FLU intervention was not intended to prevent ILI in the person assigned to the intervention; instead, this study investigated the spillover effect of social distancing on at-risk peers. Therefore, information on social contacts between study participants was also collected at each of the _{ijt} = 1, if either individuals _{ijt} = 0. Contacts are assumed to be symmetric (i.e., _{ijt} = _{jit} for all _{iit} = 0 for all _{t∈{0,…,τ}}
_{ijt} = 1 for individuals

Effects of the intervention can be represented by comparisons of potential outcomes (_{it}(_{it}(_{it}(_{it} is observed and the remaining become counterfactual.

Inference about intervention effects generally requires some assumptions about the structure of the interference (

Randomization-based tests permit inference about experiments based on the exact (i.e., finite sample) distribution of a given test statistic under a specific hypothesis about the intervention effect (_{0}, a test statistic _{0}. In addition to being a function of the randomization vector _{0} : _{it}(_{it}(_{0}, _{it}(_{it} for all _{it}(_{0}. Assuming values of the test statistic far from zero provide evidence against the null hypothesis, the two-sided p-value,

An RI test can assess the presence of an intervention effect without any structural assumptions about the effect. Different possible test statistics are considered below. For example, the test statistic may equal an estimator of the average intervention effect under some working model. However, the validity of the RI framework for hypothesis testing does not rely on correct specification of the working model; that is, type I error control is guaranteed under the null even if the working model is mis-specified (

For instance, the test statistic could be based on the maximum likelihood estimator (MLE) of the parameters of a logistic regression model. One such working model is logit{_{it})} = _{0}+_{1}(Σ_{j}
_{ij,t−1}_{j})_{j}
_{ij,t−1}), for _{1} in this model quantifies the effect on the outcome at time _{it})} = _{0}+_{1}(Σ_{j}
_{ij,t−1}_{j,t−1}_{j})_{j}
_{ij,t−1}_{j,t−1}), where _{1} quantifies the effect of contacts with ILI being assigned to the intervention. These models can be used to calculate the Wald test statistics _{1} and _{1} respectively under the working logit model, _{1} and _{1} are zero if there is no effect.

The test statistic could also be based on an estimator for the average difference in the number of ILI infections that could be attributed to individuals in the intervention and control groups. Let _{it} = (Σ_{j}
_{ijt}_{jt})_{j}
_{ijt}) be the proportion of person _{3}(_{j,t}
_{jt}_{j})_{j,t}
_{j})−{Σ_{j,t}
_{jt}(1−_{j})}_{j,t}(1−_{j})}, where Σ_{j,t} denotes _{4}(_{j,t}
_{jt}_{j}_{j,t−1})_{j,t}
_{j}_{j,t−1}) − {Σ_{j,t}
_{jt}(1 − _{j})_{j,t−1}}_{j,t}(1 − _{j})_{j,t−1}}. Since statistics _{1} and _{3} use information from all participants, regardless of ILI status, these statistics may capture transmission from asymptomatic or unreported cases. However, asymptomatic individuals in the intervention group were not encouraged to socially distance themselves, suggesting _{1} and _{3} may have diminished power. On the other hand, statistics _{2} and _{4} utilize information from ILI cases to measure the effect of the intervention in contacts of individuals known to have ILI.

For longitudinal studies, the intervention effect can be assessed by considering all available time points simultaneously, as in the test statistics above. Alternatively, a pairwise analysis may be conducted that separately investigates each pair of consecutive time points. A pairwise approach may be preferred if the intervention effect changes over time. While the test statistics defined above incorporate information from all

Randomization tests of the sharp null hypothesis provide information about the presence of an intervention effect, but do not provide information about the magnitude of such an effect. Point estimates and confidence intervals (CIs) of the intervention effect may also be desired. In the absence of interference, a common approach to constructing CIs entails inverting randomization tests. This approach typically relies on a constant treatment effect assumption (_{i}(0) and _{i}(1), then the additive treatment effect assumption supposes _{i}(0) = _{i}(1)+

Unlike the deterministic potential outcome model in _{it}(

The stochastic potential outcome model is parameterized by ILI transmission probabilities from sources inside and outside of the observed network. Let _{1} denote the per-contact probability of ILI transmission from an individual assigned to the social distancing intervention to another individual. Similarly, define _{0} as the per-contact transmission probability from an individual not assigned to the intervention. Additionally, let _{0}, _{1}, _{0}, _{1}, ^{3}}.

The transmission probability model can be tailored to the study design and subject matter knowledge. For instance, the model can incorporate dynamics specific to transmission of the outcome under study, such as pathogen-specific latency or infectious periods. Since ILI is caused by multiple viruses rather than a single pathogen, the transmission probability model used in the eX-FLU analysis assumes that individuals remain susceptible to ILI even after previous infections. Therefore, individuals with ILI at week

The per-contact probability of transmission depends on multiple factors, including the social contact network, ILI status, and intervention assignment. Let _{ijt} be the (unobserved) indicator of whether person _{ijt} = 1) = _{ijt}(_{ijt}(_{j,t−1}(_{ij,t−1}{_{j}_{1} +(1−_{j})_{0}}. Note that if individual _{it} be the (unobserved) indicator that person _{it} = 1) =

Only one individual needs to successfully transmit ILI to individual _{it}(_{i1t}_{int}_{it}}. Assume _{i1t},…, _{int} and _{it} are mutually independent given the outcomes at the previous time point, such that
_{t} ⫫ {_{t−2}_{1}}|_{t−1} for all

Confidence regions for _{0} : _{2} denotes the Euclidean norm. Since this _{0} is not a sharp hypothesis due to the randomness of the potential outcomes, the sampling distribution of the test statistic can no longer be constructed through enumeration for each _{0} : _{5} is evaluated. Then, the p-value

The intervention effect can be defined by a contrast of transmission probabilities; the per-contact risk difference _{θ} = _{1} − _{0} is the focus below. A point estimate for _{θ} based on the MLE for _{θ} is _{θ}. Determining the endpoints of this CI may be computationally challenging in practice. A computationally efficient stochastic search procedure for approximating the CI endpoints is described in the Appendix.

The RI inferential methods presented above were evaluated via simulations designed to emulate the eX-FLU trial. Baseline networks were simulated with two different network models: an exponential random graph model (ERGM), and a scale-free model that allows for highly connected individuals or super-spreaders (

To emulate the cluster randomization design of eX-FLU, each of _{i0}, was assigned such that 25 of the _{it} for _{it}(_{0}_{1}, and

R code for these simulations is available at

The test statistics from _{θ} for values of

Power results are shown in _{2} and _{4} were more powerful than _{1} and _{3}, demonstrating that contacts without ILI at the previous time period were not informative about the intervention effect. Test statistics _{2} and _{1} also tended to be more powerful than _{4} and _{3}, respectively, suggesting that utilizing a working logistic model may be preferable for assessing intervention effects in settings such as eX-FLU. All four statistics demonstrate type 1 error control, as is guaranteed by RI.

On the other hand, type 1 error control would not be expected in this setting if instead standard methods were employed that assume the observations are independently and identically distributed. To illustrate, the scale-free model simulations described above were repeated under the null hypothesis _{θ} = 0. For each simulated data set, whether to reject the null was determined by naively assuming the Wald statistics _{1} and _{2} follow standard Normal distributions. The results in

For the transmission probability model in _{θ} under various combinations of the true data generating parameter

Empirical coverage of the CI for _{θ} and bias of the estimator _{5} to reject the hypothesis _{0} : _{θ} = 0 was also evaluated. Results are presented in _{5} had moderate power for _{θ}| and controlled type 1 error, although power decreased as

In this section, data from the eX-FLU study is used to illustrate how the methods presented above can be used to draw inference about the effect of the social distancing intervention in the observed network of students. Baseline ILI information was collected in the initial week of the study, after randomization but before the intervention could plausibly spillover to others. Over the ten-week study period (

Different contact types may experience different intervention effects. For instance, since the eX-FLU intervention asks students with ILI to isolate in their dorm room, the effect of the intervention on roommates and non-roommates may differ. The analysis presented here considers the non-roommate and classmate (NC) network, defined by all self-reported or classmate contacts that were not between roommates. The NC network included

Since the primary results from the eX-FLU trial have not yet been published, the ILI outcome is not currently available for analysis. Therefore, ILI outcomes were simulated according to the transmission probability model in

_{1}_{2}_{3}, and _{4} along with the corresponding sampling distributions and p-values for each test based on 1000 randomly sampled _{2} and _{4} indicate strong evidence for an intervention effect (_{1} and _{3} also suggest a possible intervention effect, although these p-values are larger. These results are consistent with the empirical finding in _{2} and _{4} tend to be more powerful than _{1} and _{3}. Fitting the transmission probability model from _{θ} = −0.15. These results demonstrate that the proposed methods can be used to detect and accurately quantify intervention effects in trials such as eX-FLU.

Experiments conducted on social networks create an opportunity to study spillover effects. In such settings, RI methods are valid even in the presence of interference and non-random sampling from a target population. In this article, RI methods were developed for analysis of the eX-FLU trial. Randomization tests were used to assess the null hypothesis of no intervention effect, and a stochastic potential outcomes transmission probability model was proposed to construct point estimates and confidence intervals of the magnitude of the intervention effect. The proposed methods allow for interference between individuals and do not assume constant treatment effects. While motivated by the eX-FLU trial, the transmission model may be tailored to other settings based on existing subject matter knowledge, such as information about latency periods or immunity for the disease pathogen of interest.

There are several other possible analyses of the eX-FLU trial. The study investigators collected additional contact information on a subset of participants via the iEpi smartphone application. The iEpi app uses Bluetooth location information to detect potential interactions between substudy participants. The app then sends prompts to the participants to collect information about the context surrounding the interaction. The contact information provided by iEpi is expected to be more accurate than the weekly self-reported contact questionnaires used in the above analysis. Future work may consider using iEpi contact information to improve inference about the eX-FLU intervention effect. Additionally, the transmission probability model considered here does not incorporate individual susceptibility characteristics, such as receipt of influenza vaccine or hand hygiene habits. Future research could incorporate individual-level covariates that may affect infection susceptibility. One such approach could entail randomization tests of residuals based on regression models of the outcome on baseline covariates under the null hypothesis of no treatment effect (

Randomization-based inference is appealing in that hypothesis testing is exact, i.e., type I error control is guaranteed. On the other hand, adequate statistical power is not guaranteed, and thus the choice of test statistic may be consequential. A test with lower power can fail to detect intervention effects when present, and if inverted can lead to large, uninformative confidence regions. Therefore it is important to select powerful test statistics in practice; as in

Likewise, the form of the stochastic potential outcome model assumed can also have a substantial impact on inferences drawn about a particular data set. Therefore in practice it is important to consider assessment of model fit and robustness of results to the assumed model. For certain test statistics, a randomization test can be viewed as simultaneously assessing the plausibility of both the null parameter value and the assumed model given the observed data (

As any model is an over-simplification to some extent, in practice considering multiple models may provide greater insight than inferences based on a single model (

Increasingly, public health interventions are being designed to impact more of the population than is intervened on directly. The continued development of causal methods for inference about spillover effects of interventions within networks are therefore important to understanding public health and policy implications of interventions.

The authors thank the the Editor, Associate Editor, an anonymous reviewer, and the Causal Inference with Interference working group in the Biostatistics Department at UNC-Chapel Hill (Bryan Blette, Kayla Kilpatrick, Taylor Krajewski, Sam Rosin, Bonnie Shook-Sa, Jaffer Zaidi) for their helpful suggestions. Dr. Aiello acknowledges funding by the Centers for Disease Control and Prevention (U01CK000185), National Institute of Health (R01 EB025021), and National Science Foundation (NSF 20–4354). This research was also supported by NIH grant R01 AI085073 and by a Gillings Innovation Laboratory award from the UNC Gillings School of Global Public Health. The content is solely the responsibility of the authors and does not represent the official views of the National Institutes of Health.

SUPPORTING INFORMATION

The data that support the findings in this paper are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

The following procedure is used to conduct a stochastic search for the minimum and maximum _{θ} values in

The first step involves finding the values of

In the second step, monotone splines are used to estimate the upper and lower bounds of _{U}(_{U} as the unique value of _{U}(_{U}) = _{L}(_{L} be the unique value of _{L}(_{L}) = _{θ} is then (_{L}_{U}).

This procedure for estimating the CI endpoints is illustrated using a single data set in

Graphical representations of the eX-FLU randomization design. Each node represents one individual, and an edge between nodes indicates reported contact between individuals in at least one week of the study. Darker shades of each color represent cohort participants in the intervention group, and lighter shades represent participants in the control group. Figure (a) shows a random sample of 30% of the network edges in a layout where proximity between nodes is based on relative geographic location of participant residence halls, which were used as clusters for randomization of the intervention. Figure (b) shows 100% of network edges arranged in a layout that bases node proximity on frequency of reported contact using the igraph R package (

Empirical power for test statistics _{1}_{2}_{3}, and _{4} under the scale-free model with _{0} and _{1}. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.

Reported ILI in each of three weeks of the eX-FLU NC network. Nodes are colored red if the participant reported ILI during the specified week of the study period. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.

Sampling distribution and RI p-value (_{1}_{2}_{3}, and _{4} based on the NC network in the eX-FLU trial. The observed test statistic is indicated by the vertical dashed line.