This study estimates the overall effect of two influenza vaccination programs consecutively administered in a cluster-randomized trial in western Senegal over the course of two influenza seasons from 2009–2011. We apply cutting-edge methodology combining social contact data with infection data to reduce bias in estimation arising from contamination between clusters. Our time-varying estimates reveal a reduction in seasonal influenza from the intervention and a nonsignificant increase in H1N1 pandemic influenza. We estimate an additive change in overall cumulative incidence (which was 6.13% in the control arm) of −0.68 percentage points during Year 1 of the study (95% CI: −2.53, 1.18). When H1N1 pandemic infections were excluded from analysis, the estimated change was −1.45 percentage points and was significant (95% CI, −2.81, −0.08). Because cross-cluster contamination was low (0–3% of contacts for most villages), an estimator assuming no contamination was only slightly attenuated (−0.65 percentage points). These findings are encouraging for studies carefully designed to minimize spillover. Further work is needed to estimate contamination – and its effect on estimation – in a variety of settings.

Influenza is a seasonal respiratory infection that causes a substantial global burden of morbidity and mortality, particularly among children. One meta-analysis estimated that in 2018 the global burden of influenza among children under 5 was 109.5 million influenza episodes, 870,000 hospital admissions for influenza virus-associated acute lower respiratory infection, and between 13,200 and 97,200 deaths (

The study that produced the data analyzed in this paper was a cluster-randomized trial of 20 villages in the Niakhar Demographic Surveillance System (DSS) zone. Villages were assigned to vaccination of children with either inactivated trivalent influenza vaccine or an inactivated polio vaccine as an active control. There is no national recommendation for routine influenza vaccination in Senegal, hence off-study vaccination was expected to be minimal. The trivalent influenza vaccine has been shown to be efficacious in reducing influenza infection in children in other settings (

The total and overall effects are of interest scientifically because of the presence of interference in infectious disease processes. Interference —when one person’s treatment can affect another’s outcome—is both a boon to disease prevention and a classic inferential problem in infectious disease research. The benefit: the very nature of the process induces dependence between people’s outcomes, and treating one person may prevent another’s infection. The drawback: observations are no longer independent, and most mainstream causal inference tools cannot account for the induced dependence. The main approach to dealing with interference is to use cluster-randomized trials (CRT), which allow for dependence within cluster. The assumption of no interference that would be made in a traditional individually-randomized controlled trial is thus weakened to

Typical methods for estimating the overall effect assume partial interference (e.g.,

This paper continues as follows. In

The data were collected in a cluster-randomized clinical trial conducted in the Niakhar Demographic Surveillance System (DSS) zone from 2009–2011. Among thirty villages in the Niakhar DSS zone, twenty were selected as clusters for inclusion in the trial and randomized in a 1:1 ratio to receive a blinded vaccination campaign of either inactivated trivalent influenza vaccine (TIV) or inactivated poliovirus vaccine (IPV) as an active control. From here on, villages that received TIV will be referred to as “treated” and those that received IPV as “control”. The same villages were followed for two influenza seasons (2009–2010 and 2010–2011). Different formulations of trivalent influenza vaccine were given during the two years; the second formulation included the H1N1 2009 “swine” pandemic strain of influenza, but the first formulation did not. A map of the twenty villages analyzed is included in

Within each treatment group the goal was to vaccinate up to 5,000 children 6 months to 10 years of age in the following approximate numbers per age-group: 1,270 children 6–35 months of age; 2,835 children 36 months to 8 years of age; and 895 children 9–10 years of age. Vaccinees received age-specific doses. In villages assigned to receive influenza vaccine, 3,906 (78.1% of target number for vaccination) were vaccinated with Dose 1, while 3,843 (76.9% of the target) of those in control villages were vaccinated with IPV. These numbers comprised 66.6% and 66.2% of age-eligible children, respectively.

The primary outcome of the study was laboratory-confirmed symptomatic influenza infection. A combination of active and passive surveillance was used for the primary outcome in the Niakhar DSS zone. In this geographic area, residences are organized in compounds, clusters of dwellings typically housing an extended family. For the twenty villages randomized in the study, field workers visited compounds on a weekly basis to inquire about the occurrence of influenza symptoms. If the person had experienced influenza-like illness in the past 7 days, then the field worker consented them into the surveillance study and documented symptoms and epidemiologic data. Influenza-like illness was defined as follows: (1) among children under 2 years of age, the sudden onset of fever (^{◦} C axillary) or subjective (parent-reported) feverishness, plus at least 1 other symptom (cough, sore throat, nasal congestion, rhinorrhea, or difficulty breathing), and (2) among individuals 2 years and older, the sudden onset of fever (^{◦}C axillary) or subjective (parent- or participant-reported) feverishness, plus either a cough or sore throat. Cases of influenza-like illness were reported to the study center, and nasal and throat swab specimens were collected. In addition, individuals seeking medical care at any of the three Niakhar DSS health posts at any time throughout the year were assessed by health post medical staff or a study physician to determine if the person had influenza-like illness. These individuals were consented into the surveillance study, their symptoms were documented, and nasal and throat swab specimens were obtained for influenza testing.

When individuals with influenza-like illness enrolled into the surveillance study, they also responded to a survey about their travel and social contact patterns during the prior three days. The contact survey defined a “contact” as a conversation occurring between two people in the same location. The contact survey collected numbers of contacts in various locations at two time points (AM and PM) for three consecutive days: the survey day and the two prior days. Numbers of contacts recorded on the survey day are subject to truncation bias because most surveys were administered in the morning and exclude contacts occurring after the time of the survey. Contact patterns for asymptomatic participants are included in the data since some participant’s symptoms began on the day of or the day before the survey. For each day, the respondent provided the number of people she contacted in her own compound in the morning and the afternoon/evening. In addition, she indicated yes or no to whether she had visited a list of locations: another compound (up to five could be identified in the survey), a market, mosque or church, field, school, sports field or public place, outside the study zone, or another location. For each location visited, the village identification code (and compound identification number, where applicable), the time of day visited (AM, PM, or both), and the number of persons the respondent spoke with during the visit were recorded. For additional details, refer to the example survey form in the

Village of residence was recorded during quarterly censuses conducted by the Niakhar DSS (

In this paper, we consider two estimators for the overall effect of influenza vaccination relative to polio vaccination. The first estimator assumes partial interference (i.e., no contamination), and we refer to it as the

To account for contamination, we use the method developed in _{j} is a binary treatment indicator for cluster _{j} is the total percentage of contacts of susceptibles in cluster _{j} is a cluster-level variable, but the model is an individual-level model, with individuals in the same cluster taking the same value for _{j}.

The coefficient of interest in the additive hazards model—corresponding to the treatment variable—is potentially time-varying. For this reason, we report both that coefficient (visually) and the difference in cumulative hazard of influenza due to the treatment. Because the cumulative hazard is low, this is approximately equal to the difference in cumulative incidence due to treatment. The time-varying coefficients are visualized by displaying the value of their integrals,

The estimand of interest, which we will denote

This additive hazards model for interference has a natural correspondence to a compartmental epidemic model such as an SIR model (Susceptible-Infectious-Recovered; see, e.g.,

Although the hazards are permitted to be time-varying, the model assumes identical hazards for different individuals (with the same attributes) at a given time point. As such, the model does not take into account the differences in individual hazards due to different numbers of infections among neighbors at that time point. Survival analysis models applied in influenza vaccine trials typically make this assumption (

While Cox regression is frequently used for survival analysis, the Cox proportional hazards model does not share this natural correspondence to epidemic compartmental models. Another advantage that the additive hazards model has over the proportional hazards model is collapsibility, which implies that the treatment effect is the causal effect of interest whether or not covariates are included in the model. A drawback of the additive instead of proportional hazards model is that the estimated hazard, or the lower limit of its confidence interval, is not mathematically restricted to be nonnegative. However, we did not observe negative hazard estimates or negative lower bounds for the confidence interval in the models that we fit.

Analyses were performed separately for Year 1 and Year 2 of the study. Inputs to the additive hazards model are the time to event (or censoring) for each person, infection status, and the percentage of contacts to treated clusters. Calculation of time-to-event for each survey year is described in detail in the

The treatment exposure value for village _{j}, is the proportion of contacts that susceptible people in village

We analyze a single contact survey per participant and restricted analysis to 3,758 contact surveys that were submitted between August 1, 2009 to February 1, 2010 because this subset had been previously cleaned and analyzed extensively (_{j}, and we will denote our estimator for it by

Our treatment exposure estimates take into account the percentage of contacts reported while the respondent was visiting treated villages (

For each village, we calculate the percentage of contacts reported while respondents from that village were located in treated villages. The denominator is the sum of contacts reported by village residents; the numerator is the sum of those contacts whose reported location was a treated village. Contacts reported to villages that are not in the trial are included in the denominator and are treated the same as contacts to control villages. The numerator includes contacts reported in the respondent’s own compound if the respondent was a resident of a treated village. For participants who moved mid-study, the village of residence is the reported village of residence at the time of the contact survey.

We initially calculated treatment exposure rates using reports by asymptomatic people only, assuming that this would be more representative of behavior when uninfected and that the symptomatic people would travel less. We compared these to the estimates based on reports by symptomatic people and (counterintuitively) found that symptomatic reports included slightly higher rates of contacts to clusters of the opposite treatment assignment (

The above approach assumes that the location of a contact reported by the respondent indicates the residence of the person contacted. As such it does not account for visitors to one’s compound from a cluster of the opposite treatment assignment, so may underestimate cross-cluster exposure. To incorporate exposure from visitors into the estimate, we will define some notation and first consider the estimates for people living in control clusters. Suppose there are _{j} people living in cluster _{i} denote the number of contacts reported by person _{i} denote the number of contacts person

We need to update the numerator to include contacts occurring within the respondent’s own compound to visitors from other clusters. We can use estimates reported by these visitors, rather than by respondents in cluster _{T,j} denote the total number of contacts reported by people in any treated cluster during their visits to compounds in cluster

The rationale for this adjustment is explained in detail in

An analogous update is needed for residents of treated clusters. For these respondents we need to account for visits from members of control clusters. Letting _{C,j} denote the total number of contacts reported by people in any control cluster during their visits to compounds in cluster

The submitted contact surveys had a large number of missing fields, which, if not modelled appropriately, could create bias in the estimates of cross-cluster exposure. For locations visited outside the home two days before the survey, 24% are missing time of day, 59% are missing the number of people contacted, and 32% do not have a village number recorded. Missing data was slightly less on the day before the survey (55% missing the number of people contacted and 29% missing the village number), but contamination estimates were similar, generally ranging from 0–3% per village. We used data from two days before due to the higher number of reports from asymptomatic people at that time point (24.6% vs. 1.4%). The survey design elicited at-home contacts differently than those that occurred outside the home: the numbers contacted at home in the morning and in the afternoon/evening were recorded, so village and time point were not collected as separate variables. Furthermore, in 60% of analyzed surveys, the number contacted at home in the morning was missing.

We used multiple imputation, expanding on the procedure used in another analysis of this data set (

We created twenty imputed data sets, calculated percentages of contacts to treated clusters for each village in each of these imputed data sets, and combined the percentages using standard rules for combining multiply imputed data (

Our estimated time-varying treatment effects (both unadjusted and contamination-adjusted) are displayed in

The overall incidence rates are displayed in

Our two estimators and confidence intervals are similar, but the no-contamination estimators are slightly attenuated because they assume no mixing between clusters of opposite treatment assignments. The confidence intervals for the contamination-adjusted estimator are slightly wider, reflecting the loss of information caused by contamination, but again, are similar. For Year 1 both effects are not statistically significant when all infections are included but achieve significance (barely) when A/H1N1pdm09 infections are excluded. The Year 2 estimates are statistically significant. The Year 2 estimates are interpreted differently as they cover different time intervals; a higher difference in cumulative incidence is expected for the longer interval if vaccine performance stays the same. While bias from the strike starting Jan 1, 2011 does not impact the Year 2 estimate censored at that date, the uncensored one could be biased. The rates of reporting infections during health post visits (as opposed to household visits) were 12.5% in treated villages and 17% in control villages, so the vaccine effect could be overestimated by including a time interval with only health post visits. Because the rates are similar, and because the strike lasted 49 days of a 320-day follow-up period, the bias is likely low.

We also perform a simulation study to demonstrate the potential impact of using the contamination-adjusted estimator in settings with higher rates of contamination across communities. The simulation study is similar to the one conducted in _{0} of 2.4 in an unvaccinated population. Each village in each simulation has a small random pertubation added to the infectiousness parameter to encourage stochastic variation across clusters and simulations. Individuals have an average of 16.5 face-to-face contacts per day because a previous analysis of the network data collected in this study found a lower bound of 16.5 face-to-face contacts per day for asymptomatic individuals and 15 for symptomatic individuals (

The outcome measure is the cumulative incidence of influenza after 60 days. The contamination-adjusted estimator attempts to recover the difference in cumulative incidence that would be observed between the two arms of the study if there were truly no contamination across clusters. The “true value” for this estimand is found via simulation, based on 2000 replications of the epidemic process in a population with two fully distinct clusters. This gave a reduction in cumulative incidence of 8.3 percentage points in the treated cluster relative to the control. For simulations with interacting clusters, we perform 250 replications of the epidemic process and compute both the contamination-adjusted estimator and the no-contamination estimator (which wrongly assumes there was no contamination between clusters).

The simulation study results in

We have applied novel statistical methodology to estimate the overall effect of a trivalent influenza vaccine program in Niakhar, Senegal. This method incorporates social contact data together with treatment and infection data to reduce the bias in this estimate caused by interference between clusters. Ours is the first study we know of applying this novel method to contact and infection data collected jointly in a clinical trial setting. We produce the first estimates of contact rates between clusters of opposite treatment assignments for this trial and the first, to our knowledge, in Senegal. Our results provide insight into the extent to which the standard assumption of partial interference is violated in a trial of this structure and of the impact of this violation on estimates.

Our time-varying effect estimates show that in Year 1 of the study, the treatment program – vaccination of children – reduced lab-confirmed symptomatic infection with seasonal influenza in the community. Our estimates found the treatment program to be associated with a small (though statistically insignificant) increase in infections with A/H1N1pdm09 influenza. While other studies have found evidence for this relationship (

The extent of contamination measured in our data resulted in little difference between the cumulative incidence for the estimator adjusting for contamination and the one assuming no contamination. The latter was smaller because, as has been found in other studies, contacts to members of clusters of the opposite assignment attenuate the estimate of the overall effect from what it would have been with no contamination (

The level of contamination in the data was fairly small: the percent of contacts to clusters of the opposite treatment was between 0% and 3% for most villages, although there were some outliers, with 14% being the largest observed value. To our knowledge, these are the first data-based contamination estimates of this type for Senegal. Our finding that this amount of contamination has a negligible impact on the effect estimate may be encouraging for researchers who carefully define cluster selection to minimize contamination, as was done in this study. As contact and travel patterns can vary substantially between cultures and contexts, our estimates may not generalize to other geographic areas, so further measurement of contamination is recommended.

Our study has several limitations. First, convenience sampling was used in collecting contact and travel data. Instead of random sampling, participants with ILI were surveyed during household surveillance visits, and their responses were used to estimate the percentage of contacts that susceptible individuals made to treated clusters. Information on contact patterns prior to symptom onset suggest that contact patterns while symptomatic vs. asymptomatic do not differ substantially. However, in future surveys, random sampling of susceptible individuals is recommended to ensure a representative sample.

Second, the extent of missing data in the contact survey is substantial. As noted previously, for locations visited outside the home two days before the survey, 24% are missing time of day, 59% are missing the number of people contacted, and 32% do not have a village number recorded. We used multiple imputation to adjust for missing data. Simulations have shown that multiple imputation can yield unbiased results even when the proportion of missing data is as high as 90%, as long as the imputation model is correctly specified and the data are Missing At Random (MAR) (

A second limitation of the contact survey is that contacts were reported separately for morning and afternoon time intervals without recording the extent of overlap. Because morning and afternoon contamination estimates were similar, either is likely a reasonable approximation to the percent of contacts to clusters of the opposite assignment during a full day. However, it would be preferable to record numbers of contacts throughout the entire day in future studies. We also note that contacts recorded on the day of the survey did not contribute to analysis since truncation bias arose from the fact that most surveys were conducted in the morning. A diary-based approach would avoid this problem, or if interviews are conducted, they should focus on days before the survey day. The literacy level of the population of interest should be considered in choosing the optimal approach to collect contact data.

Finally, the type of contacts recorded in our study emphasize transmission via large droplets (in close proximity) rather than by aerosol droplets which have a longer range. While many studies have investigated the importance of fomite transmission, physical contacts, small droplets, and aerosol droplets for transmission, their relative importance is not well understood (

We also recommend collection and estimation of cross-cluster contamination for different types of contacts (e.g., physical contacts, sexual contacts), for various definitions of clusters in various settings. These estimates can be used to inform future trial designs, choose whether the method we have applied would be better than one which does not adjust for contamination, and ultimately improve the accuracy of vaccine effectiveness and standard error estimates.

We are grateful to all the families who participated in this trial and to the full research teams at Institut de Recherche pour le Développement and Institut Pasteur de Dakar in Senegal. We appreciate helpful comments from two anonymous reviewers on this manuscript. We would like to acknowledge funding received from NIH/NIAID R01 AI085073 (PI Michael Hudgens), NIH/NIAID R37 AI032042 (PI M. Elizabeth Halloran)), NIH/NIGMS U01-GM070749 (PI M. Elizabeth Halloran), and 1U01IP000174 (CoAg between CDC and PATH, PI John Victor) funded by National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention.

Map of the twenty villages included as clusters in the influenza vaccine trial.

Panel A shows the estimated effects of the influenza vaccination program for Year 1 (July 2009 - May 2010) of the study. Shading shows 95% confidence intervals. Panel B shows incidence of influenza infections by time and type. Panel C shows the estimated effects of the influenza vaccination program during Year 1 on symptomatic infection with seasonal influenza strains (A/H3N2 or B).

Panel A shows the estimated effects of the influenza vaccination program for Year 2 (July 2010 - May 2011) of the study. Panel B shows incidence of influenza infections by time and type.

Panel A shows the value of the contamination-adjusted (red solid line) and no-contamination (blue dashed line) estimators and associated 95% confidence intervals across values of cross-community contamination. The horizontal line shows the true value of the estimand. Because of the substantial overlap in confidence intervals, the lines are shifted slightly for visibility, but contamination rates were at 5% intervals. Panel B shows the root mean squared error of the estimator (with respect to the true difference in cumulative incidence in the absence of contamination of −0.083).

Percentages of contacts with residents of treated clusters based on (1) contacts reported while located in treated clusters, (2) contacts in the respondent’s own compound to visitors from clusters of the opposite treatment assignment, and (3) total percentages of contacts to residents of treated clusters (treatment exposure).

Village | Treatment Assignment | Percent reported in treated clusters | Percent from visitors | Treatment exposure _{j} |
---|---|---|---|---|

Kalome Ndofane | Vaccine | 100 | 0 | 100 |

Ngayokheme | Vaccine | 99 | 0 | 99 |

Ndokh | Vaccine | 99 | 1 | 99 |

Ngangarlame | Vaccine | 99 | 0 | 99 |

Diohine | Vaccine | 99 | 0 | 98 |

Mokane Ngouye | Vaccine | 99 | 1 | 98 |

Nghonine | Vaccine | 98 | 2 | 96 |

Logdir | Vaccine | 95 | 2 | 93 |

Darou | Vaccine | 96 | 5 | 90 |

Poudaye | Vaccine | 93 | 2 | 90 |

| ||||

Ngalagne Kop | Control | 0 | 0 | 0 |

Mboyene | Control | 0 | 0 | 0 |

Poultok Diohine | Control | 0 | 0 | 0 |

Bary Ndondol | Control | 0 | 1 | 1 |

Toucar | Control | 1 | 0 | 1 |

Gadiak | Control | 2 | 0 | 2 |

Godel | Control | 2 | 0 | 2 |

Khassous | Control | 3 | 0 | 3 |

Kothiok | Control | 3 | 0 | 3 |

Meme | Control | 14 | 0 | 14 |

Incidence of influenza by treatment group and study year.

Study Year | Treated | Control | All |
---|---|---|---|

Year 1, all infections | 999/18200 (5.49%) | 1076/17550 (6.13%) | 2075/35750 (5.8%) |

Year 1, excluding A/H1N1pdm09 | 630/18200 (3.46%) | 833/17550 (4.75%) | 1463/35750 (4.09%) |

Year 2, all infections | 224/18547 (1.21%) | 341/17815 (1.91%) | 565/36362 (1.55%) |

Estimated difference in cumulative incidence of influenza (measured in percentage points) due to the influenza vaccination program.

Contamination-Adjusted | No-Contamination | |||
---|---|---|---|---|

Study Year | Estimate | 95% C.I. | Estimate | 95% C.I. |

Year 1, all infections | −0.68 | [−2.53, 1.18] | −0.65 | [−2.40, 1.09] |

Year 1, excluding A/H1N1pdm09 | −1.45 | [−2.81, −0.08] | −1.35 | [−2.64, −0.06] |

Year 2 (July - Dec 2010) | −0.59 | [−1.01, −0.17] | −0.59 | [−0.99, −0.19] |

Year 2 (July 2010 - May 2011) | −0.73 | [−1.16, −0.31] | −0.73 | [−1.14, −0.32] |