Maps of the distribution of medically-important ticks throughout the US remain lacking in spatial and temporal resolution in many areas, leading to holes in our understanding of where and when people are at risk of tick encounters, an important baseline for informing public health response. In this work, we demonstrate the use of Bayesian Experimental Design (BED) in planning spatiotemporal surveillance of disease vectors. We frame survey planning as an optimization problem with the objective of identifying a calendar of sampling locations that maximizes the expected information regarding some goal. Here we consider the goals of understanding associations between environmental factors and tick presence and minimizing uncertainty in high risk areas. We illustrate our proposed BED workflow using an ongoing tick surveillance study in South Carolina parks. Following a model comparison study based on two years of initial data, several techniques for finding optimal surveys were compared to random sampling. Two optimization algorithms found surveys better than all replications of random sampling, while a space-filling heuristic performed favorably as well. Further, optimal surveys of just 20 visits were more effective than repeating the schedule of 111 visits used in 2021. We conclude that BED shows promise as a flexible and rigorous means of survey design for vector control, and could help alleviate pressure on local agencies by limiting the resources necessary for accurate information on arthropod distributions. We have made the code for our BED workflow publicly available on Zenodo to help promote the application of these methods to future surveillance efforts.

Tickborne diseases have tripled in the last two decades and now make up more than 75% of reported vector-borne infections in the United States (

In addition to learning from existing data, a further use of tick distribution models is informing future surveillance and control efforts by anticipating the value of future sampling locations. For example, more fine-grained sampling might follow an initial surveillance effort focused on a subset of areas of potentially high risk (

Thanks to computational advances in recent decades, Bayesian inference has become popular for model fitting in ecology and epidemiology (

Implementing Bayesian Experimental Design (BED) involves three general steps (

In this work, we outline principles for how BED can be incorporated in spatiotemporal surveys to maximize the value of vector surveillance and control efforts, and we illustrate their use for an ongoing tick surveillance effort of South Carolina state parks and other public lands. We compare the ability of different search techniques to find survey schedules that maximize utility based on two design criteria tailored to different priorities of vector surveillance. In addition to informing future data collection efforts, we demonstrate how high utility designs can be further analyzed to provide novel insight into sources of uncertainty in tick distributions.

Data originated from an existing state-wide tick surveillance project in South Carolina city and state parks, beginning in March 2020. The project also included submissions from South Carolina animal shelters and citizen scientists, though these data were not used in this study. Here we used data from 2020 to 2021 state and city parks, with observations in 30 distinct parks spanning 26 counties from March to December. Sampling was relatively opportunistic; more visits took place between March and August compared to later months, and 4 parks were visited at least 10 times over the two years while 3 parks were visited just once. In total, 59 distinct visits occurred in 2020, and 111 visits occurred in 2021.

A scientific collecting permit from the SC Department of Parks, Recreation, & Tourism was secured for both years, and written permission was granted from the appropriate municipal government for city parks. The coordinates for each site were selected near the entrance to each park in a forested area for consistency. Tick collections were performed following recommended CDC Ixodidae surveillance guidelines (^{2} muslin cloth baited with 1.5-lb dry ice each were placed in parks along hiking and nature trails and left in the park for 1.5–2 h. Additionally, tick drags were performed along hiking and nature trails. Tick drags were constructed with a 1.22-m x 1.52-m white duck canvas attached to a 1.22-m wooden dowel, with zinc washers as weights on the bottom. Each collection visit consisted of ten tick traps and a 30 min timed tick drag (sixty 30-second segments to regularly check for ticks) to ensure that the recommended surface area of at least 750 m for host-seeking ticks was surveyed (

Ticks were processed at the Laboratory of Vector-Borne and Zoonotic Diseases at the University of South Carolina, where they were identified to species, sex, and life stage. Morphological identifications were conducted with multiple dichotomous keys (

Several meteorological and geographic variables were selected as potential covariates of tick occurrence based on tick ecology and previous modeling studies (

Complete mathematical definitions for the model specification and experimental design procedure are given in the

To find a model parsimonious with the collections data, 28 candidate models were constructed from simplifying different components of the full model. Each of the environmental, spatial, and temporal model components were considered either shared or different between tick groups, and linear or spline-based functions were considered for the environmental effects. Models were compared based on the Deviance Information Criterion, which measures a model’s goodness-of-fit to the data and robustness, while penalizing model complexity. The best performing model was used in all subsequent analyses.

Considering the limited capacity of vector control agencies and researchers, a reasonable surveillance strategy should consider the feasibility and convenience of sampling sites while allowing sufficient diversity to realistically be able to extrapolate to the region of interest. To maintain this balance, we restricted future sampling to a set of 57 sites on public land across South Carolina, of which the 30 sites visited in the initial data were a subset. These candidate sites included all 47 South Carolina state parks and historic sites, 6 locations within national parks and wildlife refuges, and 4 other locations present in the initial data. Collection visits were delineated monthly and could take place in any month, resulting in a space of 684 possible visits (i.e. month-location pairs) which may be added to a candidate survey.

In BED, potential outcomes (i.e. presence/absence of ticks) resulting from a proposed design are assigned a

Equipped with a predictive model and utility function, the space of possible designs can be searched to optimize utility (

In the model comparison study, the model with lowest DIC included linear environmental effects shared between species, and both spatial and temporal effects separate for each species, although several alternative models performed nearly as well (

To more concretely visualize the output of our experimental design pipeline, panels C and D of

To further gain a sense of what makes certain surveys better than others, the environmental conditions of each possible visit were then projected using Factor Analysis for Mixed Data (FAMD), a dimension reduction technique for continuous and categorical data (

Accurate information regarding the time and place of probable tick encounters is an essential first step to reducing the burden of tick-borne pathogens. Statistical modeling allows extrapolating available information to a wider scale, which in turn enables local vector control agencies to better direct critical resources. However, the reliability of such model predictions are critically dependent on the nature of available data. Combining a Bayesian workflow and design of experiments is a principled approach to getting more out of data from existing surveillance efforts, and directing future efforts for the greatest effect. Thanks to advances in software and computing throughout the last decade, optimal Bayesian survey design is feasible to implement for a diverse array of researchers throughout epidemiology.

Our results for the application of scheduling monthly tick surveillance in public natural areas demonstrate large gains in information are possible through carefully chosen surveys. Even when restricting sampling to a limited number of locations, efficient survey design can make the difference for learning critical information and improving reliability of tick distribution maps. Although implementing an optimal surveillance design may require additional planing prior to sampling, the higher quality of information provided by such designs means that ultimately fewer visits are required to reach a certain level of confidence. Optimal sampling can therefore serve to reduce the overall resources necessary for effective surveillance.

Successful designs can also inform general practices for surveillance. For example, for the first design criterion based on environmental covariates, the success of a space-filling strategy shows that spreading future visits across time and space is more valuable than other intuitive options such as focusing sampling on specific months which were previously under-sampled. Because similar environmental conditions will tend to be clustered in time and space, spreading visits in this way will tend to spread design points evenly across covariate space as well, which is theoretically optimal with respect to the d-optimality criterion for simple logistic regression (

While we have emphasized finding efficient designs using optimization, a secondary takeaway from our analysis is that random sampling generally outperforms some common forms of convenience sampling, such as repeated sampling or sampling based on current uncertainty. This reinforces random sampling as a gold standard catch-all technique that is independent of the design criterion (

Analysis of the initial survey data during model comparison provides insight into the current tick patterns in natural areas throughout the southeastern US, while also demonstrating further data are needed. The top performing models all included a temporal trend for each tick species (

The best suited models all included a term for spatial variability for each tick species, which has previously been deemed important for modeling

Our prediction map of expected probability of tick presence in South Carolina generally agrees with previously published results, although data at a similar spatial and temporal scale are limited. For

The application of BED for vector surveillance used in this work focused on establishing tick presence in public natural areas, although we note that the framework used here can be applied to other metrics such as abundance with minimal changes. While measuring tick presence or abundance in outdoor recreational areas such as state parks is a widely used method for establishing human exposure risk (

The BED procedure illustrated here suggests several avenues for future statistical and computational development. First, additional work is needed to better understand optimal designs for the types of mixed-effects models used in this and other studies of species distributions, as research combining BED and mixed-effects models is scarce (

In this work, we have outlined Bayesian Experimental Design as a formal approach to the surveillance of disease vectors. Compared to classical methods of experimental design, a Bayesian framework provides a natural way to incorporate initial survey data, while rigorously accounting for remaining uncertainty in model predictions. We applied a BED workflow to an ongoing tick surveillance study in South Carolina state parks, and found that surveys optimized to satisfy specific goals were universally more efficient than simple random sampling. These results demonstrate the promise of optimal survey design for researchers and vector control agencies to maximize the impact of the data they collect.

The authors would like to thank Drs. Banky Olatosi and Xiaoming Li for their programmatic support of BC during the T35 summer training program at the University of South Carolina.

BC received financial support from National Institutes for Health T35 grant 5T35AI165252–02 and 1P20 GM125498–01 Centers of Biomedical Research Excellence Award. This work was also partially supported by the Centers for Disease Control and Prevention Epidemiology and Laboratory Capicity for Prevention and Control of Emerging Infectious Diseases NU50CK000542-04-00.

CRediT authorship contribution statement

Supplementary materials

Supplementary material associated with this article can be found, in the online version, at doi:

All code and data necessary to reproduce our analyses are available on Zenodo, doi:

Implementing Bayesian Experimental Design in spatiotemporal surveillance. A motivating example with a single environmental covariate is shown, with the goal of establishing environmental factors associated with tick presence. Top: a small design space consisting of four possible survey locations (e.g. parks) and four timepoints (e.g. months). A surveillance schedule is a collection of visits (time-location pairs) and amounts to arranging points in design space. The changing values of the environmental covariate _{init} and associated _{init} are shown in covariate space, two candidate models are compared, and a posterior distribution for the selected model is fit. In step 2, the utility of some candidate design is defined as an average over future outcomes and the amount of new information that would be provided by each outcome. Here a Bayesian d-optimality criterion is used, which scores outcomes based on the volume of confidence ellipsoids approximating the updated posterior distribution. In step 3, finding an effective design is treated as an optimization problem over the space of candidate designs. A generic stepwise procedure is shown, and the best design found after 100 iterations is then examined. In accordance with BED theory, this design spread additional points throughout the middle of covariate space, while putting special attention at the extreme

Posterior environmental and temporal effects from the initial survey. Results are shown for the best-performing model fit to the initial survey data from 2020 to 2021. All results are in log-odds scale. (A) Marginal posterior means for each species intercept and coefficient for the environmental variables are shown as points, while 50% and 95% Highest Posterior Density Intervals are shown as purple bars. (B) Mean temporal trend for each tick group is given by dashed lines, along with full marginal posterior densities for each month/tick group.

Spatiotemporal mean and standard deviation of risk. Results are shown for the best-performing model fit to the initial survey data. Posterior marginals for the probability of tick presence were computed along a 16 km grid of locations across South Carolina, and summarized by the mean (top) and standard deviation (bottom).

Comparing search methods for effective designs of tick surveillance. The utility of designs found using different search strategies are compared to 20 replications of random sampling (light blue distributions) for different sample sizes, and to different convenience/repeated sampling schemes (dashed lines; number of visits in parentheses). (A) Results from optimizing the d-optimality criterion for the environmental covariates. (B) Results using the maximum variance reduction criterion among high risk areas. (C-D) Proposed schedules of 20 visits for the 3 effective search strategies are shown across space and time. The schedule of 170 visits used for the initial survey data is shown as gray points.