This is an open access article distributed under the terms of the Creative Commons Attribution License (

Early detection of outdoor aerosol releases of anthrax is an important problem. The Bayesian Aerosol Release Detector (BARD) is a system for detecting releases of aerosolized anthrax and characterizing them in terms of location, time and quantity. Modelling a population's exposure to aerosolized anthrax poses a number of challenges. A major difficulty is to accurately estimate the exposure level--the number of inhaled anthrax spores--of each individual in the exposed region. Partly, this difficulty stems from the lack of fine-grained data about the population under surveillance. To cope with this challenge, nearly all anthrax biosurveillance systems, including BARD, ignore the mobility of the population and assume that exposure to anthrax would occur at one's home administrative unit--an assumption that limits the fidelity of the model.

We employed commuting data provided by the U.S. Census Bureau to parameterize a commuting model. Then, we developed methods for integrating commuting into BARD's simulation and detection algorithms and conducted two studies to measure the effect. The first study (simulation study) was designed to assess how BARD's detection and characterization performance are impacted by incorporation of commuting in BARD's outbreak-simulation algorithm. The second study (detection study) was designed to measure the effect of incorporating commuting in BARD's outbreak-detection algorithm.

We found that failing to account for commuting in detection (when commuting is present in simulation) leads to a deterioration in BARD's detection and characterization performance that is both statistically and practically significant. We found that a simplified approach to accounting for commuting in detection--simplified to maintain tractability of inference--nearly fully restored both detection and characterization performance of BARD detector.

We conclude that it is important to account for commuting (and mobility in general) in BARD's simulation algorithm. Further, the proposed method for incorporating commuting in BARD's detection algorithm can successfully perform the necessary correction in the detection algorithm, while preserving BARD's practicality. In our future work, we intend to further study the problem of the trade-off between running time and accuracy of the computation in BARD's version that includes commuting and ultimately find the best such trade-off.

Disease surveillance has a long history in Public Health. However, recent years have witnessed a sharp increase in the research devoted to the very early detection of disease outbreaks [

The early detection of outdoor aerosol releases of anthrax is an important problem because a release could infect hundreds of thousands of individuals and without early detection mortality could be as high as 30,000 to 3 million [

Modelling a population's exposure to aerosolized anthrax poses a number of challenges. A major difficulty is to accurately estimate the exposure level--the number of inhaled anthrax spores--of each individual in the exposed region. A key determinant of the exposure level of an individual is his/her spatial location at the time of exposure. Hence, to accurately estimate the exposure level it would be desirable to perform a fine-grained spatial modelling at the person-level. However, the data needed to parameterize such models are difficult to obtain. In fact, the only type of spatial information that is typically contained in biosurveillance databases is the home administrative unit--such as the home zip code--of each patient. To deal with this challenge posed by the lack of fine-grained spatial data, nearly all anthrax biosurveillance systems, including BARD, make two simplifying assumptions. First, they assume that all individuals who live in the same administrative unit would have the same exposure level--typically taken to be the level at the unit's centroid. Second, they assume that exposure to anthrax would occur at one's home administrative unit. Whether it is possible to relax the first assumption remains currently an open problem. On the other hand, in the last few years there has been some progress toward relaxing the second assumption. The key to this progress has been to integrate biosurveillance systems with mobility models that are parameterized through data that describe the travel patterns of the population. For one type of travel, namely the work-related commuting, such datasets are publicly available from the U.S. Census Bureau. For other types of travel, data are more difficult to obtain.

The first study that incorporated a mobility model in the simulation of anthrax outbreaks was conducted by Buckeridge [

This paper describes a method for incorporating commuting into BARD. First, we incorporate commuting into BARD's outbreak-simulation algorithm. Then, we present the results of an experimental study that assessed how BARD's detection and characterization performance are impacted as a result. Next, we present the results of a second study designed to measure the effect of incorporating commuting in outbreak

Here we provide a brief description of BARD. A thorough discussion of this system can be found in Hogan et al. [

List of symbols

release date and time | |

release coordinates | |

release quantity | |

_{
i
} | population of tract |

_{
i
} | number of cases who live in tract |

_{
i
} | dose of spores in the centroid of tract |

_{
i
} | ED-visit probability of each individual who lives in tract |

commuting graph: nodes denote tracts, arcs denote commuting flows | |

_{
ij
} | for |

_{
ij
} | for |

union of set { | |

ED-visit probability of each individual who lives in tract |

BARD detector attempts to discriminate between two hypotheses. The null hypothesis states that only "background" respiratory disease is present in the surveillance region. The alternative hypothesis states that both background respiratory disease and a respiratory disease caused by an aerosol release of anthrax are present. Using Bayes' theorem, BARD computes the posterior probability of the attack hypothesis _{1 }given data, as follows:

In addition, BARD computes the likelihood ratio, or the Bayes' factor

In equations (1)-(2), _{1 }and the Bayes' factor are both measures of the evidence provided by the data in favour of the alternative hypothesis.

The two key quantities in equations (1)-(2) are _{0}, _{0}, and _{1}, _{1}. Assuming conditional independence among the counts of different tracts given a hypothesis and the meteorological conditions, BARD computes the two region-wide likelihoods _{0}, _{1},

In eq. (3), _{i }denotes the 24-hr count of tract _{i }|_{0}, _{i }|_{1},

In addition to computing the two detection statistics discussed above--the posterior probability of _{1 }and the Bayes' factor--BARD attempts to characterize a release by computing an estimate for each element of the release vector _{1}, i.e.,

BARD employs a uniform prior for each element of

BARD can also simulate outbreaks of the respiratory disease caused by an aerosol release of anthrax. Algorithm 1 gives a high-level description of BARD's outbreak simulation algorithm.

Algorithm 1: BARD outbreak simulation

1. compute the _{i }in the centroid of tract

2. compute the probability _{i }that an individual who lives in tract _{i }(see [

3. generate the number of cases _{i }for tract _{i}, _{i}), where _{i }is the population of tract

4.

a. draw a random time interval from a _{i }(see [

b. add this interval to the release date/time to obtain the time of ED visit for this case

end

For brevity, in Algorithm 1 we have left out a number of details that are not necessary for understanding the flow of the computation. These details can be found in [

Our disease data--counts of ED visits with RCC--correspond to a region surrounding the city of Pittsburgh. This region consists of seven counties: Allegheny, Armstrong, Beaver, Butler, Lawrence, Washington, and Westmoreland. The historical ED data for our experiments were provided by 10 EDs operated by one health system. These data represented nearly 30% of all ED visits in the surveillance region. We divided the total period spanned by the available ED data into a training period (1 January 1999 to 31 December 2004), which was used to train BARD's detection algorithm, and a test period (1 January 2005 to 31 December 2005), used in our evaluation experiments. Finally, the weather data were provided from the National Weather Service, while the populations and central zip codes came from the ESRI ArcGIS Desktop product.

The data that describe the commuting patterns were provided by the U.S. Census Bureau. These data were collected during the 2000 Census, have national coverage and are provided at the

As a pre-processing step, we needed to transform the commuting flows from the census tract level provided by the U. S. Census Bureau to one of the two levels supported by the BARD simulator: zip code, or block group. In this paper we performed the latter transformation mainly because the block-group level is finer-grained than the zip-code level and thus has the potential to lead to a more accurate estimation of the release location by BARD. Note that block group is a sub-unit of census tract (i.e., each census tract is partitioned into two or more block groups). To perform the desired transformation, we split each commuting flow between a pair of census tracts _{1}, _{2 }into several smaller-sized flows according to the following two intuitive rules: (i) the number of workers coming from each constituent block group of _{1 }was assumed to be proportional to the block group population; (ii) the number of workers going to _{2 }was assumed to be evenly divided among its constituent block groups. The final commuting graph for our region consisted of 1991 nodes (block groups) and 324,402 arcs (commuting flows).

We refer to the version of BARD simulator that takes commuting into account as the BARD-C simulator. BARD-C simulator takes as input a representation of the commuting graph

where

In eqs. (5)-(6), _{i }denotes the total number of cases who live in tract _{ij}, _{ii }denotes number of people who either don't work, or who both live and work in tract _{ij}, _{ii }denotes number of cases who either don't work, or who both live and work in tract _{j }denotes the probability that a person

This refined computation of the tract-level ED counts in BARD-C could alternatively be implemented by modifying the ED-visit probability of each individual who lives in tract _{i }denote, as before, the probability that an individual _{i }specified by eqs. (5)-(6) is essentially the same as generating the count _{i }from a binomial distribution _{i},

The above two methods for incorporating commuting in simulation differ slightly from the method employed by Buckeridge [

After generating the number of cases for each tract, BARD-C computes the ED-visit time of each simulated case by, again, taking into account the fact that a worker would be exposed at the dosage level of his/her work tract. This idea is implemented in BARD-C by ensuring that in the Step 4(a) of Algorithm 1 the time of ED visit for a worker case is computed as a random deviate from the lognormal distribution that corresponds to the dosage level at his/her work tract.

The incorporation of commuting in simulation leads to an increase in the running time of the Step 3 of BARD's simulation algorithm by a constant factor. It is straightforward to see that this factor equals the average out-degree of the commuting graph. For the block group-level commuting graph of the Pittsburgh region that we created, the average out-degree is nearly 150. However, the factor by which the total running time of BARD simulator is increased due to the incorporation of commuting is smaller. The reason for this is that Step 3 accounts for only a fraction of the running time of BARD. In a computer with a 3 GHz processor and 2 GB of main memory, a simulation for the Pittsburgh region took nearly 2 minutes with BARD and nearly 20 minutes with BARD-C.

To verify that BARD-C simulator works as intended, we generated a random release parameter vector, consisting of the release time, location, and quantity. Then, using this fixed release parameter vector we produced 10 synthetic outbreaks with BARD-C simulator. Next, for each tract _{i}, _{i }denotes the known population of tract _{j }for all

We have also developed a method for integrating commuting with BARD detector. A thorough discussion of this method is given [

We conducted an experiment to measure whether BARD's detection and characterization performance is significantly affected by the incorporation of commuting in outbreak simulation. In the remainder of the paper, we refer to this experiment as the "simulation study" because the emphasis of this study was on the integration of commuting with BARD simulator. In the next section we discuss a second study, where the emphasis is on the integration of commuting with BARD detector. We refer to this second study as the "detection study". Figure

The simulation study followed a matched-pairs design. First, a set of 185 release parameter vectors

We supplied each release vector _{i},

BARD's detection performance was measured through the time-to-detection--or

We employed two methods to compare BARD's performance over the non-commuting group with its performance over the commuting group. The first method was Activity Monitoring Operating Characteristic (AMOC) analysis [

The second method we employed to compare BARD's performance over the non-commuting group with its performance over the commuting group was statistical testing. Our null hypothesis asserted that for a given false alarm rate and performance metric, the mean of the given metric over the non-commuting group was equal to its mean over the commuting group. Testing was carried out through the

We performed a second experiment designed to compare the performance of BARD and BARD-C

The

Figure

Figure

With respect to

Figure

Finally, Table

Results of the test for equality of BARD's errors over the two simulation groups.

0.00014 | 0.06698 | 0.00139 | 0.0055 | |

0.00889 | 0.04066 | 0.00026 | 0.00106 | |

0.00991 | 0.03827 | 0.00024 | 0.00107 | |

0.00074 | 0.09732 | 0.00006 | 0.00061 | |

0.0046 | 0.04967 | 0.00017 | 0.00086 | |

0.00044 | 0.07566 | 0.0002 | 0.00068 | |

0.0041 | 0.01017 | 0.00004 | 0.00008 | |

0.03221 | 0.01087 | 0.00003 | 0.00003 | |

0.01664 | 0.00879 | 0.00059 | 0.00006 | |

0.01007 | 0.00689 | 0.00104 | 0.00006 | |

0.00819 | 0.00762 | 0.00107 | 0.00007 | |

0.00477 | 0.00775 | 0.00124 | 0.00013 |

The p-values of the test H_{0}: mean(metric) non-commuting equals mean(metric) commuting--where metric can be timeliness, t error, x error, or y error.

Some of the results appearing in this section (Figs

Next, we turn to the characterization performance of BARD and BARD-C detectors. As in the simulation study, we found that the posterior means of the release parameters

Hence, in the remainder of this section we focus on the

Figure

Finally, we performed a Wilcoxon signed rank test whose null hypothesis asserted that BARD and BARD-C have identical locations of the distributions of

We incorporated commuting into BARD's simulation and detection algorithms and conducted two experimental studies to evaluate the effect. Incorporation of commuting in simulation is important to improve the fidelity of semi-synthetic outbreaks that BARD generates. Likewise, by incorporating commuting in detection we expect that the laboratory performance of the BARD detector would be closer to a real-world performance.

The results of our simulation study showed that incorporation of commuting in outbreak simulation leads to a statistically significant deterioration in BARD's detection and characterization performance, when BARD's detection algorithm does not account for commuting. Intuitively, this result was expected since incorporation of commuting in simulation causes the spatio-temporal pattern of respiratory disease incidence to be different from the pattern BARD would expect under the alternative hypothesis. However, our precise quantification of the deterioration in BARD's performance due to commuting showed that this deterioration is practically significant. Indeed, BARD's timeliness increased by nearly one hour and based on previous estimates [

Our detection study showed that incorporation of commuting in detection almost fully restores BARD's performance in the sense that the performance of BARD-C detector on simulations produced with the BARD-C simulator is close to the performance of BARD detector on simulations produced with BARD simulator.

Although others have measured the effect of population mobility on outbreak-detection performance, to our knowledge our work is the first to report the effect on outbreak-characterization performance. We found that failing to account for commuting in detection (when commuting is present in simulation) worsens characterization performance. We found that our simplified approach to accounting for commuting in detection--simplified to maintain tractability of inference--significantly improved characterization performance over characterization without commuting.

Finally, we discuss the main limitations of our work. First, in creating the commuting graph of the surveillance region, we employed intuitive but arbitrary rules to transform the commuting flows into a spatial resolution supported by BARD. The need for transforming the commuting flows could be avoided in the future by extending BARD so that it supports the census tract mode, but at the expense of coarser spatial granularity. Second, although we did not use the same commuting graph in both simulation and detection (see [

The authors declare that they have no competing interests.

AC, GLW and WRH developed the proposed methods, designed the experimental studies and prepared the manuscript. AC implemented the proposed methods and performed the experiments.

This work was supported by the Centers for Disease Control and Prevention (CDC) under grant R01PH000026-01.

This article has been published as part of