The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster.
We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum
When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging.
We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm.
The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance [
for each subregion of interest
One of the main drawbacks of Kulldroff's approach is the computational expense associated with the randomization testing required to calculate statistical significance. Neill and Moore [
Another approach taken in the literature is the development of Bayesian variants on Kulldroff's statistic [
The use of a Bayesian approach eliminates the need for significance testing since the purpose of significance testing is to measure how likely the alternative hypothesis is given the data (more precisely, it tells us how likely we are to obtain a likelihood ratio that is at least as extreme under the null hypothesis). Since the posterior probability gives us the probability of the alternative hypothesis given the data directly, the need for time-consuming randomization testing is eliminated. An additional benefit of the Bayesian approach is that the introduction of prior probabilities enables these methods to incorporate prior information to potentially enhance detection [
The methods mentioned above so far are "count-based" methods which detect clusters based on aggregate counts of data such as the total number of ED visits in a given ZIP code on a given day, for example. An alternative approach is "agent-based," where each individual in the population is modeled. This is the approach taken by Cooper et al. [
A common way of articulating the spatial scan problem is one that considers the surveillance region to be a rectangular
Most grid-based methods have mainly focused on the detection of single rectangular clusters aligned with the grid.
However, since an outbreak may take other shapes, or may occur in multiple disconnected regions of the surveillance grid, it is of interest to develop algorithms that are able to detect more general subregions. Jiang and Cooper [
Complementary to these approaches are methods for calculating the posterior probability of the presence of clusters anywhere in the region using Bayesian model averaging, such as the work of Shen et al. [
The task of detecting irregularly shaped clusters has been addressed by distance-based methods which when, given a set of points in (possibly many-dimensional) space find clusters of elevated or uniform density [
We implement, develop, and test an algorithm for finding multiple rectangular clusters on a grid that employs dynamic programming in order to consider an exponential number of hypotheses in polynomial time. The algorithm finds a hypothesis
We also present an adaptation of the algorithm that averages over all considered hypotheses
Below, we describe and investigate an algorithm for the detection of multiple rectangular spatial clusters. The algorithm is applied to the simple outbreak model described next. While we restrict the analysis to this particular model, the algorithm is general and can be applied to any underlying model that can provide a likelihood function such as the function
We define the disease model as a Bayesian Network (BN) model, which is a type of graphical model for representing joint probability distributions; it is particularly suited for modeling conditional independence between variables. In the graphical representation each node represents a random variable and the distribution of each variable is defined as a conditional distribution that is conditioned on the variables from which it receives incoming edges, or as a prior probability distribution if the node has no incoming edges [
We employ an agent-based model where each cell of the
Figure
The random variable
Note that this is a well-defined probability distribution only under the constraint that the members of SUB do not intersect. We could alternatively merge this logic into the definition of
The presence of
Where (
The primary purpose of the model is to enable us to calculate the likelihood of the presence of an outbreak in a given subregion, where a subregion is taken to be any set of cells. Throughout this paper, we will often also refer to rectangular subregions, or simply rectangles, to make explicit instances where we require the sets of cells to form multi-cell rectangles on the grid. We will also introduce below the concept of tilings and tiles. In the context of this paper, tiles are always rectangular subregions.
The data likelihood of the presence or absence of an outbreak in a subregion is given by the likelihood of the data about individuals located within the subregion's limits supporting a uniform outbreak of some frequency
Note that the likelihood for the individual given the absence of an outbreak can be obtained from the expression above by letting
Here we are simplifying some of the calculation by capturing the logic behind determining the manifested frequency
Three disease states are modeled: noED, flu, and other, which respectively represent the events that an individual did not come in to the ED, came in to the ED due to influenza, or came to the ED for some other reason. The probability that an individual
In the initial stages of the investigation
To estimate the distribution
Four evidence states are modeled for the evidence variable
A value of "missing" indicates that individual
The proper choice of prior distributions is closely tied to the proper choice of the prior probability of an influenza outbreak being present anywhere in the region. For that purpose, we take the value
The distributions of SUB,
The simplest case is the case where only single-rectangle hypotheses are considered. Under that model, SUB can represent any of the
Note that when SUB is the empty set,
The case that corresponds to the greedy algorithm to which we will be comparing our algorithm is the case where each element
The parameter
Where \ is the set difference operator and |
Since in our experiments this model is used only in the context of selecting the most likely outbreak hypothesis, there is no need to specify
Since in this model every rectangle that appears in SUB is an outbreak rectangle, we again have
The hypothesis space used by the dynamic programming algorithm, the centerpiece of this work, is a space of tilings (discussed in more detail in the next section.) To express this hypothesis space, each value of SUB is a tiling: a collection of non-overlapping rectangles such that every cell of the grid is covered by exactly one of these rectangles. In order to represent an outbreak hypothesis as a tiling we use the variable
Here
In the text that follows, we will refer to the choice of a set of tiles
The algorithm presented in this paper searches a space of hypotheses where each hypothesis is a possible tiling of the surveillance grid by non-outbreak and outbreak rectangles. For example, Figure
Note that we make a distinction between two adjacent outbreak tiles (e.g., Hypothesis 5) and one large outbreak tile (e.g., Hypothesis 6). The rationale for this is that each tile represents a region over which the outbreak frequency is uniform. In the 1 × 2 example, we would expect Hypothesis 5 to be more likely than Hypothesis 6 in a case where, for example, 5% of the population in the left cell has the outbreak disease and 10% of the population in the right cell has the outbreak disease. Conversely, if the outbreak disease cases are distributed uniformly among the two cells we would expect Hypothesis 6 to be more likely.
In computing tiling scores, we assume conditional independence among separate tiles given a particular tiling, even if the tiles are adjacent. As an alternative, modeling dependencies could help in the cluster detection task to adjust the prior probability of the presence of a disease in one cluster when it is near another cluster. A model that takes such effects into account could achieve improved outbreak detection, especially when modeling infectious diseases like influenza where being near an infected individual increases the probability of transmission of an infection. Modeling such dependencies incorrectly, however, could also hinder detection. In this sense, our choice not to model spatial dependencies can be seen as a cautious approach to avoid making informative spatial dependence assumptions that may be incorrect and therefore deleterious to outbreak detection performance. Even so, our basic choice of priors does favor finding a smaller number of tiles for a region, which often leads to spatially grouping neighboring disease cases.
The independence assumptions we make enable the dramatic computational efficiency that we gain by using dynamic programming, as explained in detail below. Assuming conditional independence does not constrain the outbreak hypotheses that can be identified in principle from the data, if given enough data. Although valid assumptions of conditional dependence may yield a more accurate performance in light of available data, representing and reasoning with conditional dependence carries an enormous computational burden that we avoid. It may be possible to extend our method to take spatial dependencies into account and still maintain computational tractability. We leave this issue as an open problem for future research.
Conditional independence allows us to define the score of a given tiling to be the product of the scores of the individual tiles it is composed of. The score of each individual tile is given by the data likelihood of that tile, as defined by Equation (8), multiplied by the prior probability of the tile's outbreak state. That is, the score of the hypothesis that tile
To illustrate the effects of multiplying the likelihood by a prior, suppose that in the 1 × 2 example in Figure
A multiplicative prior for each tile is used to allow the decomposition of a tiling score into a product of individual tile scores. While we use the same prior for all tiles, it is possible to assign a different prior probability for each tile based on its location and size while maintaining the multiplicative property that the score of a tiling is a product of tile scored. Also note that while the priors for each tile add up to 1, the priors for tilings, which are a product of tile priors, are not normalized. We discuss the details of normalization when we present Bayesian model averaging, since it is especially relevant in that context. The process of selecting the value of the structure prior
It is clear that the space of possible outbreak hypotheses is exponential in the size of the grid, since there are 2
To present the operation of the algorithm, we will first describe the algorithm for finding the highest-scoring tiling of a 1 ×
In order to find the best (highest scoring) tiling of a 1 ×
Figure
This method is extended to the two-dimensional case by performing the same iteration over rows, where each row takes the role analogous to a cell in the one-dimensional version discussed above. For example, Figure
For readability purposes, a version of the algorithm that only computes the score of the best tiling is presented in Figure
The algorithm takes
We can express the value computed by the algorithm formally as follows: Let a row-wise partition of the grid be denoted by
The space of tilings considered by the algorithm can be described as the space of row-wise tilings of column-wise sub-tilings. Intuitively, this space has the desired property of being able to capture every cell-wise coloring of the grid, however, this is not an exhaustive search over all general colored tilings. Figure
In the above algorithm it was assumed that the task at hand is model selection, that is, finding the most likely tiling given the data. Below we present an adaptation of the dynamic programming algorithm to perform Bayesian model averaging as well, in order to derive a posterior probability that an outbreak is occurring given the data. In simple terms, this is done by replacing maximization with summation to obtain two sums:
The summation algorithm calculates the sum
Here, the function
Under the above setup, the tiling scores need to be normalized in order to have the prior probabilities over all hypotheses sum to 1. The normalization constant
Hence the normalization constant
Similarly, the prior probability, as governed by the structure prior parameter
This relationship was used to pick the value of
In order to obtain the posterior probability of an outbreak, we normalize the sums of scores to obtain:
In a typical biosurveillance application this is the posterior that we would use to detect whether an outbreak is occurring anywhere in the surveillance region by testing whether it rises above a pre-defined alert threshold.
This section describes an evaluation of the dynamic programming (DP) algorithm, both in terms of model selection and in terms of model averaging. The evaluation was preformed using real background data consisting of information about chief complaints and home ZIP codes of ED patients who are presumed not to have an outbreak disease. Synthetic data were generated to simulate the presence of influenza outbreak cases. The evaluation consists of a comparison to a baseline that scans over single-rectangle hypotheses (SR), as well as a comparison for model selection against a greedy algorithm (GR) that hypothesizes one or more non-overlapping outbreak rectangles. The next section describes these algorithms and the section that follows it describes the data in further detail.
The simpler of the baseline methods that the DP algorithm is compared to is a method of scanning over single-rectangle hypotheses (SR). For model selection, it consists of iterating over all possible placements of a single rectangle on the grid and finding the placement that maximizes the posterior probability of an outbreak, calculated as:
where
Let the notation
The other baseline method that is used in the evaluation is the greedy algorithm (GR) without recursion by Jiang and Cooper [
Outbreak rectangles are considered to be independent conditioned on their positions. The value of
Multiple outbreak scenarios were generated on a 10 × 10 grid with a population distribution based on ZIP Code Tabulation Area populations in Allegheny County, Pennsylvania as reported in the 2000 Census. Real ED data with home ZIP code information was used as a non-outbreak background scenario. Data from the months of June-August of the years 2003-2005, a total of 181 days, were used to evaluate the background scenario. The data for each day consisted of a de-identified list of records for patients who visited emergency departments in Allegheny county on that day, where each record contained the patient's home zip code and chief complaints. Ethics approval for use of the data was provided by the University of Pittsburgh IRB.
Outbreaks of varying shapes, sizes, and frequencies were then injected into the data to obtain outbreak scenarios. In simulating outbreaks, the outbreak frequency
For each frequency range, four sets of outbreak specifications were generated where the rectangle sizes were limited to a maximum of 10, 40, 60, and 100 cells. Each of these sets was in turn composed of five sets of specifications with
For each of the 181 days of background data, each of the 800 outbreak specifications was used to create an outbreak scenario by injecting disease cases into that day by selecting an individual in a an outbreak rectangle indexed by
For each outbreak scenario, the outbreak's coverage (the proportion of the grid that is covered by outbreak rectangles) was also recorded. This value is later used to stratify the test results.
We evaluated model averaging of the dynamic programming (DP) and the single rectangle (SR) algorithms by obtaining the posteriors for each day in the background data (when no outbreak is occurring) and for each day in the outbreak scenarios, and aggregating these results to obtain Receiver Operating Characteristic (ROC) curves. Note that days are treated as single snapshots of the surveillance region with no temporal context, and the area under the ROC curve gives us a measure of the quality of detection of an outbreak from observing only a single day of the outbreak regardless of how far into the outbreak's progression that day is. This is unlike another popular metric in the outbreak detection literature, the Activity Monitoring Operating Characteristic (AMOC) curve, where a progressing outbreak is simulated over a span of multiple days and the time to detection is measured. We use an evaluation measure that does not take time into account because the algorithms in question themselves are purely spatial. Additionally, the need to simulate outbreak progression for AMOC analysis inadvertently introduces additional assumptions about the dynamics of the outbreaks to be detected, which is not the focus of the current evaluation.
In these tests the disease model used assumes a single-valued
Model averaging comparison in terms of area under the ROC curve
| Frequency range | Coverage | DP AUC | SR AUC | (DP-SR) 95% confidence interval limits | ||
|---|---|---|---|---|---|---|
| Low | < 16% | 0.95047 | 0.95016 | - 0.00746 | 0.00808 | 0.9384 |
| 16-34% | 0.86260 | 0.86852 | - 0.01261 | 0.00078 | 0.0831 | |
| ≥ 35% | 0.97409 | 0.97911 | - 0.01005 | 0.00001 | 0.0505 | |
| Mid | < 16% | 0.96019 | 0.96079 | - 0.00530 | 0.00410 | 0.8014 |
| 16-34% | 0.98246 | 0.98613 | - 0.00575 | - 0.00159 | 0.0005 | |
| ≥ 35% | 1.00000 | 1.00000 | ||||
| High | < 16% | 0.98468 | 0.98379 | - 0.00173 | 0.00352 | 0.5044 |
| 16-34% | 0.99990 | 0.99984 | - 0.00001 | 0.00013 | 0.1096 | |
| ≥ 35% | 1.00000 | 1.00000 | ||||
Comparison of the DP and SR ROC curves obtained from model averaging. The table shows the AUC for each algorithm, the lower and upper 95% confidence limits for the difference in areas (DP-SR) and the associated p-values
The results show that both methods perform similarly and do not achieve statistically different results, except in the case of mid-range frequencies with outbreak coverage of 16-34% of the surveillance grid where the SR method performs statistically significantly better, but the performance difference was inconsequential from a practical standpoint. We can also see that for both methods the area under the ROC curve is strongly influenced by both the frequency and coverage of the outbreak, increasing as these parameters increase, as expected.
The result that the DP algorithm does not perform better than the SR algorithm is surprising since the DP algorithm is able to consider multiple-rectangle hypotheses which would be expected to drive the posterior probability up significantly when there are multiple simulated outbreaks.
Since it is not immediately obvious from inspection of the algorithms why the anticipated improvement is absent, we performed targeted tests with the same background data and specially-designed outbreaks, such as the one illustrated in Figure
Additional model averaging comparison
| Frequency range | Number of rectangles | DP AUC | SR AUC | |||
|---|---|---|---|---|---|---|
| Low | 1 | 0.51545 | 0.51647 | - 0.01450 | 0.01247 | 0.8830 |
| 2 | 0.52214 | 0.52353 | - 0.01462 | 0.01185 | 0.8373 | |
| 3 | 0.55150 | 0.55355 | - 0.01652 | 0.01244 | 0.7824 | |
| 4 | 0.55416 | 0.55637 | - 0.02003 | 0.01561 | 0.8080 | |
| Mid | 1 | 0.66873 | 0.67618 | - 0.01967 | 0.00478 | 0.2326 |
| 2 | 0.80826 | 0.82422 | - 0.02771 | - 0.00420 | 0.0078 | |
| 3 | 0.78402 | 0.79984 | - 0.03444 | 0.00650 | 0.0520 | |
| 4 | 0.88198 | 0.88272 | - 0.01471 | 0.01324 | 0.9177 | |
| High | 1 | 0.88433 | 0.91615 | - 0.04173 | - 0.02191 | < 0.0001 |
| 2 | 0.96696 | 0.97218 | - 0.01147 | 0.00105 | 0.1027 | |
| 3 | 0.97819 | 0.98139 | - 0.00749 | 0.00109 | 0.1438 | |
| 4 | 0.99626 | 0.99567 | - 0.00224 | 0.00340 | 0.6852 | |
Comparison DP and SR ROC curves obtained from model averaging for outbreaks composed of various combinations of the rectangles in Figure 9
In light of these results we also followed another line of investigation, namely, one of changing the underlying likelihood function with the thought that a model that more accurately represents the background scenarios would yield more representative likelihoods, and possibly better differentiation between the nuances of the two algorithms. Since in reality the proportion
We have compared DP to SaTScan™,a the software package developed by Kulldorff that can use spatial, temporal, or space-time scan statistics [
It is important to note that there are fundamental differences between the DP algorithm and SaTScan™. Firstly, SaTScan™ is designed to detect clusters of higher or lower than normal values of some measurement and report those as clusters, while our method uses a disease model to calculate a likelihood of a state of interest based on observations. This means that in our testing, we cannot use SaTScan™ to directly detect high numbers of influenza cases
Table
DP compared to SaTScan™ in terms of area under the ROC curve
| Frequency range | Coverage | DP AUC | SaTScan™ | (DP-SaTScan™) 95% confidence interval limits | ||
|---|---|---|---|---|---|---|
| Low | < 16% | 0.95047 | 0.49570 | 0.41018 | 0.49936 | < 0.0001 |
| 16-34% | 0.86260 | 0.46070 | 0.35816 | 0.44562 | < 0.0001 | |
| ≥ 35% | 0.97409 | 0.42947 | 0.50375 | 0.58549 | < 0.0001 | |
| Mid | < 16% | 0.96019 | 0.65946 | 0.26789 | 0.33356 | < 0.0001 |
| 16-34% | 0.98246 | 0.85287 | 0.11099 | 0.14818 | < 0.0001 | |
| ≥ 35% | 1.00000 | 0.81147 | ||||
| High | < 16% | 0.98468 | 0.82496 | 0.14089 | 0.17856 | < 0.0001 |
| 16-34% | 0.99990 | 0.99490 | 0.00387 | 0.00611 | < 0.0001 | |
| ≥ 35% | 1.00000 | 0.94950 | ||||
Comparison of the DP posteriors obtained from model averaging and SaTScan™ p-values in terms of ROC curves. The table shows the AUC for each algorithm, the lower and upper 95% confidence limits for the difference in areas (DP-SaTScan™) and the associated p-values
A possible factor influencing the poor performance of SaTScan™ is that the task that it performs is not a perfect fit for our evaluation measure. We are measuring performance in the task of identifying injected disease outbreaks, while SaTScan™ performs the task of detecting elevated counts of the "cough" or "fever" chief complaint. Also, we injected rectangular outbreaks, whereas SaTScan™ is looking for elliptical shaped outbreak regions. Another potential factor to the performance of SaTScan™ is that the background (no-outbreak) scenarios also contain clusters of elevated chief complaint counts of cough and fever, even in the absence of injected influenza outbreak cases. DP itself is not free from the same problem, however, since it does assign high posteriors to some background scenarios, but to a lower extent.
In order to compare the quality of the detection of the location of outbreak clusters, the cell-wise spatial precision and recall of each algorithm were measured in each outbreak scenario. In terms of the most likely outbreak subregion
The disease model used in these tests is also the one that assumes a single-valued
Table
Comparison of model selection in terms of spatial precision
| Frequency range | Coverage | DP precision | GR precision | SR precision | (DP-GR) 95% confidence interval limits | ||
|---|---|---|---|---|---|---|---|
| Low | < 16% | 0.09365 | 0.11775 | 0.07022 | - 0.02412 | - 0.02408 | < 0.00001 |
| 16-34% | 0.28551 | 0.33207 | 0.27251 | - 0.04659 | - 0.04654 | < 0.00001 | |
| ≥ 35% | 0.57974 | 0.62428 | 0.60255 | - 0.04462 | - 0.04447 | < 0.00001 | |
| Mid | < 16% | 0.31987 | 0.34414 | 0.11212 | - 0.02430 | - 0.02425 | < 0.00001 |
| 16-34% | 0.67969 | 0.66288 | 0.31934 | 0.01679 | 0.01683 | < 0.00001 | |
| ≥ 35% | 0.92269 | 0.86046 | 0.55602 | 0.06211 | 0.06235 | < 0.00001 | |
| High | < 16% | 0.50417 | 0.53018 | 0.15127 | - 0.02604 | - 0.02599 | < 0.00001 |
| 16-34% | 0.84815 | 0.78839 | 0.34174 | 0.05974 | 0.05978 | < 0.00001 | |
| ≥ 35% | 0.91292 | 0.83541 | 0.54478 | 0.07740 | 0.07763 | < 0.00001 | |
| Very high | < 16% | 0.66221 | 0.51208 | 0.14081 | 0.15010 | 0.15017 | < 0.00001 |
| 16-34% | 0.84584 | 0.44728 | 0.26127 | 0.39854 | 0.39859 | < 0.00001 | |
| ≥ 35% | 0.92306 | 0.66563 | 0.48019 | 0.25728 | 0.25758 | < 0.00001 | |
Comparison of the DP, GR, and SR spatial precisions obtained from model selection. The table shows the mean precision for each algorithm, the lower and upper 95% confidence limits for the difference in precision (DP-GR) and the associated
Comparison of model selection in terms of spatial recall
| Frequency range | Coverage | DP recall | GR recall | SR recall | |||
|---|---|---|---|---|---|---|---|
| Low | < 16% | 0.08729 | 0.09759 | 0.04993 | - 0.01031 | - 0.01029 | < 0.00001 |
| 16-34% | 0.13665 | 0.14801 | 0.10795 | - 0.01137 | - 0.01135 | < 0.00001 | |
| ≥ 35% | 0.18579 | 0.19636 | 0.17592 | - 0.01060 | - 0.01055 | < 0.00001 | |
| Mid | < 16% | 0.39280 | 0.38533 | 0.11793 | 0.00745 | 0.00749 | < 0.00001 |
| 16-34% | 0.56118 | 0.55025 | 0.24091 | 0.01092 | 0.01095 | < 0.00001 | |
| ≥ 35% | 0.74869 | 0.75813 | 0.42978 | - 0.00956 | - 0.00932 | < 0.00001 | |
| High | < 16% | 0.62454 | 0.60666 | 0.13601 | 0.01786 | 0.01789 | < 0.00001 |
| 16-34% | 0.82999 | 0.82147 | 0.29433 | 0.00850 | 0.00852 | < 0.00001 | |
| ≥ 35% | 0.91850 | 0.91130 | 0.50550 | 0.00716 | 0.00724 | < 0.00001 | |
| Very high | < 16% | 0.93248 | 0.92970 | 0.24123 | 0.00277 | 0.00278 | < 0.00001 |
| 16-34% | 0.99829 | 0.99809 | 0.62275 | 0.00020 | 0.00020 | < 0.00001 | |
| ≥ 35% | 0.99813 | 0.99815 | 0.74743 | - 0.00002 | - 0.00001 | < 0.00001 | |
Comparison of the DP, GR, and SR spatial recall obtained from model selection. The table shows the mean recall for each algorithm, the lower and upper 95% confidence limits for the difference in recall (DP-GR) and the associated
First note that all differences observed between the DP and GR algorithms, while small, are statistically significant at the large sample sizes used. In terms of both spatial precision and recall, the results show that the greedy algorithm does slightly better than the dynamic programming algorithm at low frequencies, even though the performance of both is very poor. This may be an indication that the greedy algorithm is more sensitive to small differences in likelihood. For the most part, as frequency and coverage of the outbreak increase, the performance of the dynamic programming algorithm overtakes that of the greedy algorithm. Most notably, the dynamic programming algorithm yields much better spatial precision for outbreaks with higher spatial coverage (and therefore more outbreak rectangles). The differences in recall, while statistically significant, do not amount to much from a practical standpoint.
We also compare the DP algorithm to SaTScan™ in terms of spatial precision and recall. Table
Comparison of spatial precision against SaTScan™
| Coverage | DP precision | SaTScan™ precision | ||||
|---|---|---|---|---|---|---|
| Low | < 16% | 0.09365 | 0.09218 | 0.00395 | - 0.00100 | 0.12159 |
| 16-34% | 0.28551 | 0.27073 | 0.01871 | 0.01084 | < 0.00001 | |
| ≥ 35% | 0.57974 | 0.47264 | 0.11638 | 0.09781 | 0.00128 | |
| Mid | < 16% | 0.31987 | 0.24489 | 0.07877 | 0.07118 | < 0.00001 |
| 16-34% | 0.67969 | 0.59986 | 0.08355 | 0.07612 | < 0.00001 | |
| ≥ 35% | 0.92269 | 0.87176 | 0.05817 | 0.04369 | < 0.00001 | |
| High | < 16% | 0.50417 | 0.33449 | 0.17408 | 0.16526 | < 0.00001 |
| 16-34% | 0.84815 | 0.66405 | 0.18728 | 0.18093 | < 0.00001 | |
| ≥ 35% | 0.91292 | 0.82920 | 0.08906 | 0.07839 | < 0.00001 | |
| Very high | < 16% | 0.66221 | 0.41769 | 0.24879 | 0.24027 | < 0.00001 |
| 16-34% | 0.84584 | 0.62706 | 0.22190 | 0.21568 | < 0.00001 | |
| ≥ 35% | 0.92306 | 0.79866 | 0.11971 | 0.12911 | < 0.00001 | |
Comparison of the DP and SaTScan™ spatial precision. The table shows the mean recall for each algorithm, the lower and upper 95% confidence limits for the difference in recall (DP-SaTScan™) and the associated
Comparison of spatial recall against SaTScan™
| Frequency range | Coverage | DP recall | SaTScan™ recall | (DP-SaTScan™) 95% confidence interval limits | ||
|---|---|---|---|---|---|---|
| Low | < 16% | 0.08729 | 0.15045 | - 0.05982 | - 0.06649 | < 0.00001 |
| 16-34% | 0.13665 | 0.18430 | - 0.04485 | - 0.05045 | < 0.00001 | |
| ≥ 35% | 0.18579 | 0.16384 | 0.02682 | 0.01708 | < 0.00001 | |
| Mid | < 16% | 0.39280 | 0.37422 | 0.02227 | 0.01489 | < 0.00001 |
| 16-34% | 0.56118 | 0.48644 | 0.07769 | 0.07180 | < 0.00001 | |
| ≥ 35% | 0.74869 | 0.58639 | 0.17157 | 0.15305 | < 0.00001 | |
| High | < 16% | 0.62454 | 0.49413 | 0.13376 | 0.12706 | < 0.00001 |
| 16-34% | 0.82999 | 0.70011 | 0.13170 | 0.12805 | < 0.00001 | |
| ≥ 35% | 0.91850 | 0.76347 | 0.15955 | 0.15052 | < 0.00001 | |
| Very high | < 16% | 0.93248 | 0.69904 | 0.23695 | 0.22993 | < 0.00001 |
| 16-34% | 0.99829 | 0.78494 | 0.21539 | 0.21132 | < 0.00001 | |
| ≥ 35% | 0.99813 | 0.78448 | 0.21034 | 0.21697 | < 0.00001 | |
Comparison of the DP and SaTScan™ spatial recall. The table shows the mean recall for each algorithm, the lower and upper 95% confidence limits for the difference in recall (DP-SaTScan™) and the associated
The spatial precision results show that SaTScan™ achieves precision not statistically significantly different from DP for outbreaks of low coverage and frequency, while for other cases DP achieves statistically significantly higher precision. The spatial recall results show that SaTScan™ achieves recall that is statistically significantly higher than DP for outbreaks of coverage below 35% and low frequency. Both methods perform generally poorly in this range. In the remainder of the cases DP outperforms SaTScan™ in terms of recall as well as precision.
We emphasize that DP has the advantage of performing inference about whether an individual has influenza using the same disease model that was used to generate the injected outbreaks. SaTScan™, however, merely searches for elevated chief complaint counts. Also note that the injected outbreaks are rectangular in shape, while SaTScan™ uses elliptical search windows. An ellipse is not able to fit some of the outbreak scenarios well. Additionally, unlike our model which has prior parameters defining a distribution over the chief complaints that are observed when no outbreak is present, SaTScan™ computes the expected counts for the absence of a cluster based on the counts outside the search window. As a result, when an elliptical window fails to accurately enclose the outbreak, SaTScan™'s assessment of the expected level of chief complaint counts is affected. We believe that the combination of these factors is what is responsible for SaTScan™'s lower performance.
A major motivating factor for developing a dynamic programming spatial scan is that brute force multi-region spatial scans are computationally intensive. In order to evaluate the feasibility of using the DP algorithm in a practical application setting, we performed a variety of timing tests on input data of varying grid sizes and populations. The population size affects the running time not due to any feature of the dynamic programming itself, but rather due to the agent-based disease model. In order to properly interpret the timing results, it is first useful to briefly describe some technical aspects of our implementation.
The implementation consists of two modules, one module reads in the data to be scanned and uses the disease model to compute the log-likelihood of each possible grid rectangle for each value of the flu incidence frequency
Figures
Figure
This paper described a dynamic programming algorithm for spatial cluster detection. The algorithm specifies alternative clustering hypotheses in terms of colored tilings of the surveillance grid. We showed how the algorithm can be used both for characterizing the location and shape of the most likely clusters and for calculating the posterior probability of the presence of clusters in the data.
Tests of general detection in terms of area under the ROC curve show that the new method is not statistically significantly better than the naïve calculation that only averages over single-rectangle hypotheses. This result appears to remain the case even when the true outbreaks cannot be reasonably approximated by a single rectangle. We conjecture that the additive nature of the process of model averaging either gives the naïve method more detection power than expected or hinders the dynamic programming method. On the one hand, even though the single-rectangle method cannot accurately capture multiple clusters with a single hypothesis, it can accurately capture parts of the multi-cluster scenario (the single rectangles that compose a multi-rectangle scenario). These partial agreements may make a sufficient contribution to the likelihoods so that averaging over all hypotheses results in relatively high posterior probability for the multi-cluster scenario. On the other hand, while the dynamic programming algorithm is able to accurately capture a multi-cluster scenario in a single hypothesis, it does consider an exponential number of alternative hypotheses that do not accurately describe the outbreak scenario. It is possible that when averaging over the entire hypothesis space the score of the accurate hypotheses is simply outweighed by the alternative scores. The reasons for why the two methods perform similarly may be a combination of both of these reasons and possibly others. Further investigation is needed to explain these results fully.
In spite of the lack of improvement in general detection power, tests of location and shape characterization show that under certain conditions the dynamic programming method performs better than the naïve SR method and the greedy method. The most notable improvement is in the precision of cluster locations detected, as DP algorithm tends to output fewer false-positive outbreak cells than the other algorithms. Recall of cluster locations is on par with the greedy method.
Measurements of the DP algorithm's running time over varying grid and population sizes showed that it scales well to larger grids, being able to complete a scan of a 100 × 100 grid in about five minutes. The measurements also show that the dynamic programming algorithm has a general advantage of taking no more time than the naïve single-rectangle scan. Since the scan itself relies on having access to a likelihood function for the likelihood of an outbreak in a particular rectangle, the runtime will depend on nature of the likelihood calculation. In our implementation and using our particular disease model, this calculation could be performed in a separate module and timed separately. We observed that the likelihood calculation scales linearly with the population using our agent-based model, taking under four minutes for a population of almost 150 million.
Just like the baseline methods considered, the dynamic programming algorithm can be applied to other likelihood models as long as they provide a likelihood as a function of location. In particular, this means that even though an agent-based model was used in the experiments, these methods are also suited for use with a count-based model. These methods can also be extended to additional space dimensions as well as time, and multiple cluster classes (outbreak states). The DP algorithm can be also modified to perform finer characterization such as finding the most likely values of
aSaTScan™ is a trademark of Martin Kulldorff. The SaTScan™ software was developed under the joint auspices of (i) Martin Kulldorff, (iii) the National Cancer Institute, and (iii) Farzad Mostashari of the New York City Department of Health and Mental Hygiene.
The authors declare that they have no competing interests.
GF Cooper conceived of the basic dynamic programming algorithm presented here, advised and supervised all stages of research, and participated in the drafting of the manuscript. X Jiang performed initial tests and initial implementation of the model selection version of the dynamic programming algorithm and performed the analysis of the number of tilings searched. Y Sverchkov contributed to the development of the model averaging version of the dynamic programming algorithm, implemented and tested all the algorithms presented in this paper, and took the lead in drafting this paper. All authors read and approved the final manuscript.
The pre-publication history for this paper can be accessed here:
Click here for file
Click here for file
Click here for file
This work was funded by CDC grant P01HK000086 and NSF grant IIS-0911032.