This is an Open Access article distributed under the terms of the Creative Commons Attribution License (

Ecological niche modeling is a method for estimation of species distributions based on certain ecological parameters. Thus far, empirical determination of significant differences between independently generated distribution maps for a single species (maps which are created through equivalent processes, but with different ecological input parameters), has been challenging.

We describe a method for comparing model outcomes, which allows a statistical evaluation of whether the strength of prediction and breadth of predicted areas is measurably different between projected distributions. To create ecological niche models for statistical comparison, we utilized GARP (Genetic Algorithm for Rule-Set Production) software to generate ecological niche models of human monkeypox in Africa. We created several models, keeping constant the case location input records for each model but varying the ecological input data. In order to assess the relative importance of each ecological parameter included in the development of the individual predicted distributions, we performed pixel-to-pixel comparisons between model outcomes and calculated the mean difference in pixel scores. We used a two sample Student's t-test, (assuming as null hypothesis that both maps were identical to each other regardless of which input parameters were used) to examine whether the mean difference in corresponding pixel scores from one map to another was greater than would be expected by chance alone. We also utilized weighted kappa statistics, frequency distributions, and percent difference to look at the disparities in pixel scores. Multiple independent statistical tests indicated precipitation as the single most important independent ecological parameter in the niche model for human monkeypox disease.

In addition to improving our understanding of the natural factors influencing the distribution of human monkeypox disease, such pixel-to-pixel comparison tests afford users the ability to empirically distinguish the significance of each of the diverse environmental parameters included in the modeling process. This method will be particularly useful in situations where the outcomes (maps) appear similar upon visual inspection (as are generated with other modeling programs such as MAXENT), as it allows an investigator the capacity to explore subtle differences among ecological parameters and to demonstrate the individual importance of these factors within an overall model.

Ecological niche modeling is an emerging spatial mapping technology designed to characterize and map the ecological niche distributions occupied by species [

We performed an ecological niche modeling study to describe the distribution of monkeypox disease in humans throughout Africa, its only endemic region [

For our purposes, we utilized the software GARP (Genetic Algorithm for Rule-set Production), which models ecological niches of species and predicts their distributions in geographic space [

The GARP modeling algorithm itself incorporates stringent internal accuracy tests for evaluating the validity of predicted distribution models. Internal model validation occurs through both iterative solving for solution optimization as well as division of input data into multiple training and testing sets for independent confirmation. Distribution output at the conclusion of a modeling session is accompanied by a table of statistics assessing significance, including a chi-square test and resulting

Although internal tests for model accuracy are available within the framework of the GARP algorithm, a limitation of this and many other spatial modeling technologies is a failure to address the endpoint requirement of being able to compare, in a statistically rigorous fashion, the degree to which individually-generated model results agree with one another. For this example, we wished to determine whether a predicted distribution for human monkeypox disease produced from ecological layers

Previously, we used ecological niche modeling software to develop a predicted geographical distribution of human monkeypox disease, shown in figure

Summary of statistical analysis of 'jackknife procedure' used to determine environmental importance of ecological parameters (re-printed with permission from [

Aspect | 0.227 | 1.1201 | 75.25 | < .0001 | 14.90913 | 0.8471 |

Diurnal Temp Range | -0.162 | 1.462 | -41.09 | < .0001 | 16.00786 | 0.8074 |

Elevation | -0.266 | 1.265 | -78.19 | < .0001 | 14.59951 | 0.8366 |

Flow Accumulation | -0.014 | 0.9739 | -5.36 | < .0001 | 12.44882 | 0.8683 |

Flow Direction | 1.0288 | < .0001 | 11.30027 | 0.818 | ||

Frost Days | -0.005 | 1.0807 | -1.62 | 13.99543 | 0.8562 | |

Land Cover | 1.1047 | < .0001 | 15.73201 | 0.8418 | ||

Precipitation | < .0001 | |||||

Minimum Temp | 0.2259 | 1.0138 | 82.75 | < .0001 | 13.24499 | 0.8606 |

Mean Temp | -0.204 | 1.1033 | -68.74 | < .0001 | 13.09959 | 0.8392 |

Maximum Temp | -0.298 | 1.4251 | -77.65 | < .0001 | 13.2078 | 0.8063 |

Topographic Index | -0.026 | 0.957 | -9.94 | < .0001 | 12.70847 | 0.8627 |

Wet Days | -0.134 | 1.2503 | -39.82 | < .0001 | 14.66416 | 0.833 |

(*) Indicates extreme values of mean difference, standard deviation, t value, and % difference. (§ Indicates the p-value for which exclusion of this parameter from the model caused no significant difference. (±) Indicates the kappa value for which exclusion caused overall model agreement to drop below significance, indicating the model loses internal accuracy without inclusion of this parameter.

In order to assess the relative importance of each ecological parameter in the development of the model, it was necessary to statistically compare each of the individual jackknife maps to the comprehensive map. This analysis goes one step beyond the scope of previously existing technology. All maps were created initially as ArcInfo grids; however, we used a free downloadable Avenue script to export the ArcInfo grids as ascii raster grids, or numerical representations of images where each pixel is represented by a unique cell with a specific score for each parameter

ESRI's

Upon transforming each GARP-generated dataset as a rasterized ascii file with rows and columns corresponding to the map's coordinates, each map was exported as a grid of identical size containing 886 rows and 739 columns. Next, we created a one-dimensional array with the number of positions equal to the number of x-coordinates in the map, thereby transforming the file from its original grid format to a single column of data. The resulting dataset contained one observation per cell and preserved both the unique score and position of each pixel. This subsequent ascii transformation array created using SAS corresponded to a dataset containing 654,754 unique observations (rows) that equaled the number of pixels in the original coverage area. This process was repeated for each spatial distribution map we wished to compare (12 jackknife maps plus 1 comprehensive map).

The individual array datasets from each map were merged together to form a single dataset with one column per jackknife map, and one row per unique pixel position. The variables for each observation consisted of the pixel identifier (its row) and the pixel score (0–10) from each column. To increase computational efficiency, we deleted all observations (i.e. rows) which had pixel scores of zero for all maps before beginning statistical analysis. These 100% niche-absence pixels represent areas that would never include predicted niche for this "species," such as oceans or desert regions. This data culling limited the data only to those pixels having at least one non-zero score thereby providing meaningful comparisons and facilitating statistical calculations. After deleting all observations for which the score was zero for all jackknife maps, 137,857 pixels remained which had at least one non-zero score.

To complete the dataset to be used in analysis for pixel-by-pixel comparison between each individual jackknife map and the comprehensive map, we created new variables to represent the difference in pixel scores between each pair of maps. Consider a particular pixel _{n}) for pixel _{y }= 2. A mean difference in pixel score of zero would satisfy the null hypothesis (if d_{n }= 0 pixel scores are the same) i.e., that such a jackknife map was identical to the comprehensive map, thus demonstrating that the missing layer had little to no influence on the predicted niche distributions.

As the sample population of map pixels was quite large (137,857 pixels), mean pixel difference scores were assumed to have a normal distribution. Therefore, a two sample Student's t-test was used to evaluate the null hypothesis and generate statistics including: mean difference, standard deviation,

The use of t-tests and other independent two-group tests evaluated only whether the mean pixel score was the same in both groups, whereas we were also concerned with the distribution of scores. In order to compare the relative difference of each map, we first utilized weighted kappa statistics using the FREQ procedure in SAS. Kappa statistics are most often used to evaluate inter-rater reliability when judging a common stimulus. In the case of map comparison, the 'raters' were the maps being compared, while the stimulus was the data provided by the variables (each map being compared) and the agreement objective was the pixel score generated by each map. These statistics were weighted based on the Cicchetti-Allison method so as to consider deviations further from the mean as more divergent than deviations closer to the mean. A kappa value of 1 indicates perfect agreement between raters and a value of 0 indicates no more agreement than that expected by chance. Weighted kappa values between 0.8 and 1 are generally accepted as having excellent agreement between the raters; values falling below 0.8 may be considered less statistically significant [

The second method for comparing the relative difference of each map was the creation of histograms showing the frequency distribution of pixel score differences for each ecological parameter's jackknife map as compared to the scores generated by the comprehensive map. We generated a 'score' variable by multiplying the number of pixels with a certain difference score by that value (i.e., if d_{n }= 5 for 50 pixels in a jackknife map as compared to the complete map, the score would be 50*5 = 250). For negative differences (indicating over-prediction of distribution), we multiplied by negative one to get a positive score. If the summed score results for each jackknife map had many pixels with scores either the same or very close to the same as the complete distribution map, the mean difference score was closer to zero, becoming larger with a greater dissimilarity. Exclusion of precipitation and flow direction again yielded the highest divergence from the comprehensive map. The distributions of pixel difference scores for 2 exemplar jackknife maps as compared to the comprehensive map are shown in Figure

Finally, we examined the frequency distribution of pixel score differences. Using absolute difference scores, we calculated the percent of pixel difference scores falling outside of one standard deviation of the mean difference in pixel score. Though the exclusion of the flow direction parameter failed to stand out, the exclusion of the precipitation parameter caused the percent difference between the jackknifed map and the comprehensive map to increase nearly two-fold (26% difference as compared with an average 14% difference). This result is shown in table

The observation that multiple independent statistical tests demonstrated a significant loss of internal consistency for the overall model when precipitation was left out, strongly supported the idea that precipitation was the single most important independent ecological parameter in the niche model for human monkeypox disease.

It is broadly accepted that the determination of range limits for species often have at their core the effect of various ecological parameters [

GARP models are sensitive to ecological inputs – we expect that most layers will have a meaningful (significant) contribution to the outcome. Furthermore, we expect to see significant differences when a layer is removed, otherwise the ecological inputs were selected poorly. We are most interested in this statistical method for its capacity to explore subtle differences among the ecological parameters, and feel its greatest utility is in revealing the extremes of the individual environmental factors' importance within the overall niche model. For example, information relating to frost in our model of human monkeypox disease showed that it wasn't a particularly useful parameter whereas the statistics relating to precipitation showed that it was of significant importance. In this study, the differences between the jackknifed models are not subtle, but are nevertheless in line with our expectations. We anticipate that this method will prove most useful for making meaningful comparisons when distribution maps appear similar upon visual inspection.

The method described herein presents a procedure for evaluating the statistical significance of ecological parameters involved in niche modeling. Here we have applied the procedure to output created using the GARP system, but this method is broadly applicable to other spatial modeling technologies as well, such as MAXENT, which others have found to be superior to GARP [

The authors declare that they have no competing interests.

RL and MG conceived of the study, participated in its coordination and execution, and drafted the manuscript. KY created the variables and performed the statistical testing. MW formatted the data. All authors read and approved the final manuscript.

Thank you to the following individuals for their expertise: I.K. Damon, CDC; M.Q. Benedict, CDC; R.C. Holman, CDC; and A.T. Peterson, University of Kansas.

This study was supported by the Centers for Disease Control and Prevention.

The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the funding agency.