This is an Open Access article distributed under the terms of the Creative Commons Attribution License (

Climate may exert a strong influence on health, in particular on vector-borne infectious diseases whose vectors are intrinsically dependent on their environment. Although critical, linking climate variability to health outcomes is a difficult task. For some diseases in some areas, spatially and temporally explicit surveillance data are available, but comparable climate data usually are not. We utilize spatial models and limited weather observations in Puerto Rico to predict weather throughout the island on a scale compatible with the local dengue surveillance system.

We predicted monthly mean maximum temperature, mean minimum temperature, and cumulative precipitation at a resolution of 1,000 meters. Average root mean squared error in cross-validation was 1.24°C for maximum temperature, 1.69°C for minimum temperature, and 62.2 millimeters for precipitation.

We present a methodology for efficient extrapolation of minimal weather observation data to a more meaningful geographical scale. This analysis will feed downstream studies of climatic effects on dengue transmission in Puerto Rico. Additionally, we utilize conditional simulation so that model error may be robustly passed to future analyses.

Transmission of many infectious diseases is conditioned by the environment. Arthropod-borne diseases are particularly susceptible to climatic influence because transmission is reliant on ectothermic vectors that depend on temperature and often precipitation for development and survival [

Our particular interest is the influence of climate on dengue transmission in Puerto Rico. Dengue viruses cause severe morbidity and occasional mortality in people who become infected. Globally, hundreds of thousands of cases are reported annually to the World Health Organization, including tens of thousands of deaths [

The Puerto Rico Department of Health and the U.S. Centers for Disease Control have collected data on reported cases since the 1970s and classified them spatially to the municipal scale based on the patient's residence. Although 76 of Puerto Rico's 78 municipalities are located on a single island 180 km long and 65 km wide, climate varies substantially due to a central mountain range (Figure

Weather is most accurately observed directly at weather stations. Throughout Puerto Rico, 92 stations reported weather observations at some point during our study period. Though the accuracy, temporal resolution, and temporal coverage of these measurements are high, the observations are made at specific geographic points which do not represent the entire island. Moreover, the spatial distribution of stations is neither uniform nor constant through time, with as few as 18 stations reporting temperature in a given month. Previous studies of the association between weather and dengue have treated the spatial limitations of observed weather data in different ways. Some have used averaging [

An alternative to direct observation is using remotely sensed data as a proxy for climate [

Without suitable remotely sensed data to augment estimation, we are left with the observational data. While the temporal coverage and resolution of this data is appropriate, optimization of resolution in the spatial domain is critical to maximize the power of downstream analyses. As mentioned above, previous studies of weather and dengue have neglected to develop and test methods to do this. Here we develop and compare dynamic spatial models utilizing weather station observations to predict weather for the rest of the island using important weather-related covariates including latitude, longitude, altitude, slope, and aspect. Latitude and longitude allow the identification of geographic trends such as the North-South gradient in temperature associated with differing solar exposure. Altitude is an important determinant of temperature because air pressure decreases with increasing altitude. As pressure decreases, air expands, so that heat is adiabatically dispersed. Heat dispersion causes temperature to decrease, increasing water vapor condensation such that clouds form and produce precipitation. While wind moving up a mountain cools and releases precipitation, downsloping wind on the leeward side is drier due to the loss of water vapor and warms as air pressure increases. Slope and aspect (the direction of slope) are critical components of this process, determining how fast warming and cooling occur and where the effects are present geographically.

We analyze three different model types: linear regression, traditional universal kriging, and Bayesian universal kriging (hereto referred to strictly as Bayesian kriging). Linear regression allows prediction for unknown locations based on covariate characteristics. Universal kriging assumes that these covariates do not account for all of the spatial covariance [

In the current paper we focus explicitly on the methodology for estimating weather throughout Puerto Rico on scales consistent with dengue surveillance data. We pay particular attention to model development and assessment with the aim of thoroughly describing the methodology such that it may be used in other settings where different spatial and temporal scales may be relevant. Analysis of the relationship between temperature, precipitation, and dengue transmission in Puerto Rico will be published subsequently.

We first investigated the selection of month-specific covariates for the prediction of maximum temperature (T_{max}), minimum temperature (T_{min}), and precipitation. Altitude entered into every T_{max }model and longitude, latitude, and slope were accepted as covariates in 42%, 28%, and 13% of the 252 monthly models, respectively (Table _{min }models, altitude was again universally included. Longitude (15%) and slope (7%) were the other frequently repeated covariates. Altitude was an anticipated covariate because of altitudinal adiabatic cooling. Latitude was expected to be important due to gradient exposure to solar radiation. However, the more frequent inclusion of longitude in models (42 vs. 28%), suggests that the direction of the prevailing winds may be a more important influence in Puerto Rico. Because latitude and longitude create a grid over the surface, they allow the identification of directional trends such as the cooling and warming that occurs as air moves up and down slopes. Thus the dominant factors are altitude and directionality as represented by latitude and/or longitude. Slope may serve as an additional modifier of the directional and altitudinal effects for transitional areas. Secondary covariates were less frequent in the T_{min }models. T_{min }reflects night-time temperature which is influenced by both the heat capacity of physical surfaces and changes in advection relative to the structure of those surfaces [

Covariate inclusion frequency

Latitude | Longitude | Altitude | Slope | Aspect | |

n (%) | n (%) | n (%) | n (%) | n (%) | |

T_{max} | 71 (28) | 107 (42) | 252 (100) | 32 (13) | 3 (1) |

T_{min} | 0 (0) | 38 (15) | 252 (100) | 17 (7) | 6 (2) |

Precipitation | 141 (56) | 113 (45) | 163 (65) | 9 (4) | 26 (10) |

In precipitation models, the frequently included covariates were altitude (65%), latitude (56%), and longitude (45%, Table

We used linear regression, universal kriging, and Bayesian kriging to predict monthly mean observations for island-wide grids of various resolutions. Assessing the fit of each model is problematic because the limited number of observations makes it desirable to use as many observations as possible for model fitting. For this reason we use cross-validation; each observation is predicted by a model fitted with all observations except the one to be predicted [

_{max }(A), T_{min }(B), and precipitation (C). Mean RMSE is significantly different between models at all resolutions for T_{max }(Bonferroni-corrected, paired _{min }mean RMSE is significantly lower for the linear regression model at all resolutions. Universal and Bayesian kriging models are significantly different at 3,000, 5,000, and 10,000 meters. Precipitation mean RMSE was significantly higher for linear regression models than for universal or Bayesian kriging models, which were not significantly different at any resolution.

Allowing spatial covariation in the temperature models decreased the fit of the models; the simple linear regression model had significantly lower mean efficient cross-validation RMSE at all resolutions compared to the universal kriging or Bayesian kriging models for both T_{max }and T_{min }(Figure

The precipitation models, in contrast, contained significant residual spatial variance in the linear regression model as evidenced by the reduced RMSE of universal and Bayesian kriging models (Figure

Cross-validation error

Linear regression | Universal kriging | Bayesian kriging | |

RMSE (range) | RMSE (range) | RMSE (range) | |

T_{max }(°C) | 1.24 (0.71–2.13) | 1.25* (0.74–2.20) | 1.23 (0.76–2.39) |

T_{min }(°C) | 1.69* (0.77–3.15) | 1.71* (0.77–3.15) | 1.83* (0.88–3.16) |

Precipitation (mm) | 69.2* (13.4–210.3) | 65.8* (13.4–210.3) | 62.2* (13.4–194.6) |

*models significantly different by Bonferroni-corrected paired

The resolution of the model must balance the accuracy and utility of high resolution with the covariate smoothing effects of low resolution. Because we used cross-validation to test the model, we expect the models to perform better at high resolutions where the focus over which geographic features are smoothed is much smaller and thus may be closer to those of the actual station. However, slight changes in grid resolution may increase or decrease the similarity between the characteristics of the prediction pixel and the observation site, so there may be significant flux in the relationship at small intervals.

As expected, T_{max }models exhibited increased error with decreased resolution (Figure _{min }models generally had the opposite trend with error decreasing between 500 and 1,000 meters and again, more drastically, between 1,500 and 3,000 meters. This suggests that the determinants of T_{min }occur over larger areas so that smoothing enhances prediction even as it reduces the specificity of local covariates. The local minima present at 1,000 and 3,000 meter resolutions in the T_{min }model also occur in the precipitation model. The regular occurrence of minima at 1,000 and 3,000 meters suggests that these are the most appropriate resolutions for this data set. Because the 1,000 meter resolution was the minima in precipitation models, close to the minima in T_{max }models, a local minima in T_{min }models, and provides a relatively fine resolution when compared to the smallest administrative area relevant to disease surveillance (approximately 12.5 km^{2}), we select it for our final models.

Completely refitting each final model for cross-validation at 1000 m resolution, the mean RMSE was 1.24°C for T_{max}, 1.69°C for T_{min}, and 62.2 millimeters for precipitation (Table _{max }and T_{min }models, increased cross-validation error occurred at higher altitude, likely due to under-representation in observations from those areas. The same problem occurred in the precipitation model, with the majority of error confined to the particularly rainy areas around the El Yunque peak in the northeastern corner of the island. Cross-validation error translates to prediction error (Figure

_{max }(linear model, °C), T_{min }(linear model, °C), and precipitation (Bayesian kriging model, mm) at 1,000 meter resolution.

_{min }(linear model, °C), and precipitation (Bayesian kriging model, mm) predictions at 1,000 meter resolution.

In the precipitation model, spatial covariance is a critical model parameter. As such, prediction error is very different, principally reflecting station proximity. For example, Culebra, the small island northeast of the mainland, had no observations during this time period. Because of its distance from observation stations, there is very little spatial information to augment covariate information and the prediction error is high. The fact that prediction error increases with distance from observation points is a critical consideration. Kriging models will fit best for areas where observations are relatively close. Though spatial heterogeneity of precipitation is significant even on small scales [

Because we wish to use this model in studies of arbovirus disease transmission, it is critical to have model output that represents the error in the model. Bayesian analytical techniques lend themselves to this purpose because they produce outcome distributions based on simulations that maintain the covariance structure of the model. Simulations can be used in later analyses to account for the possibility of consistent bias in the weather model. The covariance in the simulations preserves model error thus increasing the robustness of downstream analyses.

Figure _{max}, T_{min}, and precipitation in the municipalities of coastal San Juan and mountainous Adjuntas relative to observations when available. The mean and intervals represent the distribution of estimates for each municipality based on 1,000 simulations of each 1 km^{2 }grid point within the municipality. In San Juan, though observations were not made at all time points, the model predicts those observations based on the stations located in other municipalities. Although there are areas where the model appears to fit the data less accurately, the true time series is susceptible to natural local variation or problems with instrumentation, effects that are reduced by utilizing information collected at other sites. In Adjuntas, observations are more consistent throughout the time series. The model uniformly underestimates T_{max }and overestimates T_{min}. This may be due to the specific station location or the fact that Adjuntas is a relatively high-altitude municipality and there are few high-altitude observations.

_{max }(red) and T_{min }(blue) are predicted maximum and minimum (respectively) mean monthly temperatures for each municipality. Precipitation (light blue) is predicted total monthly precipitation. Black lines are actual observations from stations within each municipality.

We have developed temperature and precipitation models for Puerto Rico that enhance the spatial resolution of raw observational data. The methods presented use basic physical characteristics and spatial information to efficiently predict weather for many locations over an extended time period without making

The raw data used here is discontinuous, spatially-disperse weather observations and globally available altitude data. The ability to convert this data into weather predictions at a meaningful spatiotemporal scale is an invaluable tool for further research. Climate varies drastically in time and space. The impact of this variability on health outcomes must therefore be measured at a relatively fine scale and compared to weather patterns at an equally fine scale. Our model provides a mechanism to address the latter problem. As always, models are limited by the data used to create them, but full utilization of that data, robust cross-validation, and conditional simulation can produce useful predictions with robust error consideration.

Climate observations were obtained from the National Climate Data Center (NCDC) [

Altitude, slope, and aspect were derived from the 3 arc-second digital elevation model (DEM) of the Shuttle Radar Topography Mission [

Rasters for model output were created as grids with resolutions of 500 to 10,000 meters. This range of resolutions was selected to explore trade-offs between accuracy, computational efficiency, and final resolution. The DEM layer was smoothed to a resolution equivalent to the specified grid size using a square unit filter of size equivalent to the grid resolution using Spatial Modeler in ERDAS Imagine. Covariates for each grid location were extracted from this layer. The filtering ensures that the covariates represent the average for the grid square rather than a point estimate for its center.

Three different models were assessed: linear regression which ignores the spatial component of the data, and universal and Bayesian kriging models which incorporate spatial structure [

for climate variable _{1}..._{P}, indexed by station location

where ^{2 }parameterize a spatial covariance model. The range, ^{2}, defines the covariance beyond ^{2}, the covariance at zero distance (estimated as a non-zero value due to measurement error and limited sampling at short distances). The statistical package R (version 2.6.0) [

All models were developed so that parameterization and prediction occurs exclusively on a monthly scale. For each month, covariates were selected dynamically using stepwise F-tests for inclusion of the covariates latitude, longitude, altitude, slope, and aspect. This approach was taken to allow temporal flexibility because regular and irregular factors, such as hurricanes or changing direction of prevailing winds, may be critical. Another possible approach is to force some variables into the model [

Internal cross-validation was utilized to measure model fit due to the paucity of data. Observations from each station were predicted with data from all the other stations. Initially, for computational efficiency, models were fitted using the complete dataset and prediction was informed by a reduced dataset including all stations except for the prediction site. This omission only affects the kriging models because outcomes for other sites are incorporated into their covariance structure. For selected analyses where greater precision was required, the more computationally demanding method was used in which the model was completely refit for each prediction.

Fit was assessed as the root mean square error (RMSE),

where _{i }is the observation for each station location (

Final models were used to generate prediction distributions conditioned on the model covariance structure. For the Bayesian model, this is a natural extension as model parameters are assumed to be distributed rather than fixed best estimates. For the linear regression model conditional simulation is accomplished by deriving parameter sets conditioned on the linear model parameter estimates and using these for prediction simulations. This was performed using the R package spBayes [

The authors declare that they have no competing interests.

Both authors participated in study design and manuscript revision. MAJ did the analysis and drafted the manuscript. Both authors read and approved the final manuscript.

GEG was supported by grant U01 GM070708-04 from NIH 'Computational Models of Infectious Disease Threats.'