This paper has two aims: (1) to summarize various geographic information science methods; and (2) to provide a review of studies that have employed such methods. Though not meant to be a comprehensive review, this paper explains when certain methods are useful in epidemiological studies and also serves as an overview of the growing field of spatial epidemiology.
In this paper, we review the use of Geographic Information Systems (GIS) and spatial analysis in environmental epidemiology and public health research. Spatial epidemiologists, health geographers, and others using geographic methods have made significant contributions to understanding potential exposure pathways in space and time, mechanisms that may influence effective biological dose, modeling of the social distributions of pollutants, and finally the assessment of health effects from environmental contaminants. There has also been considerable attention paid to the perceptions of environmental risk and how this may in turn condition biological responses to pollutants or lifestyle factors such as smoking, which may affect subsequent individual-level susceptibility.
The focus here is on the quantitative aspects of environment risks and how health geographers and others have approached the assessment of risks arising from environmental exposures. Our emphasis is on methods used to study environmental exposures, susceptibilities, ways of adapting, and ultimately the health risks of environmental exposures to human populations. Although we touch upon some of the historical aspects of the use of spatial analysis in public health research, we have drawn specifically on recent research published between 2005 and 2008 to emphasize innovations and emerging trends in the field. Interestingly, this review suggests extraordinarily rapid growth in the use of advanced geographic information science and spatial modeling for addressing questions of environmental risk. The growth in the field has meant that much of the application of spatial analysis has been conducted increasingly by people from disciplines beyond the field of Geography.
To illustrate the utility of specific methods, we draw examples related to environmental justice, atmospheric pollution, and climate change. We aim the paper to a broad audience who may be unfamiliar with epidemiology and spatial analysis; therefore, some technical details are omitted. Numerous references are given on the statistical models for readers interested in operationalizing these methods, as well as specific examples.
Here we translate Mayer’s [
Modeling combines both visualization and exploration techniques, and the statistical analysis assesses whether spatial patterns apparent in the data have occurred by chance or whether they display significant departures from random or control expectation. Spatial modeling usually focuses on data in the following forms: points (e.g., the location of individuals who have died in a given period), point attribute (e.g., estimates of pollution at a fixed-site monitor), areal form (e.g., a census tract polygon with an age-adjusted mortality rate), or continuous surface form (e.g., surfaces of pollution interpolated from estimates of fixed-point attributes). Point pattern maps are referred to as “dot” or “dot density” maps. Areal data maps are called “choropleth” maps. Maps displaying continuous surfaces are usually referred to as “contour”, “isoline” or “isopleth” maps [
Overlay analysis is the simplest form of spatial modeling, and consists of stacking different thematic maps on top of one another. This method was employed by Lindley
Conurbation-scale risk assessment was performed to evaluate an entire urban-system as well as provide a basis for neighborhood-level analyses. Similar to the conceptual framework introduced earlier, the authors defined risk to be an interaction between hazard, exposure, and vulnerability. This methodology uses GIS to create separate maps of various risk elements (
To demonstrate the method, the authors used conurbation-scale risk assessment to analyze how socio-economic change will affect the risk of heat stress (see
The authors reported this methodology to be valuable for several reasons. Firstly, since each risk element is represented as a separate layer, it is possible to modify each element individually to re-assess the final risk layer. This allows planners to easily evaluate different adaptation strategies to determine how best to mitigate the risk faced by urban areas due to climate change. Secondly, by developing this GIS method it is possible not only to identify current areas where adaptation is most necessary to deal with the risks posed by climate change, but also possible to identify areas that are most at risk in the future. Finally, to perform the conurbation-scale risk assessment, the authors used previously generated data to create the various GIS layers. By using the best available data, it was possible to produce results rapidly, which will become increasingly necessary in order for urban areas to adapt swiftly to climate change.
By employing conurbation-scale risk assessment, the authors demonstrated the usefulness of visualization and cartographic overlay. This assessment is efficient and can be completed relatively quickly since it utilizes the best available spatial data rather than creating new data. It also allows researchers to easily compare various risk scenarios to discern the proper adaptive approach to climate change.
Other research uses overlay analysis to identify areas of environmental justice concern. Environmental justice occurs when a certain social group is disproportionately impacted by harmful land uses. This has become an increasingly important topic in the study of health disparities. Researchers have recently sought answers to the health risks of residential racial segregation. In the paper titled
First we will discuss methods for assessing autocorrelation among observations. Tobler’s [
Using ambient air pollution as an example, we might expect pollution levels to be more similar between Pittsburgh and Johnstown (a nearby city) than between Pittsburgh and Seattle. This may occur because of similarities in the underlying social and economic processes that cause pollution (e.g., manufacturing base) or atmospheric processes that suspend pollutants over large distances and disperse pollutants from region to region (e.g., prevailing wind patterns). Usually the level of spatial autocorrelation would diminish as a function of distance between the two regions, unless there is some reason for similarity due to industrial structure or some other factor associated with the pollution phenomenon such as transportation emissions. Autocorrelation tests use point, line, or area features that have attribute values attached to them. One important distinction in these tests is whether they measure global or local autocorrelation.
Global autocorrelation tests measure the tendency, across all data points, for higher (or lower) values to correlate more closely together in space with other higher (or lower) values than would be expected if the data points were drawn from a random distribution. Several tests of global autocorrelation are available, with the Moran’s I being the most common. Positive values of the Moran’s I [
Sometimes global relationships are of less interest than local relationships or clusters that may display non-stationarity. Local indicators of spatial association (LISA), such as the local Getis-Ord (G) and local Moran’s I statistics, can assess clustering in small areas to identify clusters or “hot spots” of high or low values (see [121–126] for computational details). These local statistics usually break the study area into smaller regions to determine if local areas have attribute values that are higher or lower than would be expected based on the global average or a random expectation for the entire study area. Using the G statistic to investigate mortality in the American Cancer Society (ACS) Cancer Prevention II Study in 1982 on approximately 550,000 subjects followed for vital status until 1989, we found a significant mortality cluster in the lower Great Lakes area (see
A major issue in the assessment of global or local spatial autocorrelation is the selection of a “spatial weights” or “connectivity” matrix. To assess autocorrelation, it is necessary to assign a matrix that formalizes the potential for spatial dependence. The simplest form of connection is the nearest neighbor approach using a series of polygons such as the census tracts in
Other methods can examine more than one type of event and multiple confounding variables at once, yielding a more informative control for confounding and assessment of autocorrelation. The generalized linear mixed models, generalized additive models (GAM), and Bayesian models are some techniques that allow for adjustment of spatial confounding (e.g., residential clustering by age and race) [
In an example from the literature, Webster,
Interpolation is a process whereby known data points are used to infer values over a space between the points to create a continuous surface. For example, data from a network of pollution monitoring stations may be interpolated to estimate the most likely values between sample locations. There are several different types of interpolation, including kriging, inverse distance weighting, splining and Thiessen polygons [
A special type of optimal interpolation known as “kriging” can be used to generate predicted values and their standard errors. These standard errors show where the interpolation tends to be less reliable. Kriging models exploit spatial dependence in the data to develop smoothed surfaces. The spatial dependence can be divided roughly into two broad categories. First-order effects measure broad trends in all the data points such as the global mean, whereas second-order effects measure local variations at shorter distances between the points [
A third type of modeling deals with the intensity of point patterns over space. This type of modeling addresses the hypothesis that the intensity of point clustering in a given area differs significantly from a random (or control) pattern observed in the entire study area [
At the regional level, Fisher
For certain types of disease analysis point pattern analysis can provide useful insights (e.g., the incidence of asthma among young adults 20−44 years old). Data for this example came from a respiratory health survey administered in 1993−94 [
In both examples above, we explored “first order” intensity or the tendency of some areas to display a higher density of point cases. Other point pattern analyses like Ripley’s K function seek to assess “second order” effects that measure spatial interaction between the points at various distances [
A final type of modeling deals with spatial association or correlation between two or more attribute values at the same location. For example, we may wish to predict mortality rates in given areas with other attribute data such as socioeconomic, lifestyle, and pollution exposure variables. This approach then becomes similar to regression analysis (see, e.g., [
In another recent study, Jerrett
The researchers found large significant associations between particulate air pollution and mortality, with especially elevated risks for ischemic heart disease. Risks using this intra-urban exposure assessment were more than two times greater than shown in earlier studies that were based on central monitoring data and used exposure contrasts between cities rather than within them.
Importantly, the researchers were able to examine the residual mortality spatially through multilevel modeling. Figures below show the residual mortality pattern present when only the individual risks are included in the model with no pollution term (
These recent methodological advances, with the use of sophisticated Bayesian methods and with multilevel analyses, represent a major new direction in the field. In both instances, confidence in the observed health effects increased substantially with the examination of residual spatial patterns in the data. Removal of these patterns with inclusion of the environmental pollution variables provided stronger evidence that the associations did not occur by chance.
Much of the current quantitative work in spatial analysis assigns estimates of exposure to the home address and occasionally to workplace or school locations. Exposure surfaces can be assigned through raster grid cells or as points in a vector-based lattice. The result is a high-resolution estimate of potential ambient exposure across the entire urban area that can be assigned to the subjects’ addresses through the geocoder file that converts alphanumeric street addresses to a longitude-latitude coordinate or equivalent projected coordinate system such as the Universal Transverse Mercator system.
Although useful to use home or work locations, most studies have not assigned exposures based on the “activity space” occupied by individuals. Studies conducted by Kwan [
Remote sensing has emerged as an important innovation in the exposure sciences. Remote sensing can be defined as “the acquisition and measurement of data/information on some property(ies) of a phenomenon, object, or material by a recording device not in physical, intimate contact with the feature(s) under surveillance” [
Because routinely collected satellite data capable of measuring parameters that estimate ground level concentrations are generally of coarser resolution than the 500 m distance selected as a guide for traffic impacts [
The Multi-angle Imaging SpectroRadiometer (MISR) is another space-based instrument capable of estimating AOT. This instrument has a minimum grid size of 17.6 × 17.6 km, and temporal coverage of the Earth every nine days [
Special studies using Light Detection and Ranging (LiDAR) have been used to augment other meteorological and ground-level data for understanding spatial and temporal dimensions of aerosols [
Increasingly land cover information is derived partly or wholly from remotely sensed imagery. For example, as mentioned earlier, the US Multi-Resolution Land Characteristics Consortium of federal agencies has purchased and processed Landsat 7 images to classify land cover for the National Land Cover Database, which encompasses the entire US [
Processed images may also supply useful information as input to exposure models. As an example, the normalized difference vegetation index (NDVI) can be used to derive estimates of vegetative cover (see
Many of the current exposure models used to predict pollutant concentrations at a fine scale utilize ground-based information on pollutant concentrations, land use and traffic. In some instances, the geographic accuracy of these ground data may be of variable or questionable quality. Remotely sensed imagery of high resolution can be used as cross-validation against which to compare these ground data. Some examples include the location of pollution monitoring stations operated by government entities. Although increasingly these sites are marked with GPS coordinates, some error in the GPS coordinates can occur and those that rely on coordinates assigned by paper maps may have large errors. Digital orthophotos or high resolution images from IKONOS or QuickBird images, at 1−5 m resolution, can increase the spatial accuracy of the data used as input to land use regressions (e.g., [
Understanding the interface between scientific research and policy action is a complex and multifaceted undertaking. Prevention policies designed to protect public health usually involve the knowledge base, political will to act, and social strategy to accomplish change [
Some studies have had direct impact on policy. For example, the aforementioned study by Jerrett
The Office of Environmental Health Hazard Assessment (OEHHA) in California has formed a working group with the California Integrated Waste Management Board to assess cumulative environmental impacts and make policy recommendations in accordance with the Cal EPA Environmental Justice Action Plan. Members of this group titled the Cumulative Impacts and Precautionary Approaches (CIPA) Work Group come from industry, academia, and environmental and community groups to collaborate and develop feasible solutions to minimize the effect of adverse environmental impacts. Moreover, environmental justice arguments are being heard in the California legislature with the passage of Assembly Bill (AB) 32, which, as a part of the Global Warming Solutions Act, requires California to reduce greenhouse gas emissions to 1990 levels by 2020. This bill specifically mandates that an Environmental Justice Advisory Committee convene and advise the California Air Resources Board on the development of the planning and implementation of AB 32. Although direct linkages to specific studies are hard to determine, the works of Rachel Morello-Frosch and Jesdale appear to have influenced the consideration of cumulative effects and environmental justice in California because both these scholars are now on the academic partner’s team of CIPA.
This paper has reviewed concepts and methods of spatial analysis used in spatial epidemiology and public health research. Examples from published and ongoing studies served to illustrate the strengths and weaknesses of different types of spatial analysis. We have supplied a reasonably complete summary of the field, but have omitted some point pattern and multivariate methods. For example, principal components analysis may be used to characterize neighborhoods by extracting closely related components of variables describing the social, economic, and demographic characteristics of neighborhoods. The component scores can be mapped and local autocorrelation statistics can be applied to assess hot spots of low socioeconomic status or other areas likely to experience poor health [
Through this review, we have underscored the key limitations of each method and approach. Other perennial issues related to spatial analysis in a health context deserve mention. First is the ecological fallacy. In deriving group rates for display and analysis in chloropleth form, aggregation from the individual to the spatial unit can lead to incorrect inferences about individuals (referred to as the “cross-level” bias). This issue has been examined in many studies, and while a thorough review is beyond the intent of this paper, ecologic bias may lead to incorrect inference about associations between risk factors and individual health [
Relying on small units can lead to low counts of health data and subsequently unreliable rates, especially for rare diseases and events such as mortality. Various techniques have evolved for dealing with the “small numbers problem” in disease mapping [
Some of the point pattern techniques discussed earlier rely on simulated data and Monte Carlo distributions to overcome the problem of small counts by using data from larger areas created by buffers that circle a point representing a health outcome or the centroid of an existing administrative unit such as a census tract [
Finally, in most spatial analyses, controlling simultaneously for all known risk factors is problematic, and analysts may have to rely on both temporal and spatial methods. This is especially true for acute exposures that elicit a health response within a short time frame. For example, Poisson regressions of mortality counts on air pollution and weather variables, with appropriate adjustment for serial autocorrelation, build in automatic control for confounding because individuals experiencing health effects are unlikely to change their job, lifestyle, diet, and other risk factors within a short period of 1−3 days [
Given the potential of these methods, what are their prospects for future use in environmental health research? We will probably see further proliferation of spatial analysis as the methods become more familiar to researchers outside of medical geography and spatial epidemiology. The largest challenge to the expanded use of GIS and allied methods for health surveillance relates to data availability, consistency, and cost. In the United States, the myriad of private medical care suppliers will probably make the task of developing national level data capable of supporting spatial analysis even more difficult. Thus, while the knowledge and the technology are available to utilize spatial analysis in Public Health, the institutional structures for data collection, management, and dissemination are lagging. Until these structures are developed and put in place, spatial analysis will remain in the realm of a specialized approach for specific studies where data are available. While the development of “infostructure” may seem costly, the expense amounts to a rounding error on the expenditures currently made in traditional medical care.
Through this review some central conceptual issues and trends have emerged. In examining the trends, there has been a remarkable growth in the use of advanced spatial modeling that appears an essential component of spatial epidemiology and public health. Use of GIS and spatial analysis is now commonplace in many research projects and health departments, oftentimes not involving traditional health geographers.
On the assessment of health risks, the methodological advent of multilevel models and substantive idea of contextual influences on health have done much to increase the sophistication and insights into how environmental risks are both conditioned and confounded by numerous social and neighborhood factors. The use of multilevel models has elevated insights into health risks—in some of the more advanced models, the spatial approach has lead to much higher confidences in the empiric results and the demand for this kind of modeling in a field always at the interface between science and policy appears likely to grow.
Other future trends are also apparent. GPS systems and activity monitors have given researchers capacity to move beyond relatively static geographies of risk, with exposures assigned largely to the home address, to characterize mobility and activity while in the exposure space or what Hägerstrand called the “hazard fields”. Interesting and counter intuitive findings are emerging from such studies. For example, Briggs
Although still in its infancy, remote sensing holds promise for studying environmental exposures and even for characterizing susceptibilities, particularly in poorer regions that may lack digitized mapping data. Remote sensing as presented through Google Earth has also awakened the geographic imagination in ways that go beyond the traditional academy and places where health geography is typically practiced. Numerous sites have now used Google Earth to map environmental exposures and risks. Combined with more systematic efforts of web-based mapping [
This paper has reviewed the rationale for GIS and spatial analysis in environmental and public health research, with an emphasis on earlier arguments by Mayer [
Support for this work was provided by a grant from the California Air Resources Board (Grant No. 55245A), The Tides Foundation, The Berkeley Center for Environmental Public Health Tracking (Grant No. CDC U19/EH000097-05) Cooperative Agreement Number 5U38EH000186—National Environmental Public Health Tracking Program Network Implementation from the Center for Disease Control, and the National Cancer Institute USC Centers for Transdisciplinary Research on Energetics and Cancer (Grant No. UA54CA116848).
Extended Conceptual Framework for Spatial Analysis in Epidemiology and Public Health (Adapted from Jerrett, Gale and Kontgis, 2009 [
Application of conurbation-scale risk assessment.
Lindley
Using conurbation-scale risk assessment to analyze heat stress risk.
This figure is adapted from Lindley,
Local Mortality Cluster as Measured by the Getis-Ord Statistic (from ACS cohort).
Residual mortality unexplained by 44 individual risk factors (e.g., smoking) with a significant cluster of high residual mortality shown in the darker pink color with the yellow outline as estimated by the Getis-Ord Autocorrelation Statistic.
Sulfate Air Pollution and All Cause Mortality Overlay Map (from ACS cohort).
Overlay showing intersection between the residual mortality discussed above in
Comparative Mortality Figures for Men Ages 0−74 in Hamilton (1985−94).
Comparative mortality figures allow for age standardization using methods similar to a standardized mortality index (see Fleiss 1981 [
An example of maps created using Generalized Additive Modeling techniques by Webster,
Modeled Mean Concentrations of Ambient Sulfates in the ACS Study.
Standard Estimation Error Associated with Interpolated Concentrations of Ambient Sulfate Using Kriging.
(a) TRI facilities in the city of Oakland, CA (b) Intensity distribution of TRI facilities in the city of Oakland, CA (c) Ripley’s K function for TRI facilities in the city of Oakland.
Overlay Map of TSP Exceedance Zone on Interpolated Female Asthma Indicator Rates. Areas within the red isolines indicate zones where the regulatory standard for total suspended particulate matter was exceeded. Areas showing in yellow hatching overlapping with the blue and purple shading indicated rates of asthma symptoms that exceed what would be expected by chance based on a Monte Carlo simulation.
Land use regression prediction surface of particulate matter less than 2.5 microns in diameter (see Moore
Residual mortality in ZIP code areas after controlling for 44 individual confounders and age, race and sex. Rho represents a spatial autocorrelation term, which was set to zero in this example.
Residual (relative risks of mortality) mortality in ZIP code areas after controlling for 44 individual confounders and age, race and sex with the PM2.5 pollution term or autocorrelation term included. Note the decline in the amount of and spatial pattern in the residual mortality.
Residual mortality in ZIP code areas after controlling for 44 individual confounders and age, race and sex with the PM2.5 pollution and freeway pollution terms included. Note the further decline in the residual mortality and the associated spatial pattern.
Normalized Difference Vegetation Index for the Los Angeles Metropolitan Area based on Landsat Imagery. Compare to