Prev Chronic DisPrev Chronic DisPCDPreventing Chronic Disease1545-1151Centers for Disease Control and Prevention23742941368281012_026510.5888/pcd10.120265Tools and TechniquesPeer ReviewedUsing Geographic Information Systems to Compare Municipal, County, and Commercial Parks DataEvensonKelly R.PhDMSWenFangMSMCSAuthor Affiliations: Fang Wen, Gillings School of Global Public Health, Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, North Carolina.Corresponding Author: Kelly R. Evenson, Department of Epidemiology, Gillings School of Global Public Health, Department of Epidemiology, and Center for Health Promotion and Disease Prevention, University of North Carolina, Chapel Hill, 137 East Franklin St, Ste 306, Chapel Hill, NC 27514. Telephone: 919-966-4187. E-mail: kelly_evenson@unc.edu.2013066201310E93Introduction

Parks are an integral part of a favorable built environment, and several studies have found a positive association between a favorable built environment and physical activity. Parks data are available to researchers from various sources; however, the accuracy of data sources in representing parks is unknown. This study compared secondary parks data obtained from a commercial vendor with data from municipal/county government records, all of which were verified by using Internet searches, telephone inquiries, or on-the-ground audits.

Methods

We studied large metropolitan areas in 3 states: North Carolina (1,837 sq mi), Maryland (1,351 sq mi), and New York (260 sq mi). We collected information on park land area (shapefiles) from municipal/county governments from 2009 through 2012 and from a commercial source in 2010.

Results

Commercial parks data did not include 31.1% (119/383, 20.3 sq mi) of North Carolina, 42.9% (187/436, 21.8 sq mi) of Maryland, and 71.7% (640/892, 13.5 sq mi) of New York parks that we found and verified from municipal/county sources. Municipal/county data did not include 15.7% (60/383, 9.9 sq mi) of North Carolina parks, 27.5% (120/436, 74.6) of Maryland parks, and 9.0% (80/892, 6.3 sq mi) of New York parks that we found and verified from commercial sources.

Conclusion

In this study, the combination of commercial and municipal/county data sources that were verified provided the most complete and accurate shapefile. The quality of secondary sources of parks data should be checked prior to use and, if needed, methods incorporated to improve the capture of parks.

Background

Numerous studies have found a positive association between a favorable built environment and physical activity, such as walking or bicycling (14). Parks are an integral part of the built environment. They exist in many communities and often provide free places for physical activity (5). Researchers and public health practitioners have studied access to parks to help plan where new parks should be developed, to identify underserved locations, and to determine what facilities should be offered at the parks (69). The development of geographic information systems (GIS) has facilitated the study of spatial access and use of parks.

Researchers using GIS to study parks can obtain parks data from several sources. Sources include using commercial sources (9,10), assembling park locations from local jurisdictions, such as municipal or county governments (7,11), and on-the-ground audits that include measuring park boundaries in the field (1214). Each of these sources varies in cost and time required. To our knowledge, no study has compared the accuracy of commercial and municipal/county data sources in representing park geographic area and amenities. Our study compared parks data obtained from commercial sources with those from municipal and county government sources for 3 large metropolitan areas in 3 US states: North Carolina, Maryland, and New York. The findings highlight strengths and limitations of both data sources. We also explored the effect of parks being omitted from both data sources.

Methods

We defined a park as a public place set aside for physical activity and enjoyment. This definition did not include cemeteries, mobile home parks, historic sites, professional stadiums, country clubs, zoos, private parks, private facilities (such as stand-alone baseball or tennis facilities), or stand-alone recreation centers.

Study area

The study areas corresponded to 3 of 6 US locations from the Multi-Ethnic Study of Atherosclerosis (MESA), a cardiovascular cohort study that enrolled 6,814 participants from 2000 through 2002 (15). The 3 study areas, as defined by the MESA Study, were expanded to capture areas where participants had moved since enrollment. In this paper, we refer to the study areas by state: North Carolina (Davidson, Davie, Guilford, Forsyth, Randolph, Rockingham, Stokes, Surry, and Yadkin counties [1,837 sq mi]); Maryland (79 zip code areas in Anne Arundel, Carroll, Harford, Howard, and Baltimore counties and Baltimore city [1,351 sq mi]); and New York (183 zip code areas in Bronx, Brooklyn, Manhattan, and Queens boroughs, and Westchester County [260 sq mi]).

Data collection

From 2009 through 2012, we used municipal or county GIS shapefiles (GIS files that include the park name and an outline of each park drawn as a polygon) to locate parks, most of which came from planning, parks, and recreation departments. In a few instances, we used Google maps (http://maps.google.com/maps) to draw a park boundary when no other outline of the park was available. If only part of the polygon for a confirmed park was in the study area, we included it in our study. Parks with multiple polygons but the same name were manually merged and designated as 1 park. Parks were verified by using Internet searches, telephone inquiries, and if necessary on-the-ground audits.

To determine the amenities available at each park (eg, tennis courts, basketball hoops, swimming pools), we searched online, contacted municipal/county departments, or visited the park. This process also allowed us to verify that the park conformed to our park definition.

We obtained commercial data on parks for 2010 from Esri (Esri, Redlands, California). Esri metadata (a summary document containing information on the data set) indicated that parks and forests were identified at the national, state, and local levels, including county and regional parks, and referenced Tele Atlas MultiNet North America (Lebanon, New Hampshire; www.teleatlas.com). We verified the existence of parks and park facilities that Esri identified by using the same methods we used to verify municipal/county sources, primarily through Internet searches and telephone inquiries.

Statistical analysis

We used several tools in ArcGIS 10.0 (Esri, Redlands, California) to compare the park shapefiles obtained from the commercial sources with files obtained from municipal/county sources. For each of the 3 states, GIS files from both data sources were assembled and overlaid using the state plane coordinate system. Parks that partially overlapped were explored manually in ArcGIS by comparing the park name, shape, and percentage of the area overlapping to determine whether the parks were the same.

The area of each park polygon was calculated for both data sources by using the ArcGIS calculating geometry tool. With the 2 shapefiles projected on top of each other in ArcGIS, the concordant park area from the 2 data sources was extracted, corresponding to spatially matched areas. This area in square miles was calculated for both matched and mismatched park areas.

To quantify the impact of missed parks (defined as parks reported in one data source but not the other), we calculated an indicator described in the Centers for Disease Control and Prevention’s (CDC’s) recommended strategies to enhance or create access to places for physical activity (16). The indicator for the extent of the public’s access to parks was defined as “the percentage of US census blocks with parks.” The indicator was calculated as the proportion of 2010 census blocks that have at least 1 park within the block or within 0.5 miles of the block boundary. This metric was calculated for both data sources separately and for both combined. As a second metric to quantify the effect of missed parks, the percentage of parks with each type of facility missed (eg, basketball court, swimming pool) was calculated for both data sources.

Results

Overall, we verified the existence of 383 parks in the NC study area, 436 parks in the Maryland study area, and 892 parks in the New York study area (Table 1). The commercial data source did not include the following percentage of parks found and verified in municipal/county sources: 31.1% (119/383, 20.3 sq mi) in North Carolina, 42.9% (187/436, 21.8 sq mi) in Maryland, and 71.7% (640/892, 13.5 sq mi) in New York. The municipal/county data sources did not include the following parks found and verified in the commercial source: 15.7% (60/383, 9.9 sq mi) in North Carolina, 27.5% (120/436, 74.6 sq mi) in Maryland, and 9.0% (80/892, 6.3 sq mi) in New York. Municipal/county data sources showed higher percentages of land area with parks for North Carolina and New York than did the commercial data sources but a lower percentage for Maryland.

Comparison of Parks Data Obtained From Municipal/County Sources with Data Obtained from Commercial Sources in 3 Locations: North Carolina, Maryland, and New York, 2009–2012
Park DetailsNorth Carolinaa, n = 383b
Marylandc, n = 436b
New Yorkd, n = 892b
Municipal/ Countye Commercialf Municipal/ Countye Commercialf Municipal/ Countye Commercialf
Number of parks
Number of parks, total323261316246812251
Parks in both data sourcesg 204201129126172171
Parks in municipal/county data but not in commercial data119NA187NA640NA
Parks in commercial data but not in municipal/county datah NA60NA120NA80
Park area (sq mi)
Park area, total43.032.567.9120.731.824.7
Park area spatially overlaidi 22.722.746.146.118.318.3
Park area in municipal/county data but not in commercial data20.3NA21.8NA13.5NA
Park area in commercial data but not in municipal/county dataNA9.9NA74.6NA6.3
Percentage of study area in parks 2.31.85.08.912.39.5

Abbreviation: NA, not applicable.

North Carolina study area comprised Davidson, Davie, Guilford, Forsyth, Randolph, Rockingham, Stokes, Surry, and Yadkin Counties (1,837 sq mi).

Total number of parks derived from combining verified municipal/county and commercial data.

Maryland study area comprised 79 zip code areas in Anne Arundel, Carroll, Harford, Howard, and Baltimore counties and Baltimore city (1,351 sq mi).

New York study area comprised 183 zip code areas in Bronx, Brooklyn, Manhattan, and Queens boroughs and Westchester County (260 sq mi).

Data from municipal/county government sources collected from 2009 through 2012.

Data from Esri (Esri, Redlands, California), 2010.

Parks that were identified in both data sources.

Includes 28 parks in North Carolina, 55 in Maryland, and 30 in New York that did not meet our definition of a park, which was defined as public place set aside for physical activity and enjoyment. This definition did not include cemeteries, mobile home parks, historic sites, professional stadiums, country clubs, zoos, private parks, private facilities (such as stand-alone baseball or tennis facilities), and stand-alone recreation centers.

The exact area where parks from municipal/county data were overlaid with parks from the commercial data.

To examine the effect of parks missing from either data source, we explored how the CDC indicator of at least 1 park within a census block or 0.5 miles from the block boundary varied with commercial data and municipal/county data. On the basis of verified (ie, via Internet searches, telephone inquiries, and audits) and combined data sources (ie, parks identified from either or both commercial and municipal/county source), the proportion of census blocks with park access was 35.2% in North Carolina, 64.1% in Maryland, and 97.9% in New York (Table 2). Verified combined parks from municipal/county data sources were more accurate than estimates from commercial sources for North Carolina (absolute proportion difference, 1.1% municipal/county vs 6.2% commercial) and New York (absolute proportion difference, 0.7% municipal/county vs 28.6% commercial), but less accurate for Maryland (absolute proportion difference, 7.6% municipal/county vs 5.2% commercial).

Parks per Census Block, by Data Source and Study Area: North Carolina, Maryland, and New York, 2009–2012
Study AreaNo. of Census Blocks in AreaMunicipal/Countya, n (%)Commercialb, n (%)Combined, nc (%)
North Carolina37,49212,798 (34.1)10,855 (29.0)13,214 (35.2)
Maryland38,35621,685 (56.5)22,610 (58.9)24,598 (64.1)
New York32,81931,910 (97.2)22,741 (69.3)32,134 (97.9)

Data from municipal/county government sources collected from 2009 through 2012.

Data from Esri (Esri, Redlands, California), 2010.

Data from both municipal/county government and commercial (Esri) sources combined.

To examine the impact of missing parks in either data source, we also quantified the facilities missed if relying only on 1 data source (Table 3). For example, if relying only on municipal/county park data, the data would be missing 12 parks with baseball or softball fields in NC, 30 parks in MD, and 14 in NY. If relying only on the commercial park data, the data file would be missing 34 parks with baseball or softball fields in NC, 72 in MD, and 105 in NY.

Park Facilities Missed by Relying on 1 Data Source, by Study Area<xref rid="T3FN1" ref-type="table-fn">a</xref>
Parks with Each FacilityPark Facilities Missed if Relying on Municipal/County Data Onlyb
Park Facilities Missed if Relying on Commercial Data Onlyb
North Carolina (n = 32) na (%)Maryland (n = 65) na (%)New York (n = 50) na (%)North Carolina (n = 119) na (%)Maryland (n = 187) na (%)New York (n = 640) na (%)
Outdoors
Baseball or softball fields 12 (37.5) 30 (46.2) 14 (28.0) 34 (28.6) 72 (38.9) 105 (16.4)
Basketball hoops 5 (15.6) 26 (40.0) 17 (34.0) 25 (21.0) 84 (44.9) 391 (61.2)
Bocce ball courts 0 0 1 (2.0) 0 1 (0.5) 15 (2.3)
Cricket fields 0 0 0 0 0 2 (0.3)
General purpose fields 2 (6.3) 23 (35.4) 5 (10.0) 7 (5.9) 60 (32.1) 19 (3.0)
Golf holes 1 (3.1) 0 1 (2.0) 7 (5.9) 3 (1.6) 4 (0.6)
Football fields 0 1 (1.5) 1 (2.0) 3 (2.5) 0 14 (2.2)
Skate park 0 1 (1.5) 1 (2.0) 4 (3.4) 2 (1.1) 5 (0.2)
Soccer fields 1 (3.1) 0 (0.0) 0 11 (9.2) 2 (1.1) 13 (2.0)
Swimming pools 3 (9.4) 2 (3.1) 2 (4.0) 7 (5.9) 9 (4.8) 23 (3.6)
Tennis courts 7 (21.9) 14 (21.5) 4 (8.0) 17 (14.3) 32 (17.1) 33 (5.2)
Tracks 0 0 1 (2.0) 0 0 12 (1.9)
Volleyball courts 9.4 4 (6.2) 0 9 (7.6) 9 (4.8) 32 (5.0)
Outdoors or indoors
Racquetball, handball, or squash courts 1 (3.1) 0 5 (10.0) 0 1 (0.5) 394 (61.6)
Indoors
General purpose fields 0 0 1 (2.0) 0 2 (1.1) 0
Swimming pools1 (3.1)00007 (1.1)

Parks missed when relying only on commercial data or only on municipal/county data. The numbers given in this table for “parks missed when relying only on municipal/county data” (32, North Carolina; 65, Maryland; and 50, New York) are lower numbers than those shown in Table 1 (60, 120, 80, respectively). The difference is because some parks in the municipal and county data did not meet the study’s park definition.

Number and percentage of facilities in missed parks.

Discussion

When comparing parks data obtained from commercial and municipal/county sources, we found that both data sources omitted parks whose existence was verified through Internet searches, telephone inquiries, or on-the-ground audits. The most accurate park assessment was derived by combining verified commercial and municipal/county data together.

There are several advantages and disadvantages to both commercial data and municipal/county park data for research purposes. Although it may be necessary to purchase commercial data, such data may be easier to use and require less staff time. A disadvantage to commercial data sources is that they may include spaces that are not considered parks by the researchers’ definition.

Municipal/county parks data files were generally more complete than commercial data sources; however, acquiring them required significant staff time. The quality of municipal/county GIS data varied across geographic areas, and it was unclear how frequently data files were updated. Therefore, they may be temporally mismatched across multiple administrative boundaries. Users should be aware that national parks, state parks, and forest areas may not be included in municipal/county parks data.

Neither municipal/county or commercial sources of parks data provided information on facilities in the park or the quality of parks. Facilities offered at the park can be identified, as in our study, through Internet searches, telephone calls, and site visits if needed; data on park quality can be collected through site visits or, as in New York City, through its park inspection program (8). Neither the municipal/county or commercial data sources included private neighborhood parks that may be accessible to the public. Whether these parks are of interest can be determined through an audit or site visit, although private neighborhood parks without road access may still be missed. Audits may miss parks that are unnamed (ie, lack signage), and conducting audits may require significant time and cost (14). Although park shapefiles in commercial data sources are static, we learned that they are fluid in municipal/county sources. By “fluid” we mean that parks may be added, removed, or renamed and that facilities within parks can change over time. Park shapefiles and inventory of amenities should be updated if a study spans an extended period.

Impact of the park data source

To explore the impact of the 2 park data sources, we used a CDC indicator: the percentage of census blocks that had parks within their block or within 0.5 miles of the boundary (16) (Table 2). We compared our results with CDC’s finding of a 20% median across the United States of access to parks, ranging from 2% (Mississippi) to 47% (California). For its calculations, CDC used national, state, county, and local parks data from a 2007 commercial source. We calculated the indicator by using municipal/county and commercial parks data and found that the result varied between the 2 data sources and across locations. When compared with the combined and verified park data, the absolute prevalence difference ranged from 0.7% to 7.6% for municipal/county data and 5.2% to 28.6% for commercial data. The differences were most remarkable for the commercial data for New York, because a large number of parks were missing. For North Carolina and New York, municipal/county data provided estimates closer to the combined and verified data than did data from the commercial source. However, for Maryland the commercial data provided estimates closer to the combined and verified data because of the larger spatial area of parks that were in the commercial data but not in the municipal/county data.

We also calculated the effect of parks missing from both data sources by quantifying the facilities at each park that were missed (Table 3). We found that parks that were missed did contain a variety of facilities, a finding that had a larger impact on most commonly found active park facilities, such as baseball or softball fields and basketball hoops.

Study limitations

Our study had several limitations. First, we did not compare results from the 2 data sources used here (ie, park data from the commercial source and data from municipal/county sources) with other commercial data sources that may be available. Second, we were unable to compare results by urbanicity and recognize that the quality of parks data for urban and rural areas may differ. Third, in some instances, the park shapefiles from the 2 data sources did not exactly match. In these situations, we determined if parks from the 2 sources were the same parks or different parks by comparing the park name, shape, and percentage of the area overlapping from visual inspection, by comparing names to see if they matched, and by the percentage of park area that matched.

This method was not subjective, because we did not go to the parks to see the differences. Fourth, the metadata from both sources could have provided more information on the geospatial data, such as the content, quality, positional accuracy, coverage, scale, and date of collection, but was not provided (17).

Conclusion

GIS-derived measures of parks allow practitioners and researchers to investigate park accessibility and associations of parks with physical activity by nearby residents. Studies of park accessibility and associations with physical activity would benefit from quantification of the degree of error in GIS data and ultimately the potential bias that such error introduces to surveillance measures and to environment–health associations (18). In assessing both commercial and municipal/county data sources, we found count errors (neither source listed all parks), attribute errors (commercial sources listed some parks that were not verified as such), and positional errors (parks listed in the 2 data sources did not always align). Using both data sources and verifying that parks existed was the most accurate way to develop the park shapefile in this study. However, it is still possible that parks were missed even though we used both sources.

These findings indicate that practitioners and researchers should check park shapefiles from commercial or municipal/county sources before using them by verifying them against other sources of information. A comprehensive parks file for the entire United States, developed using standardized GIS protocols (17,19), could facilitate parks-related research. With more than 9,000 local parks and recreation departments and organizations that manage more than 108,000 public park facilities and 65,000 indoor facilities (20), the coordination of data across jurisdictions is complex. A database to house this information that is regularly updated could be useful to future research and for surveillance purposes.

Acknowledgment

This study was funded by the National Heart, Lung, and Blood Institute, National Institutes of Health (NIH) (No. 2R01 HL071759). Funding was also provided the Robert Wood Johnson Foundation (RWJF) through its national program, Active Living Research (No. 52319). The grants were managed through the University of North Carolina’s, Center for Health Promotion and Disease Prevention (cooperative agreement No. U48-DP000059), a member of CDC’s Prevention Research Centers Program. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH, RWJF, or CDC.

The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions.

Suggested citation for this article: Evenson KR, Wen F. Using Geographic Information Systems to Compare Municipal, County, and Commercial Parks Data. Prev Chronic Dis 2013;10:120265. DOI: http://dx.doi.org/10.5888/pcd10.120265.

References Durand CP , Andalib M , Dunton GF , Wolch J , Pentz MA . A systematic review of built environment factors related to physical activity and obesity risk: implications for smart growth urban planning.Obes Rev2011;12(5):e17382 10.1111/j.1467-789X.2010.00826.x21348918 Giles-Corti B , Kelty SF , Zubrick SR , Villanueva KP . Encouraging walking for transport and physical activity in children and adolescents: how important is the built environment?Sports Med2009;39(12):9951009 10.2165/11319620-000000000-0000019902982 Davison KK , Lawson CT . Do attributes in the physical environment influence children's physical activity? A review of the literature.Int J Behav Nutr Phys Act2006;3(19):11716390544 Humpel N , Owen N , Leslie E . Environmental factors associated with adults' participation in physical activity: a review.Am J Prev Med2002;22(3):18899 10.1016/S0749-3797(01)00426-311897464 Kaczynski AT , Henderson KA . Parks and recreation settings and active living: a review of associations with physical activity function and intensity.J Phys Act Health2008;5(4):6193218648125 Talen E . The social equality of urban service distribution: an exploration of park access in Pueblo, Colorado and Macon, Georgia.Urban Geogr1997;18(6):52141 10.2747/0272-3638.18.6.521 Diez Roux AV , Evenson K , McGinn A , Brown D , Moore L , Brines S , Availability of recreational resources and physical activity in a sample of adults.Am J Public Health2007;97(3):4939 10.2105/AJPH.2006.08773417267710 Weiss CC , Purciel M , Bader M , Quinn JW , Lovasi G , Neckerman KM , Reconsidering access: park facilities and neighborhood disamenities in New York City.J Urban Health2011;88(2):297310 10.1007/s11524-011-9551-z21360245 Zhang X , Lu H , Holt JB . Modeling spatial accessibility to parks: a national study.Int J Health Geogr2011;10:31 10.1186/1476-072X-10-3121554690 Wen M , Kowaleski-Jones L . The built environment and risk of obesity in the United States: racial-ethnic disparities.Health Place2012;18(6):131422 10.1016/j.healthplace.2012.09.00223099113 Giles-Corti B , Broomhall M , Knuiman M , Collins C , Douglas K , Ng K , Increasing walking: how important is distance to, attractiveness, and size of public open space?Am J Prev Med2005;28(2, Suppl 2):16976 10.1016/j.amepre.2004.10.01815694525 Evenson KR , Sotres-Alvarez D , Herring AH , Messer L , Laraia BA , Rodriguez DA . Assessing urban and rural neighborhood characteristics using audit and GIS data: derivation and reliability of constructs.Int J Behav Nutr Phys Act2009;6:44: 10.1186/1479-5868-6-4419619325 Day K , Boarnet M , Alfonzo M , Forsyth A . The Irvine-Minnesota inventory to measure built environments: development.Am J Prev Med2006;30(2):14452 10.1016/j.amepre.2005.09.01716459213 Brownson RC , Hoehner CM , Day K , Forsyth A , Sallis JF . Measuring the built environment for physical activity: state of the science.Am J Prev Med2009;36(4 Suppl):S99123 10.1016/j.amepre.2009.01.00519285216 Bild DE , Bluemke DA , Burke GL , Detrano R , Diez Roux AV , Folsom AR , Multi-ethnic Study of Atherosclerosis: objectives and design.Am J Epidemiol2002;156(9):87181 10.1093/aje/kwf11312397006Centers for Disease Control and Prevention State indicator report on physical activity, 2010 national action guide. http://www.cdc.gov/physicalactivity/downloads/PA_State_Indicator_Report_2010.pdf Accessed December 18, 2012 Matthews SA , Moudon AV , Daniel M . Work group II: Using Geographic Information Systems for enhancing research relevant to policy on diet, physical activity, and weight.Am J Prev Med2009;36(4, Suppl):S1716 10.1016/j.amepre.2009.01.01119285210 Boone JE , Gordon-Larsen P , Stewart JD , Popkin BM . Validation of a GIS facilities database: quantification and implications of error.Ann Epidemiol2008;18(5):3717 10.1016/j.annepidem.2007.11.00818261922 Story M , Giles-Corti B , Yaroch AL , Cummins S , Frank LD , Huang TT , Work group IV: Future directions for measures of the food and physical activity environments.Am J Prev Med2009;36(4, Suppl):S1828 10.1016/j.amepre.2009.01.00819285212 Godbey G , Mowen A . The benefits of physical activity provided by park and recreation services: the scientific evidence. http://www.nrpa.org/uploadedFiles/nrpa.org/Publications_and_Research/Research/Papers/Godbey-Mowen-Summary.pdf Accessed December 18, 2012