This paper investigates spatiotemporal interpolation methods for the application of air pollution assessment. The air pollutant of interest in this paper is fine particulate matter PM_{2.5}. The choice of the time scale is investigated when applying the shape function-based method. It is found that the measurement scale of the time dimension has an impact on the quality of interpolation results. Based upon the result of 10-fold cross validation, the most effective time scale out of four experimental ones was selected for the PM_{2.5} interpolation. The paper also estimates the population exposure to the ambient air pollution of PM_{2.5} at the county-level in the contiguous U.S. in 2009. The interpolated county-level PM_{2.5} has been linked to 2009 population data and the population with a risky PM_{2.5} exposure has been estimated. The risky PM_{2.5} exposure means the PM_{2.5} concentration exceeding the National Ambient Air Quality Standards. The geographic distribution of the counties with a risky PM_{2.5} exposure is visualized. This work is essential to understanding the associations between ambient air pollution exposure and population health outcomes.

Spatial interpolation methods have been well developed to estimate values at unknown locations based upon values that are spatially sampled in GIS (Geographic Information Systems). These methods assume a stronger correlation among points that are closer than those farther apart. They are characterized as either deterministic or stochastic depending on whether statistical properties are utilized. Deterministic interpolation methods determine an unknown value using mathematical functions with predefined parameters such as distances in Inverse Distance Weighting (IDW) [

Although spatial interpolation methods have been adopted in various applications, many critical problems remain unsolved. One of them is that traditional GIS researchers tend to treat space and time separately when interpolation needs to be conducted in the continuous space-time domain. The primarily strategy identified from the literature is to reduce spatiotemporal interpolation problems to a sequence of snapshots of spatial interpolations [

Integrating space and time simultaneously is anticipated to yield better interpolation results than treating them separately for certain typical GIS applications. For example, the following set of ozone data collected 8 annual ozone concentration measurements in 1994, 4 in 1995, 3 in 1996, 6 in 1997, and 8 in 1998, as shown by the solid circular dots in

Therefore, a more integrative research framework that allows us to incorporate measurements from other years in the interpolation for the space-time point W needs to be developed. Depending on the characteristics of various spatiotemporal interpolation methods, different groups of measurements may be chosen to interpolate for W. One possibility would be choosing F and G in 1997 and D and E in 1995 since these four points are relatively closer to W in space and time. In general, spatiotemporal interpolation is based on the assumption: things that are closer in the space-time domain are more alike than those that are farther apart.

In order to integrate space and time simultaneously, a so called “extension approach” has been proposed in [

Although the idea of the extension approach is intriguing, extending spatial interpolation to incorporate a dimension of time poses many challenges. One of the biggest challenges is that space and time have different and incomparable scales. The physically sound distance in a space-time domain is therefore difficult to define or compute. For example, how would you define the spatiotemporal distance between two points P_{1} (x_{1}, y_{1}, _{1}) and P_{2} (x_{2}, y_{2}, _{t}_{2}) when the extension approach of IDW is applied? This challenge has been rarely investigated and impedes to a very large degree the development of a more generalized and logically sound approach of spatiotemporal interpolation. The existing practice used in references [

There are two main contributions of our paper. The first contribution is to examine whether the choice of time scale affects the quality of SF (Shape Function) based spatiotemporal interpolation results. SF-based 2-D triangular interpolation methods are proven to be invariant to coordinate scales [

The second contribution is to set an important initial step to investigate the associations between ambient air pollution exposure and population health outcomes. Since PM_{2.5} concentrations are only measured at certain monitoring sites and time instances, the PM_{2.5} concentrations at unsampled locations and times need to be estimated using an effective method. In this paper, an efficient 3-D SF-based spatiotemporal interpolation method is applied to the PM_{2.5} data set. The interpolated county-level PM_{2.5} has been linked with 2009 population data and the population with a risky PM_{2.5} exposure has been estimated. The geographic distribution of the counties exceeding the PM_{2.5} air quality standards is displayed.

Shape functions, which can be viewed as a spatial interpolation method, are popular in engineering applications such as finite element algorithms [

At the beginning stage of applying SF-based interpolation methods, a mesh that divides the total domain into a finite number of simple sub-domains or elements should be generated. For example,

for a 2-D spatial problem, a mesh composed of triangular elements should be generated if one wants to use shape functions for triangles to interpolate unknown values in the (x, y) coordinate system;

for a 3-D spatial problem, a mesh composed of tetrahedral elements should be generated if one wants to use shape functions for tetrahedra to interpolate unknown values in the (x, y, z) coordinate system.

Quite successful algorithms have been developed to generate triangular or tetrahedral meshes, including the popular method of Delaunay triangulation meshing [

Considering the tetrahedron in _{1}, w_{2}, w_{3}, and w_{4} at the four corners as below [_{1}, N_{2}, N_{3} and N_{4} are the following shape functions
_{1}, V_{2}, V_{3}_{4}_{2}w_{3}w_{4}, w_{1}ww_{3}w_{4}, w_{1}w_{2}ww_{4}_{1}w_{2}w_{3}w_{1}w_{2}w_{3}w_{4}

Our paper focuses on spatiotemporal interpolation problems in the domain of 2-D space and 1-D time. Using the extension approach of SF-based interpolation methods, we treat time as the imaginary third dimension z in space. Therefore, substituting the z variable by the time variable t in _{1}, N_{2}, N_{3}_{4}

Extending spatial interpolation to incorporate a dimension of time using the SF-based interpolation method has shown promising results compared with the extension methods based on IDW and Kriging [

Similar as the proof for 2-D triangular shape functions in [

Since it is assumed that coordinate scaling happens after the mesh is constructed, each tetrahedron in the mesh has the same set of corner vertices before and after coordinate scaling. For a given tetrahedron, comparison can be made for the value of _{1}(x,y,t)_{1}, V_{2}, V_{3}_{4}_{1}

It is obvious that _{1}’(x,y,t)_{1}(x,y,t)_{1}_{2}, N_{3}_{4}

When the SF-based spatiotemporal interpolation using the extension approach is applied, an important question has been neglected in the literature: will different time scales lead to different meshes using the Delaunay triangulation method?

In order to answer this question, a simple 2-D example can demonstrate whether coordinate scales affect the mesh result:

We first used the 76 house locations of the real estate data in [

Then we double the y-coordinate and generated the new mesh as shown in

It is not hard to find that the two triangulations in

Different meshes will eventually lead to different interpolation results, because each point to interpolate may be located in different elements. Therefore, a reliable mesh constructed with an appropriate time scale is fundamental to the success of a SF-based spatiotemporal interpolation using the extension approach.

Particle pollution (also known as “particulate matter”) in the air includes a mixture of solids and liquid droplets. Such particles are either emitted directly or form in the atmosphere when other pollutants react. Particles come in a wide range of sizes. The EPA (Environmental Protection Agency) is concerned about particles that are 10 micrometers in diameter or smaller because those are the particles that generally pass through the throat and nose and enter the lungs. Ten micrometers are smaller than the width of a single human hair. Once inhaled, these particles can affect the heart and lungs and cause serious health effects. EPA groups particle pollution into two categories (

Inhalable coarse particles (PM_{10}), such as those found near roadways and dusty industries, are larger than 2.5 micrometers and smaller than 10 micrometers in diameter.

Fine particles (PM_{2.5}), such as those found in smoke and haze, are 2.5 micrometers in diameter or smaller. These particles can be directly emitted from sources such as forest fires, or they can form when gases emitted from power plants, industries and automobiles.

The data used in this study is daily PM_{2.5} concentration measured in 2009 by monitoring sites over the contiguous U.S.

The data coverage contains point locations of the monitoring sites, the daily concentration level measurements of PM_{2.5}, and the days of the measurements. We obtained a number of data sets from the U.S. EPA (_{2.5} measurement is taken, and w is the measured PM_{2.5} values. The reorganized data set has some entries with PM_{2.5} values as zero, which means no measurements available at a particular site and on a particular day. After all the zero entries are filtered out, there are 146,125 daily measurements at 955 monitoring sites, which are shown as stars (*) in

The data set with locations to interpolate are the centroids of 3,109 counties in the contiguous United States. This data set has the format of (id, x, y). The estimated PM_{2.5} values at each county location and on each day in 2009 need to be computed. Therefore, there are 3,109 × 365 = 1,134,785 spatiotemporal PM_{2.5} values to interpolate.

When implementing SF-based spatiotemporal interpolation, it is important to decide the time scale, as discussed in the previous section. The four time scales shown in

In order to decide which is the best time scale to use for interpolation,

The PM_{2.5} data set with measurements was randomly split into ten nearly equally sized folds.

For each of the four time scales in

The spatiotemporal points in one fold (validation fold) were interpolated using the remaining nine folds (learning folds). Each point in the validation fold had both the original PM_{2.5} measurement and an estimated value, after the interpolation.

Six accuracy assessments were made to compare the original and estimated PM_{2.5} values in the validation fold: MAE (Mean Absolute Error), MSE (Mean Squared Error), RMSE (Root Mean Squared Error), MARE (Mean Absolute Relative Error), MSRE (Mean Squared Relative Error) and RMSRE (Root Mean Squared Relative Error). They are defined as follows:
_{i}

Since there are ten iterations and a different validation fold is chosen within each iteration, for each accuracy assessment, the average of ten accuracy results has been calculated. _{2.5} the data interpolation.

The SF-based interpolation for the county-level PM_{2.5} data was implemented in Matlab using _{2.5} values were computed for 3,109 county centroids in the contiguous U.S. on each day in 2009.

The interpolated county-level PM_{2.5} was linked to 2009 county population data. The population with a risky PM_{2.5} exposure was estimated. The revised EPA National Ambient Air Quality Standards for PM_{2.5} in 2006 was adopted here (

35 micrograms per cubic meter (35 µg/m^{3}) for 24-hours,

15 micrograms per cubic meter (15 µg/m^{3}) for the annual mean.

there is a population of 33,147,335 (33.1million) residing in counties with an annual PM_{2.5} exceeding the national standard of $15 µg/m^{3} and

more than one third of the U.S. population (111,752,669) residing in counties where PM_{2.5} exceeded 35 µg/m^{3} for at least one day in 2009.

_{2.5} exceeding EPA National Ambient Air Quality Standards.

First, we tested only four possible time measurement scales and chose the best one for the PM_{2.5} air pollution data. A more systematic and effective method should however be developed to help decide the most appropriate time scale that should be chosen in a particular application.

Second, the use of county centroids in this study could have caused biases in county population level exposure to PM_{2.5}. In future studies, a finer geographic resolution, such as census block groups and tracts, may provide a more solid base for evaluating population exposure to air pollutants.

Third, it would be of great value to link air quality with population health outcomes, such as asthma and other respiratory diseases. Such future study will hopefully further support the utility of our method presented.

Disclaimer

The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention (CDC), USA.

Annual ozone concentration sample points from 1994 to 1998. The solid circular dots (●) are the points with known measurements, while the empty dot (○) is unmeasured and needs to be interpolated for 1996.

Computing 3-D shape functions by tetrahedral volume divisions. w_{1}, w_{2}, w_{3}, and w_{4} are measured values, while the value w at location (x, y, z) is unmeasured and needs to be interpolated.

Delaunay triangulation result with the original coordinate scales.

Delaunay triangulation result with the y coordinate values doubled.

Monitoring sites with PM_{2.5} measurements in 2009.

Geographic distribution of counties in the contiguous United States that exceeded the PM_{2.5} air quality standards in 2009.

FOUR TIMES SCALES TESTED FOR THE PM_{2.5} DATA SET.

time | ScaleA | ScaleB | ScaleC | ScaleD |
---|---|---|---|---|

01/01/2009 | 1 | 0.1 | 0.2 | 0.067 |

01/02/2009 | 2 | 0.2 | 0.4 | 0.133 |

… | … | … | … | … |

12/31/2009 | 365 | 36.5 | 73 | 24.333 |

ACCURACY ASSESSMENTS FOR THE PM_{2.5} DATA SET.

Accuracy | ScaleA | ScaleB | ScaleC | ScaleD |
---|---|---|---|---|

3.1538 | 3.5621 | 3.2526 | 3.7344 | |

77.2331 | 74.5937 | 74.7896 | 73.0842 | |

8.6521 | 8.4539 | 8.4200 | 8.3536 | |

3.2384 | 0.4286 | 0.3866 | 0.4486 | |

5462.2800 | 35.4845 | 36.6819 | 35.1620 | |

73.3605 | 3.1898 | 3.4048 | 3.2607 |