^{1}

Rapid technological advances have drastically improved the data collection capacity in occupational exposure assessment. However, advanced statistical methods for analyzing such data and drawing proper inference remain limited. The objectives of this paper are (1) to provide new spatio-temporal methodology that combines data from both roving and static sensors for data processing and hazard mapping across space and over time in an indoor environment, and (2) to compare the new method with the current industry practice, demonstrating the distinct advantages of the new method and the impact on occupational hazard assessment and future policy making in environmental health as well as occupational health. A novel spatio-temporal model with a continuous index in both space and time is proposed, and a profile likelihood-based model fitting procedure is developed that allows fusion of the two types of data. To account for potential differences between the static and roving sensors, we extend the model to have nonhomogenous measurement error variances. Our methodology is applied to a case study conducted in an engine test facility, and dynamic hazard maps are drawn to show features in the data that would have been missed by existing approaches, but are captured by the new method.

Occupational exposure assessment refers to assessment of the level of contaminants an employee is exposed to during their work shift. The traditional method for occupational exposure assessment is personal monitoring using lightweight devices that can be worn by the workers. Personal exposure estimates are typically sought because they can be compared against regulatory standards to ensure compliance with existing laws. However, personal monitoring is generally expensive and requires workers to carry equipment with them during their work. As such, it is common for a small number of measurements, on asmall number of employees, to be collected [

Occupational hazard maps, contour plots of contaminant concentration over the two-dimensional floor plan of the workplace, have gained popularity as a method to overcome some of the limitations of the traditional personal sampling that is generally expensive with small sample sizes [

Hazard maps that rely on roving monitor data alone, while cost-effective to produce and conceptually simple, likely fail to represent the temporal variability in concentrations present in many occupational settings [

Statistical methods for integrating different sources of data in space and/or time have been researched in the past. For example,

Combining data from static and roving sensors in a statistically sound way is challenging. First of all, while the roving sensors expand the spatial coverage of data, the observations are sparse in time at any given location. This is in contrast to the static sensors that are at a smaller number of sampling locations, but observations are denser in time at each sampling location. An ad hoc approach would be to analyze the two types of data separately, but there is potential benefit to be gained by developing statistical methodology that pools the two data sources and takes full advantage of their respective strengths. In addition, inaccurate and missing data can be a thorny issue in such data analysis due to different measurement systems, instrumentation failures, and uneven or asynchronous monitoring times, etc. To address these challenges, we propose a novel spatio-temporal process that has a continuous index in both space and time; that is, in the spatial domain of interest, the sampling locations can occur anywhere, in which sense the modeling is geostatistical [see, e.g.,

As we will demonstrate in a case study conducted in an engine test facility, the dynamic hazard maps that interpolate across space and over time are far more informative and representative of the evolution of hazard levels in space and time. This finding can impact the way occupational hazards are to be mapped in the future and move the industry and regulation forward to more accurate assessment of environmental hazards.

A study was conducted in the spring of 2013 in an engine test facility located in Colorado to evaluate occupational exposure [^{2}. A floor plan is shown in

The measurements sampled over time are plotted in

The static sensors all have dense sampling points in time, and thus relatively complete profiles of the temporal processes at the sampling locations (

In a recent review of hazard mapping approaches,

We now specify the notation for the spatio-temporal process of a generic hazard. Let _{1}, … , _{nS} denote the locations of the static sensors, where _{i =} (_{i,x}, _{i,y})′ is the location of the _{S} is the number of static sensors. For the _{i} sampling time points, denoted as _{k,i}, for _{i}, _{S}. In contrast, let _{R} denote the number of roving sensors. For the _{j} sampling time points. A roving sensor generally has different sampling locations at different sampling time points and, therefore, the sampling locations are denoted by _{1,j}, … , _{qj}_{,}_{l}_{l,}_{j}, _{R}. Let _{s}(_{x}, _{y})′ ∈ _{si}(_{k,i}), for _{i}, _{S}, and the samples collected by the roving sensors as _{rl,j} (_{l,j}), for _{j} , _{R}.

To model the static sensor data {_{si}(_{k,i})} and roving sensor data {_{rl,j} (_{l,j})}, we consider a spatio-temporal model
_{s}(_{s}(_{s}(_{s′}(_{s}(^{2} = Var(_{s}(_{s}(_{s}(_{0}(_{0}, _{0}) is a temporal covariance function at any spatial location _{0} ∈ _{s}(_{s}(

The spatio-temporal process _{s}(_{ℓ}(_{ℓ}(

We assume that the spatial covariance function of _{ℓ}(_{ℓ} = Var(_{ℓ}(_{ℓ}(_{ℓ}(⋅; _{ℓ}) is a correlation function parameterized by _{ℓ} and ∥ · ∥ denotes the Euclidean distance. From _{s}(_{0}(_{s}(_{s}(

Our modeling approach is tailored toward the distinct features of static and roving sensor data. The spatial index is continuous in the spatial domain, and the temporal index is continuous within the time window. Thus, the sensors can be placed anywhere in the study area and do not need to be on a regular grid. Further, sampling can occur at any point in time and no regular time intervals are required. In addition, our modeling framework is semiparametric and flexible. The specification of the deterministic mean function _{s}(_{ℓ}(_{ℓ}(_{0}(

The class of spatio-temporal covariance functions _{ℓ} = 0 for

Parameter estimation by maximum likelihood can be challenging due the large number of parameters in the _{i}{_{i}} [see

Let _{si} = (_{si}(_{1}_{si}(_{pi}_{si} = (_{si}(_{1}_{si}(_{pi}_{k} for _{i} and the corresponding mean vector for _{S}. Let _{l,j} and time points _{l,j} for _{j} and the corresponding mean vector for _{R}. Also, let

Let _{ℓ}_{ℓ}(∥_{ℓ}(_{ℓ}(_{ℓ}
_{ℓ}(_{ℓ}(⋅) is a spatial correlation function that may be modeled by the Matérn class [_{S,S} is a block matrix with blocks corresponding to distinct spatial locations. The submatrices _{S,R} and _{R,R} are defined analogously; however, for a given roving sensor, each distinct spatial location corresponds to its own block. This illustrates that the covariance structure is more complex than a sampling scheme that involves only static sensors, showing that roving sensors play a role in both spatial and temporal dependence. The rank of

Suppose _{s}(_{s}(_{ℓ}(_{N} is the

Estimation of all the components is not always possible depending on the choices made for the parameters _{ℓ} in the spatial correlation function _{ℓ}(⋅) and those made for the shape of the temporal process φ_{ℓ}(⋅) We now develop a profile likelihood approach to parameter estimation. At initialization, we estimate the mean function _{s}(_{x}, _{y})′, _{k}(·) are cubic spline basis functions, _{S}
_{R}.

Next, we estimate λ_{ℓ} and _{ℓ}(_{ζ} = 1 in the ^{ζ} norm _{ζ} sense) for all _{ℓ} is estimated by

Unlike static sensors, it is not possible to obtain a full time series of data at a fixed location _{j,l} for the roving sensor. Therefore, estimates of _{ℓ}(_{l,j} for _{j} , _{R}. The smoothness of _{ℓ}(

Given ^{2} defined as
^{2} and evaluated at

To predict _{s0}(_{0} and time _{0}, we use
_{0}, and _{0} and _{0.}

We can also predict the spatial loadings _{s}(

The model given in

Consider model _{S}, _{R} are diagonal matrices with diagonal entries equal to 1 for static and roving sensors, respectively, and 0 otherwise. The model _{s}(_{s}(_{ℓ} and _{ℓ}(

Before fitting an inhomogeneous variance model for the noise data, we selected the tuning parameters by a leave-one-sensor-out cross-validation approach. More specifically, we considered a grid of values for the number of deterministic spline basis functions _{ℓ}(⋅) are shown in

A series of dynamic hazard maps for the predicted noise intensity using our STDF model in _{x}
_{y}

_{ℓ}(⋅) can be made in light of

The standard errors for the parameter estimates are obtained via cross-validation by leaving one sensor out at each time, as detailed in

Both static (

The problem of interpolating hazard maps in time and space from discrete sampled observations was discussed in

To compare the methods globally, the MSPE values for our spatio-temporal data fusion method are obtained using the leave-one-sensor-out cross-validation described in

In

A final remark is that we can consider cases in which the spatial covariance function is not exponential. For example, we repeated the analysis using the Matérn class of covariance functions, with known smoothness parameter

In this paper, we have developed a spatio-temporal static and roving data fusion model, with each data sensor having potentially different instrument variances. The approach to model fitting and statistical inference has been applied to produce hazard maps that capture dependence across space and over time in indoor environments. Modeling the spatio-temporal dependence structure allows the hazard maps to capture features that are missed by the current practice in occupational hazard assessment. Furthermore, our approach enables continuous-time prediction of hazard, which the existing approaches are unable to produce.

With the semiparametric model specification, our method is able to detect unexpected hazard sources that occur sporadically during a study. A sudden fluctuation of intensity, such as the secondary noise source in the southeastern corner of the facility, are undetected or underestimated when using current practices, but can be detected by our method. Moreover, health effects of short duration but high-level exposures are unclear, and our method provides a way to better capture such transient exposures.

Cross-validation shows that our methodology outperforms the traditional methods in the scientific application, a conclusion that is corroborated by the simulation study given in

While the height of the sensors is not accounted for directly, the model with heterogeneous measurement error variances may accommodate possibly different heights for different sensors. It would be interesting, however, to examine this third dimension more closely, as well as to consider three-dimensional hazard maps when data are collected at different heights [see, e.g.,

Other covariance modeling allows for nonseparability, although stationarity in time is generally assumed [

The authors gratefully acknowledge the Editor, an Associate Editor and four anonymous reviewers for their excellent comments and constructive suggestions that helped to improve this manuscript in content and presentation.

Supported in part by the CAPES Foundation, Brazil, Grant 5588–10–3 (Ludwig), the National Natural Science Foundation of China, Grant 11301536 (Chu), USGS CESU Award G16AC00344 (Zhu), NSF Grants DMS-1106975 and DMS-1521746 (Wang) and NIOSH Grant R01 OH010533 (Koehler).

SUPPLEMENTARY MATERIAL

Appendix: Tuning parameter selection and simulation study

(DOI: 10.1214/16-AOAS995SUPPA;.pdf). The Appendix contains a description of the leave-one-sensor-out cross-validation procedure for MSPE evaluation and tuning parameter selection, a detailed approach for the choice of tuning parameters for the smoother terms and number of components for the data analysis in

(a) The floor plan of the engine test facility. The black rectangle in the upper left corner is the source of noise, white rectangles are offices, gray rectangles are inactive engines, and dark gray rectangles are floor openings. The locations of static sensors are numbered from 1 to 18. (b) The pathway of the first roving sensor is drawn in open circles. The pathway of the second roving sensor is similar, and thus omitted.

Observed noise intensity over time from 9:45:00 am to 11:23:20 am. Gray solid lines are time series for static sensors near one noise source (#1 through 7, and 17). The black solid line is near the secondary noise source (#18). Dashed lines are for the remainder static sensors. Filled and open circles are samples taken by the first roving sensor that started at 10:28:45 am and the second roving sensor that started at 10:52:20 am, respectively.

Static maps of the noise intensity obtained by kriging using the roving sensor data only (left), the roving and static sensor data (center) and the static sensor data only (right), averaging data at the same location in time.

Dynamic hazard maps with contour lines obtained from the spatio-temporal data fusion (STDF) method; each panel corresponds to a point in time from 9:50 am to 11:20 am at 10-minute intervals.

Prediction standard error maps for the dynamic hazard maps given in

Estimated

(a)

Comparison of STDF hazard map with maps created by universal kriging (UK) and thin–plate splines (TPS) methods. For the UK and TPS, data are from 11:04:00 and 11:05:00, and interpolated linearly between 11:04:20 and 11:04:40. Black points mark the static sensor locations, while white points mark the roving sensor locations at the corresponding time.

Comparison of standard errors of STDF hazard maps with the standard error of universal kriging (UK) and thin-plate splines (TPS) methods. For UK and TPS, data are from 11:04:00 and 11:05:00, and interpolated linearly between 11:04:20 and 11:04:40. Black points mark the static sensor locations, while white points mark the roving sensor locations at the corresponding time.

Parameter estimates and cross-validated standard errors (in parenthesis). STDF denotes spatio-temporal data fusion for the inhomogeneous variance case, STDFh for the homogeneous variance case, and STDF* for only the static sensors

Coefficient | STDF | STDFh | STDF* |
---|---|---|---|

_{0} | 83.64 (4.52) | 83.53 (4.49) | 82.23 (3.59) |

_{x} | −0.40 (0.08) | −0.40 (0.08) | −0.38 (0.11) |

_{y} | 0.31 (0.04) | 0.31 (0.04) | 0.31 (0.04) |

_{1} | 22.34 (2.93) | 22.20 (2.95) | 13.17 (1.83) |

_{2} | 10.93 (2.13) | 12.08 (1.88) | 30.99 (2.57) |

_{3} | 40.34 (11.15) | 40.34 (11.10) | 32.56 (8.42) |

1.49 (0.16) | 1.48 (0.15) | 1.46 (0.12) | |

1.05 (0.29) | – | – |