Gaussian Process (GP) models provide a very flexible nonparametric approach to modeling location-and-time indexed datasets. However, the storage and computational requirements for GP models are infeasible for large spatial datasets. Nearest Neighbor Gaussian Processes (Datta A, Banerjee S, Finley AO, Gelfand AE. Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets.

The growing capabilities of Geographical Information Systems (GIS) have resulted in a deluge of geo-indexed datasets observed over a very large number of locations. Gaussian Process-based models provide a very flexible non-parametric approach to capture the spatial patterns in such datasets. A Gaussian Process (e.g., Refs ^{d}_{θ}_{θ}_{θ}_{θ}_{1}(_{2}(_{q}_{θ}_{θ}_{i}_{j}_{1})′, _{2})′, …, _{n}_{1}, _{2}, …, _{n}

_{θ}_{θ}_{θ}_{i}_{j}

The popularity of Gaussian Processes among non-parametric models is largely indebted to their unparalleled out-of-sample predictive performance and ability to produce a stochastically interpolated surface with uncertainty quantified predictions at new locations. The latter (known as kriging) can be simply achieved using properties of multivariate Gaussian distributions. For example, if _{θ}_{i}

_{θ}_{θ}_{i}_{θ}_{θ}_{θj}_{θ}A_{θ}_{θ}_{θ}_{θ}_{1}(_{θ}_{2}(_{θn}_{i}_{i}_{i}

In practice, we often observe a

The noise covariance _{i}

If the number of locations _{θ}_{θ}^{2}^{2}) and may exhaust storage resources for large _{θ}_{θ}^{3}^{3}) floating point operations (flops), so even for an univariate response (

In the next section, we provide a brief overview of the existing, and still growing, literature on modeling spatial and spatiotemporal datasets. Subsequently, we focus upon a class of highly scalable sparsity-inducing Gaussian process models for massive space-time data. This approach has recently been explored in Refs

The remainder of this article evolves as follows. In Section

There is a burgeoning literature on statistical modeling of large spatial and spatiotemporal datasets. Some approaches try to approximate the likelihood in ^{6,7} block composite likelihood,^{8} small conditioning sets^{9,10} among others. Subsequently, kriging at a new location _{0} proceeds by imputing these estimates in _{θ̂}^{3} + ^{2}). So for large

Compactly supported covariances (CSC)^{11–15} are also popularly used to reduce the computational burden. CSC yields sparse correlation and precision structures to expedite calculations. Despite the sparsity, computing det(_{θ}^{16} or nearest-neighbor type local approximations.^{17} Yet another approach embeds the irregular locations in a larger regular lattice to construct computationally tractable covariance matrices utilizing spectral properties^{18} or Gaussian Markov Random Fields.^{19} Inferences from these methods are limited to the resolution of the embedding lattice and cannot interpolate at finer resolutions.

In machine learning, Gaussian Process regression is used to facilitate uncertainty-quantified interpolation and smoothing of an outcome observed at a large number of inputs (locations). When the input-dimension

For irregularly located large datasets, there is a considerable amount of literature on low-rank approaches for large spatial data.^{2,22–33} Low-rank models approximate the covariance matrix _{θ}_{θ}^{2} + ^{3}). Some low rank approaches^{23,26} can be formulated as well-defined Gaussian Processes over the domain. This is very convenient as it provides an unified platform for parameter estimation and kriging at arbitrary resolutions. Furthermore, a full rank GP prior for the spatial random effects can be simply replaced with a low-rank GP prior in any hierarchical setup like ^{4,34} low rank processes struggle to emulate the inference from the expensive full-rank GPs. The gain in computational efficiency is often offset by oversmoothed estimates of the spatial surface.

Localized GP regression based on few nearest neighbors has also been used to obtain fast kriging estimates. Ref ^{17} extend this further to estimate the parameters at each new location. LAGP thus essentially provides a non-stationary local approximation to a Gaussian Process at every predictive location and can be used to interpolate or smooth the observed data.

Recently, Ref ^{d}^{5} Like the Gaussian Predictive Process, NNGP enjoys all the benefits of being a proper GP. It delivers massive scalability both in terms of parameter estimation and kriging. Most importantly it does not oversmooth like low-rank processes and accurately emulates the inference from the full-rank GPs. In the next section, we provide a general method to construct scalable Gaussian processes and demonstrate how particular choices lead to the NNGP and dynamic NNGP models.

The computational bottleneck in evaluating _{θ}_{θ}_{1}, _{2}, …, _{n}_{θ}_{θj}_{θ}

It is straightforward to see that the process defined by ^{5}
_{θ}_{θ}_{S}_{θ}_{θ}_{θ}

The construction in _{θ}_{θ}_{1}, _{2}, …, _{N}

_{U}_{θ}_{1})′, _{θ}_{2})′, …, _{θ}_{N}_{U}_{θ}_{1}), Γ_{θ}_{2}), …, Γ_{θ}_{N}_{i}_{j}_{U}_{S}^{4} reveals that joint kriging hardly yields any noticeable benefits.

The Gaussian Process defined in _{S}_{∪}
_{U}

One approach would be to avoid computing the Cholesky factorization and directly _{θ}_{q}_{θ}_{θ}_{S}^{−1}_{S}^{−1} = _{S}_{S}_{i}_{,}_{j}_{i}

Let _{i}_{1}, _{2}, …, _{i}_{− 1}}. If {_{i}_{,}_{j}_{i}_{Si} (analogous to _{i}_{θ}_{θ}_{θ}_{i}_{i}

Computational efficiency can be achieved by limiting the number of non-zero _{i}_{,}_{j}_{i}_{j}_{i}_{,}_{j}_{i}_{,}_{j}_{i}_{N(si)}. The kriging variance Λ_{i}_{i}_{θ}_{i}_{i}_{θ}_{i}_{i}_{θ}_{i}_{i}^{−1}
_{θ}_{i}_{i}_{i}_{,}_{j}_{i}_{θ}_{S}_{θ}^{3}^{3}).

The size of the neighbor sets control the sparsity of
_{θ}_{θ}_{i}_{i}_{i}_{θ}_{i}_{θ}

NNGP uses nearest neighbors to create the small conditioning sets. Nearest neighbors have been shown to be sub-optimal for predicting at a new location—theoretically, for some special designs, in Ref

Finally, we turn to define _{θ}_{θ}_{1}(_{θ}_{2}(_{θn}_{θ}_{θj}_{j}_{θj}_{N}_{(}_{s}_{)}. Choosing Λ_{θ}_{N}_{(}_{s}_{)}) completes the process specification. Evaluating the likelihood _{θ}_{S}_{S}^{2}^{2}) and ^{3}^{3}), respectively. As these requirements are linear in

As NNGP is a proper Gaussian process, we can use it as a prior for the spatial random effects in any hierarchical model formulation. For example, the Bayesian model in

An efficient Markov chain Monte Carlo (MCMC) sampler using Gibbs steps and random-walk Metropolis steps described in Ref _{1}, _{2}, …, _{N}

_{U}_{1})′, _{2})′, …, _{N}_{U}_{U}_{θ}_{θ}_{1})′, _{θ}_{2})′, …, _{θ}_{N}_{θ}_{θ}_{1}), _{θ}_{2}), …, _{θ}_{N}

We present the results of a multivariate simulation study to demonstrate the accuracy of scalable Gaussian Processes. The synthetic data comprises of

The spatial random effects _{1}(_{2}(_{i}_{j}_{i}_{j}_{b}_{b}_{i}_{j}_{b}_{b}

_{i}_{j}_{i}_{j}

The _{1} and _{2} were drawn from _{1,0} and _{1,1} are the intercept and slope associated with the first response variable. Here too, response-specific subscripts index the elements of the cross-covariance matrix _{b}_{b}^{37} GPD score^{38} or Root Mean Square Predictive Error (RMSPE).^{39}

For all models, the intercept and slope regression parameters were given

Candidate model parameter estimates and performance metrics based on 25,000 iterations are provided in

All model specifications produce similar posterior median estimates and 95% credible intervals that contain the

Turning to the out-of-sample prediction results, the NNGP and Full GP models produced comparable RMSPE and mean 95% credible interval widths as shown in

The last row in

The size of the neighbor sets _{θ}_{m}_{θ}

The support of _{0}] where _{0} can be determined from the available computational resources, that is, choose _{0} for which

We illustrate the performance of this variable-_{1}_{1}(_{2}_{2}(_{1} and _{2} were generated from _{1} and _{2}, uniform prior for ^{2} and ^{2}.

We used 10,000 MCMC iterations and discarded the first 5000 as burnins. For the variable-_{1}, _{2}, and ^{2} were similar for all three models, whereas the estimate of ^{2} was much better for the variable-^{2} and

Computationally, the variable-_{0} denotes the upper bound for the support of ^{3}) where ^{th} iteration.

Fully model-based Bayesian inference for large spatial and spatiotemporal datasets is challenging because of expensive computations involving matrices without apparent exploitable structure. Recently Ref

The construction of NNGP requires a preo-rdering of the spatial locations. While the choice of ordering has been empirically shown to have little impact on the performance of the NNGP^{4} or other nearest neighbor based approaches,^{9,10} it remains an annoyance from a purely theoretical perspective as spatial locations do not have any natural ordering.

Also, NNGP, in its current form is only constructed using stationary (or isotropic) covariance functions. Extending NNGP to create sparse Gaussian Processes for modeling non-stationary spatial surfaces also remains challenging. Kriging based on few Euclidean nearest neighbors may no longer remain accurate if the stationarity assumption is not valid. LAGP^{17} possess advantage in this aspect as the covariance parameters are estimated at each predictive location thereby incorporating non-stationarity. The implementation of LAGP in the R-package

The storage and computational requirements of NNGP are linear in size of the dataset and the dimension of the locations. Hence, it can easily scale to datasets with hundreds of thousands or possibly millions of high-dimensional locations. One potential area of concern is that, the spatial random effects are updated sequentially in the MCMC algorithm for NNGP described in.^{4} While empirical observations reveal that convergence is achieved very fast, a block update of the spatial random effects may speed up the MCMC significantly. Two possible algorithms were described in Ref

The authors thank the Associate Editor and anonymous reviewers for their suggestions which considerably improved the manuscript. Sudipto Banerjee was supported by NSF DMS-1513654. Andrew Finley was supported by National Science Foundation (NSF) DMS-1513481, EF-1137309, EF-1241874, and EF-1253225, as well as NASA Carbon Monitoring System grants.

Conflict of interest: The authors have declared no conflicts of interest for this article.

Multivariate synthetic data analysis true versus fitted spatial random effects ^{(1)} and ^{(2)} correspond to the random effects associated with the first and second response variables. (a) Full GP ^{(1)}, NNGP ^{(1)}, (c) Full GP ^{(2)} and (d) NNGP ^{(2)}

Posterior distribution of

Multivariate Synthetic Data Analysis Parameter Estimates and Computing Time in Minutes for Candidate Models. Parameter Posterior Summary 50 (2.5, 97.5) Percentiles

True | NNGP
| Full Gaussian Process | ||
---|---|---|---|---|

_{1,0} | 1 | 0.81 (0.22, 1.43) | 0.64 (−0.05, 1.45) | 0.82 (−0.11, 1.71) |

_{1,1} | −5 | −4.94 (−5.02, −4.85) | −4.95 (−5.04, −4.86) | −4.95 (−5.04, −4.86) |

_{2,0} | 1 | 1.03 (0.15, 2.02) | 1.31 (0.26, 2.37) | 0.95 (−0.27, 2.18) |

_{2,1} | 5 | 5.03 (4.91, 5.15) | 5.02 (4.89, 5.14) | 5.01 (4.89, 5.13) |

| 4 | 3.88 (3.14, 5.13) | 4.06 (3.20, 5.80) | 4.20 (3.21, 5.47) |

| −4 | −3.58 (−4.80, −2.87) | −3.66 (−5.50, −2.86) | −3.79 (−4.96, −2.87) |

| 8 | 7.31 (6.02, 9.15) | 7.43 (6.07, 9.70) | 7.18 (5.74, 9.15) |

| 0.1 | 0.07 (0.02, 0.25) | 0.06 (0.02, 0.25) | 0.07 (0.02, 0.24) |

| 0.1 | 0.10 (0.02, 0.87) | 0.08 (0.02, 0.70) | 0.09 (0.03, 1.33) |

_{1} | 6 | 6.98 (4.05, 15.33) | 7.09 (3.21, 14.61) | 6.95 (3.79, 12.19) |

_{2} | 6 | 4.14 (3.20, 8.07) | 4.80 (3.14, 9.87) | 5.47 (3.41, 12.07) |

_{1} | 0.25 | 0.25 (0.19, 0.38) | 0.27 (0.20, 0.37) | 0.27 (0.21, 0.37) |

_{2} | 0.25 | 0.22 (0.16, 0.42) | 0.22 (0.15, 0.37) | 0.23 (0.16, 0.68) |

DIC | – | 845.47 | 747.82 | 934.73 |

GPD | – | 30,666.13 | 30,782.05 | 36,182.24 |

RMSPE | – | 1.68 | 1.67 | 1.67 |

% CI coverage | – | 94.7 | 94.7 | 94.1 |

Mean 95% CI width | – | 6.31 | 6.24 | 6.14 |

Time (in minutes) | – | 18.82 | 75.62 | 369.10 |

Parameter Estimates (50% [2.5%, 97.5%] percentiles) for Fixed and Variable

True | NNGP | NNGP | NNGP Variable- | |
---|---|---|---|---|

_{1} | 1 | 0.98 (0.95 1.02) | 0.99 (0.95 1.02) | 0.99 (0.95 1.02) |

_{2} | 5 | 4.98 (4.95 5.02) | 4.98 (4.95 5.02) | 4.98 (4.95 5.02) |

^{2} | 1 | 1.64 (0.78 6.39) | 1.36 (0.65 5.72) | 1.17 (0.71 2.17) |

^{2} | 0.5 | 0.47 (0.43 0.5) | 0.47 (0.44 0.51) | 0.47 (0.44 0.51) |

1 | 1.01 (0.26 2.55) | 1.13 (0.26 2.87) | 1.36 (0.66 2.21) |