Nitrate
(NO_{3}^{–}) is a widespread contaminant of groundwater and surface water across
the United States that has deleterious effects to human and ecological
health. This study develops a model for predicting point-level groundwater
NO_{3}^{–} at
a state scale for monitoring wells and private wells of North Carolina.
A land use regression (LUR) model selection procedure is developed
for determining nonlinear model explanatory variables when they are
known to be correlated. Bayesian Maximum Entropy (BME) is used to
integrate the LUR model to create a LUR-BME model of spatial/temporal
varying groundwater NO_{3}^{–} concentrations. LUR-BME results in a leave-one-out
cross-validation ^{2} of 0.74 and 0.33 for
monitoring and private wells, effectively predicting within spatial
covariance ranges. Results show significant differences in the spatial
distribution of groundwater NO_{3}^{–} contamination in monitoring versus
private wells; high NO_{3}^{–} concentrations in the southeastern plains of North
Carolina; and wastewater treatment residuals and swine confined animal
feeding operations as local sources of NO_{3}^{–} in monitoring wells. Results
are of interest to agencies that regulate drinking water sources or
monitor health outcomes from ingestion of drinking water. Lastly,
LUR-BME model estimates can be integrated into surface water models
for more accurate management of nonpoint sources of nitrogen.

Nitrate (NO_{3}^{–}) is a widespread contaminant
of groundwater and surface water across
the United States that has deleterious effects to human and ecological
health.^{1,2} The maximum contaminant level of 10 mg/L
established by the U.S. Environmental Protection Agency was based
on the prevention of methemoglobinemia in infants;^{3} moreover, there is concern of many cancer types^{4−6} and from lower concentration exposures.^{7} Excessive NO_{3}^{–} inputs into the environment can result in adverse changes to ecosystems
such as eutrophication and harmful algal blooms.^{8−10}

Protection
of drinking water sources is mandated by the Safe Drinking
Water Act; however, private well drinking water is unregulated in
contrast to regulated public water systems.^{11} In North Carolina where more than 1/4 of the population relies on
private wells for drinking water,^{12} quantifying
potential exposures is important to protect public health. Monitoring
programs such as the U.S. Geological Survey’s (USGS) National
Water Quality Assessment (NAWQA) Program^{13} and the NC Division of Water Resources (NC DWR) ambient monitoring
program^{14} are effective because they use
consistent sampling and analytical methods, yet this water quality
monitoring data is spatially and temporally sparse.

Land use
regression^{15−21}(LUR) is a proven method that complements monitoring programs and
provides effective means for water quality exposure assessments. Previous
studies have related land use characteristics to NO_{3}^{–} contamination in surface
waters^{22−25} and groundwater. Additionally, regression-based methods have been
implemented for estimating loading to surface waters.^{21,23,24} In North Carolina, groundwater
discharge to streams (baseflow) accounts for roughly two-thirds of
annual streamflow in the Coastal Plains region of North Carolina^{26} and may be contributing excess nutrient loads
in streams;^{27} however, current surface
water models do not directly account for this large source of NO_{3}^{–} from baseflow.

For linear regression models, traditional statistical methods to
select predictor variables include forward, backward, and stepwise
selection. These methods can lead to erroneous models with high multicollinearity
when the candidate variables are related. However, for LUR model studies,
model selection methods have been modified to accommodate the potential
high multicollinearity from selection variables that differ only by
a hyperparameter.^{16,19} Additionally, lasso^{28} and elastic net^{29} regression are potential methods for selecting linear LUR models,
but to the authors’ knowledge has not been employed for LUR
model selection. For nonlinear regression, methods for model selection
based on a large candidate variable space include stepwise logistic
regression^{30,31} and regression tree analysis
which approximates nonlinear relationships;^{32,33} still for continuous variable outcomes with nonlinear models, less
rigorous methods for model selection have been developed. The number
of candidate variables is generally consolidated to a tractable number
through expert knowledge or single variable regression, and then various
combinations of models are tested until one finds the best model in
terms of a validation statistic like ^{2} or Akaike Information Criterion (AIC).^{15,21,24}

The advanced geostatistical method
of Bayesian Maximum Entropy
(BME) has also been shown to successfully estimate groundwater quality
contaminants.^{19,34} An advantage of BME is its ability
to quantify spatial and temporal variability which is then used in
the estimation process at unmonitored locations. BME, like all geostatistical
methods, is data driven and can only provide reliable estimates within
the vicinity of measured values. However, BME utilizes Bayesian epistemic
knowledge blending to combine multiple sources of data, which has
been successfully demonstrated with incorporation of deterministic
mean trend functions into BME for groundwater.^{19}

Local spatial and temporal variability have lead
previous studies
to reduce NO_{3}^{–} variability with a combination of spatial smoothing and temporal
averaging.^{15,35,36} For instance, Nolan and Hitt spatially smoothed NO_{3}^{–} by taking watershed averages
over their study time period, based on watersheds with an average
size of approximately 2000 square-kilometers. They not only helped
elucidate trends and potential explanatory variables, but they were
able to explain a large percentage in the variability of spatially
smoothed NO_{3}^{–} with a ^{2} of 0.80 for shallow aquifer
NO_{3}^{–} and
0.77 for deep aquifer NO_{3}^{–}. However, this advantage of reducing groundwater NO_{3}^{–} variance
is also a limitation because factors affecting spatially smoothed
and temporally averaged NO_{3}^{–} might not affect point-level NO_{3}^{–}, and vice
versa. Furthermore, since groundwater NO_{3}^{–} contains significant local variability,
the need to provide local estimates of its variability naturally follows.
Models developed for predicting spatially smoothed and temporally
averaged NO_{3}^{–} will likely not be successful in predicting observed, point-level
NO_{3}^{–}.

The objectives of this study are to (1) develop a novel nonlinear
regression model for spatial point-level and time-averaged groundwater
NO_{3}^{–} concentrations
in monitoring and private wells of North Carolina, (2) produce the
first space/time estimates of groundwater NO_{3}^{–} concentrations across a large
study domain by integrating LUR models into the BME framework, and
(3) compare space/time NO_{3}^{–} concentration models to the current
standard of spatially averaged NO_{3}^{–} concentration models. Two nonlinear
models, whose form is adopted from Nolan and Hitt^{15} with components that represent NO_{3}^{–} sources, attenuation, and transport,
are created and selected with a new model selection framework for
nonlinear regression models with correlated explanatory variables.
We then integrate the LUR models into the BME framework to model space/time
point-level NO_{3}^{–}. Results are of interest to agencies that regulate drinking water
sources or that monitor health outcomes from ingestion of drinking
water. Additionally, the results can provide guidance on factors affecting
the point-level variability of groundwater NO_{3}^{–} and new resources for
more accurate management of NO_{3}^{–} loads.

NO_{3}^{–} data across North Carolina are obtained
from three data sources (

North
Carolina Division of Water Resources (NC-DWR) collects data near select
permitted, dedicated wastewater treatment residual (WTR) application
fields via monitoring wells. The second source is U.S. Geological
Survey (USGS) data obtained through the National Water Information
System (NWIS). Well depth information is not linked directly to each
monitoring well although a subset of well depth information is available.
Based on the subset with depth information, they have a mean depth
of 33 feet with a standard deviation of 32 feet. Together, the NCDWR
and USGS data represent shallow aquifer monitoring wells (

The last data set of
groundwater NO_{3}^{–} comes from private well data
collected by the North Carolina Department of Health and Human Services
(NC-DHHS). Groundwater NO_{3}^{–} was obtained and address geocoded using
the same process outlined in Messier et al.^{19} Well depth information is not linked to water quality measurements,
but a separate database on private well construction contains well
depths. The mean depth is 95 feet with a standard deviation of 109
ft. This data will hereafter be referred to as “private well
data” and this data is assumed to represent a deeper aquifer
model of groundwater NO_{3}^{–} (

The median
NO_{3}^{–} concentrations
for the NC-DWR, USGS, and private well data are 1.30,
0.10, and 0.62 mg/L respectively. The means are 4.61, 6.14, and 1.66
mg/L respectively. The percent observed above the detection limit
is 79.7, 61.4, and 30.6 respectively. Additional basic statistics
for the data set are available in the

In this work
we develop models for NO_{3}^{–} at three observation scales. The finer scale corresponds
to the space/time point-level NO_{3}^{–} data, that is, NO_{3}^{–} data as it is sampled.
An intermediate observation scale corresponds to the _{3}^{–} at each well is averaged. The time-averaged data provides
point-level spatial resolution, but no time variability. Finally,
the coarser resolution observation scale corresponds to the ^{15,37} While previous works over large study domains have developed models
for spatially smoothed/time average NO_{3}^{–} data, very few models, if any,
have been developed for point-level NO_{3}^{–} data over large study domains.
Our work therefore fills that knowledge gap.

Our notation for variables denotes
a single random variable _{1},...,_{n}]^{T} and _{1},...,_{n}]^{T}.

Due to the high percentage
of nondetect (left-censored) data in both the monitoring well and
private well databases, a maximum likelihood estimation (MLE) is used
for the estimation of monitoring well and private well distribution
parameters,^{38} which is assumed to follow
a log-normal distribution. MLE can directly account for the nondetect
values by modifying the likelihood equation, with the censored observations
given by the cumulative distribution function (CDF) evaluated at the
detection limit. The MLE equation then becomes^{38}_{μ,σ}(_{i}) denotes the normal
probability distribution function (PDF) of log-transformed (natural
log) point-level NO_{3}^{–}, _{i},
with mean and standard deviation parameters μ and σ, and _{μ,σ}(_{i}) denotes the CDF of the distribution
taken at the log of the detection limit _{i}. The estimated distributions are used to quantify
the extent of contamination in monitoring and private wells and to
handle nondetect data. For the regression analysis, the log-NO_{3}^{–} concentration
of a measurement below detection limit _{i} is assigned a value equal to the mean of the normal
distribution N(μ,σ) truncated above log(_{i}), whereas the geostatistical analysis
can handle the full truncated normal distribution.^{19}

Spatial
explanatory variables
representing possible groundwater NO_{3}^{–} sources, attenuation, and transport
factors were constructed prior to model development. Potential variables
are summarized below with details available in the

All of the explanatory variables have
an inherent spatial distance parameter such as circular buffer radius
or exponential decay range, which hereinafter is referred to as the ^{15} are NO_{3}^{–} sources
calculated as kg-NO_{3}^{–}/yr/ha within a circular buffer: Sources include farm
fertilizer, nonfarm fertilizer, manure, and NO_{3}^{–} atmospheric deposition.
Each National Landcover Database (NLCD) category is calculated as
a percent within a circular buffer. On-site wastewater treatment plant
variables, septic density and average nitrate loading, are created
following the methods of Pradhan et al.^{39} The following point sources are calculated as the sum of exponentially
decaying contribution:^{19} Wastewater treatment
residual field application sites (WTR), swine confined animal feeding
operations (CAFOs), poultry CAFOs, cattle farms, and wastewater treatment
plants (WWTP). Mean slope in degrees and topographic wetness index^{40} (TWI) are calculated within circular buffers.
Water withdrawals in cubic meters per second are calculated using
USGS water use estimates.^{12} Lastly, population
density is calculated within circular buffers from U.S. Census block
data assuming an even distribution of population per census block.

In order to develop
a LUR model for NO_{3}^{–} we adopt a similar nonlinear multivariable model implemented
by ^{15} which is also similar to the surface water counterpart ^{21,23,24} We partition explanatory variables into source, attenuation, and
transport terms. Following Nolan and Hitt,^{15} the nonlinear multivariable model is constructed as follows:_{i} is the log-transform of NO_{3}^{–} concentration at point _{0} is the intercept, _{i}^{(k)}(λ_{k}) is the _{k},
β_{k} is its source regression coefficient, _{i}^{(l)}(λ_{l}) is the _{l}, γ_{l}is
its attenuation regression coefficient, _{i}^{(m)}(λ_{m}) is the _{m}, δ_{m} is its transport regression coefficient, and ε_{i} is an error term. The model contains an additive,
linear submodel for sources, and multiplicative exponential terms
for the attenuation and transport variables that act directly on the
source terms.^{15} For example _{i}^{(k)}(λ_{k})
may be equal to a land cover variable or a point source variable.
The attenuation variables,_{i}^{(l)},
physically represent areas that are associated with removing NO_{3}^{–} from groundwater
such as wetlands and histosol soil. The transport variables, _{i}^{(m)}(λ_{m})., may be equal to any variable that effects the movement
of NO_{3}^{–} in the groundwater such as the soil permeability and average slope.
The attenuation variable coefficients, γ_{l}, are constrained to be negative allowing them to only decrease
NO_{3}^{–} concentrations,
while the transport variable coefficients, δ_{m}, are unconstrained allowing variables to increase or decrease
NO_{3}^{–} concentrations.

We developed a nonlinear model regression model selection technique
that accommodates variables that differ only by a hyperparameter and
can be adapted for various nonlinear model forms. Our model selection
procedure is essentially a nonlinear extension of ^{16} since
to the authors’ knowledge there is not a regression selection
strategy for nonlinear LUR. We implement

To improve estimation
accuracy, we integrate the time-averaged LUR
results into the bayesian maximum entropy (BME) method of modern spatiotemporal
geostatistics.^{41,42} BME is a space/time geostatistical
estimation framework grounded in epistemic principles that reduces
to the space/time simple, ordinary, and universal Kriging methods
as its linear limiting case when considering a limited, Gaussian,
knowledge base, while also allowing the flexibility to process a wide
variety of additional knowledge bases (physical laws, empirical relationships,
non-Gaussian distributions, hard and soft data, etc.). We only provide
the fundamental BME equations for mapping NO_{3}^{–}; the reader is referred to other
works for more detailed derivations of BME equations^{41,43} and LUR integration into BME.^{19}

Let _{3}^{–} across
space and time, where _{3}^{–} across
the study domain, and the site-specific knowledge base (S-KB) corresponding
to the hard and soft data _{d} available at a set of specific space/time points _{d}

First, we define
the transformation of log-NO_{3}^{–} data _{d} at locations _{d} as_{Z}(_{h}_{d}, that is, such that _{d} is a realization of _{z}(_{3}^{–}. In this study, we consider two choices for _{z}(_{z}_{h}

The
G-KB for the S/TRF _{x}(_{X}(_{x}(_{x}(_{h}_{h}_{z}_{h}_{h}_{s}_{s}_{3}^{–} is observed
below the detecti limit. Following Messier et al.,^{19} the BME soft data for log-NO_{3}^{–} is modeled as a Gaussian distribution
truncated above the log of the detection limit.

The overall
knowledge bases considered consist of _{x}(_{X}(_{s}(·), _{h}_{K}(_{k}) is the BME
posterior PDF for the offset-removed log NO_{3}^{–}(_{k}) at some unmonitored estimation point _{k}, _{G}(_{h}_{s}_{k}) is the (maximum entropy) multivariate Gaussian PDF for (_{h}_{s}_{k}) with mean and variance-covariance
given by G-KB, _{S}(_{s}_{s}^{–1}is a normalization
constant. After the BME analysis is conducted, _{Z}(_{3}^{–} concentrations.

The robustness of CFN-RHO is tested with a 10-fold cross-validation procedure. In 10-fold cross-validation data is randomly partitioned into 10 equal size subsamples. A single subsample is retained as the validation data for testing the model, and the remaining nine subsamples are used as training data. Each of the 10 subsamples is used exactly once as the validation data. Similar variable selections (which may differ only by hyperparameter) for subsamples demonstrate model selection robustness.

Models
are compared with a leave one-out cross-validation (LOOCV) mean squared
error (MSE) and _{3}^{–} and
time-averaged NO_{3}^{–} models are also tested on how well they predict at the smaller observation
scales. In LOOCV, each log-NO_{3}^{–} value _{j} is removed one at a time, and re-estimated using
the given model based only on the remaining data. Let ^{*(k)}be the re- estimate for method ^{(k)} = (1/_{j = 1}^{n}(_{j}^{*(k)} −_{j})^{2} and the cross-validation R-Squared is ^{2}(^{*(k)}).

The MLE of the statewide monitoring
concentrations resulted in a geometric mean and standard deviation
of the log-normal distribution of 0.62 and 14 mg/L, respectively (

The 25 km
spatially smoothed/time-averaged NO_{3}^{–} LUR model cross-validation results
(Table ^{2} of 0.69 and 0.68 for monitoring and private wells, respectively,
which is of similar magnitude to current literature.^{15} However, as expected, the LUR model calibrated for spatially
smoothed/time-averaged NO_{3}^{–} underperforms and does progressively
worse (top row, moving left to right in Table _{3}^{–} and point-level NO_{3}^{–} with lower ^{2} and higher RMSE. The variables selected for this model
via CFN-RHO are available in the

predicted
value | |||||||
---|---|---|---|---|---|---|---|

spatially smoothed/time-averaged NO_{3}^{–} | time-averaged NO_{3}^{–} | point-level NO_{3}^{–} | |||||

method | MW ( | PW ( | MW ( | PW ( | MW ( | PW ( | |

spatially smoothed/time-averaged LUR | ^{2} | 0.69 | 0.68 | 0.27 | 0.08 | 0.15 | 0.08 |

RMSE | 0.895 | 0.293 | 2.23 | 1.19 | 2.40 | 1.27 | |

time-averaged LUR | ^{2} | 0.37 | 0.09 | 0.23 | 0.09 | ||

RMSE | 2.08 | 1.19 | 2.28 | 1.27 | |||

space/time BME | ^{2} | 0.70 | 0.25 | ||||

RMSE | 1.39 | 1.23 | |||||

space/time LUR-BME | ^{2} | ||||||

RMSE |

Note
that methods were used to
predict at scales more refined or equal to its calibration scale.
MW = Monitoring Well model. PW= Private Well model. _{3}^{–} concentration = mg/L.

10-fold cross-validation of spatially smoothed/time-averaged NO_{3}^{–} LUR models
was done to demonstrate the stability of CFN-RHO (

The LUR variables selected through
CFN-RHO for time-averaged NO_{3}^{–} observed at monitoring wells and private
wells are shown in Table _{3}^{–} obtains a ^{2} of 0.37 and 0.09 for monitoring wells and private wells, respectively
(Table _{3}^{–} with a ^{2} of 0.23 and
0.09 for monitoring and private well, respectively. LUR maps are available
in

monitoring well | private well | |||||
---|---|---|---|---|---|---|

variable | variable range | coefficient estimate | standard error | variable range | coefficient estimate | standard error |

Constant | n/a | –3.71 | 0.191 | n/a | –1.570 | 0.0382 |

Source Variables | ||||||

manure^{a} | 250 m | 0.0759 | 0.0317 | – | – | – |

wastewater treatment residuals
(WTR)^{b} | 5 km | 0.245 | 0.0274 | – | – | – |

farm fertilizer^{a} | 250 m | 0.132 | 0.0193 | 250 m | 0.0432 | 0.0025 |

swine CAFO’s^{c} | 2 km | 0.117 | 0.0218 | – | – | – |

swine lagoons^{b} | – | – | – | 6 km | 0.1079 | 0.0146 |

developed low^{d} | 250 m | 0.112 | 0.0214 | – | – | – |

developed (all combined)^{d} | – | – | – | 100 m | 0.0112 | 7.08e-4 |

atmospheric deposition^{a} | 250 m | 0.477 | 0.129 | 25 km | 2.94e-11 | 2.53e-10 |

Attenuation and Transport Variables | ||||||

forest (all combined)^{d} | 2 km | –0.0064 | 0.00281 | – | – | – |

deciduous forest^{d} | – | – | – | 4 km | –0.0151 | 0.00127 |

herbaceous wetlands^{d} | 5 km | –0.531 | 0.079 | – | – | – |

histosol^{d} | 25 km | –0.0427 | 0.0111 | 25 km | –0.106 | 0.0126 |

hydrologic soil group d^{d} | – | – | – | 500 m | –0.012 | 0.0010 |

slope^{e} | 25 km | –0.074 | 0.0261 | – | – | – |

All variables
are significant
with _{3}^{–}/yr/ha;
b, dimensionless; c- 100 pigs; d, percent; e, degrees; (−)
not a variable in the model.

10-fold cross-validation of time-averaged NO_{3}^{–} LUR models was conducted
(

We modeled the
space/time covariance
of the LUR offset removed log-NO_{3}^{–} S/TRF, ^{19}_{1} = 0.67
(log – mg/L)^{2}, _{r1} = 93m, _{τ1} = 15 days, _{2} = 3.6 (log-mg/L)^{2}, _{r2} = 1750m, _{τ2} = 15840
days for monitoring wells (_{1} = 0.76 (log – mg/L)^{2}, _{r1} = 1181m,_{τ1} = 8640
days (

The LUR-BME
model, which integrates the time-averaged LUR as the offset best predicts
space/time point-level NO_{3}^{–} concentrations with a ^{2} of 0.74 and 0.33 (Table _{3}^{–} concentrations
estimated by LUR-BME for 1 day during the study period for both monitoring
and private well models. These are the first results to show that
there is a 4-fold improvement in predicting point-level NO_{3}^{–} when the
LUR-BME method is used in comparison to previous studies that use
models for spatially smoothed/time-averaged NO_{3}^{–}, and five percent improvement
in ^{2}when integrating a LUR model into
the BME framework, over purely BME. A link to a movie of LUR-BME maps
is available in the

Comparison of LUR-BME
results between the monitoring well (left
of gray bar) model and private well (right of gray bar) model NO_{3}^{–} concentrations.
The extent rectangles shows zoomed in portions of the state and are
identical areas for both models. Extent (B) shows geometric mean predictions
and then geometric standard deviation.

This study
presents a LUR
model for point-level NO_{3}^{–} in North Carolina that elucidates processes affecting
its local variability, and then utilizes the strengths of BME to create
the first LUR-BME model of groundwater nitrate’s spatial/temporal
distribution including prediction uncertainty. The first major finding
is the LUR-BME model for monitoring wells, assumed to represent surficial
aquifers, (Figure _{3}^{–} that is highly variable with many areas
predicted above the current standard of 10 mg/L.

Contrarily,
the private well results (Figure _{3}^{–} concentrations, which is consistent with the current
physical understanding in which sources tend to pollute the surficial
aquifer, but then transport over time to the deeper drinking-water
supply aquifers where concentrations are lower. This finding is significant
because of the studies demonstrating potential significant health
effects at concentrations as low as 2.5 mg/L.^{4−7} Additionally, concentrations of
NO_{3}^{–} could
impact ecological function since there are potential large reserves
in deeper aquifers that can discharge to surface waters.^{27} The standard deviation maps (Figure

The second major finding is the LUR-BME maps
(Figure _{3}^{–} in monitoring
wells is elevated in
the southeastern plains of North Carolina (_{3}^{–} sources and the lack of subsurface
attenuation factors (^{15} which also show spatially
smoothed/time-averaged NO_{3}^{–} to be the highest in the southeastern
plains of North Carolina. This expands that finding with point-level
results showing significant point-level variability within regional
trends. Additional concerns arise since groundwater flow of the southeastern
plains contributes significantly to surface water flow.^{27} Our LUR-BME model can be used with surface water
models to quantify the effect of groundwater NO_{3}^{–} contributing to surface
water contamination.

The use of the methods in this study provide
estimates at a finer
resolution and down to smaller NO_{3}^{–} values than Nolan and Hitt,^{15} resulting in new findings. Nolan and Hitt^{15} generally show greater concentrations than the
LUR-BME model potentially due to their model using significantly less
training data and averaging NO_{3}^{–} over watersheds. Our LUR-BME models
benefit from the large amount of monitoring (

LUR-BME benefits from
the exactitude property of BME, thus our
model results are in 100% agreement at monitoring locations. Contrarily,
when our observed data is compared with Nolan and Hitt^{15} by grouping results according to the bins of
Figure ^{15} overpredicts 48% and 59% of the time for monitoring and
private wells, respectively (_{3}^{–}, our results lead to a significant
new finding about the extent of areas with low level contamination.
Our results show private well concentrations are greater than 0.25
mg/L while monitoring well concentrations are less than 0.25 mg/L
in 30.6% of North Carolina’s area, compared to 2.6% for Nolan
and Hitt^{15} (^{15} Hence whereas Nolan and Hitt^{15} results suggest the geographical extent of the low level
contamination of drinking water aquifer is limited to that of the
shallow aquifer, which is consistent with downward transport of NO_{3}^{–} contamination,
our LUR-BME models shows that in fact the geographical extent of the
contamination of the drinking water extends over a much larger area
than that of the shallow aquifer. This major new finding provides
new evidence indicating that in addition to downward transport, there
is also a significant outward transport of groundwater NO_{3}^{–} in the
drinking water aquifer to areas outside the range of sources. This
is especially significant because it indicates that the deeper aquifers
are acting as a reservoir that is not only deeper, but also wider
than the reservoir formed by the shallow aquifers.

Variables selected through
CFN-RHO show processes influencing monitoring well and private well
NO_{3}^{–} concentrations.
Interpretations of regression sources parameters are based on the
nonlinear model formulation: Since NO_{3}^{–} was log-transformed and the nonlinear
model has multiplicative interaction, the percent increase of the
geometric mean of NO_{3}^{–} is the exponential of the source coefficient multiplied
by the result of the attenuation and transport terms held to their
mean value. For instance, in the monitoring well model, the percent
increase in the geometric mean of NO_{3}^{–} in mg/L for every 1 kg/yr/ha of farm
fertilizer is exp(0.132 × 0.456) = 1.06 = 5% where 0.456 is the
exponential of the mean attenuation and transport variables multiplied
by their coefficients. For the private well model, the percent increase
in the geometric mean of NO_{3}^{–} for every 1 kg/yr/ha of farm fertilizer
is exp(0.0432 × 0.4636) = 1.02 = 2%. Every other source coefficient
interpretation for time-averaged NO_{3}^{–} is provided in the

Comparing variables selected between the spatially
smoothed/time-averaged NO_{3}^{–} LUR and the time-averaged NO_{3}^{–} LUR help
elucidate effects the spatial scale has on groundwater NO_{3}^{–} concentrations.
The variable hyperparameters selected by CFN-RHO help elucidate potential
scales at which the variables affect groundwater NO_{3}^{–} concentrations. For example,
the short buffer range of developed low likely captures the small
size of single-family housing yards and their associated fertilizer
applications. The monitoring well model WTR has an exponential decay
range of 5 km. A possible explanation of this medium range is due
to the volatization of NO_{3}^{–} into the air, which can then be transported
over longer distances than subsurface transport mechanisms alone.
Long buffer ranges for attenuation and transport variables such as
percent histosol soil and mean slope represent variables with larger,
regional scale effects.

The third major finding is that both
wastewater treatment residuals
(WTR) and swine CAFOs were selected as local sources of groundwater
NO_{3}^{–} contamination,
which to our knowledge have not yet been previously identified as
sources in multivariable models that included regional sources. To
help aide state-wide policy decisions concerning regional versus local
sources, Figure _{3}^{–} within an area in response to the percent
decrease in a LUR model source given all other sources remain at current
levels. Farm fertilizer and atmospheric deposition result in the greatest
decrease in groundwater NO_{3}^{–} state-wide (Figure _{3}^{–} in the local area surrounding
the sources, demonstrating the importance of sources on local area
NO_{3}^{–} variability.

Elasticity
curves for monitoring well sources.

This work represents
the first step in the development of modeling observed NO_{3}^{–} over large
domains without averaging. In previous studies, spatial averaging
is utilized because it provides results at the domain (state, regional,
or national) desired for policy making decisions and sheds light on
processes influencing groundwater NO_{3}^{–}. We demonstrated that a LUR at the
point-level in space is currently limited in terms of model predictive
capability but when integrated into the BME framework, the improved
model can estimate within the spatial covariance range similar to
LUR models for spatially smoothed/time-averaged groundwater NO_{3}^{–} concentrations.
Potential explanatory variables that can explain the remaining variability
in the point-level LUR will need primary data collection. For instance,
we found WTR to be a significant variable even though we just used
location of fields. If records of timing and amounts of WTR applications
were improved, then the temporal variability in monitoring wells near
WTR application fields could be improved.^{44} Similarly, a parcel-level query of farm fertilizer application practices
could distinguish farms that use NO_{3}^{–} fertilizers efficiently versus farms
that apply excessively or with poor timing. For private wells, the
short spatial autocorrelation range may be due to differences in effectiveness
of on-site wastewater treatment systems or residential fertilizer
use. Additionally, we note that candidate variables not selected via
CFN-RHO does not necessarily indicate they have no effect on groundwater
NO_{3}^{–} concentrations
in surficial or confined drinking-water aquifers of North Carolina.
Many factors both statistically and physically can affect the selection
such as correlation between candidate variables and local hydrogeology
conditions being overwhelmed by larger scale trends. This study lacked
well depth for the majority of monitoring and private wells. The monitoring
and private well models clearly demonstrate a difference in concentrations
based on depth, so well depth could quantify this more explicitly
as opposed to categorically as done by this study. Furthermore, pumping
rate information was not available for the private well data set thus
the effect of local pumping could not be quantified. The USGS water
use report^{12} has information on domestic-use
water withdrawals; however, it is at the county-scale, based on county
populations, and cannot be down-scaled like the agricultural water
withdrawals variable, thus it was not included as a candidate variable.
Additionally, the detection limit of 1 mg/L for the private well data
is high and lowering that detection limit would improve the ability
of the model to delineate areas with low level contamination that
may act as reservoir to surface water NO_{3}^{–} recharge. The high detection
limit is also potentially responsible for the lower ^{2}in the private well LUR model for time-averaged nitrate
because it results in a low dependent variable variance. Predictions
of the private well LUR model for time-averaged nitrate are likely
biased toward the detection limit; however, the LUR-BME model for
private well models likely avoids this bias due to the exactitude
property along with the good spatial coverage of private well data
across North Carolina. Moreover, greater uncertainty in attenuation
processes in deeper aquifers is likely contributing to the lower ^{2}.

In conclusion, a LUR model with a novel
model selection procedure can elucidate important predictors of point-level
groundwater NO_{3}^{–} in North Carolina monitoring and private wells. The methods are
translatable to other study areas in the United States. LUR-BME models
can be used to predict spatial/temporal varying groundwater NO_{3}^{–} and provide
uncertainty assessments. Further research should integrate groundwater
NO_{3}^{–} results
into surface water models to determine the extent of groundwater’s
contribution to surface water contamination. Lastly, results will
be useful in identifying localities of elevated NO_{3}^{–} for increased monitoring.

Additional information as noted
in text. This material available free of charge via the Internet at

es502725f_si_001.pdf

The authors declare no competing financial interest.

This research was supported in part by funds from the NIH T32ES007018, NIOSH 2T42OH008673, and North Carolina Water Resources Research Institute (WRRI) project number 11-05-W.