We consider shape restricted nonparametric regression on a closed set

This paper considers Bayesian modeling of an unknown function
_{0} has at most

The shape-constrained regression literature focuses primarily on functions that are monotone, convex, or have a single minimum; that is, cases with

Extending these approaches to broader shape constraints is not straightforward. For example, to obtain

Little work has been done on nonparametric Bayesian testing of curve shapes.

We propose a new approach to incorporating shape constraints based on splines that are carefully constructed to induce curves having a particular number of extrema. This is similar in spirit to the I-spline construction of

Another key aspect of our approach is that we place a prior over a countable dense set of knots, which allows the number of the splines in the model space to grow. This bypasses the sensitivity to choice of the number of knots, while facilitating computation and theory on consistency. In particular, we propose a prior over nested model spaces where the location of the knots is known for each model. This allows for a straightforward reversible jump Markov chain Monte Carlo algorithm (

Let ℱ^{H}_{0} ∈ ℱ^{H}_{0} is continuously differentiable and has _{k}_{(}_{j,k}_{)}(_{1} ≤…≤ _{K}_{0}‖_{∞} ≤ Δ ‖_{0}‖_{∞}, where Δ is the maximum difference between adjacent knots. Though this construction can be used to model _{0} with arbitrary accuracy, it does not ensure that the approximating function ^{H}.

We force ^{H}_{(}_{j,k}_{)}(_{1}, …, _{h}_{k}_{1},…, _{H}^{H}

If
_{k}^{H}.

This result follows from the constraint on the _{k}_{k}_{1}, …, _{H}_{k}_{k+}_{1} = 0 and _{h}_{k+j}_{k+j}_{+1}], _{h}

For any _{0} ∈ ℱ^{H} and ε^{LX}

The flexibility of local extremum splines is attributable to the B-splines used in their construction. The proof of Theorem 1 assumes that ^{H}

Though the polynomial weighting does not affect the ability of the local extremum spline to model arbitrary functions in ℱ^{H}

Bayesian methods for automatic knot selection (

To make these ideas explicit, define
^{N}^{+1}, its children are labeled (2^{N}^{+2} and (2^{N}^{+2}. For example, the node labeled 3/8 at

We induce a prior on the set of local extremum spline basis functions through a branching process over this tree. The process starts at the root node ^{N}^{+1}, which decreases the probability of adding a new node the larger the tree becomes. The tree ℳ generated from this process corresponds to a knot set

Letting
_{k}_{k}_{k}_{k}_{0} ~ ^{−5}, making the prior indistinguishable from the Gamma distribution.

To allow uncertainty in locations of the change points, we choose the prior

The prior for the change point parameters is defined such that

Define ℱ^{H+}_{0} ∈ ℱ^{H+}_{0} ∈ ℱ^{H+}^{H}^{−1}. Conversely, define ℱ^{H}^{−} as the set of continuously differentiable functions with _{0} ∈ ℱ^{H}^{−} having less than ^{H}^{−1}. The prior places positivity in _{0} in ℱ^{H}^{−} or ℱ^{H+}

Letting ^{LX}_{0} ∈ ℱ^{H−}^{1},
_{0} ∈ ℱ^{H}^{+} if _{0} ∈ ℱ^{H−}

Using this result we can show posterior consistency. Assume that _{1},…, y_{n}^{T}_{1},…,_{n}_{i}_{i}_{+1}|) < (_{1}^{−1}, where 0 < _{1} < 1 and _{ε}_{,}_{n}_{0}(_{n}_{0} − 1| < _{ε}_{Q}_{0}) < _{0} − 1| < _{Q}_{1}, _{2}) = inf {_{1}(_{2}(_{0}, one has:

Let ^{LX}_{0} ∈ ℱ^{H−}^{1}. If
^{H−}^{1} that is defined by _{ε}_{,}_{n}_{ε}_{n}_{0} given
_{0} ∈ ℱ^{H}^{+}, otherwise it holds for _{0} ∈ ℱ^{H−}_{0} ∈ ℱ^{H}^{+}, otherwise it holds for _{0} ∈ ℱ^{H−}

The proof of this consistency result follows from ^{2} can be satisfied with an inverse-Gamma distribution.

Our approach allows one to define the shape of the curve through the _{0} the shape of the curve is not uniquely identifiable based upon the configuration of

Let ℍ_{1} and ℍ_{2} denote two distinct and non-nested sets of _{0} ∈ ℍ_{1}) and pr(_{0} ∈ ℍ_{2}), with the corresponding Bayes factor between the two shapes being

Any two shapes falling within ℱ^{H}_{0} is in a class of functions with at least _{1} to correspond to functions in ℱ^{H}

Let ℍ_{1} be the class of functions in ℱ^{H}_{0} ∈ ℍ_{1}, then

This result, an application of Theorem 1 in

We rely on _{−ℳ} denote the coefficients on all the splines that are the same as well as ^{2}, _{ℳ} and
_{−ℳ,} we marginalize _{ℳ} and
_{−}_{ℳ}) and _{−}_{ℳ}). This marginalization requires numerical integration of multivariate normal distributions, which is performed using

All proposals are made between models that are nested and differ by only one knot. When the current model has no children we propose a knot insertion with unit probability. Otherwise, the proposal adds or deletes a knot with probability 1/2, and the inserted or deleted knot is chosen uniformly. For a knot insertion, as we are going from model ℳ to ℳ′, the available knots are represented by all failures in the branching process that generated ℳ. A knot deletion going from model ℳ′ to ℳ represents all of the nodes in the branching process that generated ℳ′ that do not have any children. All other parameters, including the spline coefficients, are sampled in Gibbs steps described in the

The posterior distribution is often multimodal, with the sampler getting stuck in a single mode, when widely different parameter values have relatively large support by the data, with low posterior density between these isolate modes. To increase the probability of jumps between modes, a parallel tempering algorithm (

We investigate our approach through simulations for functions having 0, 1, or 2 local extrema interior to

The Markov chain Monte Carlo algorithm was implemented in the R programming language with some subroutines written in C++ and is available from the first author. Depending on the complexity of the function, the algorithm took between 60 and 90 seconds per 50, 000 samples using one core of a 3·3 gigahertz Intel i7-5830k processor. Parallelizing the tempering algorithm on multiple cores may substantially reduce the computation time. Additional information on the convergence of the algorithm, as well as impact of the B-spline order used, is provided in the

We compare the local extremum spline approach to other nonparametric methods, including Bayesian P-splines (_{i}_{j}_{i}_{i}_{i} ~^{2}). Functions _{1}, _{2} and _{3} are monotone, _{4} and _{5} have one change point, and _{6} and _{7} have two change points. For each simulation, a total of 100 equidistant points were sampled in
^{2} = 1, 4. For each simulation condition, 250 data sets were generated, fitted and compared using the mean squared error,

For the local extrema approach, we collected 50,000 Markov chain Monte Carlo samples, with the first 10, 000 samples disregarded as burn-in. For the parallel tempering algorithm, we specify 12 parallel chains with {_{1}, …, _{12}} = {1/30, 1/24, 1/12, 1/9, 1/5, 1/3·5, 1/2, 1/1·7, 1/1·3, 1/1·2, 1/1·1, 1}, and monitor the target chain with _{12} = 1. The P-spline approach was defined using 30 equally-spaced knots, and the prior over the second-order random walk smoothing parameter was IG(1, 0·0005), distribution, which was one of the recommended choices in

We perform a simulation experiment investigating the method’s ability to correctly identify the shape of the response function for three sets of hypotheses. In the first case, the null hypothesis is the set of all functions with one or more extremum, and the alternative, ℍ_{1}, is the set of all monotone functions. In the second test, the null consists of all monotone functions, and the alternative, ℍ_{2}, is all functions with one or more extremum. Finally, for the third test the null hypothesis is the set of functions having at most one extrema, and the alternative, ℍ_{3}, is the set of functions with two extrema first having a local maximum followed by a local minimum. Functions are defined on

For the simulation, data are generated assuming _{i}_{j}_{i}_{i}_{i} ~^{2}) and ^{2} = 1. We consider sample sizes _{1} and ℍ_{2}, the local extremum approach is compared with the Bayesian method of _{n}

The Bayesian tests produce Bayes factors, while the frequentist tests have corresponding test statistics. We compare the methods based upon area under the receiver operating curve. For the simulation, the false positive rate was computed from the values of the test statistics for the other functions not in the test set. As a frequentist calibration of our Bayesian test, one can choose a threshold on the Bayes factor to control the type I error rate at a specified level based on an approximation to the distribution of the Bayes factor under the null hypothesis. We describe this approximation in the

_{1}. This shows that the local extremum approach is superior to the other three approaches across all false positive rates. Further, the estimated area under the receiver operating curve is 0·94, better than the approaches of Salomond at 0·86, Baraud at 0·77, and Wang and Meyer at·0·74. When looking at the impact of sample size on the tests, the power of the local extrema approach increases as the sample size increases, does so at a rate greater than competitors, and is similarly superior for hypothesis ℍ_{2}, data not shown.

For hypothesis ℍ_{3}, there is not an equivalent methodology in the literature, but the performance of our approach is excellent. The area under the receiver operator curve is 0·94. For the Bayes factor cut point of 6, _{7}, even though it differs this function is only slightly different from _{3}. Function _{8} is the same as _{5}, this simulation gives evidence that the departure from monotonicity may be due to the pronounced U shape in the data and not necessarily because there are two extrema, which requires more data to conclude in favor of ℍ_{3}.

In temperate climates, the prevalence of influenza peaks in the winter months while dropping in the warmer months. Estimating this seasonal effect as well as departures from this effect, may be of interest when estimating the magnitude of an influenza epidemic. Here, we expect a peak in the winter months followed by a trough in the summer months. Parametric models for this pattern may not be adequate to model the observed phenomena, and smoothing approaches do not guarantee this pattern. We use local extremum splines, setting

The authors would like to thank the referees and associate editor for comments on earlier versions of this manuscript. This research was partially supported by a grant from the National Institute of Environmental Health Sciences of the United States National Institutes of Health.

It is well known that

If _{k}_{h}.

Consider _{0} ∈ ℱ^{H}_{0} has exactly

Let ^{BS} be a taut B-spline approximation of _{0} of order

having exactly ^{BS} is defined on
_{k} |τ_{k} − τ_{k+j}_{0}^{BS} are continuous and differentiable, we define _{0}^{BS}‖ < _{0} = ^{BS}(0). For the exactly ^{BS} defined by the taut spline, set

Rewriting the right hand side of (^{BX} is based upon the derivative formula for B-Splines (

Because of the taut spline construction of ^{BS}, we know that for all _{h}_{k}, τ_{k}_{+}_{j−}_{1}] one has sgn(_{k}_{k}, τ_{k}_{+}_{j−}_{1}]. Here sgn(·) is the signum function. On each of these intervals let
_{(}_{j,k}_{)}(_{k}_{h} ∈_{k}, τ_{k}_{+}_{j−}_{1}].

For the at most _{h} ∈_{k}, τ_{k}_{+}_{j−}_{1}], set these coefficients to zero. As there are a finite number of intervals whose error is non-zero and ^{BS} is bounded, the maximum error is at most (^{−1} that also have ‖_{0} − ^{BS}‖_{∞} <

The function
^{−}^{1} as in Theorem 1, we conclude that

We verify the conditions given in A1 and A2 of Theorem 1 of _{0}, ^{2}), one can use ^{H}^{+} and ℱ^{H}^{+} are subsets of all continuous differentiable functions on
_{ε,n}_{ε}.

As in _{0}, _{1}, _{2}, _{3} > 0. Define
_{ℳ} be the number of spline coefficients in model ℳ then

Now let pr*(ℳ) be the probability of a branching process where ^{2} ≥ pr (ℳ) for all ℳ such that
^{LX}_{∞} > M_{n}_{2} exp(−_{3}). One can find a

The receiver operating curve for the four tests defined for hypothesis ℍ_{1} for all 1,400 simulations. The black line represents the local extremum spline, dashed line the approach of Salomond, dashed-dotted line the approach of Baraud, and dotted line the approach of Wang and Meyer.

Estimate of the expected rate of seasonal influenza and pnuemonia deaths using the local extremum spline, black line, compared to the observed rate of influenza and pnuemonia deaths estimated using the Center for Disease Control’s standard approach, gray line. Dots represent observed state level influenza and pneumonia percentages.

Estimated mean squared error for all functions. For each function, the left value represents the simulation condition ^{2} = 1 and the right value represents the simulation condition ^{2} = 4. Asterisks signify that the number is significantly different than the local extremum spline at the one-sided 0·05 level.

True Function | Local Extremum Splines | Smoothing Splines | Bayesian P-Splines | Gaussian Process |
---|---|---|---|---|

_{1} | 1·60/0·49 | 2·11*/0·58 | 2·28*/0·55 | 2·15*/0·71* |

_{2} | 2·59/0·09 | 4·19*/0·13* | 3·82*/0·11* | 5·26*/0·15* |

_{3} | 1·57/0·49 | 2·43*/0·67* | 2·26*/0·92* | 2·64*/0·79* |

_{4} | 1·70/0·49 | 2·10*/0·56* | 2·15*/0·49 | 1·90*/ 0·59* |

_{5} | 2·55/0·61 | 3·69*/1·12* | 3·39*/0·98* | 3·90*/1·14* |

_{6} | 2·17/0·69 | 2·57/0·72 | 5·16*/0·72 | 2·44/0·79* |

_{7} | 2·38/0·66 | 3·39*/1·05* | 3·96*/0·85* | 3·30*/0·90* |

Percent of samples where the model was correctly chosen as having two extrema, which is hypothesis ℍ_{3}, using a cut point of 6.

Function | ||||
---|---|---|---|---|

100 | 200 | 300 | 400 | |

_{7} | 78 | 90 | 98 | 96 |

_{8} | 14 | 32 | 22 | 46 |

_{9} | 76 | 88 | 98 | 100 |