^{1}

^{2}

^{2}

^{1}Department of Mathematics and Statistics, American University, Washington, DC, USA

^{2}Occupational Science & Technology, University of Wisconsin-Milwaukee, Milwaukee, WI, USA

Academic Editor: Ruisheng Wang

Truncated power basis expansions and penalized spline methods are demonstrated for estimating nonlinear exposure-response relationships in the Cox proportional hazards model. R code is provided for fitting models to get point and interval estimates. The method is illustrated using a simulated data set under a known exposure-response relationship and in a data application examining risk of carpal tunnel syndrome in an occupational cohort.

The Cox proportional hazards (PH) model is frequently used to model survival data or time-to-event data, particularly in the presence of censored survival times [

Consider an occupational cohort with _{i}, is measured. These times may be right censored if the individual did not have the event of interest during the study time. This is denoted by an indicator variable,_{i}, which takes the value of 1 if the individual had the event and 0 if the time is censored. The general form of the Cox PH model for a single covariate is_{i}) is the hazard function, _{i} is the corresponding quantitative exposure variable, _{0}(_{i}, and the exposure-response relationship is described as linear (on the log-scale). The HR for a given exposure

A nonlinear exposure-response relationship can be modeled by including a transformation of_{i} in the model:

This manuscript provides a detailed introduction to modeling and interpreting nonlinear exposure-response curves using these spline functions. We assume familiarity with the Cox PH model and survival data. The remainder of the paper is structured in three sections.

In the Cox PH model in (_{i}) based on a linear combination of known basis functions, _{j}(_{i}), _{1}(_{i}) = _{i}. For a quadratic association, the basis functions are _{1}(_{i}) = _{i} and _{2}(_{i}) = _{i}^{2}. This can be extended to a polynomial of degree_{i}, _{i}^{2}, _{i}^{3},…, _{i}^{p}}. Note that we omit the unit basis function, which corresponds to the intercept term in the model, because in the Cox PH model setting the intercept is subsumed by the unspecified baseline hazard function. Estimates in the Cox PH model are relative to the unspecified baseline hazard.

To provide flexibility in capturing local features in the exposure-response curve, polynomial spline terms may also be used as basis functions. A spline function is a function, typically a polynomial, defined on a subinterval of the range of exposures. Splines allow for estimation of the exposure-response relationship using a piecewise-defined curve. They are generally considered to provide more flexibility in estimating nonlinear relationships than polynomials or other algebraic functions. To define a piecewise linear curve over four regions in which the slope changes from region to region, we would use a set of basis functions consisting of the functions {_{i}, (_{i} − _{1})_{+}, (_{i} − _{2})_{+}, (_{i} − _{3})_{+}}, where {_{1}, _{2}, _{3}} are exposure values at which the slope changes and are called “knots.” These are user-specified values, similar in spirit to categorical cut-points where changes in the response occur. The “+” subscript notation indicates the function is equal to the expression given in parentheses when that expression is positive. That is, (_{1})_{+} = _{1}if _{1} and 0 otherwise. In this way, a nonlinear association can be estimated by fitting the model in (_{i}) = _{1}_{i} + _{2}(_{i} − _{1})_{+} + _{3}(_{i} − _{2})_{+} + _{4}(_{i} − _{3})_{+}. The standard maximum partial likelihood method yields estimates of the coefficients, giving an estimated ln(HR) of _{i}, _{i}^{2}, _{i}^{3},…, _{i}^{p}, (_{i} − _{1})_{+}^{p}, (_{i} − _{2})_{+}^{p},…, (_{i} − _{K})_{+}^{p}}. This set is called the truncated power basis of degree

As an illustration, we simulated a data set of

We illustrate the spline-based methods for estimating the exposure-response relationship,_{1}(_{2}(_{1})_{+}, _{3}(_{2})_{+}, and _{4}(_{3})_{+}. _{1} = 3.0,_{2} = 5.5, and_{3} = 8.3). A cubic truncated power basis representation using these same knots requires six basis functions, _{1}(_{2}(^{2}, _{3}(^{3}, _{4}(_{1})_{+}^{3}, _{5}(_{2})_{+}^{3}, and _{6}(_{3})_{+}^{3} (

Fitting the Cox PH model requires using the basis function transformations of the exposure variables as the covariates in the model (and introduces regression coefficients_{j}),

The R software package used here for fitting Cox PH models and obtaining the estimates is the _{1} = _{2} = (_{+}, _{3} = (_{+}, and _{4} = (_{+}, then a side effect of the

Based on the calculations and code in Appendices

Although the truncated power basis functions are relatively easy to visualize and implement, they do require a choice of the polynomial degree

With the knots and degree specified, the B-spline basis functions are then the known functions _{j}(

Penalized estimates for the unknown parameters in the basis expansion (_{1},…, _{J}) in (

As with the truncated power basis expansion method of

To illustrate penalized estimates, we used our simulated data with the known quadratic nonlinear exposure-response curve. We fit penalized splines as described above, under three conditions: with df selected using AICc, with df = 2, and with df = 4. The estimates using an unexposed reference are displayed in

These estimated hazard ratios give the estimated hazard (risk) of the outcome at a given exposure relative to the hazard when unexposed. For instance, we estimate from the penalized spline fit using AICc that the hazard of the event when exposed at a level of 2.0 is 1.3 times that when unexposed, corresponding to a 30% increase in hazard at this exposure level. For this simulated data set, the linear truncated power basis with knots at the quartiles of the case exposures and the penalized spline fit are comparable; however while the former does attenuate, it does not decrease at the highest exposure values.

The _{j=1}^{J}_{j}_{j}(

Garg et al. [

An initial assessment of a nonlinear exposure-response was made using plots of the martingale residuals. To do so, the Cox PH model with all covariates excluding the exposure (SI) variable was fit and the martingale residuals were obtained. These martingale residuals were then plotted against the exposure variable and Loess curves were added to the plot. The residual plot is displayed in

To address the nonlinearity displayed in the residual plots, four models were examined for these revisited analyses: two parametric functional forms (linear and a logarithmic transformation), a linear spline function with a single knot at the median exposure of SI = 13.5 units (as in [

The analyses of the previous sections illustrate a typical modeling conundrum in that the models considered all give differing estimated hazard ratios. For the occupational cohort of the previous section, all examined models provide statistical evidence of elevated risk (or hazard) for carpal tunnel syndrome as SI exposure levels increase relative to unexposed. The linear spline model used by Garg et al. [

A visual representation of the effect size differences (and similarities) between models can be assessed using the

One caution when using the spline-based methods was highlighted in Tables

As an illustration, we simulated two new data sets using the simulation set-up of

Regression modeling often focuses on interpreting coefficient estimates. When exposure-response relationships are nonlinear and a nonparametric or smoothing method is used to estimate the relationship, the resulting regression coefficients are not interpretable. But, these methods do provide effect size estimates which are interpretable—estimates at specific exposures of interest. The methods illustrated here are easily adapted to include a time-varying exposure. They can also be applied to a covariate of interest which is not an exposure measure but some other quantitative covariates, such as a prognostic factor. In these situations, the reference value of

This work was partially supported by the National Institute for Occupational Safety and Health under Grant nos. U01 OH07917 and R01 OH010474.

The hazard ratio for a given exposure

We use a basis expansion representation for _{j=1}^{J}_{j}_{j}(_{j}(^{T}_{1} relative to an exposure _{0} is_{1} relative to an exposure _{0} of_{1}) − _{0}) and the estimated exposure-response can be written as _{1−α/2}is the 1 −

The linear truncated power basis coefficients estimates have a nice interpretation in terms of the estimated change in the slope of the exposure-response curves that occurs at the knot points. For instance, the estimated slope for exposures up until the first knot point of 3.0 corresponds to the coefficient

The R script for creating the linear truncated power basis using knots at the quartiles of the case exposures is given in

The R script for creating the linear truncated power basis and fitting the corresponding Cox PH model is as follows:

The R script for computing fitted values at each exposure value, their corresponding standard errors, pointwise 95% confidence intervals, and plotting the results is as follows:

The R script for fitting a penalized spline with the degrees of freedom selected using the AICc is below. It assumes the

The corresponding output from the

Formal tests can also be evaluated for the truncated power basis methods. As the truncated power bases include a linear term in their expansion, this corresponds to testing the null hypothesis Ho: _{2} = _{3} = ⋯ = _{p+K} = 0 versus the alternative hypothesis that at least one of these coefficients is nonzero. That is, Ho: _{1}_{1}_{2}^{2} + ⋯+_{p}^{p} + _{p+1}(_{1})_{+}^{p} + ⋯+_{p+K}(_{K})_{+}^{p}. A likelihood ratio test can be derived to test this “reduced” model in Ho versus the “full” model in Ha. The test statistic has the form:

The

The R code and corresponding output for the likelihood ratio test of nonlinearity in the linear truncated power basis expansion are as follows:

The authors declare that there are no conflicts of interest regarding the publication of this paper.

True exposure-response relationship used to simulate data (a). Histogram of the simulated exposure data (b). Kaplan-Meier estimates of the survival functions for five exposure groups (c).

Linear spline (a) and cubic spline (b) basis functions using knots at quartiles of the case exposures (_{1} = 3.0,_{2} = 5.5, and_{3} = 8.3).

Estimated ln(HR) and corresponding pointwise 95% confidence intervals using linear spline (a) and cubic spline (b) basis functions with knots at quartiles of the case exposures (_{1} = 3.0,_{2} = 5.5, and_{3} = 8.3).

Linear B-spline (a) and cubic B-spline (b) basis functions using equally spaced knots.

Estimated exposure-response curves on the natural logarithmic scale (logarithm of the hazard ratio) using truncated power basis functions and B-spline basis functions.

Estimated exposure-response curves on the natural logarithmic scale (logarithm of the hazard ratio) using penalized splines.

Unscaled (a) and scaled (b) plots of the martingale residuals versus exposure (SI) with Loess curves using various degrees of smoothing (0.4 to 2.0) from a Cox proportional hazards model with all covariates excluding the exposure variable. (b) is scaled to focus on the Loess curves. The distribution of the exposure variable is given in the rug plot on the

Estimated exposure-response curves for carpal tunnel syndrome and strain index in a cohort of 569 workers. Rug plot is of cases.

Estimated exposure-response curves on the natural logarithmic scale (logarithm of the hazard ratio) for simulated data with 41 cases in 500 observations (a, b, c) and with 40 cases in 5000 observations (d, e, f) using linear, linear splines, and linear B-splines (a, d), cubic spline and cubic B-splines (b, e), and penalized splines (c, f).

Estimated hazard ratios (HR) and 95% pointwise confidence intervals from two Cox proportional hazard model fits.

Exposure | Penalized spline function AICc as in | Linear spline function with knots at quartiles of case | True HR |
---|---|---|---|

2.0 | 1.3 (1.2, 1.5) | 1.3 (1.1, 1.6) | 1.5 |

3.0 | 1.5 (1.3, 1.8) | 1.5 (1.1, 2.1) | 1.7 |

4.0 | 1.8 (1.4, 2.2) | 1.8 (1.3, 2.3) | 2.0 |

5.0 | 2.0 (1.6, 2.5) | 2.1 (1.6, 2.7) | 2.3 |

7.0 | 2.5 (2.0, 3.1) | 2.5 (2.0, 3.3) | 2.9 |

9.0 | 2.9 (2.3, 3.6) | 2.9 (2.2, 3.8) | 3.5 |

19.3 | 3.7 (2.1, 6.3) | 4.1 (2.5, 6.5) | 4.0 |

21.1 | 3.5 (1.7, 7.3) | 4.3 (2.5, 7.5) | 3.5 |

24.0 | 3.3 (1.1, 9.9) | 4.7 (2.4, 9.2) | 2.6 |

Estimated hazard ratios and 95% pointwise confidence intervals from separate Cox proportional hazard models using the carpal tunnel syndrome and strain index exposure data.

Exposure value | Linear | Logarithmic | Linear spline with knot at 13.5 | Penalized spline function with |
---|---|---|---|---|

0.8 | 1.01 | 1.21 | 1.10 | 1.04 |

6.0 | 1.10 | 1.88 | 2.09 | 1.35 |

9.0 | 1.15 | 2.11 | 3.03 | 1.57 |

13.5 | 1.24 | 2.38 | 5.27 | 1.89 |

18.0 | 1.33 | 2.60 | 4.85 | 2.12 |

20.3 | 1.38 | 2.70 | 4.65 | 2.18 |

54.0 | 2.33 | 3.68 | 2.51 | 2.32 |