^{*}

A key step in pharmacogenomic studies is the development of accurate prediction models for drug response based on individuals’ genomic information. Recent interest has centered on semiparametric models based on kernel machine regression, which can flexibly model the complex relationships between gene expression and drug response. However, performance suffers if irrelevant covariates are unknowingly included when training the model. We propose a new semiparametric regression procedure, based on a novel penalized garrotized kernel machine (PGKM), which can better adapt to the presence of irrelevant covariates while still allowing for a complex nonlinear model and gene-gene interactions. We study the performance of our approach in simulations and in a pharmacogenomic study of the renal carcinoma drug temsirolimus. Our method predicts plasma concentration of temsirolimus as well as standard kernel machine regression when no irrelevant covariates are included in training, but has much higher prediction accuracy when the truly important covariates are not known in advance.

Pharmacogenomics studies the role of genomics in drug response by correlating gene expression with drug absorption, distribution, metabolism and elimination. An important problem is to develop accurate drug response prediction models using individuals’ genomic, clinical and demographic information, as well as statistical learning methods for investigating the biological mechanisms underlying the outcome. The investigation that motivated our present work was a study of the anticancer agent temsirolimus (CCI-779), which targets renal cell carcinoma. Our goal is to predict, using an individual’s gene expression levels, the expected concentration of temsirolimus in the patient’s blood plasma. Plasma concentrations reflect the amount of the drug absorbed by the body, so accurate predictions can allow us to identify the patients for whom temsirolimus would be most efficacious.

Standard methods for predictive modeling usually posit a model for the outcome that is linear in the predictors. However, because the relationship between genes and drug plasma concentration may be very complex, e.g., due to gene-gene interactions, linear models may not suffice. Xue et al. [

There are few solutions that can maintain the flexibility of the LSKM while ameliorating the impact of the irrelevant predictors. Popular variable selection methods like the LASSO [

We propose a new kernel machine regression approach using a “garrotized” kernel, which generalizes the idea of Maity and Lin [

For subjects _{i}_{i}_{i1}, …, _{iP}^{T} be a set of clinical and demographic covariates such as age, and _{i}_{i1}, …, _{iQ}^{T} be expression levels associated with _{1}, …, _{n}^{T} is the _{1}, …, _{n}^{T} is the _{1}, …, _{n}^{T} is the _{1}, …, _{n}^{T} is an _{i}^{2}). The regression parameter vector

It is common to assume _{K} generated by some positive definite kernel function _{K}, which is spanned by a particular set of orthogonal basis functions _{j}_{K} imply that any function _{K} can be represented using a set of basis function as
_{j}_{m}

The space ℋ_{K} can be implicitly defined by choosing a kernel function. A commonly used one is the Gaussian kernel:

Model _{i}_{i}

LSKMs become less accurate when _{i}^{(g)} is defined by

The _{q}_{q}_{q}_{q}_{q}_{q}

To estimate the parameters of model _{1} and _{2} are nonnegative regularization parameters, _{3} is a tuning parameter which controls the trade-off between goodness of fit and complexity of the model, and ‖_{ℋK} denotes the functional norm in the space ℋ_{K} generated by the garrotized kernel. The penalty functions involving

The representer theorem of Kimeldorf and Wahba [_{1}, …, _{n}^{T} is an unknown vector and the ^{(g)}(·, ·) is our garrotized kernel. Minimization of _{ij}_{i}, Z_{j}; δ

Indeed, our method stands out from the competing methods, in particular, KNIFE [

We propose solving

Set initial estimates ^{ini}, β^{ini}, δ^{ini}^{ini}^{ini}

Update

Fix

Holding the values of

Given the estimates of _{q}_{q}_{t}, t_{q}_{q}_{t}

Repeat Step

Cross-validation is often used for tuning parameter selection but can be computationally inconvenient. Instead, we divide a given dataset into a training set and a validation set. We use the training set to fit models using various prespecified values for _{1}, _{2}, _{3}). The solutions are computed for a decreasing sequence of values for

We first compare our proposed PGKM method to that of the LSKM method of Liu et al. [_{i}_{ip}_{iq}_{i}^{2}), with

We consider four configurations by varying the sample size

Setting 1:

Setting 2: ^{T},

Setting 3:

Setting 4: ^{T},

For each simulation setting we generate training, validation, and testing datasets of

_{q}

In contrast, in settings 2 and 4, our proposed PGKM always yields much smaller average MSPEs than LSKM. That is mainly because the proposed PGKM method can recognize the irrelevant variables by estimating the corresponding _{q}_{p}

One byproduct of our proposed PGKM method is that while estimating the parameters of model _{q}_{q}^{−5} as the threshold to decide whether _{q}

We report the variable selection performance of PGKM on simulation settings 2 and 4 in

We next compare the performance of our PGKM method to that of the method of He et al. [

Setting 1:

Setting 2: ^{T},

_{1}, _{2}, _{3}) on a grid of triplets while He’s method is tuned over _{2} on a grid of scalars. PGKM is considerably faster.

The variable selection results of PGKM and He’s method in setting 2 are reported in

In this section, we compare the proposed PGKM method with the KNIFE of Allen [

Fit a semiparametric model using the PGKM method.

Fit a nonparametric model using the KNIFE method directly.

Use the KNIFE method fitted to the residuals of a penalized linear regression of

The variable selection results for PGKM, KNIFE and Linear-KNIFE methods are reported in

We apply the proposed PGKM method to clinical pharmacokinetics data on temsirolimus (CCI-779) from renal cell carcinoma subjects collected by Boni et al. [

To improve the accuracy of our predictions we first perform dimension reduction using the nonparametric independence screening method proposed by Fan and Song [

In order to compare the prediction errors of these three methods, we randomly selected 40 observations for estimation, 9 observations for searching for the best estimated model of each method and the remaining 9 observations for prediction. We calculated the average MSPE of each method over 1000 replications.

We have proposed a flexible variable selection procedure for semiparametric regression based on a new class of garrotized kernels. It can capture complicated relationships between predictors and outcome and possesses more predictive power than the existing methods in the presence of irrelevant predictors. A key advantage of the proposed PGKM method is that it can achieve variable selection while allowing for a complex nonlinear model. Simulations and our analysis of the plasma concentration of the anticancer drug temsirolimus demonstrate the advantages of our method compared to competing approaches.

In this article we considered only continuous outcomes using a Gaussian base kernel. However, our garrotized kernel machine framework can be extended to estimation and variable selection for a much larger class of models and a much wider range of base kernels. We are pursuing extensions into generalized semiparametric models, for example logistic regression and exponential class models, and other kernels, such as the identity-by-state kernel popular in genome-wide association studies [

Finally, we have so far only studied situations where the number of covariates is smaller than the sample size. In principle, our framework can also be used in the high-dimensional setting where there are more covariates than observations. In practice this requires overcoming significant computational hurdles, and we are currently investigating more efficient algorithms for fitting our PGKM estimate.

We would like to thank the Editor, the Associate Editor and the two referees for their constructive comments and suggestions. Rong’s work was partially supported by National Natural Science Foundation of China (No. 11701021), National Statistical Science Research Project (No. 2017LZ35), Fundamental Research Foundation of Beijing University of Technology and Beijing Outstanding Talent Foundation (No. 2014000020124G047); Zhao’s work was partially supported by NSF grant DMS-1613005; Li’s work was partially supported by NIH grant U01CA209414.

Prediction errors of PGKM and LSKM. The last two columns provide the average MSPEs over 500 replications, with standard deviations in parentheses

PGKM | LSKM | |
---|---|---|

Setting 1 | ||

0.0345 (0.0160) | 0.0379 (0.0121) | |

0.0778 (0.0280) | 0.0797 (0.0399) | |

0.1617 (0.0903) | 0.1639 (0.0739) | |

| ||

Setting 2 | ||

0.0430 (0.0133) | 0.0928 (0.0487) | |

0.0693 (0.0196) | 0.1398 (0.0609) | |

0.1746 (0.0525) | 0.2369 (0.0513) | |

| ||

Setting 3 | ||

0.0689 (0.0165) | 0.0790 (0.0166) | |

0.1015 (0.0189) | 0.1045 (0.0182) | |

0.1608 (0.0319) | 0.1708 (0.0320) | |

| ||

Setting 4 | ||

0.0748 (0.0146) | 0.1622 (0.0264) | |

0.1204 (0.0251) | 0.2174 (0.0348) | |

0.2403 (0.0448) | 0.3319 (0.0587) |

Variable selection results for the PGKM methods, with different numbers of irrelevant Z. The percentage of 500 simulations in which the true model was exactly selected is denoted by C (correct selection), the percentage in which the correct model was nested in the selected model is denoted by O (over-selection), and the percentage in which the true model was not a subset of the selected model is denoted by U (under-selection)

X | Z | |||||||
---|---|---|---|---|---|---|---|---|

σ | P | Q | C | O | U | C | O | U |

Setting 2 | ||||||||

0.1 | 2 | 15 | 0.9474 | 0.0526 | 0.0000 | 0.1316 | 0.7522 | 0.1162 |

0.5 | 2 | 15 | 0.6842 | 0.1842 | 0.1316 | 0.0000 | 0.7959 | 0.2041 |

1.0 | 2 | 15 | 0.2889 | 0.3111 | 0.4000 | 0.0000 | 0.7818 | 0.2182 |

| ||||||||

Setting 4 | ||||||||

0.1 | 2 | 30 | 0.9268 | 0.0000 | 0.0732 | 0.0366 | 0.3585 | 0.6049 |

0.5 | 2 | 30 | 0.6047 | 0.1047 | 0.2906 | 0.0000 | 0.3140 | 0.6860 |

1.0 | 2 | 30 | 0.3608 | 0.3608 | 0.2784 | 0.0000 | 0.3196 | 0.6804 |

Prediction errors and running times of PGKM and He’s method. The second and fourth columns provide the average MSPEs over 500 replications, with standard deviations in parentheses. The third and fifth columns provide the average running times in seconds

PGKM | He’s Method | |||
---|---|---|---|---|

Setting 1 | ||||

0.0308 (0.0099) | 50 | 0.3742 (0.2642) | 560 | |

0.0849 (0.0333) | 79 | 0.7873 (0.7808) | 566 | |

0.1732 (0.0585) | 88 | 1.4956 (1.4561) | 398 | |

| ||||

Setting 2 | ||||

0.0430 (0.0133) | 25 | 0.4540 (0.3820) | 2764 | |

0.0693 (0.0196) | 41 | 0.7814 (0.6853) | 1269 | |

0.1746 (0.0525) | 30 | 1.9578 (1.9320) | 525 |

Variable selection results for PGKM and He’s method in setting 2. C, O, U are defined the same as those in

PGKM | He’s Method | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

0.1 | 2 | 15 | 0.9474 | 0.0526 | 0.0000 | 0.1316 | 0.7522 | 0.1162 | 0.0000 | 0.2609 | 0.7391 |

0.5 | 2 | 15 | 0.6842 | 0.1842 | 0.1316 | 0.0000 | 0.7959 | 0.2041 | 0.0000 | 0.1429 | 0.8571 |

1.0 | 2 | 15 | 0.2889 | 0.3111 | 0.4000 | 0.0000 | 0.7818 | 0.2182 | 0.0000 | 0.0909 | 0.9091 |

Prediction errors and running times of PGKM, KNIFE, and Linear-KNIFE using data from setting 2 of Section 3.1. The second, fourth, sixth columns provide the average MSPEs over 500 replications, with standard deviations in parentheses. The third, fifth and seventh columns provide the average running times in seconds

PGKM | KNIFE | Linear-KNIFE | ||||
---|---|---|---|---|---|---|

0.0430 (0.0133) | 25 | 7.8800 (5.1100) | 15 | 2.5696 (1.7865) | 16 | |

0.0693 (0.0196) | 41 | 6.9600 (5.1600) | 17 | 2.4310 (1.3180) | 18 | |

0.1746 (0.0525) | 30 | 9.0400 (4.9900) | 16 | 2.8036 (1.1905) | 17 |

Variable selection results for PGKM, KNIFE and Linear-KNIFE. C, O, U are defined the same as those in

PGKM | KNIFE | Linear-KNIFE | |||||||
---|---|---|---|---|---|---|---|---|---|

0.9474 | 0.0526 | 0.0000 | 0.0875 | 0.0125 | 0.9000 | 0.3125 | 0.2875 | 0.4000 | |

0.6842 | 0.1842 | 0.1316 | 0.0750 | 0.0250 | 0.9000 | 0.2125 | 0.4000 | 0.3875 | |

0.2889 | 0.3111 | 0.4000 | 0.0750 | 0.0250 | 0.9000 | 0.3000 | 0.3125 | 0.3875 | |

| |||||||||

0.1316 | 0.7522 | 0.1162 | 0.0000 | 0.0500 | 0.9500 | 0.1750 | 0.3150 | 0.5000 | |

0.0000 | 0.7959 | 0.2041 | 0.0250 | 0.0750 | 0.9250 | 0.0500 | 0.5500 | 0.4000 | |

0.0000 | 0.7818 | 0.2182 | 0.0000 | 0.0250 | 0.9750 | 0.0500 | 0.4750 | 0.4750 |

Average prediction error of each method for 1000 replications, with standard deviations in parentheses

Methods | MSPE (SD) |
---|---|

0.4120 (0.2077) | |

0.5842 (0.3537) | |

2.0343 (1.1360) |