Abstract:
|
Processes based on the so-called Partial Least Squares (PLS) regression, which recently gained much attention in the analysis of high-dimensional genomic datasets, were recently developed to perform variables selection. Most of these processes rely on some tuning parameters that are usually determined by Cross-Validation (CV), which raises important stability issues. We have developed a new dynamic bootstrap based PLS process for significant predictors selection, suitable for both PLS regression and its extension to Generalized Linear (GPLS) regression frameworks. Since it has a very computational cost, we developed a GPU based R package to speed up our existing package plsRglm. The aim of the plsRglm package is to deal with complete and incomplete datasets through several new techniques or, at least, some which were not yet implemented in R. Indeed, not only does it make available the extension of the PLS regression to the generalized linear regression models, but also bootstrap techniques, leave one-out and repeated k-fold cross-validation. In addition, graphical displays help the user to assess the significance of the predictors when using bootstrap techniques.
|