Abstract:
|
Penalization schemes like LASSO or Ridge regression are routinely used to regress a response of interest on a high-dimensional set of features. Commonly used approaches assume that features are exchangeable: the same penalty factor is used for each model coefficient. In many applications, however, additional information is available about the features. Such information can include structural knowledge (e.g., feature sets comprising multiple data types and data qualities, such as in biology: transcriptome, genome, epigenome) and/or different prior probabilities for different feature classes (e.g., based on gene or pathway annotation or prior studies). We present a hierarchical Bayesian model that enables differentially penalizing groups of features based on external covariates and adapts the penalty to the information content of each group in a data-driven way. In an application to drug response prediction for cancer patients from multiple 'omic data types, the method identifies meaningful differences between 'omic data types. Using available covariates extends the range of applications of penalized regression, improves model interpretability and can improve prediction performance.
|