Abstract:
|
Empirical Bayes is an approach to 'learn from a lot' in two ways: first, from a large number of variables and second, from a potentially large amount of prior information on the features, termed 'co-data', for example available in public repositories. We review empirical Bayes methods in the context of regression-based prediction models. We discuss formal empirical Bayes methods which maximize the marginal likelihood, but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross-validation and full Bayes. Empirical Bayes is particularly useful to estimate multiple hyper-parameters that model the information in the co-data. Some examples of co-data are: p-values from an external study or genomic annotation. The systematic use of co-data can considerably improve predictions and variable selection, which we demonstrate on (mi)RNAseq applications to cancer diagnostics. Finally, some extensions to other prediction methods, such as the random forest, and to other problems, such as network estimation, are shortly discussed.
|