Abstract:
|
We propose a fusion learning method which learns from multiple data sets across different experimental platforms through group penalization. The responses of interest may include a mix of discrete and continuous variables. The responses may share the same set of predictors, however, the model and parameters differ across different platforms. Integrating information from different data sets can enhance the power of model selection. The goal is to select which predictors affect any of the responses, where the number of such informative predictors tends to infinity as sample size increases. We specify a pseudolikelihood combining the marginal likelihoods, and propose a pseudolikelihood information criterion. Under regularity conditions, we establish selection consistency for this criterion with unbounded true model size. The proposed method includes a Bayesian information criterion with appropriate penalty term as a special case. Numerical results indicate that fusion learning can dramatically improve upon using only one data source. In the talk, we will demonstrate the use of the R package "FusionLearn" to perform the proposed fusion learning tasks.
|