Abstract:
|
Heterogeneous data are ubiquitous in scientific studies. In regression problems, different subpopulations may differ not only in the effect size of covariates on the response, but also in the subset of covariates that are useful predictors. We propose using mixtures of finite mixture models (MFM) to address the heterogeneity in data, where the number of subpopulations, M, is modeled as a random variable. Two special features of our models are (1) the adoption of a class of priors based on the Normalized Independent Finite Point Process (NIFPP) introduced by Argiento and De Iorio (2019), and (2) the inclusion of spike-and-slab components in generating NIFPP priors to achieve variable selection that is specific to each cluster. We demonstrate improved performance of our model over classical ones, thanks to the more flexible priors and the variable selection feature. For the computation of the proposed Bayesian models, we extend existing MCMC algorithms for NIFPP to perform versatile posterior inferences, such as clustering, individual profiling, and predictions.
|