Abstract:
|
Population heterogeneity exists everywhere in real life. For instance, complexity of the underlying disease process may cause heterogeneity in the association between disease and biomarkers. In the context of binary markers such as single nucleotide polymorphisms (SNPs), we use ideas from logic regression and seek Boolean combinations that can explain association with a binary disease response. While we typically deal with binary data, our methods may also be used for continuous variables via appropriate discretization. We cast heterogeneity as unknown subgroups in the population; hence it is natural to adopt Dirichlet process mixture model (DPMM) and mixture of finite mixture model (MFM) for our Bayesian formulation because of their clustering effect. We describe our model that incorporates the Boolean relations as parameters arising from a DPMM or MFM, and our way of addressing the associated challenges both in terms of specification of the base distribution and estimation using a MCMC approach. For MCMC, we implement both an incremental algorithm (Gibbs sampler) and nonincremental algorithm (split and merge). We illustrate the performance with simulation and discuss application.
|