Abstract:
|
Motivation :- Advances in next-generation sequencing methods has enabled researchers/agencies to collect a wide variety of sequence data across multiple platforms . The primary motivation behind such an exercise is to analyze these datasets jointly to gain new insights into disease prevention, treatment, and cure. Clustering of such datasets, can provide the much-needed insight into disease subtypes, and biological associations. However, the differing scale, the heterogeneity, and the size of the mixed dataset is hurdle for such analysis.
Result :- We propose an integrated bayesian nonparameteric approach for clustering of high-dimensional mixed data. We make use of Generalized Linear Model (GLM), and latent variable approaches to integrate the mixed dataset. We apply our method to glioblastoma multiforme dataset. Our method performs simultaneous clustering of high-dimensional mixed data. Moreover, we show that the cluster detection is aposteriori consistent, as the number of covariates and subject grows. As a byproduct of our work, we derive a working value approach to perform bayesian beta regression.
|