Conference Program

Return to main conference page

All Times EDT

Thursday, September 22
Thu, Sep 22, 9:45 AM - 10:30 AM
White Oak
Poster Session

Multivariate Outcome-Guided Disease Subtyping for High-Dimensional Omics Data (303636)

View Presentation

Lu Tang, University of Pittsburgh - Pittsburgh, PA 
George C. Tseng, University of Pittsburgh - Pittsburgh, PA 
*Wei Zong, University of Pittsburgh - Pittsburgh, PA 

Keywords: High-dimensional clustering, disease subtyping, prediction, precision medicine.

Many complex diseases are manifested as a heterogeneity of therapeutic responses and outcomes. Classification of patients into subgroups with homogeneous outcomes can be beneficial to precision medicine. However, without any guidance, classical unsupervised clustering methods usually result in subgroups differentiated in demographic variables of little interests. While the univariate outcome-guided clustering has been developed to generate clusters associated with a specific outcome, it does not agree with the fact that multiple outcomes can be jointly associated with the underlying subtypes and forcing molecular clusters to be related to a specific outcome may result in only a few non-generalizable features. Therefore, we propose a unified multivariate outcome-guided clustering model (mogClust) to identify molecular subtypes that are relevant to multiple model-selected outcomes collectively. The two main components, disease subtyping model and outcome association model, interact with each other through a latent subtyping variable to facilitate feature selection and outcome selection toward meaningful cluster definition. An EM algorithm on a penalized likelihood was used for parameter estimation. Compared with other existing methods, we show the mogClust has improved clustering and feature selection performance with accurate outcome selection through extensive simulation studies. Application of our method to a lung disease dataset identifies four latent subtypes with progressive diagnosis compositions from COPD to idiopathic UIP and distinct genetic profiles. The four clusters are also found to be highly associated with the four outcomes picked by the model suggesting effective clustering and outcome selection.