Abstract:
|
Heterogeneity in human diseases present clinical challenges in accurate diseases characterization and treatment. High throughput multi-omics data may offer a great opportunity to explore the underlying mechanisms of diseases. In addition, although increasingly accumulated data from existing literature may be informative about diseases of interest, the existing clustering procedures, such as Sparse Convex Clustering (SCC), cannot directly utilize the prior information even though SCC produces stable clusters. We develop a novel clustering procedure, information-incorporated Sparse Convex Clustering (iSCC), to respond the need of disease subtyping in precision medicine. Utilizing the text mining approach, the proposed method leverages the existing information from previously published studies through a group lasso penalty to improve disease subtyping and biomarker identification. In simulation studies, our proposed method outperforms other clustering methods, such as SCC, K-means, iCluster+, and BCC. The proposed method generates more accurate disease subtypes and identifies important biomarkers for future studies in the application of cancer related omics data.
|