Online Program

Return to main conference page

All Times EDT

, -
Virtual
Contributed Presentations

Pan-Cancer Identification of Clinically Relevant Genomic Subtypes Using Outcome-Weighted Integrative Clustering (309847)

*Arshi Arora, Memorial Sloan Kettering Cancer Center 

Keywords: supervised clustering, integrative clustering, patient survival, pan-cancer, prognostic stratification

Molecular phenotypes of cancer are complex and influenced by a multitude of factors. Conventional unsupervised clustering of heterogeneous cancer patient populations is inevitably driven by the dominant variation from major factors such as cell-of-origin or histology and may or may not be associated with clinical outcome of interest like survival time. Such a canonical clustering approach leads to meaningful clusters when, by chance, the major molecular patterns are also correlated with clinical outcomes of interest. When this is not the case a more targeted approach requiring effective extraction of information directly associated with patient outcome and localized molecular signature is essential.

We propose survClust, an outcome-weighted statistical learning algorithm for integrative molecular stratification focusing on patient survival. The algorithm learns a weighted distance matrix from each molecular data type that consists of a vector of regression effect sizes as weights of the outcome. To facilitate integration, the weighted distance matrices are averaged after standardization, summarizing intratumor and inter-patient heterogeneity. Multidimensional scaling (MDS) is then used to map the subjects into an n-dimensional space that preserves between-subject distances for clustering. Results are cross-validated and final supervised clustered class labels are provided carrying prognostic relevance.

survClust was performed on TCGA pan-cancer datasets, encompassing over 6,000 tumors over 18 cancer types, elaborating how our supervised clustering approach outperforms canonical unsupervised clustering approach while discovering indolent versus aggressive subgroups which may be useful toward patient stratification in clinical trial designs. Integration of multi-omics data expounds upon the cross-talk between different molecular platforms leading to substantially improved and novel subtypes not previously identified by unsupervised clustering, revealing survival associations driven by mutation burden and concurrent high CD8 T-cell immune marker expression; and the aggressive clinical behavior associated with CDKN2A deletion across 18 cancer types