Keywords: cancer subtyping,omics data integration
Recent technology developments have made it possible to generate various kinds of omics data, which provides opportunities to better solve problems such as disease subtyping or disease mapping using more comprehensive omics data jointly. Among many developed data-integration methods, the similarity network fusion (SNF) method has shown a great potential to identify new disease subtypes through separating similar subjects using multi-omics data. SNF effectively fuses similarity networks with pairwise patient similarity measures from different types of omics data into one fused network using both shared and complementary information across multiple types of omics data. In this article, we proposed an association-signal-annotation boosted similarity network fusion (ab-SNF) method, adding feature-level association signal annotations as weights aiming to up-weight signal features and down-weight noise features when constructing subject similarity networks to boost the performance in disease subtyping. In various simulation studies, the proposed ab-SNF outperforms the original SNF approach without weights. Most importantly, the improvement in the subtyping performance due to association-signal-annotation weights is amplified in the integration process. Applications to somatic mutation data, DNA methylation data and gene expression data of three cancer types from The Cancer Genome Atlas (TCGA) project suggest that the proposed ab-SNF method consistently identifies new subtypes in each cancer that more accurately predict patient survival and are more biologically meaningful.