Abstract:
|
Gene Set Enrichment Analysis (GSEA) is a powerful inferential tool that incorporates knowledge in a prior defined gene sets (e.g. molecular pathways) to high-throughput data analysis. Knowledge-based gene sets are available in bioinformatics resources such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. In the generically constructed database, multifunctional genes may belong to several gene sets. However, most existing GSEA methods ignore the overlapping genes for study-specific analyses (e.g. disease-specific). In this study, we reveal the substantial overlapping in KEGG pathways. Under a disease-specific condition, we illustrate that the overlapping genes present pathway-specific activations. Further, we computationally decompose the overlapping genes in the study-specific context and develop appropriate similarity measures to assign their pathway memberships empirically. Unlike the traditional binary membership (i.e. either 0 or 1), the empirical membership is quantified using continuous weights. We conduct simulation studies and demonstrate through real data analyses. Lastly, all developed work are implemented with efficient algorithms in an R package.
|