Abstract:
|
Cancers arise owing to somatic mutations and the characteristic combinations of somatic mutations form mutational signatures. Despite many mutational signatures being identified, their mutational processes remain unknown, which hinders identification of interventions that may reduce somatic mutation burdens and prevent the development of cancer. We demonstrate that the unknown cause of a mutational signature can be inferred by the associated signatures with known etiology. However, existing association tests are not statistically powerful due to excess zeros in signatures data. To address this limitation, we propose a semiparametric kernel independence test (SKIT). The SKIT statistic is defined as the integrated squared distance between mixed probability distributions and is decomposed into four disjoint components to pinpoint the source of dependency. We derive the asymptotic null distribution and prove the convergence of power. Due to slow convergence, a bootstrap method is employed to compute p-values. We applied SKIT to TCGA data for over 9,000 tumors across 32 cancer types and identified a novel association between unknown 17 and APOBEC signatures in gastrointestinal cancers.
|