We propose a stochastic labeling method for testing functional or regional enrichment pattern by mutational signatures in cancer. Mutation signatures in cancer are common mutation patterns in the 96 tri-nucleotide sequence contexts (i.e. six possible mutations for the middle nucleotide in the context of 16 possible combinations of two flanking nucleotides) of population samples learned by non-negative matrix factorization (NMF) on cancer samples. The learned mutation patterns contain information on mutagenic sources such as external carcinogens, endogenous mutagens, or genomic defects. However, mixed scores by NMF do not provide a link which mutation is attributable to which signature, thus are limited to quantify the burden of each signature attributable to risk factors.
We extended Latent Dirichlet Allocation (LDA) incorporating co-occurrence patterns of mutations and correlation among signatures. In this proof-of-concept study, we showed the utility of stochastic labeling methods for generating informative annotations of mutational signatures compared to existing deterministic labeling methods.