The genome of a cancer carries somatic mutations that are the cumulative consequences of the DNA damage and repair processes operative during the cellular lineage between the fertilized egg and the cancer cell. Each process causing mutations leaves a characteristic imprint on the genome of a cancer cell, termed, mutational signature.
In this talk, I will demonstrate that modeling mutational signatures as a blind source separation problem allows developing an effecting computational methodology for deciphering mutational signatures from cancer genomes. By applying unsupervised machine learning matrix factorization approaches to 12,023 cancers, we are able to reveal more than 40 distinct mutational signatures. Further application of tensor factorization approaches reveals the dynamics of mutational signatures across cancer subclones. Many of the signatures extracted in an unsupervised manner match mutational patterns generated by known carcinogens.
The results reveal the diversity of mutational processes underlying the development of cancer as well as the ability of unsupervised machine learning approaches to extract meaningful previously unknown features from complex datasets.