Abstract:
|
A tumor contains subpopulations of cells determined by overlapping sets of single nucleotide variants (SNVs). Such subpopulations are known as subclones and are an important target for precision medicine. Reconstructing such subclones from next-generation sequencing (NGS) data is one of the major challenges in precision medicine. We present PairClone as a new tool to reconstruct tumor subclones based on NGS data. The main idea of PairClone is to model read counts mapped to pairs of proximal and phased SNV's, rather than marginal read counts mapped to unpaired SNVs as in most existing methods. Through Bayesian nonparametric models, we estimate posterior probabilities of the number, genotypes and population frequencies of subclones in one or more tumor sample We use the categorical Indian buffet process (cIBP) to define the subclones as a vector of categorical matrices corresponding to a set of mutation pairs. Performance of PairClone is assessed using simulated and real datasets. An open source software package can be obtained at http://www.compgenome.org/pairclone.
|