Abstract:
|
Rapidly duplicating cells mutate to form clonal and sub-clonal populations with genomic heterogeneity. Correctly identifying variants as clonal or sub-clonal is essential for understanding cancer cell progression. In addition to the uncertainty in the number of populations, cluster identification is complicated by contamination of cancer cell samples with normal cells. Thus to accurately model mutation profiles, it is necessary to estimate the contamination rate and simultaneously determine the fraction of cells containing the specific mutation. We propose a hierarchical Bayesian nonparametric model for data consisting of variant counts and depth of reads across 22 chromosomes. We model this data as coming from a binomial distribution with Dirichlet process prior. We extend this model to a framework of dependent Dirichlet processes where clustering structure is shared across chromosomes. We employ a Markov chain Monte Carlo algorithm to sample from the joint posterior distribution of the contamination rate and the count parameter. To address computational challenges due to the size of the data, we implement a parallel sampling scheme to improve speed and efficiency.
|