Abstract:
|
Hidden Markov models have played a dominant role in identifying copy number abnormalities in tumor cells, known as copy number alteration analysis. Unlike normal cells, the copy number profile of tumor cells is not pure. It can be a combination of dominant mainclone genotypes and minor subclone genotypes across the genome. Identifying mainclone genotypes, subclone regions, and subclone genotypes, as wells as estimating the subclonal proportions in subclone regions are critical to understanding tumor progression. For this purpose, we propose a hidden Markov model with mixtures as emission distributions. For estimation, we add l1-norm to the mixture proportions and its first-order difference. This fused-lasso type of penalty realizes the goal of identifying subclone regions and estimating the mixture proportions at the same time. This model could be generalized as other data that are featured as stage-wise mixtures with underlying hidden states. We apply the new model to study renal cell carcinoma datasets in The Cancer Genome Atlas. In addition, we conduct simulation studies that show the good performance of the proposed approach.
|