Human cancers form clones - sets of cells that exhibit similar mutations and genomic rearrangements. As clones evolve to resist chemotherapy understanding their molecular properties is crucial to designing effective treatments. While it is possible to measure both the DNA (that defines clonal structure) and RNA (that defines cell state) in single-cells, such assays are time consuming and hard-to-scale, meaning it is far more common to have large datasets where DNA and RNA is measured in separate cells, albeit from the same tumor consisting of similar clones.
Here we present a highly-scalable statistical method to probabilistically assign each cell as measured in gene expression space (scRNA-seq) to a clone defined in copy number space (scDNA-seq). Through simulations we demonstrate that relatively few (< 20%) genes must exhibit CNV-gene expression relationships for such assignment to be feasible. We apply our method to a patient-derived xenograft in breast cancer to characterize the gene expression of expanding clones. Finally, we show how our framework serves as a basis for generalized multiview clustering from unpairable data sources and discuss extensions.
|