Abstract:
|
The generation of realistic synthetic data is essential for benchmarking numerous computation tools developed for single-cell omics data. Here we propose an all-in-one statistical framework that generates single-cell omics data at both the read and count levels from various cell heterogeneity structures, including discrete cell types, continuous cell trajectories, and spatial cell locations. Our framework uses a unified probabilistic model with accessible likelihood. This probabilistic formulation is advantageous in that it enables a straightforward discernment of the heterogeneity structure that best fits a single-cell omics dataset, by leveraging the statistical model selection principle. Moreover, the ability to generate sequence reads, in addition to read counts, allows the benchmarking of low-level computational tools.
|