Abstract:
|
Single cell RNA sequencing (scRNA-seq) is rapidly developing and widely used to study gene expression at the individual cell level. Although its usage is increasing, there has not been any statistical guide published to aid in designing sample sizes for a scRNA-seq experiment. Here, we consider the sample size issue in scRNA-seq experiment, in particular the number of cells sequenced to identify a rare novel subpopulation from a seemingly identical population. Since statistical clustering techniques are generally used for finding a new cell type in most cases, traditional criteria such as type 1 error, statistical significance are not appropriate. Instead, we define separation of populations probabilistically and consider the three parameters to calculate the number of cells: the rarity of the novel subpopulation, the proportion of differentionally expressed genes, and the strength of the differentially expressed genes. Additionally, we developed R package to compute the number of cells in scRNA-seq.
|