Abstract:
|
Gene set testing (GST) methods allow omnibus testing for a pre-specified set of genes, often by assigning an overall p-value to the set of genes. By considering such sets of genes, GST methods can improve statistical power and help to understand how genes in a common pathway jointly regulate biological processes. There are many GST methods for bulk RNA-seq data, however no such methods are specifically designed for use with single-cell RNA-seq data, which often exhibits excess zeros and overdispersed counts as compared to traditional bulk RNA-seq data. Here, we propose TWO-SIGMA-geneset to conduct gene set testing using single-cell RNA-seq data. Our focus is on competitive gene set testing, in which the genes in a given set are compared to the remaining collection of genes. Simulation studies show that type-I error is well controlled in a variety of representative scenarios. Power is improved over state-of-the-art methods, including CAMERA, when varying: gene set sizes, the scale of differential expression, the proportion of drop-out events, or the presence of individual-level random effects. Extensions to allow for inter-gene correlation are discussed.
|