Abstract:
|
Two key challenges in any analysis of single cell RNA-Seq (scRNA-Seq) data are excess zeros due to "drop-out" events and substantial overdispersion due to stochastic and systematic differences. Association analysis of scRNA-Seq data is further confronted with the possible dependency introduced by measuring multiple single cells from the same sample. Here, we propose TWO-SIGMA, a new TWO-component SInGle cell Model-based Association analysis method. The first component models the drop-out probability with a mixed effects logistic regression model, and the second component models the (conditional) mean read count with a log-linear negative binomial mixed effects regression model. Our approach is novel in that it simultaneously allows for overdispersion, accommodates dependency in both drop-out probability and mean mRNA abundance at the single-cell level, leads to improved statistical efficiency, and provides highly interpretable coefficient estimates. Simulation studies show advantages in terms of power gain and type-I error control over possible alternative approaches.
|