Abstract:
|
Single-cell RNA-Seq (scRNA-seq) is the most widely used high-throughput technology to measure genome-wide gene expression at the single-cell level. Unlike bulk RNA-Seq, the majority of reported expression levels in scRNA-seq are zeros and the proportion of genes reporting the expression level to be zero varies substantially across cells. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical versus biological variation. Here, we use an assessment experiment to examine data from published studies. We present evidence that some of these zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. This missing data problem is exacerbated by the fact that technical variation varies cell-to-cell, which can be confused with novel biological results. Finally, we propose a cell-specific censoring with a varying-censoring aware matrix factorization model (VAMF) for dimensionality reduction that permits the identification of factors in the presence of the above described systematic bias.
|