Abstract:
|
Single-cell RNA-sequencing (scRNA-seq) allows profiling genome-wide expression heterogeneity in ensembles of cells. However, cells can vary due to both technical and biological factors, causing correlated gene expression changes. We build on sparse factor analysis models to jointly infer and account for known covariates such as batch, biological sources of variation in form of factors from a pathway database and additional confounding factors. Pathway factors are encoded using a spike-and-slab prior on the weights and a second level of factor-wise regularization is used to determine which known and hidden factors are relevant in a given dataset. We present an efficient variational inference scheme such that our model scales linearly in the number of cells and factors. In simulation studies as well as several real studies where the true sources of variation are well understood, we show that our model allows decomposing scRNA-seq data into interpretable components, thereby robustly revealing the drivers of expression heterogeneity. We illustrate the potential of our model by exploring associations between DNA methylation heterogeneity and pluripotency variation in single-cell data,
|