Abstract:
|
Single-cell RNA-sequencing (scRNA-seq) enables measurement of genome-wide transcription levels at the resolution of individual cells. However, the amount of RNA in scRNA-seq experiments in each cell can be extremely low, which could lead to high noise, zero-inflated and high dimensional data, hindering cell type characterization, especially when the gene expression differences between experiments (i.e., batch effects) outweigh differences between cell types (biological signals). Here, we apply a sparse supervised Canonical Correlation Analysis (sCCA) approach to accurately identify cell subtypes in a shared low dimensional space that is not associated with the batches or other confounders. In simulated experiments, we demonstrated that sparse sCCA can reliably recover true underlying cell subpopulations and driving genes in different signal-noise settings by comparing to Seraut, mnncorrect and ZINB-WaVE. The result also shows that we can potentially predict the outcome and compare cell types between groups using canonical vectors.
|