Abstract:
|
Single-cell RNA sequencing (scRNA-seq) technology provides an opportunity to study gene expression at single cell resolution. However, prevalent dropout events in the data cause high sparsity and noise level that obscure downstream analysis. We propose a gene-graph-based imputation method, G2S3, that imputes for dropouts by borrowing information from adjacent genes in a sparse gene graph learned from the data via graph signal processing. G2S3 optimizes a sparse graph structure from each gene’s expression profile under the assumption that biological signal changes smoothly between genes closely residing on the graph. We applied G2S3 and other imputation methods to several scRNA-seq datasets to assess and comprehensively compare their performance. Results showed that G2S3 is superior in recovering the true gene expression level, identifying true cell subtypes and stages, improving differential expression analyses, and enhancing the discovery of regulatory relationships. Moreover, G2S3 is computationally efficient for large scRNA-seq datasets with hundreds of thousands of cells which have become more available with the advance of sequencing technology.
|