Abstract:
|
Applications of single-cell RNA sequencing are thriving in biomedical research areas. This new technology provides unprecedented opportunities to study disease heterogeneity at the cellular level. However, unique characteristics of scRNA-seq data, including high dimensionality, dropout rates, and batch effects, bring great difficulty to the proper analysis of such data. We present a unified Regularized Zero-inflated Mixture Model framework designed for scRNA-seq data (RZiMM-scRNA) to simultaneously detect cell subpopulations and identify gene differential expression based on a developed importance score, accounting for both dropouts and batch effects. We conduct extensive simulation studies in which we evaluate the performance of RZiMM-scRNA and compare it with several popular methods, including Seurat, SC3, K-Means, and Hierarchical Clustering. Simulation results show that RZiMM-scRNA demonstrates superior clustering performance and enhanced biomarker detection accuracy compared to alternative methods, especially when cell subgroups are less distinct, verifying the robustness of our method. Finally, we perform a real data study on glioma to demonstrate the promise of RZiMM-scRNA.
|