Abstract:
|
Clustering single-cell RNA-seq (scRNA-seq) data is a critically important task. Clustering results themselves are of great importance for shedding light on tissue complexity including the number of cell types present and transcriptomic signatures of each cell type. Due to its importance, several novel methods have been developed recently for clustering scRNA-seq data. However, different approaches generate varying estimates regarding number of clusters and cluster assignments. It is usually hard to gauge which method to use because none of the clustering methods always outperforms others across various datasets. Our SAME-clustering takes multiple sets of clustering results and adopts a probabilistic model to build a consensus, which provides robust and improved clustering results. Specifically, SAME-clustering uses a finite mixture model of multinomial distributions. We have tested SAME-clustering across 15 datasets, with number of clusters varying from 3 to 14, and number of single cells from 49 to 32,695. Results show that our SAME-clustering ensemble method, using a mixture model, yields enhanced clustering, in terms of both cluster assignments and number of clusters.
|