Abstract:
|
Unsupervised clustering is a common approach to discovering latent structure in high dimensional data. It has been successfully applied to genomic data in identifying, for example, subtypes of cancer and new cell types. It is widely used in single cell genomics data in visualization as well as in discovering new cell types or subtypes. A challenge in the choice of clustering methods is the difficulty of evaluating the performance of unsupervised clustering methods. This challenge becomes more serious in cell type clustering, even when there is some knowledge of ‘true clusters”, as existing metrics often treat the clusters as exchangeable and fail to recognize the natural hierarchy in cell type and subtypes. In this presentation we discuss the interpretation, the pros and cons of common metrics of clustering performance in the application of single cell data, and present novel measures designed for cell clustering.
|