Abstract:
|
t-distributed Stochastic Neighbor Embedding, or t-SNE, is an innovative method for dimension reduction that has proven to be an invaluable tool for identifying structure in high-dimensional data sets with well-defined intrinsic manifolds. t-SNE produces 2-dimensional scatterplot maps of high-dimensional data that maintain both meaningful groups in the low-dimensional space and global structure in the high-dimensional space. However, not all data have well-defined structure, and in some cases, an intrinsic manifold is not present. Recently, researchers have focused on developing metrics of ‘clusterability’ to determine how responsive a data set would be to clustering algorithms. Although t-SNE does not explicitly produce class labels for cluster membership, the method implicitly identifies groups of similar observations and produces well-separated maps of those groups. We examine the results of t-SNE maps on a range of datasets with varying levels of clusterability to assess the efficacy of the resulting scatterplot maps.
|