Online Program Home
My Program

Abstract Details

Activity Number: 127 - SPEED: Statistical Learning and Data Science Speed Session 1, Part 1
Type: Contributed
Date/Time: Monday, July 29, 2019 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #304546 Presentation
Title: Does T-SNE Identify False Structure? Implications of Clusterability on T-SNE Maps
Author(s): Paul Harmon* and Mark Greenwood and Tristan Anacker
Companies: Montana State University and Montana State University and Montana State University
Keywords: Data Science; High-Dimensional Data; Clustering; Dimension Reduction; Multivariate
Abstract:

t-distributed Stochastic Neighbor Embedding, or t-SNE, is an innovative method for dimension reduction that has proven to be an invaluable tool for identifying structure in high-dimensional data sets with well-defined intrinsic manifolds. t-SNE produces 2-dimensional scatterplot maps of high-dimensional data that maintain both meaningful groups in the low-dimensional space and global structure in the high-dimensional space. However, not all data have well-defined structure, and in some cases, an intrinsic manifold is not present. Recently, researchers have focused on developing metrics of ‘clusterability’ to determine how responsive a data set would be to clustering algorithms. Although t-SNE does not explicitly produce class labels for cluster membership, the method implicitly identifies groups of similar observations and produces well-separated maps of those groups. We examine the results of t-SNE maps on a range of datasets with varying levels of clusterability to assess the efficacy of the resulting scatterplot maps.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program