Name: 2019 Joint Statistical Meetings
Start: 2019-07-27T07:00:00+00:00
End: 2019-08-01
Location: Colorado Convention Center

Abstract Details

Activity Number:	127 - SPEED: Statistical Learning and Data Science Speed Session 1, Part 1
Type:	Contributed
Date/Time:	Monday, July 29, 2019 : 8:30 AM to 10:20 AM
Sponsor:	Section on Statistical Learning and Data Science
Abstract #304546	Presentation
Title:	Does T-SNE Identify False Structure? Implications of Clusterability on T-SNE Maps
Author(s):	Paul Harmon* and Mark Greenwood and Tristan Anacker
Companies:	Montana State University and Montana State University and Montana State University
Keywords:	Data Science; High-Dimensional Data; Clustering; Dimension Reduction; Multivariate
Abstract:	t-distributed Stochastic Neighbor Embedding, or t-SNE, is an innovative method for dimension reduction that has proven to be an invaluable tool for identifying structure in high-dimensional data sets with well-defined intrinsic manifolds. t-SNE produces 2-dimensional scatterplot maps of high-dimensional data that maintain both meaningful groups in the low-dimensional space and global structure in the high-dimensional space. However, not all data have well-defined structure, and in some cases, an intrinsic manifold is not present. Recently, researchers have focused on developing metrics of ‘clusterability’ to determine how responsive a data set would be to clustering algorithms. Although t-SNE does not explicitly produce class labels for cluster membership, the method implicitly identifies groups of similar observations and produces well-separated maps of those groups. We examine the results of t-SNE maps on a range of datasets with varying levels of clusterability to assess the efficacy of the resulting scatterplot maps.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program

JSM 2019 Online Program

Abstract Details

American Statistical Association