Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 381 - Recent Advances in High-Dimensional Estimation and Inference Methods
Type: Topic Contributed
Date/Time: Wednesday, August 10, 2022 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #322433
Title: Theoretical Foundations of T-SNE for Visualizing High-Dimensional Clustered Data
Author(s): Rong Ma* and Tony Cai
Companies: Stanford University and University of Pennsylvania
Keywords: Clustering; Data visualization; Foundation of data science; Nonlinear dimension reduction; t-SNE
Abstract:

This paper investigates the theoretical foundations of the t-distributed stochastic neighbor embedding (t-SNE) algorithm, a popular nonlinear dimension reduction and data visualization method. A novel theoretical framework for the analysis of t-SNE based on the gradient descent approach is presented. For the early exaggeration stage of t-SNE, we show its asymptotic equivalence to power iterations based on the underlying graph Laplacian, characterize its limiting behavior, and uncover its deep connection to Laplacian spectral clustering, and fundamental principles including early stopping as implicit regularization. The results explain the intrinsic mechanism and the empirical benefits of such a computational strategy. For the embedding stage of t-SNE, we characterize the kinematics of the low-dimensional map throughout the iterations, and identify an amplification phase, featuring the intercluster repulsion and the expansive behavior of the low-dimensional map, and a stabilization phase. The general theory explains the fast convergence rate and the exceptional empirical performance of t-SNE for visualizing clustered data, brings forth the interpretations of the t-SNE visualization


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program