Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 218 - Novel Methodology Development in High-Dimensional Longitudinal Data Analysis
Type: Invited
Date/Time: Tuesday, August 9, 2022 : 8:30 AM to 10:20 AM
Sponsor: ENAR
Abstract #319310
Title: Multi-Source Single-Cell Data Integration by Minimized Aggregated Wasserstein Barycenter
Author(s): Lynn Lin* and Jianbo Ye and Jia Li
Companies: Duke University and Amazon Lab126 and Pennsylvania State University
Keywords: Gaussian mixture model; Integrative analysis; Minimized aggregated Wasserstein; Multi-source data; Single-cell; Wasserstein barycenter
Abstract:

One key challenge encountered in single-cell-data clustering is to combine clustering results of datasets acquired from multiple sources. We propose to represent the clustering result of each dataset by a Gaussian mixture model (GMM) and produce an integrated result based on the notion of Wasserstein barycenter. However, the precise barycenter of GMMs is computationally infeasible. Importantly, it may not be a GMM with a reasonable number of components. We thus propose to use the Minimized Aggregated Wasserstein (MAW) barycenter. With significantly improved tractability, the MAW distance for GMMs approximates the Wasserstein metric, and recent theoretical advances further justify its usage. In this talk, we develop a new algorithm for computing the barycenter of GMMs under MAW. We also prove that the MAW barycenter has the same expectation as the Wasserstein barycenter. Our proposed algorithm for clustering integration scales well with the data dimension and the number of mixture components, with complexity independent of data size. We demonstrate that the new method achieves better clustering results on several single-cell RNA-seq datasets than a few popular methods.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program