Online Program Home
  My Program

Abstract Details

Activity Number: 588 - Statistical Learning: Clustering
Type: Contributed
Date/Time: Wednesday, August 2, 2017 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #324469
Title: Randomized SUP: a Clustering Algorithm for Large-Scale Data
Author(s): Shang-Ying Shiu* and Ting-Li Chen and Yen-Shiu Chin and Wush Wu
Companies: Department of Statistics, National Taipei University and Institute of Statistical Sciences, Academia Sinica and Institute of Statistical Sciences, Academia Sinica and Department of Electrical Engineering, National Taiwan University
Keywords: SUP ; clustering ; randomized algorithm
Abstract:

The self-updating process (SUP) is a clustering algorithm which iteratively updates every data point according to its neighboring points. SUP has been shown to be particularly competitive in clustering (i) data with noise and (ii) data with a large number of clusters. However, the algorithm relies on the pairwise similarities between data points, which becomes computationally inefficient for large-scale data. We will present a randomized approach to overcome the computational difficulty. At each iteration, relatively small portions of data are considered for location updates. The Law of Large Numbers guarantees that the result of the randomized updating process converges to that of the original SUP when the number of data points becomes large. Simulations as well as real data will be presented to show the clustering performance of the proposed randomized algorithm.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association