Activity Number:
|
588
- Statistical Learning: Clustering
|
Type:
|
Contributed
|
Date/Time:
|
Wednesday, August 2, 2017 : 2:00 PM to 3:50 PM
|
Sponsor:
|
Section on Statistical Learning and Data Science
|
Abstract #324469
|
|
Title:
|
Randomized SUP: a Clustering Algorithm for Large-Scale Data
|
Author(s):
|
Shang-Ying Shiu* and Ting-Li Chen and Yen-Shiu Chin and Wush Wu
|
Companies:
|
Department of Statistics, National Taipei University and Institute of Statistical Sciences, Academia Sinica and Institute of Statistical Sciences, Academia Sinica and Department of Electrical Engineering, National Taiwan University
|
Keywords:
|
SUP ;
clustering ;
randomized algorithm
|
Abstract:
|
The self-updating process (SUP) is a clustering algorithm which iteratively updates every data point according to its neighboring points. SUP has been shown to be particularly competitive in clustering (i) data with noise and (ii) data with a large number of clusters. However, the algorithm relies on the pairwise similarities between data points, which becomes computationally inefficient for large-scale data. We will present a randomized approach to overcome the computational difficulty. At each iteration, relatively small portions of data are considered for location updates. The Law of Large Numbers guarantees that the result of the randomized updating process converges to that of the original SUP when the number of data points becomes large. Simulations as well as real data will be presented to show the clustering performance of the proposed randomized algorithm.
|
Authors who are presenting talks have a * after their name.