Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 68 - Modern Statistical Learning Methods
Type: Contributed
Date/Time: Monday, August 3, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #313668
Title: Improved Strategies for Clustering Objects on Subsets of Attributes
Author(s): Maarten Kampert* and Jacqueline Meulman and Jerome Friedman
Companies: and Leiden University, Stanford University and Stanford University
Keywords: distance based clustering; unsupervised learning; dimension reduction; variable selection ; high-dimensional data ; COSA

Cluster discovery in high-dimensional settings is challenging when objects do not cluster on all attributes, or a single common subset, but rather on different subsets of attributes. To reveal such a clustering structure, the COSA procedure was proposed (Clustering Objects on Subsets of Attributes) that produces a representative distance matrix by finding differential attribute weights. This COSA distance matrix can subsequently be analyzed by a variety of distance-based analysis methods, such as hierarchical clustering or multidimensional scaling. We propose a series of improvements to the original procedure by a) making one of the tuning parameter superfluous, b) allowing for variable selection via zero-valued attribute weights and c) adjusting the COSA distance so as to better separate objects belonging to different clusters. In addition, we implement a more general regularization strategy for the attribute weights, which allows for user-specified initialization and leads to improved group extraction. We demonstrate the performance of  COSA by comparing it to the original version, and to a number of other state-of-the-art methods, using both simulated and real omics data sets.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program