Online Program

Return to main conference page
Saturday, May 19
Machine Learning
Feature Selection
Sat, May 19, 1:15 PM - 2:45 PM
Grand Ballroom D
 

Supervised Clustering via an Implicit Network for High Dimensional Data (304533)

Tucker McElroy, U.S. Census Bureau 
*Brandon Woosuk Park, George Mason University 
Anand N. Vidyashankar, George Mason University 

Keywords: Supervised Clustering, Implicit Network, Network-wide metrics, High dimension

In high dimensional data analysis, where the number of parameters exceeds the sample size, it is critical to identify features that are associated with the response variable. It is often important to detect groups of features, referred to as clusters, which have similar effects on the response variable. This allows one to provide summarized information about the clusters. In this presentation, we introduce a network-based approach for a high dimensional data analysis. We describe a new method for constructing an implicit network and provide a new supervised clustering algorithm based on network-wide metrics. We study the properties of the network-wide metrics and establish theoretical guarantees for the consistency of the supervised clustering algorithm in a high dimensional setting. In addition, simulation studies and the application to real data represent the performance of our supervised clustering algorithm.