Online Program Home
My Program

Abstract Details

Activity Number: 655
Type: Contributed
Date/Time: Thursday, August 4, 2016 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #320072 View Presentation
Title: A Simultaneous Variable Selection and Clustering Method for High-Dimensional Multinomial Regression Model
Author(s): Sheng Ren* and Jason Lu and Emily Lei Kang
Companies: University of Cincinnati and Cincinnati Children's Hospital Research Foundation and University of Cincinnati
Keywords: High-dimensional ; variable selection ; variables clustering ; multinomial regression

We propose a new data-driven simultaneous variable selection and clustering method for high-dimensional multinomial regression. Unlike other grouping pursuit methods, for example regression with Graph Laplacian penalty, our method does not assume that moderate to highly correlated variables have similar regression coefficients or should belong to same clusters. Relaxing this assumption is practically meaningful when we have a multinomial response variable. For example, moderate to highly correlated expressed genes may associate with different subtypes of a disease. We propose a penalty function taking both regression coefficients and pairwise correlation into account for defining variables' clusters. An algorithm with respect to this new penalty function is also developed, incorporating both convex optimization and clustering. We demonstrate the performance of our method via a simulation study and compare it with some other methods, showing that our method is able to yield correct variable clustering and to improve prediction performance. A real data example will also be presented.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association