Friday, May 31

Computational Statistics for Large-Scale Biological Data

Fri, May 31, 3:40 PM - 5:15 PM
Grand Ballroom K

Computationally Efficient High-Dimensional Interaction Modeling (305037)

Jacob Bien, University of Southern California
Ryan Tibshirani, Carnegie Mellon University
*Guo Yu, University of Washington

Keywords: sparse, lasso, interactions, variable selection, regression, screening

Data sets with 100,000 or more predictor variables are common in fields such as biology. Yet if we wish to fit regression models with interactions, we are faced with enormous variable selection problems with at least five billion features. The scale of such problems demands seeking out computationally cheap methods (both in time and memory) that still have sound statistical properties. Motivated by these large-scale problem sizes, we develop a new method that performs feature selection for regression models with interactions. We provide theoretical results indicating favorable statistical properties and empirical results showing our method applied to large-scale interaction problems with strong statistical performance.

Online Program

Computationally Efficient High-Dimensional Interaction Modeling (305037)

ASA Meetings Department