Online Program

Return to main conference page
Friday, May 31
Computational Statistics
Computational Statistics for Large-Scale Biological Data
Fri, May 31, 3:40 PM - 5:15 PM
Grand Ballroom K

Computationally Efficient High-Dimensional Interaction Modeling (305037)

Jacob Bien, University of Southern California 
Ryan Tibshirani, Carnegie Mellon University 
*Guo Yu, University of Washington 

Keywords: sparse, lasso, interactions, variable selection, regression, screening

Data sets with 100,000 or more predictor variables are common in fields such as biology. Yet if we wish to fit regression models with interactions, we are faced with enormous variable selection problems with at least five billion features. The scale of such problems demands seeking out computationally cheap methods (both in time and memory) that still have sound statistical properties. Motivated by these large-scale problem sizes, we develop a new method that performs feature selection for regression models with interactions. We provide theoretical results indicating favorable statistical properties and empirical results showing our method applied to large-scale interaction problems with strong statistical performance.