Abstract:
|
We demonstrate the usefulness of submodularity in statistics. Greedy algorithms such as forward stepwise regression and the lasso perform well in situations that can be characterized by submodularity. In particular, submodularity of the coefficient of determination, R$^2$, provides a natural way to analyze the effects of collinearity on model selection. In model selection, we encounter the search problem of identifying a subset of $k$ covariates with predictive loss close to that of the best model of $k$ covariates. Submodularity arises naturally in this setting due to its deep roots within combinatorial optimization. It provides structural results for discrete convexity as well as guarantees for the success of greedy algorithms. In statistics, submodularity isolates cases in which collinearity makes the choice of model features difficult from those in which this task is routine. Submodularity of R$^2$ is closely related to other statistical assumptions used in variable screening and proving the performance of the Lasso. This has important implications for the use of model selection procedures: the situations in which forward stepwise and Lasso are successful are closely related.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.