Abstract:
|
Datasets continue to grow, both in width and depth. For wide data, variable selection is often critical, in order to separate the relatively few needles from the mass of hay. Today we have many options to choose from, including lasso, relaxed lasso, forward stepwise, and quite recently best subset (thanks to recent advances in mixed-integer optimization). In this talk I will review these various options, and argue that a variant of the relaxed lasso is hard to beat in terms of accuracy, speed, and scope of applicability.
|