Abstract:
|
The pathwise coordinate optimization is one of the most important computational frameworks for solving high dimensional nonconvex sparse learning problems. It differs from the classical coordinate optimization algorithms in three salient features: warm start initialization, active set updating, and strong rule for coordinate preselection. These three features grant superior empirical performance, but also pose significant challenge to theoretical analysis. To tackle this long lasting problem, we develop a new theory showing that these features play pivotal roles in guaranteeing its statistical and computational performance of the pathwise coordinate optimization framework. In particular, we analyze the existing methods for pathwise coordinate optimization and provide new theoretical insights into them. The obtained theory motivates the development of several modifications to improve the pathwise coordinate optimization framework, which guarantees linear convergence to a unique sparse local optimum with optimal statistical properties. This is the first result establishing the computational and statistical guarantees of the pathwise coordinate optimization in high dimensions.
|