Abstract:
|
This paper studies how to model skewed, heteroscedastic, and continuous or semi-continuous responses and address the associated variable selection possibly in high dimensions. We propose a Skewed Sparse Regression hurdle model which includes the unknowns of skewness, scale and mean in high dimensions simultaneously. To tackle the computational challenge from non-convex and non-smoothness and diversity of penalties, we combine Blockwise Coordinate Descent and Majorize-Minimization algorithms to develop a highly scalable and efficient algorithm with guaranteed convergence. Our statistical analysis ensures that the computation obtained blockwise Skewed Sparse Regression estimators, though not necessarily globally optimal, still enjoy the minimax rate up to a logarithm term at the occurrence of skewness. Extensive simulation studies show that in comparison with state of the art methods, Skewed Sparse Regression can achieve better estimation accuracy and selection consistency with, however, substantially reduced computation cost. We also demonstrate how a hurdle model derived from Skewed Sparse Regression can help analyze the Medical Expenditure Panel Survey data.
|