Abstract:
|
High-dimensional linear regression models are nowadays pervasive in most research domains. We propose a general approach to handle data contaminations that might hinder classical estimators. Specifically, we consider the co-occurrence of mean-shift and variance-inflation outliers, which are modeled as additional fixed and random components, respectively, and evaluated independently. Our proposal performs variable selection while detecting and down-weighting variance-inflation outliers, excluding mean-shift outliers, and retaining non-outlying cases with full weights. Feature selection and mean-shift outlier detection are performed through a robust class of nonconcave penalization methods. Variance-inflation outlier detection is based on the penalization of the restricted posterior mode. The resulting approach satisfies a robust oracle property for feature selection in the presence of data contamination - where the number of features can increase exponentially with the sample size - and detects truly outlying cases with asymptotic probability one. This provides an optimal trade-off between high breakdown point and efficiency. Effective and lean heuristic methods are also presented.
|