Abstract:
|
Standard stepwise regression techniques that rely on p-values to select predictors often perform poorly with high dimensional data, especially when the number of candidate predictors (P) is large and exceeds the number of cases (P > N). Alternatively, sparse regression techniques that employ regularization strategies have been shown to yield reliable predictions with high dimensional data. In particular, Correlated Component Regression (CCR), which is scale invariant, and Lasso are two such regression methods, both of which utilize cross-validation as an alternative to p-values.
As pointed out by Magidson (2013), when existing, suppressor variables are often among the most important predictors in the regression. In this presentation, we simulate high dimensional data under the assumptions of 2-group linear discriminant analysis (LDA), where one of the important predictors is a suppressor variable. In evaluating the results, we find that CCR outperforms lasso in part because CCR is much more likely than lasso to include the suppressor variable among the final model predictors. We discuss unique features built into the CCR approach that might explain why this difference occurs.
|