Abstract:
|
Most variable selection techniques for high-dimensional models are designed to be used in settings where observations are independent and completely observed. In this paper, we present ThrEEBoost (Thresholded EEBoost), a general-purpose variable selection technique which accommodates "messy data" that requires an estimating equation by replacing the gradient of the loss by an estimating function. Thresholding affects the number of regression coefficients updated at each step, yielding new variable selection paths. ThrEEBoost was evaluated using simulation studies to assess the effects of different thresholds on prediction error, sensitivity, and specificity under sparse and non-sparse true models with correlated continuous outcomes. We show that when the true model is sparse or complex, ThrEEBoost achieves similar or lower prediction error to EEBoost, respectively. The technique is illustrated in the problem of identifying predictors of weight change in a longitudinal nutrition study.
|