Online Program

Return to main conference page
Tuesday, September 24
Tue, Sep 24, 1:15 PM - 2:30 PM
Thurgood Marshall South
The Catcher in the Rye: Missing Data Imputation

Strategies for Variable Selection in the Presence of Missing Data: With Clinical Applications (300954)

Prithish Banerjee, Chase Bank 
Ujjwal Das, Indian Institute of Management, Udaipur 
*Samiran Ghosh, Wayne State University 

Keywords: Groupwise Penalty, Multiple Imputation, Variable Selection

Selection of important variables contributing to the “true model” generating the data is one of the most challenging task in biomedical, social, and public health science. The problem gets complicated even further in the presence of missing data. A common and simplistic practice is Complete Case Analysis (CCA), which involves list-wise deletion of missing observations. The resulting analysis is highly restrictive, inefficient and often results in a biased model unless the missing pattern is completely at random (MCAR). Multiple imputation is a popular approach to tackle the problem of data analysis with missingness, under the missing at random (MAR) condition. However, combining the results of variables selected via Multiple Imputed (MI) datasets is nontrivial. This is because conventional method of combining parameter estimates obtained via MI datasets using Rubin's rule is not applicable for variable selection. In this talk we present an intuitive approach for doing variable selection under MAR assumption for linear model. Our approaches are closely related to penalized variable selection methods (e.g. Lasso, SCAD, MCP etc.) but requires significant further innovation due to missing data. The performance of the proposed methods is evaluated using extensive simulation studies and an application on Mental Health studies related to depression.