Abstract:
|
Electronic health records contain many, many possible variables on many patients, but with missing information on some patients. In this talk we will discuss appropriate ways to conduct variable selection with missing data. We assume that data are missing at random and consider variable selection methods that can be combined with imputation. We investigate a general resampling approach (BI-SS) that combines bootstrap imputation and stability selection, the latter of which was developed for fully observed data. The proposed approach is general and can be applied to a wide range of settings. We will report on simulation studies that demonstrate the performance of BI-SS is the best or close to the best compared to alternative methods and is relatively insensitive to tuning parameter values in terms of variable selection, compared with several existing methods for both low-dimensional and high-dimensional problems. We will also demonstrate this approach in two real data examples.
|