All Times ET
Program is Subject to Change
On the Missing Data Analysis with High-Dimensional Data (308004)Sixia Chen, University of Oklahoma Health Sciences Center
*Chao Xu, University of Oklahoma Health Science Center
Keywords: Missing data, high dimensional data, doubly robust, penalized regression
The advancement of data collection and storage technology produce big volume and high dimensional data for clinical and basic science research, such as the electronic health/medical records with hundreds even thousands of variables. Most of the existing methods for missing data imputation were designed for low dimensional data with less the number of variables (p) than sample size (n), which were inappropriate for high dimensional data (p=n) due to the rank deficiency of the design matrix or other problems. It is urgent to develop new method for the missing data analysis that can deal with the high dimensional data. We propose three approaches based on the folded concave smoothly clipped absolute deviation (SCAD) penalty function for handling missing data with high dimensional covariate in this study. The imputation approach solving is valid under the correct imputation model. The propensity score approach works under the correct non-response model. Doubly robust approaches work when one of the assumed models is correctly specified and they enjoy the so called doubly robustness properties. The simulation and real data analysis show our high dimensional approaches outperformed existing methods with less imputation bias and variance when p=n. The imputation and doubly robust approaches yielded similar result and were better than the propensity score approach. In addition, our approaches tightly matched the existing methods when analyzing low dimensional data.