Abstract:
|
Analysis of modern biomedical data is often complicated by the presence of missing values. To improve statistical efficiency, it is desirable to make use of potentially high-dimensional observed variables to impute or predict the missing values. Although many methods have been developed for prediction using high-dimensional variables, it is challenging to perform valid inference based on the predicted values. In this presentation, we develop an association test for an outcome variable and a potentially missing covariate, where the covariate can be predicted using a set of high-dimensional auxiliary variables. We use LASSO to estimate the model for the incomplete covariate and adopt a conditional likelihood approach to accommodate the estimation variability. The method is applicable to general outcome variables, including censored time-to-event variables. We demonstrate the validity of the proposed method and its advantages over existing methods through extensive simulation studies and provide an application to a major cancer genomics study.
|