Abstract:
|
Certain diseases, such as hypertension and metabolic syndrome, are derived directly from cut offs based on physiologic measurements (risk factors). While the definitions for these diseases simplify the determination of a diagnosis, it is not clear how to determine the diagnosis when there is missing data in the risk factors. We compare the performance of various imputation approaches for computing disease diagnosis in the presence of missing values. Four imputation techniques are evaluated including imputing the risk factors to define the disease diagnosis and performing additional imputations based on the predicted disease diagnosis, imputing the risk factors to define the disease diagnosis, imputing the disease diagnosis directly, and imputing the disease diagnosis and risk factors simultaneously. To evaluate each approach, we consider data from the 2015-2016 National Health and Nutrition Examination Survey as well as simulated data. We discuss the performance of the imputation methods for varying levels of missingness and correlation.
|