Abstract:
|
Missing data is common in real world studies and can create issues in statistical inference. Discarding cases that have missing values or replacing the missing values with inappropriate imputation techniques can both result in biased estimates. Many imputation techniques have assumptions that can be hard to assess in practice, therefore the actual appropriate imputation technique is often unclear. To address this issue, a factorial simulation design was developed to measure the impact of certain data set characteristics on the validity of several popular imputation techniques. The factors in the study were missing mechanisms, missing data percentages, and missing data methods. The evaluation included parameter estimates, bias, confidence interval coverage and width for the parameters of interest. Simulation results suggest all three factors have significant impact on the quality of the estimation. Additional factors such as number of variables, type of variables, and correlations of data are being incorporated in the simulation. Finally, real data examples are discussed to illustrate the applicability of different missing data imputation methods.
|