Abstract:
|
Missing data is a ubiquitous problem in psychiatry, political science and many other biomedical and social science disciplines, especially with repeated measurements over time. As mobile devices being more widely adopted, collecting personal health data regularly or even in real-time has realized and revolutionized data collection. Multivariate time series of outcomes, exposures, and covariates evoke new challenges in handling missing data to get unbiased estimate of causal quantities of interest and call for more efficient data imputation approaches. We conducted a comprehensive comparison of the performance of complete-case analysis with most commonly used imputation methods, including mean imputation, locf imputation, multiple imputation, multiple imputation with long history information, and state-space model, in estimating causal quantities in mHealth data. We consider possible missing data in the outcome, exposure, or both under MCAR, MAR and MNAR. We further propose improvements to multiple imputation to allow for imputed values to be carried backward and forward across lag times to improve precision and coherence of temporal ordering for valid causal inference.
|