Online Program

Return to main conference page

All Times EDT

Friday, June 5
Machine Learning
Machine Learning 2
Fri, Jun 5, 1:25 PM - 3:00 PM
TBD
 

Modern Multiple Imputation Applied to Functional Data (308351)

*Aniruddha Rajendra Rao, Pennsylvania State University 
Matthew Reimherr, Pennsylvania State University 

Keywords: Functional Data Analysis, Sparse, Irregular, Imputation, Non-Linear Models, Missing Data, PACE, MICE.

This work considers the problem of fitting functional data models with sparsely and irregularly sampled data. It overcomes the limitations of current state of the art methods, which face major challenges in the fitting of more complex nonlinear models. Currently, many of these models cannot be consistently estimated unless the number of observed points per curve grows sufficiently quickly with the sample size, whereas, we show numerically that an approach based on Random Forests and multiple imputation can produce consistent estimates more generally. We extend ideas of MissForest and Local Linear Forest to Functional Data and compare the performance with principal components analysis through conditional expectation (PACE) and other multivariate multiple imputation methods such as Multivariate Imputation by Chained Equations (MICE). This work is motivated by a longitudinal study on smoking cessation, in which electronic medical records allow for the collection of a great deal of data. The sampling is highly variable from smoker to smoker. Using our method, we can clearly see the relation between relapse and blood pressure (Diastolic) (BP). Smokers with high BP or whose BP sees a sudden increase tend to relapse more. We evaluate the performances on multiple simulations and real datasets coming from a diverse selection with artificially introduced missingness ranging from 50% to 90%. Additionally, the Random Forest methods exhibits attractive computational efficiency and can cope with higher dimensional data when compared with PACE and MICE.