Online Program

Saturday, February 20
PS3 Poster Session 3 & Continental Breakfast sponsored by Capital One Sat, Feb 20, 8:00 AM - 9:15 AM
Ballroom Foyer

Introduction and Comparison of Different Predictive Models Using Incomplete and High-Dimension Data (303213)

*Jie Yang, Stony Brook University 

Keywords: predictive modeling, classification models, feature selection, data imputation

Predictive modeling aiding decision making has been widely used in a number of fields such as chemistry, computer science, physics, economic, finance and statistics. Many models have been proposed to make an accurate prediction. These models include: regression based methods (logistic regression, penalized logistic regression, neural network, partial least square discriminant analysis), tree based methods (random forests, decision trees with bagging and/or boosting), and model-free methods (support vector machine, K-nearest neighbors, Naïve Bayes). However, no model consistently outperforms the rest. There are also many other factors influencing the final model’s prediction accuracy such as different feature selection methods and different data imputation technique for incomplete data. We use a real data as an example to compare the performance of difference predictive models for predicting major depression disorder based on brain imaging data (PET and MRI) and other clinical and patient characteristic data such as genetic information, medical history, age, gender. Based on this study’s results, general recommendation will be offered for any predictive modeling process in practice.