Keywords: Machine learning, imbalanced design, multi-level classifier, MRI, Alzheimer's
The growing complexity of big-data has led us to develop, more advanced data mining algorithms to cater to specific data structures. In this paper, we compare the performance of commonly used algorithms in an imbalanced (multiple minority groups) multi-level classification problem where 'n' is small. Additionally, we also evaluate hybrid methodologies. The predictors were brain MRI data (cortical thickness) in 86 Regions of Interest to predict three distinct Alzheimer’s disease subtypes in 121 clinical AD subjects: HpSp (11), Limbic (22), & Typical (88). We tested the following models: a) Lasso b) Elastic Net c) Classification and Regression Tree c) Random Forest d) Gradient Boosting Machines (GBM) e) Synthetic Minority Over-sampling Technique (SMOTE) with GBM e) AdaBoost f)SMOTE with AdaBoost. Five-fold cross validation showed that a hybrid two-level SMOTE-GBM outperformed all algorithms with an overall accuracy of 81% [80% (Typical), 82% (Limbic) and 85%(HpSp)]. Even though multi-class algorithms have been developed to outperform multiple two-class problems, hybrid or problem specific customized tools may be needed.