Online Program

Return to main conference page
Friday, February 21
Fri, Feb 21, 5:15 PM - 6:30 PM
Regency EF
Poster Session 2 and Refreshments

Evaluation of Multivariate Classification Models for Analyzing NMR Metabolomics Data (304064)

*Thao T. Vu, University of Nebraska - Lincoln 

Keywords: metabolomics, multivariate, classification models, NMR

Analytical techniques (e.g. NMR and MS) can generate large metabolomics data sets containing thousands of spectral features derived from numerous biological observations. Multivariate data analysis is routinely used to uncover the underlying biological information contained within these large data sets by classifying the observations into groups (e.g., control versus treated) and identifying associated discriminating features. There are a variety of classification models to select from, such as partial least squares [PLS], orthogonal partial least squares [OPLS]) and machine learning algorithms (e.g., support vector machines or random forests). However, it is unclear which classification model, if any, is an optimal choice. Herein, we present a comprehensive evaluation of five common classification models routinely employed in the metabolomics field, based on simulated and experimental NMR data sets with various levels of group separation. Model performance was assessed by prediction accuracy rate, area under ROC curves, and the identification of true discriminating features. When models were stressed to subtle difference, OPLS emerged as best-performing model.