Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 440 - SLDS CSpeed 8
Type: Contributed
Date/Time: Thursday, August 12, 2021 : 4:00 PM to 5:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #318403
Title: Estimating Uncertainty of Machine Learning Predictions Using Bayesian Additive Regression Trees
Author(s): Jeong Hwan Kook* and Andy Liaw and Yuting Xu and Himel Mallick and Vladimir Svetnik
Companies: Merck & Co., Inc. and Merck & Co., Inc. and Merck & Co., Inc. and Merck Research Laboratories and Merck & Co.
Keywords: Machine Learning; QSAR; Uncertainty Estimation; Bayesian Additive Regression Trees; Conditional Coverage
Abstract:

Advances in Machine Learning (ML) led to the development of highly accurate models for Quantitative Structure-Activity Relationship (QSAR) used in predicting the biological activity of a molecule with molecular descriptors. QSAR applications often require having quantitative estimations of the prediction uncertainty (PU) such as prediction intervals (PI) with the predictions. Owing to the advantage of providing estimates for both predictions and PU’s, we examined Bayesian Additive Regression Trees (BART) as a model for QSAR. In terms of prediction accuracy, BART underperformed compared to ML algorithms such as Deep Neural Network (DNN) and Light Gradient Boosting Machine (LGBM). However, estimation of PU for these ML algorithms can be quite challenging due to parameter tuning or methodological constraints. Moreover, the conditional coverage probabilities of these methods have not been studied sufficiently. In this work utilizing BART, we propose a novel method for PU estimation which is agnostic to the activity prediction algorithm, e.g. DNN, LGBM, and provides favorable conditional PI estimates compared to alternative methods using 30 diverse QSAR datasets.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program