Online Program Home
  My Program

Abstract Details

Activity Number: 466 - Statistical Models for Complex Biomedial Data
Type: Contributed
Date/Time: Wednesday, August 2, 2017 : 8:30 AM to 10:20 AM
Sponsor: Biometrics Section
Abstract #323543 View Presentation
Title: Building Quantitative Structure?Activity Relationship Models Using Bayesian Additive Regression Trees
Author(s): Dai Feng* and Matthew Pratola and Robert McCulloch and Vladimir Svetnik
Companies: Merck and The Ohio State University and Arizona State University and Merck & Co., Inc.
Keywords: QSAR ; BART ; prediction error ; variable importance ; interval estimation
Abstract:

Quantitative structure-activity relationship (QSAR) is a very commonly used technique for predicting biological activity of a molecule using information contained in the molecular descriptors. The large number of compounds and descriptors and sparseness of descriptors pose important challenges to traditional statistical methods and machine learning (ML) algorithms (such as random forest (RF)) were used in this field. Recently, Bayesian Additive Regression Trees (BART) has been demonstrated to be competitive with widely used ML approaches. Instead of focusing on accurate point estimation, BART is formulated entirely in a hierarchical Bayesian modeling framework, allowing one to quantify uncertainties and hence to provide with not only point but interval estimation for a variety of quantities of interest. We studied BART as a model builder for QSAR and demonstrated that the approach provides competitive results in comparison with RF. More importantly, we investigated BART's natural capability to generate interval estimation for not only molecular activities but also variable importance of different descriptors, which could not be easily obtained through other approaches.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association