Abstract:
|
Quantitative structure-activity relationship (QSAR) is a very commonly used technique for predicting biological activity of a molecule using information contained in the molecular descriptors. The large number of compounds and descriptors and sparseness of descriptors pose important challenges to traditional statistical methods and machine learning (ML) algorithms (such as random forest (RF)) were used in this field. Recently, Bayesian Additive Regression Trees (BART) has been demonstrated to be competitive with widely used ML approaches. Instead of focusing on accurate point estimation, BART is formulated entirely in a hierarchical Bayesian modeling framework, allowing one to quantify uncertainties and hence to provide with not only point but interval estimation for a variety of quantities of interest. We studied BART as a model builder for QSAR and demonstrated that the approach provides competitive results in comparison with RF. More importantly, we investigated BART's natural capability to generate interval estimation for not only molecular activities but also variable importance of different descriptors, which could not be easily obtained through other approaches.
|