Abstract:
|
Our ability to predict biological function (phenotype) from genetic background (genotype) impacts numerous biological domains. Black-box neural networks are currently the dominant modeling choice due to their unsurpassed ability to generate accurate out-of-sample predictions. These models suffer, however, from their inherent inability to explain their predictions. As an alternative, we developed a hierarchical Bayesian model that is inherently interpretable, called LANTERN. LANTERN learns a low-dimensional latent space where mutations combine additively - with dimensionality learned automatically from the data through a hierarchical prior on the variance of each dimension. The latent phenotype is then transformed to observed phenotype measurements through a smooth, non-linear surface, which we learn with a nonparametric Gaussian process prior. To facilitate scalability to large-scale data, we adopted a stochastic variational inference approach. Through its design, LANTERN's predictions are easily decomposed into interpretable components. Despite this simplicity, LANTERN outperforms or equals the predictive accuracy of neural networks across multiple large-scale measurements.
|