Abstract:
|
Our genomes represent a cryptic record of human history, requiring sophisticated statistical tools to decode patterns of natural selection in response to changing diets, climates, and pathogens. First, we develop a supervised classifier called SWIFr built using Average One-Dependence Estimation, that produces calibrated probabilities of adaptation at mutations in the genome. SWIFr outperforms other machine-learning approaches in simulation, and successfully identifies well-known adaptive mutations, in addition to making other predictions in less well-studied genomic regions. A common issue in genome scans such as this one is that many predictions are difficult to interpret, since selection signals can be locally diffuse. To address this, we implement a hidden Markov model (HMM) to take advantage of the spatial structure along the genome, and use backtraces through the HMM to provide probabilities of adaptation at a region level that is agnostic to the ability to localize the signature of adaptation. The HMM framework further allows us to generate measures of uncertainty commonly absent in similar genome scans.
|