Online Program

Return to main conference page

All Times EDT

Friday, June 5
Practice and Applications
Practice and Applications 3
Fri, Jun 5, 1:25 PM - 3:00 PM
TBD
 

Statistical Inference of Adaptive Mutations and Genes from Worldwide Genome Sequences (308367)

Sohini Ramachandran, Brown University 
*Lauren Alpert Sugden, Duquesne University 

Keywords: Classification, Hidden Markov Model, Population Genetics

Throughout human history, our species has encountered a vast range of climate conditions, fought off novel pathogens, and adapted myriad lifestyles and diets. This intimate interaction with our environment molded our genomes through the generations via natural selection, leaving a written -- if cryptic -- record in the genomes of present-day individuals around the world. The wealth of genomic data we now have presents a promising opportunity to decrypt this record in order to infer our evolutionary past. I describe here two statistical inference frameworks toward that end, with the goal of identifying adaptive mutations and genes in the human lineage.

I will first describe a published classification method that uses Average One-Dependence Estimation to produce calibrated probabilities of adaptation at every mutation in the genome. This supervised classifier, called SWIFr, is trained on demographic data for a range of evolutionary parameters, and shows strong performance in simulated testing data. In application to human data, our classifier identifies well-known adaptive mutations, and provides many other predictions in less well-studied genomic regions. An application to genome data from the ‡Khomani San of southern Africa found an enrichment of adaptation in genes associated with energy storage and metabolism, suggesting that fat storage has played an important role in the evolution of this hunter-gatherer population.

Second, I will describe ongoing work in which we use a hidden Markov model to reduce the noise of the site-by-site predictions returned by SWIFr and other methods. This approach has the added benefit of providing a way to calculate measures of uncertainty about the specific locations of adaptive mutations, as well as region-level probabilities of selection. We accomplish this using a stochastic backtrace to sample thousands of probabilistically representative state paths.