Abstract:
|
The recent availability of large genomic studies, with tens of thousands of observations, opens up the intriguing possibility to investigate and understand the effect of rare genetic variants in biological human evolution as well as their impact in the development of rare diseases. To do so, it is imperative to develop a statistical framework to assess what fraction of the overall variation present in human genome is not yet captured by available data sets. In this work, we introduce a novel and rigorous methodology to estimate how many new genetic variants are yet to be observed in the context of genomic projects using a nonparametric Bayesian approach. We show how to use the estimator in the context of optimal experimental design, in which, under a fixed budget, one needs to specify the optimal trade off between number of individuals sequenced and sequencing depth in order to maximize the expected number of genetic variants discovered.
|