Abstract:
|
While the cost of sequencing genomes has decreased dramatically in recent years, this expense often remains non-trivial. Under a fixed budget, scientists face a natural trade off between quantity and quality: they can spend resources to sequence either more individuals or more accurately. Optimizing resource allocation promises to reveal as many new variations in the genome as possible, and thus as many new scientific insights as possible. We consider the setting where scientists have conducted a pilot study to reveal genomic variants and are contemplating a follow-up study. We introduce a Bayesian nonparametric methodology to predict the number of new variants in the follow-up study based on the pilot study. When experimental conditions are kept constant between the pilot and follow up, we show on real data from the gnomAD project that our prediction is more accurate than three recent proposals, and competitive with a more classic proposal. Unlike other methods, though, our method allows practitioners to change experimental conditions between the pilot and the follow-up, allowing more realistic predictions and optimal allocation of a fixed budget between quality and quantity.
|