Abstract:
|
Phylodynamic analysis infers changes in population size from genetic sequence data sampled from individuals across a particular population. One approach to accomplish this uses a model called the coalescent, which relates the individuals' shared genealogy to the effective size of their population. However, when sampling individuals at different times, current techniques assume that sampling times are fixed ahead of time or are distributed randomly without any relationship to the size of the population. Through simulation, we show that when sampling times are related to population size (preferential sampling) those estimation methods may be systematically biased. To address this problem, we propose new methods that explicitly model the sampling times, potentially with a relationship to effective population size, resolving the misspecification. We also incorporate optional time-varying covariates into the sampling model. We implement an integrated nested Laplace approximation-based method for computational efficiency, as well as Markov chain Monte Carlo-based methods, with an eye on relaxing our current assumption of a fixed genealogy relating the sampled individuals.
|