Abstract:
|
Typical linkage disequilbrium (LD) estimators are attenuated toward zero in the presence of genotype uncertainty, masking dependence between loci. This attenuation effect is particularly strong in polyploid data, where genotype uncertainty is more prevalent. Previous approaches to adjust for this bias include maximum likelihood approaches and moment-based approaches using posterior genotypes. The maximum likelihood approaches are prohibitively slow, limiting their applicability to modern large-scale datasets. The approaches that use posterior genotypes, though scalable, require the researcher to have used genotyping methods that adaptively estimate the genotype prior with a sufficiently large sample size. Unfortunately, large sample sizes are not a guarantee in applied studies, and many popular genotyping programs, such as the often used Genome Analysis Toolkit, are not fully adaptive. Here, we present scalable approaches that only require genotype likelihoods, obviating the need for adaptive genotyping. We show that our new methods are as accurate as previous approaches for LD estimation, opening up scalable bias-corrected LD estimates to a wider range of applied researchers.
|