Abstract:
|
Advancements in high-throughput sequencing technologies have allowed for unprecedented insight into the vast amount of rare variants across the human genome. Recent work has shown that rare variants often have greater geographic clustering than common variants and may be a powerful resource to delineate fine-scale population structure patterns. Principal components analysis (PCA) with EIGENSTRAT (Price et al., 2006) has been the prevailing approach in recent years for population structure inference (and correction) with common variants from high-density single nucleotide polymorphism genotyping data. However, rare variants can cause EIGENSTRAT to fail, as the genetic relatedness matrix (GRM) used for the PCA becomes unstable. As an alternative, we propose the PCA-seq method, which uses a modified GRM that appropriately incorporates extremely low frequency variants for inference about population structure. Simulations demonstrate that PCA-seq substantially improves on EIGENSTRAT for inference with rare variants. Applied to the 1000 Genomes, PCA-seq identifies different population structure patterns with rare variants than common variants among European populations.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.