Abstract:
|
Population stratification is routinely accounted for in genome-wide association studies using principal components (PCs) of genetic ancestry informative markers (AIMs). It is not clear how best to account for population stratification in DNA methylation studies when genotype information is not available. We propose using gap hunting and population-specific minor allele frequencies to identify DNA methylation AIMs (mAIMs), CpG sites on the Illumina 450k or EPIC array where methylation readout is affected by an underlying genetic AIM. We used this process to identify mAIMs in several public datasets with racially and ethnically diverse participants, some with longitudinal DNA methylation available. We then computed PCs from these mAIMs to estimate ancestry. Compared to two existing DNA methylation ancestry estimation tools (EPISTRUCTURE and SeSAMe), our approach gives more consistent ancestry estimates within the same individual over time, captures the increased genetic diversity of admixed populations, and predicts self-reported race more accurately.
|