Abstract:
|
Estimating more distant familial relationships from dense genotype data is particularly challenging in samples from ethnically complex populations like recently admixed Latinos or African Americans. With the growing emphasis on such populations in genetic studies, producing relatedness or kinship coefficient estimates that are robust to different types of population structure is increasingly important. We evaluate existing approaches for inferring familial relatedness from dense genetic markers that can accommodate complex population structure. Starting from 1000 Genomes Phase 3 Latino samples we construct simulated Latino dataset with allele frequencies and linkage disequilibrium patterns similar to that of contemporary Latino populations after only one additional generation of random mating. The 3500 samples are with complex population structure, including recent admixture, distinct sub-populations, and distant but known familial-like relatedness. Several recent relationship inference methods are compared on this challenging dataset starting from the 25,235,037 markers with minor allele frequency of at least 0.01.
|