Abstract:
|
Genetic association studies of common and rare variants collect case and control samples matched by genetic ancestry. Alternatively, repositories contain genetic data from tens of thousands of potential control samples that could be utilized. Making use of these data is challenging for two reasons: due to issues of privacy, genotype data can not be shared directly; and yet the control data must be chosen so that it is comparable in genetic ancestry to the particular case sample. Our proposed approach, the Universal Control Repository Network (UNICORN), aims to provide allele frequency information that is optimally matched to the case sample without compromising privacy. We use spectral clustering to construct ancestry spaces as well as to perform projections. The base space and projected controls are then used to estimate the allele frequency surface over the ancestry space. To identify small-scale frequency variation while also borrowing strength from the entire data set we employ a combination of empirical Bayesian analysis across a hierarchical clustering of the controls and, for localized ancestry regions, a Gaussian process model of the minor allele frequency.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.