Abstract:
|
In sequencing association studies, the phenotype distribution frequently exhibits complex variation over geographical space. Since rare variants have typically arisen in the recent past and tend to cluster geographically, any rare variant particular to a region of elevated phenotypic mean will appear to be associated with the phenotype, regardless of its actual biological relevance. Such population stratification may not be corrected for by popular methods such as principal components (PCs) adjustment and linear mixed models (LMMs), and would yield spurious associations. Here, we propose to account for the spatial variation in phenotypic mean using natural thin plate spline based on the top two PCs. We show that the resulting smoother can be embedded in an LMM, the variance component of which allows for simultaneous adjustment for nonlinear spatial variation in phenotypic mean, broad-scale population structure and cryptic relatedness. We derive SNP-set association tests for this spline-embedded LMM and demonstrate through simulation studies that our method effectively controls for population stratication, and illustrate its application using the UK10K data set.
|