Abstract:
|
The computational simplicity of principal component analysis (PCA) makes it a widely used method for population stratification adjustment. However, given that categorical nature of genotype data, it is not appropriate to directly apply PCA, designed specifically for continuous variables, on genotype data. In addition, although common variants have been extensively studied, little is known about the stratification of rare variants and its impact on association tests. The fact that rare variants are not stratified in the same way as common variants necessitates the development of statistical methods that can capture stratification patterns for low-frequency and rare variants. To address these limitations, we investigate performances of categorical PCA and similarity-matrix based PCA which might be able to detect underlying structures for rare variants. We demonstrate, through simulated and real data sets, that similarity-matrix based PCA is able to adjust for population stratification in rare variants much more effectively than does standard and categorical PCA.
|