Abstract:
|
Powerful analysis of gene expression data is hampered by the presence of hidden confounders and other unknown variates. Approaches for discovering these confounders, such as principal components analysis (PCA) assume samples are independent. This assumption, however, is violated when there is polygenicity and the sample has some non-zero level of population structure. Applying PCA, or PCA-based methods, in these samples results in the estimated unknown variates to be a mixture of true hidden variates and genetic effects. Here, I apply PCA to an expression data set from an isolated population and find the first 200 PCs to have substantial heritability, and show that using these PCs as covariates can substantially reduce the estimate of heritability of the expression traits. That is, genetic signal is being removed from expression. I also show how my new method does not suffer from this problem. Using simulations I study how the using the different approaches affects estimates of eQTL effect size, type 1 error and power.
|