Abstract:
|
In expression quantitative trait loci (eQTL) studies, researchers are interested in identifying associations between genetic variants and gene expression across individuals, in an effort to determine the extent to which genetic diversity underlies diversity in phenotypes. It is common practice in eQTL analyses to attempt to control for technical sources of variation in gene expression measurements. Without accounting for these extra sources of variation, the true effect of the genetic variants on expression of genes can be diminished. Existing methods, including statistical models PEER/VBQTL and EMMAX, as well as more heuristic procedures based on principal components, attempt to identify and remove large-scale structure in gene expression measurements that represents technical variation. However, there is little discussion or consensus as to the procedures for determining optimal normalization or how much variation should be removed, in terms of the number of latent factors or principal components. We analyze several eQTL datasets and simulated data using existing methods to determine how varying the number of factors removed affects sensitivity and error rate control.
|