Abstract:
|
In genetics, the trait heritability represents the proportion of variance of a phenotype that can be explained by genetic factors. Recently, there has been an increased interest on the estimation of genomic heritability, that is the proportion of variance of a trait or in disease risk that can be explained by regression on large sets of molecular markers (e.g., SNPs). The debate about the methodology has been largely based on results from simulation studies which can produce, depending on the simulation settings, from nearly unbiased to seriously biased estimators. The recent availability of very large biomedical datasets present numerous opportunities for assessing the sampling properties of REML estimates. In this study we use real data from UK-Biobank (N~100K, K=1000) to investigate the effects of sample size and model complexity (number of SNPs, from 5K to ~600K) on estimates of genomic heritability using human height as an example trait. We use recursive partitioning of the training data and show that the average estimator of the genomic heritability decreases with sample size; we conclude that the popular REML estimates of genomic heritability can be seriously biased.
|