Abstract:
|
In genome-wide association studies (GWAS), a phenotype/response is related to a large number of Single Nucleotide Polymorphisms (SNP), in our case about 1,000,000 of them. The goal is to estimate the marginal distribution of the effect sizes of the regression of the phenotype on the individual SNP. This will help in understanding the sample sizes to explain most of the hereditability of individual phenotypes. With a very small amount of insight, this can be seen to be a classical measurement error density deconvolution problem with multiple constraints that have not been previously considered: (a) the measurement error is heteroscedastic; (b) the true signal is symmetric and unimodal; and (c) the true signal consists of mostly data that are close to zero (most SNP do not affect response) and some data that are actually signals. We develop a computationally efficient Bayesian methodology to handle this problem, and show its vast superiority to current measurement error density deconvolution methods, which take none of the important features into account.
|