Abstract:
|
Genome-Wide Association studies with modern-day Biobanks comprise data on hundreds of thousands of samples and millions of genomic markers linked to extensive phenotypes. Marker density and recombination rates vary throughout the genome, leading to complex linkage disequilibrium patterns of SNPs. Variable selection and inferences in such problems are challenging because collinearity reduces the power to identify individual variants associated with a phenotype. Therefore, we focus on developing efficient multi-resolution Bayesian feature selection methods that identify sets of variants confidently associated with a phenotype and provide powerful inferences with accurate FDR control and fine-mapping resolution. In this study, we: (i) present a multi-resolution Bayesian inference procedure, (ii) propose an algorithm to directly control the discovery set FDR, (iii) justify, theoretically, that the proposed algorithm provides adequate FDR control, (iv) compare the power-FDR and mapping precision performance of the proposed method with that of existing methods using simulations, and (v) use the methods developed for fine-mapping of complex traits using data from the UK-Biobank.
|