Abstract:
|
We consider in this paper detection of signal regions associated with disease outcomes in whole genome association studies. The existing gene- or region-based methods test for the association of an outcome and the genetic variants in a pre-specified region, e.g., a gene. In view of massive inter-genetic regions in whole genome association studies, we propose a p-value scan statistic based method to detect the location and size of signal regions by scanning the genome continuously. The proposed method accounts for the correlation (linkage disequilibrium) among genetic variants, and allows for signal regions to have both causal and neutral variants. We performed simulation studies to evaluate the finite sample performance of the proposed method. Our simulation results showed that the proposed procedure outperforms the existing methods, especially when signal regions have causal variants whose effects are in different directions, or are contaminated with neutral variants, or the variants in signal regions are correlated. We applied the proposed method to analyze a whole genome sequencing data to identify the genetic regions that are associated with heart- and blood-related traits.
|