Abstract:
|
We consider in this paper detection of signal regions associated with disease outcomes in whole genome sequencing association studies. In view of massive inter-genetic regions in whole genome association studies, we propose a quadratic scan statistic based method to detect the existence and the locations of signal regions by scanning the genome continuously. The proposed method accounts for the correlation among genetic variants, and allows for signal regions to have both causal and neutral variants, and causal variants whose effects can be in different directions. We derived an asymptotic threshold that controls for the family-wise error rate, and show that under regularity conditions the proposed method consistently selects the true signal regions. Our simulation results showed that the proposed procedure outperforms the existing methods, especially when signal regions have causal variants whose effects are in different directions or are contaminated with neutral variants, or the variants in signal regions are correlated. We applied the proposed method to analyze a lung cancer GWAS to identify the genetic regions that are associated with lung cancer risk.
|