Abstract:
|
The mainstream of research in genetics data analysis focuses on statistical association, but the signals identified by association analysis can only explain a small proportion of the heritability of complex diseases. Finding causal SNPs only by searching the set of associated SNPs may miss many causal variants. To shift the current paradigm of genetic analysis from association analysis to causal discovery, we develop novel causal inference methods for genome-wide causal studies (GWCS). Large simulation studies show satisfactory Type I error rates and high power of the proposed method in four scenarios: no association and no causation, having association but no causation, no association but having causation, having both association and causation. Excitingly, linkage disequilibrium has little impact on identification of causal SNPs.
The proposal methods have been applied to CATIE-MGS-SWD schizophrenia study dataset with 8,421,111 common SNPs typed in 13,557 individuals for GWCS of schizophrenia. At the significance level of ?10?^(-6), 245 SNPs show causation. Among them, 62 causal SNPs can be confirmed from the literature and four of them are on the typical 108 schizophrenia-associated genetic loci (Nature, 511 (2014), pp. 421-427). We also conduct GWAS for this dataset. A total of 5,917 SNPs are associated with SCH at the significance level of ?10?^(-6) and only 58 of them show causation.
|