Abstract:
|
Identification of rare variant associations is crucial to understanding the genetic contribution of complex traits and diseases. While individual studies are often small and thus have low power to detect rare variant associations, large publicly available datasets can be leveraged as external controls to increase power. However, these large datasets often differ in characteristics such as sample ascertainment, ancestry, and processing, which can lead to biased results if not accounted for appropriately. Here, we evaluate the performance of four rare variant association methods across a variety of simulation scenarios: SKAT-O, iECAT-O, ProxECAT, and ProxECAT v2. We use RAREsim to emulate the distribution of rare variants, functional annotation, and haplotype structure seen in real data. By identifying the optimal method(s) across a variety of simulation scenarios, we increase the utility of publicly available genetic resources for use as external controls. Our comprehensive simulation study along with best practice guidelines for incorporating external control data will aid in the discovery of new genetic associations.
|