Abstract:
|
Within statistical genetics, there is an ever-increasing availability of publicly available summary level data such as allele or genotype frequencies. This information has the potential to be used in case-control association tests to increase the sample size and possibly power of a study and can allow resources to be focused on sequencing cases. However, differences between the internal and external samples can cause considerable confounding. Here, we compare two methods that use frequency level data from external controls while adjusting for batch effects: Integrating External Controls into Association Test (iECAT) and Proxy External Controls Association Test (ProxECAT). We compare the type I error and power of these methods to a standard case-control test across a variety of simulation scenarios where we change parameters such as sample size, confounding effects, and genetic region. Combined with current and projected sequencing costs, we estimate and compare the costs and power of study designs with various sizes of internal and external control samples. This guidance is especially important for studies with limited resources.
|