Abstract:
|
Prohibitively large sample sizes are needed for adequately powered rare variant association tests in genetics. Large common control databases are a potential solution. Approaches, such as UNICORN (2016) and SCoRe (2018), provide ancestry matching strategies for homogeneous case samples and common controls. Methods, such as ProxECAT (2018) and iECAT-O (2017), correct for batch effects between internal cases and common controls. However, the optimal strategy for rare variant analysis given a case sample with heterogeneous fine-scale ancestry is unclear. We present a robust method to use common controls with a case sample from multiple ancestries. We provide a genetic distance threshold for when this method is necessary. We evaluate type 1 error and power, finding a better controlled distribution of test statistics for our method compared to current methods that do not explicitly adjust for ancestry. By controlling confounding due to ancestral differences, our method allows for larger case samples collected across multiple ancestries. Thus, power is increased while controlling for type 1 error, reducing false positives and focusing future resources on signals more likely to be true.
|