23 – Statistical and Bioinformatical Innovations in Genomics Research in the VA Cooperative Studies Program
Capabilities and Analytical Challenges in Utilizing Mega Genomics Cohort in the Veterans Healthcare System
Kelly Cho
MAVERIC, Veterans Administration Boston Healthcare System
David Gagnon
Harvard Medical School
J. Michael Gaziano
MAVERIC, Veterans Administration Boston Healthcare System
Hongsheng Wu
MAVERIC, Veterans Administration Boston Healthcare System
VA launched the Million Veteran Program, a nationwide genomics resource, which has over 95,000 Veterans enrolled since 2011. This provides a promising opportunity to investigate the connection between VA's longitudinal EMR and genomics data. Our understanding will highly depend on the analytical approaches used to analyze mega genomic resources. Current rapid advancement in tools to collect and extract information from genomics data, such as in GWAS, microarray or proteomics and sequence data, highlights the importance in high dimensional data analysis, including variable selection, multiple testing issues, handling, storage, and computational efficiency. Traditional statistical procedures present eminent challenges in using these data, where the number of parameters p is scalably larger than number of observations n. In addition, mega genomics data present a complex relational data structure when interactions and dynamic underlying biological complexities are considered, resulting in ultra-high dimensionality. Further research in statistical accuracy and inference, model interpretability and fitting and computational efficiency and robustness will play a critical role.