![IconGems-Print](images/IconGems-Print.png)
166 – Big Data: Modeling, Tools, Analytics, and Training
Espaliers: A Visualization Method for Big Data
Max Robinson
Institute for Systems Biology
Greg Eley
Scimentis LLC, Inova Translational Medicine Institute
Joseph G. Vockley
Inova Translational Medicine Institute
John E. Niederhuber
Inova Translational Medicine Institute
Gustavo Glusman
Institute for Systems Biology
As thousands of human genomes become available, there is pressing need for efficient and intuitive analysis and visualization methods. The genotypes observed in a set of genomes can be represented as a [genome x variant] matrix. Standard PCA-based visualizations of genotype matrices can reveal population structure, but give little insight into genetic admixture in individuals or the history of individual variants. We present Espaliers, a novel visualization of non-negative matrices, including genotype matrices. Given an ordering of the genomes (columns), we compute a position for each variant (row) that reflects the information in the genotype matrix. An Espalier plots each variant by this position and its population frequency (row sum), which is related to the variant's age. The resulting Espalier plot resembles a parsimonious evolutionary tree connecting the genomes, that is consistent with the input ordering of the genomes. We compare Espaliers with PCA, provide examples of Espaliers for Big Data sets from genomics and transcriptomics, and discuss potential future directions.