As thousands of human genomes become available, there is pressing need for efficient and intuitive analysis and visualization methods. The genotypes observed in a set of genomes can be represented as a [genome x variant] matrix. Standard PCA-based visualizations of genotype matrices can reveal population structure, but give little insight into genetic admixture in individuals or the history of individual variants.
We present Espaliers, a novel visualization of non-negative matrices, including genotype matrices. Given an ordering of the genomes (columns), we compute a position for each variant (row) that reflects the information in the genotype matrix. An Espalier plots each variant by this position and its population frequency (row sum), which is related to the variant's age. The resulting Espalier plot resembles a parsimonious evolutionary tree connecting the genomes, that is consistent with the input ordering of the genomes.
We compare Espaliers with PCA, provide examples of Espaliers for Big Data sets from genomics and transcriptomics, and discuss potential future directions.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.