JSM 2013 Home
Online Program Home
My Program

Abstract Details

Activity Number: 180
Type: Contributed
Date/Time: Monday, August 5, 2013 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract - #309388
Title: SeqArray: An R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants
Author(s): Xiuwen Zheng*+
Companies:
Keywords: big data ; R ; parallel computing ; sequencing variants ; the 1000 Genomes Project ; CoreArray
Abstract:

In this big data era, thousands of gigabyte-size data sets are challenging scientists for data management using R environment. R is not typically optimized for high-performance computing necessary for large-scale genome-wide sequencing variant data. Here I introduce an R package "SeqArray" for data management of genome-wide variants, which utilizes the efficient data storage technique and parallel implementation of the C/C++ library "CoreArray". The 1000 Genomes Project released 39 million genetic variants for 1092 individuals, and a 26G data file was created by SeqArray to store sequencing variants with phasing information, where 2 bits were used as an primitive data type. The file size can be further reduced to 1.5G by compression algorithms without sacrificing access efficiency, since it has a large proportion of rare variants. The uniprocessor benchmark shows that calculating allele frequencies could be done in 5 minutes with the compressed data. SeqArray will be of great interest to scientists involved in data analyses of large-scale genomic data using R environment, particularly those with limited experience of low-level C programming and parallel computing.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2013 program




2013 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.