Abstract Details
Activity Number:
|
180
|
Type:
|
Contributed
|
Date/Time:
|
Monday, August 5, 2013 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Section on Statistical Computing
|
Abstract - #309388 |
Title:
|
SeqArray: An R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants
|
Author(s):
|
Xiuwen Zheng*+
|
Companies:
|
|
Keywords:
|
big data ;
R ;
parallel computing ;
sequencing variants ;
the 1000 Genomes Project ;
CoreArray
|
Abstract:
|
In this big data era, thousands of gigabyte-size data sets are challenging scientists for data management using R environment. R is not typically optimized for high-performance computing necessary for large-scale genome-wide sequencing variant data. Here I introduce an R package "SeqArray" for data management of genome-wide variants, which utilizes the efficient data storage technique and parallel implementation of the C/C++ library "CoreArray". The 1000 Genomes Project released 39 million genetic variants for 1092 individuals, and a 26G data file was created by SeqArray to store sequencing variants with phasing information, where 2 bits were used as an primitive data type. The file size can be further reduced to 1.5G by compression algorithms without sacrificing access efficiency, since it has a large proportion of rare variants. The uniprocessor benchmark shows that calculating allele frequencies could be done in 5 minutes with the compressed data. SeqArray will be of great interest to scientists involved in data analyses of large-scale genomic data using R environment, particularly those with limited experience of low-level C programming and parallel computing.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2013 program
|
2013 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Continuing Education program, please contact the Education Department.
The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Copyright © American Statistical Association.