JSM 2008 Online Program

Activity Number:	304
Type:	Topic Contributed
Date/Time:	Tuesday, August 5, 2008 : 2:00 PM to 3:50 PM
Sponsor:	Section on Statistical Computing
Abstract - #302244
Title:	High-Performance Processing of Large Data Sets via Memory Mapping: A Case Study in R And C++
Author(s):	Daniel Adler*+ and Jens Oehlschlägel and Oleg Nenadic and Walter Zuccini
Companies:	Georg-August University of Göttingen and Research Consultant and Georg-August University of Göttingen and Georg-August University of Göttingen
Address:	Platz der Göttinger Sieben 5, Göttingen, International, 37085, Germany
Keywords:	large dataset processing ; C++ ; R ; memory-mapping
Abstract:	We present the current status of a package (called 'ff') for processing large data sets that don't fit in memory. While database systems are effective for selecting subsets of complex-structured data, mass data processing in scientific contexts works on flat structures (such as vectors and matrices) whose simplicity can be exploited to enhance performance. For example mirroring regions of persistent storage into main memory (memory mapping) enables processing of the dataset in a transparent manner. We illustrate the above concepts with new R container types that mimic R vectors and matrices. In effect, these enable one to work on large data sets using familiar functions. The C++ framework allows one to specify new data types. Space-saving virtual storage modes, such as 1-bit logical or single-precision reals, are implemented.

The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2008 program


This is the preliminary program for the 2008 Joint Statistical Meetings in Denver, Colorado.
The views expressed here are those of the individual authors and not necessarily those of the ASA or its board, officers, or staff. Back to main JSM 2008 Program page