JSM 2012 Home

JSM 2012 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Online Program Home

Abstract Details

Activity Number: 174
Type: Contributed
Date/Time: Monday, July 30, 2012 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract - #306652
Title: Parallel External Memory Algorithms Applied to Generalized Linear Models
Author(s): Lee Edlefsen*+
Companies: Revolution Analytics
Address: 4309 Eastern Ave N, Seattle, WA, 98103, United States
Keywords: parallel external memory algorithms ; generalized linear models ; parallel computing ; high performance analytics ; statistical computing

For the past several decades the rising tide of technology has allowed the same data analysis code to handle the increase in sizes of typical data sets. That era is ending. The size of data sets is increasing much more rapidly than the speed of single cores, of RAM, and of hard drives. To deal with this, statistical software must be able to use multiple cores and computers. Parallel external memory algorithms (PEMA's) provide the foundation for such software. External memory algorithms (EMA's) are those that do not require all data to be in RAM, and are widely available. Parallel implementations of EMA's allow them to run on multiple cores and computers, and to process unlimited rows of data. This paper describes a general approach to efficiently parallelizing EMA's, using an R and C++ implementation of GLM as a detailed example. It examines the requirements for efficient PEMA's; the arrangement of code for automatic parallelization; efficient threading; and efficient inter-process communication. It includes billion row benchmarks showing linear scaling with rows and nodes, and demonstrating that extremely high performance is achievable.

The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2012 program

2012 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.