Online Program Home
My Program

Abstract Details

Activity Number: 644 - Statistical Computing on Parallel Architectures
Type: Invited
Date/Time: Thursday, August 2, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract #326937
Title: The pbdR Project: Distributed Computing with R
Author(s): Wei-Chen Chen* and Drew Schmidt and George Ostrouchov
Companies: FDA/CDRH and ORNL and Oak Ridge National Laboratory
Keywords: Big Data; High Performance Computing; Single Program Multiple Data; Message Passing Interface; Client-Server Interface
Abstract:

We introduce the Programming with Big Data in R (pbdR) project composed of several packages available at http://pbdr.org/ and on CRAN. The packages provide a broad parallel computing capability that spans multicore laptops through multi-node clusters to supercomputers. Our philosophy is to learn from the high performance computing and provide native R interfaces. The pbdR aims to bring R and statistical computing to supercomputer architectures where a combination of shared memory, distributed memory, and co-processor hardware is available.

The pbdR was initially developed for the message passing interface (MPI) environment. Later, it focused on using scalable numerical libraries (ScaLAPACK) enlarging R's capability on high performance computing systems. Subsequently, several statistical applications had been implemented and applied to treascale datasets. In addition to batch programming, we recently developed a client-server interface capable of interactive programming on distributed systems. By utilizing a asynchronous messaging library (ZeroMQ), interactive control of a distributed set of R sessions, cooperating in a single program multiple data (SPMD) fashion is possible.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program