Online Program Home
My Program

Abstract Details

Activity Number: 257 - Contributed Poster Presentations: Section for Statistical Programmers and Analysts
Type: Contributed
Date/Time: Monday, July 29, 2019 : 2:00 PM to 3:50 PM
Sponsor: Section for Statistical Programmers and Analysts
Abstract #301774
Title: High-Performance Parallel Computing on a Cluster with R: a Tutorial
Author(s): Ann Marie Weideman* and Katie Rose Mollan
Companies: University of North Carolina at Chapel Hill and University of North Carolina Chapel Hill
Keywords: parallel computing; high-performance computing; cluster

While parallelization on a local machine can be accomplished using the R packages foreach and doParallel, the local machine is often not an appropriate environment for large-scale tasks. In this tutorial, we show the user how to use a cluster of two or more computers to create and process a swarm of independent jobs. Simply put, the user writes an R script with NULL parameters and a shell script that creates a list of command lines, each dependent on a set of parameters. The resulting file is executed by the batch operating system, and the command-line arguments are passed to R using the parseCommandArgs function in the batch package. A group of these commands will be run in parallel, and the resulting output will be transferred back to R and post-processed using base R. The tutorial example, which employs the central limit theorem and takes seconds to compute, was chosen so that the user can first run on a local machine before reproducing on a cluster. The user is expected to have access to a computing cluster and basic knowledge of secure file transfer and the command line. The tutorial is written using the bash shell and, as such, assumes that the target system is Unix-like.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program