Online Program Home
  My Program

Abstract Details

Activity Number: 303 - Big Data
Type: Contributed
Date/Time: Tuesday, August 1, 2017 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Computing
Abstract #324990
Title: What's in a Vector? Major Improvements on the Horizon for R and What They Mean for You
Author(s): Gabriel Becker* and Luke Tierney
Companies: Genentech Research and University of Iowa
Keywords: computing ; R ; rstats ; big data ; efficiency ; statistical computing
Abstract:

Vectors are at the core of everything that R does. We have created a new extensible framework for defining custom implementations of atomic R vectors compatible with base R and package code. The implications of this system and our applications of it within R itself are vast, for analysts and package developers alike. These include: nigh-instantaneously importing massive vectors from in-memory column stores, subsetting and reordering data.frames instantly with no data duplication, vectors with dramatically reduced memory footprints, vectors which know whether they are sorted or have NAs, improvements to R's sorting and matching capabilities which make use of them, and more. I will show examples of improvements in the performance of normal R code, as well as how to use the framework to help improve R even more from base or package code. These changes are currently slated to go live in R-devel by Summer 2017 and is slated to be included in the 2018 release of R.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association