JSM 2012 Home

JSM 2012 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Online Program Home

Abstract Details

Activity Number: 175
Type: Contributed
Date/Time: Monday, July 30, 2012 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract - #304511
Title: Visualization of Big Data Sets Using Package Bigvis
Author(s): Yue Hu*+ and Hadley Wickham
Companies: Rice University and Rice University
Address: 1717 Bissonnet St, Houston, TX, 77005, United States
Keywords: data visualization ; large data sets
Abstract:

Visualizing large data sets in R is a challenge. The problem of overplotting arises when using scatterplots to explore the correlation among variables or to identify clusters in the data, and there are memory and speed limitations of R. This talk proposes new methods of displaying large data sets and provides fast methods for binning and smoothing the data. Instead of visualizing each individual point, we visualize the densities, which can be efficiently obtained by binning and smoothing in any number of dimensions.

Our tool works with both base R data frames data and Revolution's proprietary XDF format. Revolution's RevoScaleR is a commercial product built on top of R that provides fast statistical analysis for data too big to fit into memory. Our preliminary works allow us to do exploratory analysis of data sets of up to 100 million observations, each plot taking under one minute to produce.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2012 program




2012 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.