JSM Preliminary Online Program
This is the preliminary program for the 2009 Joint Statistical Meetings in Washington, DC.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2009 Program page




Activity Number: 47
Type: Invited
Date/Time: Sunday, August 2, 2009 : 4:00 PM to 5:50 PM
Sponsor: Section on Statistical Graphics
Abstract - #303174
Title: Distributed Computing and the Visualization of Huge Data Sets
Author(s): Lee Edlefsen*+
Companies: REvolution Computing, Inc.
Address: 1100 Dexter Avenue N, Seattle, WA, 98109,
Keywords: visualization ; distributed computing ; parallel computing ; huge data sets ; external memory algorithms ; Trellis plots
Abstract:

The increasing availability of huge data sets opens exciting new possibilities for data analysis. Variables and relationships can be visualized in much greater detail, and assumptions required in the analysis of smaller samples can be relaxed or eliminated. However, it also presents computational and conceptual challenges, since tools and techniques designed for smaller data sets often do not scale well. This paper discusses an approach that allows "external memory" statistical and data mining algorithms to be distributed automatically and efficiently across computers and across processors. It demonstrates an implementation of this approach, and shows how it can be used to rapidly compute detailed summaries as well as large-scale linear and logistic regression models that can be used to visualize huge data sets using Trellis plots in R.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2009 program


JSM 2009 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised September, 2008