|
Activity Number:
|
47
|
|
Type:
|
Invited
|
|
Date/Time:
|
Sunday, August 2, 2009 : 4:00 PM to 5:50 PM
|
|
Sponsor:
|
Section on Statistical Graphics
|
| Abstract - #303174 |
|
Title:
|
Distributed Computing and the Visualization of Huge Data Sets
|
|
Author(s):
|
Lee Edlefsen*+
|
|
Companies:
|
REvolution Computing, Inc.
|
|
Address:
|
1100 Dexter Avenue N, Seattle, WA, 98109,
|
|
Keywords:
|
visualization ; distributed computing ; parallel computing ; huge data sets ; external memory algorithms ; Trellis plots
|
|
Abstract:
|
The increasing availability of huge data sets opens exciting new possibilities for data analysis. Variables and relationships can be visualized in much greater detail, and assumptions required in the analysis of smaller samples can be relaxed or eliminated. However, it also presents computational and conceptual challenges, since tools and techniques designed for smaller data sets often do not scale well. This paper discusses an approach that allows "external memory" statistical and data mining algorithms to be distributed automatically and efficiently across computers and across processors. It demonstrates an implementation of this approach, and shows how it can be used to rapidly compute detailed summaries as well as large-scale linear and logistic regression models that can be used to visualize huge data sets using Trellis plots in R.
|
- The address information is for the authors that have a + after their name.
- Authors who are presenting talks have a * after their name.
Back to the full JSM 2009 program |