JSM Preliminary Online Program
This is the preliminary program for the 2009 Joint Statistical Meetings in Washington, DC.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2009 Program page




Activity Number: 245
Type: Invited
Date/Time: Tuesday, August 4, 2009 : 8:30 AM to 10:20 AM
Sponsor: Section on Physical and Engineering Sciences
Abstract - #302931
Title: Reliability in Supercomputing: A Million Processors Cooperating to Solve One Problem
Author(s): George Ostrouchov*+ and Thomas J. Naughton, III and Stephen L. Scott
Companies: Oak Ridge National Laboratory and Oak Ridge National Laboratory and Oak Ridge National Laboratory
Address: P.O. Box 2008, Oak Ridge, TN, 37831,
Keywords: high performance computing ; parallel computing ; hardware
Abstract:

The world's largest supercomputers currently have hundreds of thousands of processing cores and this will soon surpass a million. When we begin to count other components such as disk, I/O support, memory, bus, etc. we are already at a million before considering the software components. Combining very large numbers of individually highly reliable components can result in something surprisingly unreliable if reliability is not addressed. This talk will describe some of the current supercomputers, their emerging reliability issues, and how they are being addressed. We will include some of our work. This is an area that still has many more questions than answers and is one where solutions will have components based on statistical methods and ideas.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2009 program


JSM 2009 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised September, 2008