The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
            
             
                    Online Program Home
            
            
             
            
        
	Abstract Details
	
	
		
			| 
				
					
						| Activity Number: | 486 |  
						| Type: | Invited |  
						| Date/Time: | Wednesday, August 1, 2012 : 10:30 AM to 12:20 PM |  
						| Sponsor: | Section on Statistical Computing |  
						| Abstract - #303783 |  
						| Title: | Scaling R to Internet Scale Data |  
					| Author(s): | Karl Millar*+ and Murray Stokely |  
					| Companies: | Google and Google |  
					| Address: | 1600 AMPHITHEATRE PKWY, Mountain View, CA, 94043, United States |  
					| Keywords: | Statistical Computing ; 
							R ; 
							parallelism ; 
							MapReduce |  
					| Abstract: | 
							Analyzing internet-scale data sets requires statistical software that scales to data sizes several orders of magnitude larger than R is currently capable of handling. Tools such as MapReduce and Hadoop are capable of scaling to such large data sizes but are impractical for statisticians to use for data analysis.     
We will discuss the overall design and API of packages designed to work over Google's distributed computing architecture that help to address these issues.     
The foundation of this work is a package that provides an R version of the FlumeJava library of Chambers et al. for distributed computation. This package provides a higher-level abstraction on top of Google's MapReduce framework, providing distributed generic collection classes and simple functions for manipulating them, which are automatically converted into an optimized sequence of MapReduces.     
Building on top of this functionality, Google is building additional packages that provide distributed implementations of common statistical algorithms and tools for data analysis on large data sets.     
						 |  
 
					The address information is for the authors that have a + after their name.Authors who are presenting talks have a * after their name.
 
					Back to the full JSM 2012 program
				 | 
	
	
	
	
		
		2012 JSM Online Program Home
		
		For information, contact jsm@amstat.org or phone (888) 231-3473. 
		
		If you have questions about the Continuing Education program, please contact the Education Department.