2013 Joint Statistical Meetings - Celebrating the International Year of Statistics

JSM 2013 Online Program

Online Program Home
My Program

Activity Number:	514
Type:	Topic Contributed
Date/Time:	Wednesday, August 7, 2013 : 10:30 AM to 12:20 PM
Sponsor:	Section on Statistical Consulting
Abstract - #309183
Title:	Linear Regression on 1 Terabytes of Data? Some Crazy Observations
Author(s):	Hesen Peng*+
Companies:	Amazon.com
Keywords:	big data ; data mining ; hadoop ; real time analytics
Abstract:	Recent explosion of internet scale data have posed novel challenge to statistical methodology research never discussed before. First, the sheer amount of data might provide abundant insight into complicated nonlinear association between multiple variables, which has been typically ignored in traditional small sample studies. Second, efficient computational algorithm is required to process these data, which are usually distributed among multiple instances or even data centers. In this talk we will start with the discussion of a simple bivariate linear regression with large sample size, and then precede to novel universal dependence discovery methods like Mira score . We will end touching upon the linear regression problem again and discuss how to solve it using Amazon Elastic Map Reduce.

Authors who are presenting talks have a * after their name.

2013 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.