Abstract Details
Activity Number:
|
514
|
Type:
|
Topic Contributed
|
Date/Time:
|
Wednesday, August 7, 2013 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Section on Statistical Consulting
|
Abstract - #309183 |
Title:
|
Linear Regression on 1 Terabytes of Data? Some Crazy Observations
|
Author(s):
|
Hesen Peng*+
|
Companies:
|
Amazon.com
|
Keywords:
|
big data ;
data mining ;
hadoop ;
real time analytics
|
Abstract:
|
Recent explosion of internet scale data have posed novel challenge to statistical methodology research never discussed before. First, the sheer amount of data might provide abundant insight into complicated nonlinear association between multiple variables, which has been typically ignored in traditional small sample studies. Second, efficient computational algorithm is required to process these data, which are usually distributed among multiple instances or even data centers. In this talk we will start with the discussion of a simple bivariate linear regression with large sample size, and then precede to novel universal dependence discovery methods like Mira score . We will end touching upon the linear regression problem again and discuss how to solve it using Amazon Elastic Map Reduce.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2013 program
|
2013 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Continuing Education program, please contact the Education Department.
The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Copyright © American Statistical Association.