JSM 2013 Home
Online Program Home
My Program

Abstract Details

Activity Number: 514
Type: Topic Contributed
Date/Time: Wednesday, August 7, 2013 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Consulting
Abstract - #309183
Title: Linear Regression on 1 Terabytes of Data? Some Crazy Observations
Author(s): Hesen Peng*+
Companies: Amazon.com
Keywords: big data ; data mining ; hadoop ; real time analytics
Abstract:

Recent explosion of internet scale data have posed novel challenge to statistical methodology research never discussed before. First, the sheer amount of data might provide abundant insight into complicated nonlinear association between multiple variables, which has been typically ignored in traditional small sample studies. Second, efficient computational algorithm is required to process these data, which are usually distributed among multiple instances or even data centers. In this talk we will start with the discussion of a simple bivariate linear regression with large sample size, and then precede to novel universal dependence discovery methods like Mira score . We will end touching upon the linear regression problem again and discuss how to solve it using Amazon Elastic Map Reduce.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2013 program




2013 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.