JSM 2015 Preliminary Program

Online Program Home
My Program

Abstract Details

Activity Number: 11
Type: Invited
Date/Time: Sunday, August 9, 2015 : 2:00 PM to 3:50 PM
Sponsor: Council of Chapters
Abstract #314610 View Presentation
Title: The Truth Is Out There, but How Do We Dig It Out?
Author(s): Mikhail Traskin*
Companies: Amazon
Keywords: big data ; scalable machine learning
Abstract:

Big data has been a very popular topic over the past several years. However, there seems to be a fair amount of confusion around it. We illustrate and discuss some of the aspects of big data that make it confusing and challenging to deal with. First of all, there is a long tail of data set sizes. Next, big data often mean big engineering because we need to collect, store and manage the data; there are various security issues. Data transformation (or variable creation or feature engineering) is an important aspect of using any data, but becomes even more important in the context of the big data because they are highly heterogeneous while most of our algorithms work with vector predictors. While there is a perception that big data are a solution to all one's problems (and this seems to be true to some degree for image and text data), we illustrate that this is often not so in other areas and it is often better to have the "right data" rather than "big data". In our experience, big data do not replace "problem understanding" aspect of data analysis, hence statisticians' role will remain high even though we see development to make big data tools more accessible to general public.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2015 program





For program information, contact the JSM Registration Department or phone (888) 231-3473.

For Professional Development information, contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

2015 JSM Online Program Home