Abstract:
|
Big data has been a very popular topic over the past several years. However, there seems to be a fair amount of confusion around it. We illustrate and discuss some of the aspects of big data that make it confusing and challenging to deal with. First of all, there is a long tail of data set sizes. Next, big data often mean big engineering because we need to collect, store and manage the data; there are various security issues. Data transformation (or variable creation or feature engineering) is an important aspect of using any data, but becomes even more important in the context of the big data because they are highly heterogeneous while most of our algorithms work with vector predictors. While there is a perception that big data are a solution to all one's problems (and this seems to be true to some degree for image and text data), we illustrate that this is often not so in other areas and it is often better to have the "right data" rather than "big data". In our experience, big data do not replace "problem understanding" aspect of data analysis, hence statisticians' role will remain high even though we see development to make big data tools more accessible to general public.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.