Abstract:
|
The multi-dimensional characteristics of "Big Data" are defined as data size, incompleteness, incongruency, complex representation, multiscale nature, and heterogeneity of its sources. Big Data is effectively a messy collage of fragmented "conventional data" representing alternative views of the same complex natural process inspected through a multispectral prism. There are many statistical challenges associated with interpreting Big Data (e.g., its sparse and discordant format, designing robust data-representation/modeling strategies, error estimation). We will discuss several examples of high-throughput data analytics and model-free Inference and explore principles of distribution-free and model-agnostic methods for scientific inference based on Big Data sets. Compressive Big Data analytics (CBDA) is an idea for iteratively generating random (sub)samples from the Big Data collection and using classical techniques to develop model-based or non-parametric inference. CBDA repeats the (re)sampling and inference steps many times, and uses bootstrapping techniques to quantify probabilities, estimate likelihoods, or assess accuracy of findings. (Session link http://goo.gl/75FygQ
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.