Online Program Home
My Program

Abstract Details

Activity Number: 223 - Annals of Applied Statistics (AOAS) Lecture
Type: Invited
Date/Time: Monday, July 30, 2018 : 2:00 PM to 3:50 PM
Sponsor: IMS
Abstract #333127
Title: Statistical Paradises and Paradoxes in Big Data (I): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election
Author(s): Xiao-Li Meng*
Companies: Harvard University
Keywords:
Abstract:

A 5-elment identity shows that for any dataset of size n, probabilistic or not, the difference between the sample and population averages is the product of three measures: (1) data quality, (2) data quantity, and (3) problem difficulty. This decomposition tells us: (I) Probabilistic sampling ensures high data quality by controlling a data defect index at the level of 1/vN, where N is the population size; (II) When we lose this control, the estimation error, relative to the benchmarking rate 1/vn, increases with vN, forming the Law of Large Populations; (III) The "bigness'' of Big Data (for population inferences) should be measured by the relative size n/N, not the absolute size n; (IV) When combining data sources for population inferences, those relatively tiny but higher quality ones should be given far more weights than suggested by their sizes. An application to 2016 election reminds us that, when we ignore data quality, population inferences with Big Data are subject to a Big Data Paradox: the more the data, the surer we fool ourselves. The identity also reveals the possibility of enhancing simultaneously data privacy and data quality for non-probabilistic samples.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program