Abstract:
|
The phrase "Big Data" has greatly raised expectations of what we can learn about ourselves and the world in which we live or will live. It appears to have also boosted general trust in empirical findings, because it seems to be common sense that the more data, the more reliable are our results. Unfortunately, this common sense conception can be falsified mathematically even for methods such as the time-honored ordinary least squares regressions, and the issue does not go away even when one has infinite amount of data (Meng and Xie, 2014). Furthermore, whereas the size of data is a common indicator of the amount of information, what matters far more is the quality of data. A largely overlooked statistical identity, a potential candidate for the statistical counterpart to the beautiful Euler identity, reveals that trading quantity for quality in statistical estimation is a mathematically demonstrable doomed game (Meng, 2017). Without taking into account the data quality, Big Data can do more harm than good because of the drastically inflated precision assessment, and hence the gross overconfidence, which minimally can give us serious surprises when the reality unfolds, as illustrated by the 2016 US election.
|