Abstract:
|
In 2001, Bill Cleveland published ``Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.'' The plan was 6 pages long, and well ahead of its time. The subsequent 15 years have seen birth of data science as a field. The rise of data science was partly driven by the contemporaneous rise of ``big data,'' the perceived need to analyze (either directly by a human, or in an automated way by machines) very large automatically collected data sets. Bill has championed the ``divide and recombine'' strategy of handling big data problems, embodied by the modeling and visualization software Tessera. In this talk I will briefly review Cleveland's 2001 ``Action Plan,'' emphasize the distinction between data science and big data, and discuss the advantages of the divide-and-recombine strategy for very large data problems.
|