Abstract:
|
As practicing statisticians and data scientists, we constantly face data challenges that are not strictly statistical. However, not everyone is aware of the fact that data management is an entire discipline in itself and much less of the effective industry data management and quality practices. For many statisticians and data scientists, exposure to the discipline of data management is quite limited; academic curricula for statistics and data science rarely includes the topic.
The objective of this talk is to frame data management from the lens of statisticians and data scientists, rather than explaining techniques or methods for data cleaning. We address what data management is, what we mean by data quality, how data quality should be managed, all in the context of the entire data lifecycle. By doing so, we aim to create greater awareness of the data management practices that are out of scope for statisticians and data scientists, as well as a greater understanding of our role in data management.
|