Abstract:
|
Frontier social science and evidence-based policy analyses increasingly rely on large-scale, naturally occurring data, such as administrative, transaction, and social media data. These data capture phenomena at higher frequency, lower cost, and greater timeliness than traditional methods. Using naturally occurring data for analytic purposes is not free, requiring management of governance and custody, processing, and linking to other data. Without methods for preservation and access, with appropriate provenance, naturally occurring data may be re-produced again and again, at high cost. The cost is not simply in dollars and time. There is significant cost to science, as replication is impossible. Naturally occurring data naturally changes. Analyses repeated on data without proper documentation, versioning, or provenance vary from one another for reasons having nothing to do with underlying science. The Inter-university Consortium for Social and Political Research has for over 55 years curated and disseminated social science data for re-use and replication. This paper presents steps ICPSR is taking to develop tools and protocols, including a new repository of data linkage algorithms.
|