Activity Number: 50 - Which Sessions Should This Go To? Text Analytics to the Rescue of Conference Committees
Type: Invited
Date/Time: Sunday, July 29, 2018 : 4:00 PM to 5:50 PM
Sponsor: Section on Statistical Computing
Title: Creating a Taxonomy of Statistical Methods Using Text Analysis
Author(s): Wendy L Martinez*
Companies: Bureau of Labor Statistics
Keywords: Clustering; Documents; Taxonomy; Text Analysis

The United Nations Economic Commission for Europe (UNECE) holds an annual workshop on Statistical Data Editing with a focus on official surveys. The 2017 workshop organizers formed subgroups who were tasked to come up with ideas to foster the implementation of good practices and international collaboration among the statistical offices of member countries. One proposal from the subgroups was to conduct a classification of existing methods for data editing and imputation based on papers presented in previous UNECE work sessions on data editing. Another idea was to create an indexed and searchable inventory of these papers using a taxonomy. This presentation describes research addressing the first idea - to construct a taxonomy of topics addressed by the UNECE data editing group. To do this, I downloaded all papers from the annual work sessions, converted them to machine readable format, and applied text analysis approaches to create a taxonomy based on the papers. This presentation will describe the process and tools used to create the taxonomy, so attendees can apply these same ideas to their document collections.

