Online Program

Friday, February 19
CS03 Text Analytics Fri, Feb 19, 9:15 AM - 10:45 AM
Diamond I&II

Text Analysis with Survey Data (303163)

*Wendy Martinez, Bureau of Labor Statistics 

Keywords: document clustering and classification, exploratory data analysis, dimensionality reduction

Data consisting of unstructured and semi-structured text are typically not used in most analyses, but they can be a rich source of information and knowledge. This presentation will highlight some approaches from text analysis that can be used to exploit this type of data. There are several different objectives one might have when analyzing text. For example, we might need to assign topic labels or codes to our text, which is an example of supervised learning or classification. Or, perhaps we are interested in learning about the main content or themes in a set of documents or text fields. This is an example of unsupervised learning or clustering. In this talk, I will describe the main steps of text analysis and will provide some common approaches for each one. Examples of real applications such as coding of accident narratives and exploiting text for nonresponse analysis will be discussed. These examples will be implemented in R or MATLAB, and the code used to analyze the data will be given. The goal of this presentation is to help the audience learn about these methods and to provide examples that enable them to use the ideas in their own analyses.