JSM 2016 Online Program

Activity Number:	581
Type:	Invited
Date/Time:	Wednesday, August 3, 2016 : 2:00 PM to 3:50 PM
Sponsor:	Section on Statistical Computing
Abstract #317995	View Presentation
Title:	Mining Text in R
Author(s):	David Marchette*
Companies:	Naval Surface Warfare Center
Keywords:	text analytics ; R ; topic models ; latent semantic indexing
Abstract:	Text analytics is an important area of research and has wide applications in many areas of the economy, social media, web searching, document analysis, scientometrics, security, and defense. There are many tools available in R for text analytics, and this talk will give a brief overview of some of these, biased by my own interests in the subject. I will discuss simple word-histogram and latent semantic indexing approaches, topic models, and some of the natural language processing tools. This will be more illustrative than comprehensive; the number of packages relevant to text mining is vast and growing. I will show various methods for analyzing, visualizing and manipulating text and models of text, and will illustrate these on a collection of open source document sets. Issues of corpus size will be discussed, as well as issues that come up when dealing with specialized text, such as the short (and jargon rich, variable spelling) text in Twitter and other social media. This talk will provide a good starting point for anyone interested in text analytics with a background in the R programming language.

Authors who are presenting talks have a * after their name.