Abstract:
|
Text analytics is an important area of research and has wide applications in many areas of the economy, social media, web searching, document analysis, scientometrics, security, and defense. There are many tools available in R for text analytics, and this talk will give a brief overview of some of these, biased by my own interests in the subject. I will discuss simple word-histogram and latent semantic indexing approaches, topic models, and some of the natural language processing tools. This will be more illustrative than comprehensive; the number of packages relevant to text mining is vast and growing. I will show various methods for analyzing, visualizing and manipulating text and models of text, and will illustrate these on a collection of open source document sets. Issues of corpus size will be discussed, as well as issues that come up when dealing with specialized text, such as the short (and jargon rich, variable spelling) text in Twitter and other social media. This talk will provide a good starting point for anyone interested in text analytics with a background in the R programming language.
|