Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 319 - SLDS CSpeed 6
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 3:30 PM to 5:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #319138
Title: Stability of Text Analytics and Topic Analysis: A Deeper Look at Popular Methods
Author(s): Mary Milam Whiteside and Mark E Eakin*
Companies: The University of Texas at Arlington and The University of Texas at Arlington
Keywords: Bootstrapping; Correspondence Analysis; Bigrams; Factor Analysis; Cosine Similarity; Robustness

As text analytics and topic analysis have grown in popularity among scholars in many academic fields, statisticians have expressed increasing concern about the robustness of the results. This research explores the stability of text analytics when used to describe the topics that emerge from an analysis of abstracts of statistical publications in the 21st century. We collect all abstracts from eleven leading statistical journals in the twenty years from 2000-2019 and apply correspondence analysis (CA) and factor analysis (FA), two techniques to extract salient topics included in popular software for text analytics. We begin with a bag of words approach. After preprocessing the data, we obtain a bigram by journal frequency matrix and cosine analysis of the journals. We consider sensitivity analysis of FA for cosine similarity and CA based upon bootstrapping. The within procedure results are relatively stable.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program