Abstract:
|
As text analytics and topic analysis have grown in popularity among scholars in many academic fields, statisticians have expressed increasing concern about the robustness of the results. This research explores the stability of text analytics when used to describe the topics that emerge from an analysis of abstracts of statistical publications in the 21st century. We collect all abstracts from eleven leading statistical journals in the twenty years from 2000-2019 and apply correspondence analysis (CA) and factor analysis (FA), two techniques to extract salient topics included in popular software for text analytics. We begin with a bag of words approach. After preprocessing the data, we obtain a bigram by journal frequency matrix and cosine analysis of the journals. We consider sensitivity analysis of FA for cosine similarity and CA based upon bootstrapping. The within procedure results are relatively stable.
|