Online Program

Return to main conference page
Saturday, May 19
Business Analytics
Sat, May 19, 10:30 AM - 12:00 PM
Lake Fairfax B

Identifying and Utilizing Research Topics in Conference Abstracts (304596)

*Stanislav Kolenikov, Abt Associates 
Alison Thaung, Abt Associates 

Keywords: text analysis, text mining, classification, semi-structured learning, hierarchical clustering, professional service

Combining conference abstracts into coherent sessions is one of the most burdensome tasks of an ASA section program chair. Text analytics using unsupervised learning can help finding abstracts that are most similar to each other in the TF-IDF space. Additional expert knowledge can come in the form of a research classification scheme, similar to Mathematical Subject Classification scheme where statistics occupies space "62[a-z]\d{2}", or Journal of Economic Literature, where econometrics occupies space "C1\d". Analyzing the human-produced clustering of the JSM abstracts of the Survey Research Methods Section (SRMS) of the American Statistical Association (2010-2017), we develop a hierarchical structure with broad topics in survey statistics as the first character of the classification code, and more detailed topics represented in the subsequent characters. The hierarchical structure of the classification code provides an implicit distance metric for abstracts that share (parts of) their classification codes: abstracts that share the full classification code are deemed closest to one another, while those that only share the stem are more distant from each other. In this approach, instead of analyzing the individual terms or bigrams, we first classify each abstract within the classification scheme, and then combine the abstracts together based on the distance in the research topic hierarchy. The process is demonstrated with the historic abstracts of SRMS, and applied to the 2018 submissions.