Abstract:
|
Combining conference abstracts into coherent sessions is one of the most burdensome tasks of an ASA section program chair. Pan, Zou and Yu applied an unsupervised learning to find abstracts that are most similar to each other in the TF-IDF space. I am extending this approach by adding a semi-supervised component in the form of a research classification scheme, a hierarchical structure with broad topics as the first character of the classification code, and more detailed topics represented in the subsequent characters. The hierarchical structure of the classification code provides an implicit distance metric for abstracts that share (parts of) their classification codes: abstracts that share the full classification code are deemed closest to one another, while those that only share the stem are more distant from each other. In this approach, instead of analyzing the individual terms or bigrams, we first classify each abstract within the classification scheme, and then combine the abstracts together based on the distance in the research topic hierarchy. The process is demonstrated with the historic abstracts of the Survey Research Methods Section, and applied to the 2018 submissions.
|