Abstract:
|
We have collected and cleaned a data set for the publications of statisticians, consisting of titles, authors, abstracts, MSC numbers, keywords, and citation counts of 83,661 papers published in 36 journals in statistics, probability, and related fields, spanning 41 years. The data set motivates an array of interesting problems. In this talk, I will discuss the following problems: productivity, centrality, journal ranking, text mining, network analysis, and citation prediction. For text mining, we use the paper abstracts in our data set as the text documents, and focus on how to use the estimated topic weights to study the research patterns of statisticians. For network analysis, we focus on community detection, membership estimation, and especially how to characterize the research trajectories of a subset of selected statisticians over the years. The work is collaborated with Pengsheng Ji, Tracy Zheng Ke, and Wanshan Li.
|