Abstract:
|
The rapid growth of the scientific community has motivated many Big Data projects. One of them is using the vast volume of publications to study the research output of an academic community (e.g., statisticians). It is desirable to demonstrate an easy-to-extend approach on how to conduct such a project. We approach it by two steps. In the first step, we collected and cleaned a large data set on the publications in statistics. The data set consists of the titles, authors, abstracts, keywords, MSC numbers, references, and citation counts of 83,331 papers published in 36 statistics-related journals, spanning 41 years. In the second step, we created a template/pipeline/model/example where we showcased how to use the data set to study the statistics community. We identified a dozen of problems such as overall productivity of statisticians, centrality of scholars in the field, journal ranking, network community structure, dynamic network evolvement, diversity of coauthorship and citation, topic learning, topic trending, citation patterns, and citation prediction, and study each of them with modern statistical tools.
|