All Times ET
Keywords: Open Source Software, Community Detection, OSS Community Structure
In this paper, we identify and study the communities formed on OSS collaboration networks using a dataset of 3.26 million GitHub users. While most existing work examines how small-scale OSS projects emerge, our work draws on a large-scale network of contributors from GitHub - the world’s largest remote hosting platform. Moreover, OSS collaborations are characterized by small groups of users that collaborate closely together and thus form more short cycles of collaboration within a community than across communities. To better understand how communities are shaped by the cyclic structure of the network (rather than just existing edges of a graph), we introduce a novel method for detecting communities: we incorporate a blend of this property as well as the strengths of the collaboration among users, as a preprocessing step and feed further topological information about the participation of edges in the cyclic structure of the groups to our clustering methods. To do this, we first preprocess the network data using Renewal-Nonbacktacking Random Walk (RNBRW) and then apply state-of-the-art clustering methods such as Louvain (Blondel et al., 2008) and Clauset-Newman-Moore (CNM). This method provides a stronger approach for detecting small-scale team formation by accounting for preferential attachment to more established collaboration communities. This paper offers useful insights for both open-source software experts as well as network scholars interested in studying group formation.