Keywords: Forests, ecology, communities, machine learning, network analysis, topic modeling
Since the turn of the 20th century, ecologists have debated what defines an ecological community of co-occurring species. Advances in statistics and the collection of large-scale datasets over the last century have provided further evidence of these relationships but identifying. Recent machine learning techniques and network analytics tools present opportunities to explore ecological communities in new, data-driven ways, but comparisons between techniques are needed to understand which methods best align with current ecological knowledge. During this research, we applied three machine learning models, Latent Dirichlet Allocation (LDA), Cluster Affiliation Model for Big Networks (BigCLAM), and Metapath2vec, to three decades of US Forest Service Forest Inventory and Analysis (FIA) data, spanning 86 tree species and 70,000 plots across the eastern US. The models showed that the best-fit number of communities, k, varied between the model input (relative vs. absolute measures of species abundance and sapling vs. adult stems) and the method used (LDA usually found more communities than BigCLAM and Metapath2vec). However, the community composition (the mixture of species within a community) when k was kept constant, was consistent between methods. These methods were also able to identify changes in both the geographic distributions of communities over time and the overlap between communities within a sampling unit, which have close links to ecological processes. For example, observed reductions in communities associated with the Fraxinus (ash) species could point to the invasion of the Emerald Ash Borer beetle in the US. Other observed community changes could associate with anthropogenic influences such as climate change and management practices. These models help to illuminate further the relationship between ecological stressors and the shifts in forest communities and can provide insight into the future sustainability of forest ecosystems across the eastern US.