Abstract:

Due to their ability to handle nonconvex data, spectral clustering methods have proven to be effective in a wide use of clustering problems. A key step of spectral clustering algorithms is the construction of an affinity matrix, which contains a measure of similarity between each of the observed data points. Once constructed, the affinity matrix serves as the primary component for determining the how the data are clustered. Thus, effective methods for constructing affinity matrices are crucial to the success of spectral clustering. Motivated by the kernel representation of Random Forests, we present an unsupervised Random Forest algorithm for constructing affinity matrices. We demonstrate that this algorithm leads to sensible data clusters, and is also able to effectively handle high dimensional data. Further possible uses of the algorithm are discussed, with particular attention paid to network link probability estimation.
