Keywords: Graphical models, online networks, natural language processing, hiearchical mixture models, hidden network discovery, overlapping communities, mixture communities.
This project investigates the behaviour of Reddit's news subreddit users and the relationship between their sentiment on exchange rates. Using graphical models and natural language processing, hidden online communities among Reddit users are discovered. The data used in this project are a mixture of text and categorical data from a news website. It includes the titles of the news pages, as well as a few user characteristics, in addition to users' comments. This dataset is an excellent resource to study user reaction to news since their comments are directly linked to the webpage contents. The model considered in this paper is a hierarhical mixture model which is a generative model that detects overlapping networks using the sentiment from the user generated content. The advantage of this model is that the communities (or groups) are assumed to follow a Chinese restaurant process, and therefore it can automatically detect and cluster the communities. The hidden variables and the hyperparameters for this model can be obtained using Gibbs sampling.