Abstract:
|
The amount of online health resources is rapidly increasing and it has become a significant source for health information in the U.S. Many local health departments now have websites to service their communities. With the staggering amount of text data, there is growing interest in analyzing and understanding collections of text. Topic models are one popular method, where a document is defined as a distribution of topics and a topic is defined as a distribution of words. We study topic model applications in website modeling. Modeling websites introduces a hierarchy between websites and web pages, where each website is a collection of pages. We propose a topic model to incorporate this hierarchy and jointly model website and web page topics. We apply standard topic models and the proposed model on a real dataset of local public health websites in the U.S.
|