Abstract:
|
People who use probabilistic topic models often need to update models based on new or updated data. There has been little research on transfer learning for Latent Dirichlet Allocation (LDA), which would enable updates. The result is that applied practitioners face an unpleasant tradeoff. Models may go stale, becoming less useful over time, or models must be re-trained from random initialization. When topics are re-initialized at random it breaks continuity with the old model. This research explores two complimentary methods for transfer learning in LDA. The first uses the topic-word distributions from previously-trained LDA model as a prior for a new model. The second involves a post-hoc alignment of counts of words and topics sampled in the original data set to the new or updated data set.
|