Abstract:
|
Many models rely on low-dimensional embeddings to provide both interpretability and predictive performance. These models can have efficient implementations that allow for use in many real-world applications, from recommendation systems to natural language processing. In such settings, model checking usually involves inspecting the resulting nearest neighbors graph, e.g., which movies or words are similar to each other. Model selection is often done by comparing performance on some alternate task, such as recommending movies or doing word analogy tasks. In both cases, models are compared indirectly, via qualitative inspection or a surrogate metric.
In this paper we propose Bayesian methods for comparing low-dimensional embeddings, and we show how this enables better model checking and improvement. We describe a tool that allows one to collect feedback on the nearest neighbor structure and to monitor how differences between models correlate with better task performance. We illustrate this procedure through two applications: 1) building a recommendation system for JSM content and 2) a query expansion task for finding rumors spreading on Twitter.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.