Online Program

Return to main conference page
Saturday, May 19
Machine Learning
New Directions in Rank Data Aggregation and Modeling
Sat, May 19, 8:30 AM - 10:00 AM
Grand Ballroom D
 

The Bayesian Mallows Model for Analysing Ranks and Preference Data: From Genomics to Recommendation Systems (304342)

Elja Arjas, University of Oslo and University of Helsinki 
Marta Crispino, Bocconi University 
Arnoldo Frigessi, University of Oslo 
Magne Thoresen, University of Oslo 
*Valeria Vitelli, University of Oslo 
Manuela Zucknick, University of Oslo 

Keywords: Incomplete rankings, Bayesian methods, Preference learning with uncertainty, Recommendation systems, Genomic data integration, Meta-analysis

Ranking items is crucial for collecting information about preferences in many areas, and the interest often lies both in producing estimates of the consensus ranking of all items, and in learning individualized preferences of the users. This latter task is particularly relevant for recommender systems, where posterior distributions of individual rankings allow for prediction (with uncertainty) of each user’s missing individual preferences, thus suggesting personalized recommendations. To these purposes, we propose to use the Mallows rank model, a quite intuitive distance-based approach to analyze rank data, able of flexibly handling very different applicative problems. We develop new computationally tractable methods for Bayesian inference in Mallows models that work with any right-invariant distance. The Bayesian paradigm allows a fully probabilistic analysis, and it easily handles missing data via augmentation procedures. Our method performs inference also based on partial rankings, such as top-k items or pairwise comparisons. We propose a mixture model for clustering heterogeneous users in homogeneous subgroups, with cluster-specific consensus rankings. Probabilistic predictions on the class membership of users based on their ranking of just some items is also easily accessible using the model posterior results. Interestingly, this Bayesian framework also allows for genomic data integration, in two different situations. In a typical meta-analysis, different gene lists arise from different studies or platforms, and the aim is to combine them in a stronger unifying consensus. Another frequent situation is the availability of heterogeneous microarray data from different sources, whose combination would strengthen the biological question under study. The use of ranks in combining genomic studies is relevant, since the biological interest lies in over-expressed genes for a given pathology. Moreover, ranks are insensitive to heterogeneity in the measurement scales.