Abstract
Rank aggregation aims at combining rankings of a set of items assigned by a sample of rankers to generate a consensus ranking. A typical solution is to adopt a distance-based approach to minimize the sum of the distances to the observed rankings. However, this simple sum may not be appropriate when the quality of rankers varies. This happens when rankers with different backgrounds may have different cognitive levels of examining the items. In this paper, we develop a new distance-based model by allowing different weights for different rankers. Under this model, the weight associated with a ranker is used to measure his/her cognitive level of ranking of the items, and these weights are unobserved and exponentially distributed. Maximum likelihood method is used for model estimation. Extensions to the cases of incomplete rankings and mixture modeling are also discussed. Empirical applications demonstrate that the proposed model produces better rank aggregation than those generated by Borda and the unweighted distance-based models.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
A distance \(d(\pi ,\sigma )\) between two rankings \(\pi \) and \(\sigma \) is said to be right invariant if and only if for any ranking \(\tau \), \(d(\pi ,\sigma )=d(\pi \circ \tau ,\sigma \circ \tau )\), where \(\pi \circ \tau (i)=\pi (\tau (i))\).
References
Aledo, J.A., Gámez, J.A., Molina, D.: Tackling the rank aggregation problem with evolutionary algorithms. Appl. Math. Comput. 222, 632–644 (2013)
Ali, A., Meilă, M.: Experiments with Kemeny ranking: what works when? Math. Soc. Sci. 64(1), 28–40 (2012)
Aslam, J.A., Montague, M.: Models for metasearch. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 276–284 (2001)
Borg, I., Groenen, P.J.: Modern Multidimensional Scaling: Theory and Applications. Springer, Berlin (2005)
Busse, L.M., Orbanz, P., Buhmann, J.M.: Cluster analysis of heterogeneous rank data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 113–120 (2007)
Ceberio, J., Irurozki, E., Mendiburu, A., Lozano, J.A.: Extending distance-based ranking models in estimation of distribution algorithms. In: Evolutionary Computation (CEC), 2014 IEEE Congress on, pp. 2459–2466. IEEE (2014)
de Borda, J.: Mémoire sur les élections au scrutin. Mémoires de l’Académie Royale des Sciences Année, pp. 657–664 (1781)
DeConde, R.P., Hawley, S., Falcon, S., Clegg, N., Knudsen, B., Etzioni, R.: Combining results of microarray experiments: a rank aggregation approach. Stat. Appl. Genet. Mol. Biol. 5(1):Article 15 (2006)
Deng, K., Han, S., Li, K.J., Liu, J.S.: Bayesian aggregation of order-based rank data. J. Am. Stat. Assoc. 109(507), 1023–1039 (2014)
Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th International Conference on World Wide Web, pp. 613–622 (2001)
Fligner, M.A., Verducci, J.S.: Distance-based ranking models. J. R. Stat. Soc. B 48(3), 359–369 (1986)
Fligner, M.A., Verducci, J.S.: Posterior probabilities for a consensus ordering. Psychometrika 55(1), 53–63 (1990)
Irurozki, E., Calvo, B., Lozano, J.A.: Sampling and learning the Mallows model under the Ulam distance. Technical report, Department of Computer Science and Artificial Intelligence, University of the Basque Country (2014)
Irurozki, E., Calvo, B., Lozano, J.A.: Sampling and learning the Mallows and generalized Mallows models under the Cayley distance. Methodol. Comput. Appl. Probab. 20(1), 1–35 (2018)
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P., et al.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Lee, M.D., Steyvers, M., Miller, B.: A cognitive model for aggregating people’s rankings. PLoS One 9(5), e96431 (2014)
Lin, S., Ding, J.: Integration of ranked lists via cross entropy Monte Carlo with applications to mRNA and microRNA studies. Biometrics 65(1), 9–18 (2009)
Mallows, C.L.: Non-null ranking models. I. Biometrika 44(1–2), 114–130 (1957)
Murphy, T.B., Martin, D.: Mixtures of distance-based models for ranking data. Comput. Stat. Data Anal. 41(3), 645–655 (2003)
Naume, B., Zhao, X., Synnestvedt, M., Borgen, E., Russnes, H.G., Lingjærde, O.C., Strømberg, M., Wiedswang, G., Kvalheim, G., Kåresen, R., et al.: Presence of bone marrow micrometastasis is associated with different recurrence risk within molecular subtypes of breast cancer. Mol. Oncol. 1(2), 160–171 (2007)
Niederreiter, H.: Quasi-Monte Carlo Methods. Wiley Online Library (2010)
Černý, V.: Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm. J. Optim. Theory Appl. 45(1), 41–51 (1985)
Author information
Authors and Affiliations
Corresponding author
Additional information
The research of Philip L.H. Yu was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. 17303515).
Appendix: Sampling from a general distance-based Model
Appendix: Sampling from a general distance-based Model
It is straightforward to simulate ranking data from the Kendall distance-based model, because of the nice decomposition of Kendall distance into a set of independent variables \(V_{i}\left( \pi _{s}\circ \pi _{0}^{-1}\right) \) (see Sect. 4.1) which can be sampled easily. The detailed algorithm can be found in Ceberio et al. (2014). Recently Irurozki et al. (2018, 2014) developed two different methods to sample from models using Cayley distance and Ulam distance. Their simulation methods require the knowledge of special properties of the distance measures and they may not be able to be generalized to other distances. Here we introduce a general method to sample from any distance-based model.
We need to sample from f(x) here. The method begins at \(s=0\) with the selection of \(X^{(0)}=x^{(0)}\) drawn at random from some stating distribution g, with the requirement that \(f\left( x^{(0)}\right) >0\). Given \(X^{(s)}=x^{(s)}\) the algorithm generates \(X^{(s+1)}\) as follows:
-
1.
Sample a candidate value \(X^{*}\) from a proposal distribution \(g\left( \cdot \mid x^{(s)}\right) \).
-
2.
Compute the Metropolis–Hastings ratio \(R_{MH}\left( x^{(s)},X^{*}\right) \), where
$$\begin{aligned} R_{MH}\left( x^{(s)},X^{*}\right) =\frac{f\left( X^{*}\right) g\left( x^{(s)}\mid X^{*}\right) }{f\left( x^{(t)}\right) g\left( X^{*}\mid x^{(t)}\right) }. \end{aligned}$$ -
3.
Sample a random value for \(X^{(s+1)}\) as follow:
$$\begin{aligned} X^{(s+1)}={\left\{ \begin{array}{ll} X^{*} &{} \text{ with } \text{ probability }\ \min \left\{ 1,R_{MH}\left( x^{(s)},X^{*}\right) \right\} \\ x^{(s)} &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$ -
4.
Assign \(s:=s+1\) and go to step 1
Back to our problem, given \(\pi _{0}\), \(\lambda \), we need to sample from the distance-based model: \(f(\pi )=\frac{\exp \left[ -\lambda d(\pi ,\pi _{0})\right] }{C(\lambda )}\) is the distance function we choose. Note that we don’t need to calculate the normalizing constant \(C(\lambda )\) in the \(R_{MH}\left( x^{(s)},X^{*}\right) \) function because the \(C(\lambda )\) terms cancel. For the proposal distribution \(g(\cdot |\centerdot )\) here, we will introduce two proposal distribution and compare their difference.
The proposal distribution for the algorithm can be chosen as a symmetric distribution so that \(g(x^{*}\mid x^{(s)})=g(x^{(s)}\mid x^{*})\). In our case, our choice of \(g(x^{*}\mid x^{(s)})\) imposes small perturbation of the elements of \(x^{(s)}\). Given \(x^{(s)}\), we uniformly picked two elements of \(x^{(s)}\) and swap their positions as our proposal \(\pi ^{*}\). Since this swap distribution is symmetric, so the Hasting corrections in the Metropolis–Hastings ratio cancel, and the ratio is given as:
When we need to sample from the distance-based Latent-scale Models, we first sample \(\lambda _{k}\) from \(exp(\lambda )\). Then we sample \(\pi _{k}\) from simple distance-based model with \(\lambda _{k}\) using the method above.
Rights and permissions
About this article
Cite this article
Yu, P.L.H., Xu, H. Rank aggregation using latent-scale distance-based models. Stat Comput 29, 335–349 (2019). https://doi.org/10.1007/s11222-018-9811-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-018-9811-9