Rank aggregation using latent-scale distance-based models

451 Accesses
Explore all metrics

Abstract

Rank aggregation aims at combining rankings of a set of items assigned by a sample of rankers to generate a consensus ranking. A typical solution is to adopt a distance-based approach to minimize the sum of the distances to the observed rankings. However, this simple sum may not be appropriate when the quality of rankers varies. This happens when rankers with different backgrounds may have different cognitive levels of examining the items. In this paper, we develop a new distance-based model by allowing different weights for different rankers. Under this model, the weight associated with a ranker is used to measure his/her cognitive level of ranking of the items, and these weights are unobserved and exponentially distributed. Maximum likelihood method is used for model estimation. Extensions to the cases of incomplete rankings and mixture modeling are also discussed. Empirical applications demonstrate that the proposed model produces better rank aggregation than those generated by Borda and the unweighted distance-based models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rank Aggregation: Models and Algorithms

A weighted distance metric for assessing ranking dissimilarity and inter-group heterogeneity

Article Open access 24 February 2021

Rank Data Clustering Based on Lee Distance

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

A distance $d(\pi ,\sigma )$ between two rankings $\pi $ and $\sigma $ is said to be right invariant if and only if for any ranking $\tau $, $d(\pi ,\sigma )=d(\pi \circ \tau ,\sigma \circ \tau )$, where $\pi \circ \tau (i)=\pi (\tau (i))$.

References

Aledo, J.A., Gámez, J.A., Molina, D.: Tackling the rank aggregation problem with evolutionary algorithms. Appl. Math. Comput. 222, 632–644 (2013)
MathSciNet MATH Google Scholar
Ali, A., Meilă, M.: Experiments with Kemeny ranking: what works when? Math. Soc. Sci. 64(1), 28–40 (2012)
Article MathSciNet MATH Google Scholar
Aslam, J.A., Montague, M.: Models for metasearch. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 276–284 (2001)
Borg, I., Groenen, P.J.: Modern Multidimensional Scaling: Theory and Applications. Springer, Berlin (2005)
MATH Google Scholar
Busse, L.M., Orbanz, P., Buhmann, J.M.: Cluster analysis of heterogeneous rank data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 113–120 (2007)
Ceberio, J., Irurozki, E., Mendiburu, A., Lozano, J.A.: Extending distance-based ranking models in estimation of distribution algorithms. In: Evolutionary Computation (CEC), 2014 IEEE Congress on, pp. 2459–2466. IEEE (2014)
de Borda, J.: Mémoire sur les élections au scrutin. Mémoires de l’Académie Royale des Sciences Année, pp. 657–664 (1781)
DeConde, R.P., Hawley, S., Falcon, S., Clegg, N., Knudsen, B., Etzioni, R.: Combining results of microarray experiments: a rank aggregation approach. Stat. Appl. Genet. Mol. Biol. 5(1):Article 15 (2006)
Deng, K., Han, S., Li, K.J., Liu, J.S.: Bayesian aggregation of order-based rank data. J. Am. Stat. Assoc. 109(507), 1023–1039 (2014)
Article MathSciNet MATH Google Scholar
Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th International Conference on World Wide Web, pp. 613–622 (2001)
Fligner, M.A., Verducci, J.S.: Distance-based ranking models. J. R. Stat. Soc. B 48(3), 359–369 (1986)
MathSciNet MATH Google Scholar
Fligner, M.A., Verducci, J.S.: Posterior probabilities for a consensus ordering. Psychometrika 55(1), 53–63 (1990)
Article MathSciNet Google Scholar
Irurozki, E., Calvo, B., Lozano, J.A.: Sampling and learning the Mallows model under the Ulam distance. Technical report, Department of Computer Science and Artificial Intelligence, University of the Basque Country (2014)
Irurozki, E., Calvo, B., Lozano, J.A.: Sampling and learning the Mallows and generalized Mallows models under the Cayley distance. Methodol. Comput. Appl. Probab. 20(1), 1–35 (2018)
Article MathSciNet MATH Google Scholar
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P., et al.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Article MathSciNet MATH Google Scholar
Lee, M.D., Steyvers, M., Miller, B.: A cognitive model for aggregating people’s rankings. PLoS One 9(5), e96431 (2014)
Article Google Scholar
Lin, S., Ding, J.: Integration of ranked lists via cross entropy Monte Carlo with applications to mRNA and microRNA studies. Biometrics 65(1), 9–18 (2009)
Article MathSciNet MATH Google Scholar
Mallows, C.L.: Non-null ranking models. I. Biometrika 44(1–2), 114–130 (1957)
Article MathSciNet MATH Google Scholar
Murphy, T.B., Martin, D.: Mixtures of distance-based models for ranking data. Comput. Stat. Data Anal. 41(3), 645–655 (2003)
Article MathSciNet MATH Google Scholar
Naume, B., Zhao, X., Synnestvedt, M., Borgen, E., Russnes, H.G., Lingjærde, O.C., Strømberg, M., Wiedswang, G., Kvalheim, G., Kåresen, R., et al.: Presence of bone marrow micrometastasis is associated with different recurrence risk within molecular subtypes of breast cancer. Mol. Oncol. 1(2), 160–171 (2007)
Article Google Scholar
Niederreiter, H.: Quasi-Monte Carlo Methods. Wiley Online Library (2010)
Černý, V.: Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm. J. Optim. Theory Appl. 45(1), 41–51 (1985)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China
Philip L. H. Yu & Hang Xu

Authors

Philip L. H. Yu
View author publications
You can also search for this author in PubMed Google Scholar
Hang Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hang Xu.

Additional information

The research of Philip L.H. Yu was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. 17303515).

Appendix: Sampling from a general distance-based Model

It is straightforward to simulate ranking data from the Kendall distance-based model, because of the nice decomposition of Kendall distance into a set of independent variables $V_{i}\left( \pi _{s}\circ \pi _{0}^{-1}\right) $ (see Sect. 4.1) which can be sampled easily. The detailed algorithm can be found in Ceberio et al. (2014). Recently Irurozki et al. (2018, 2014) developed two different methods to sample from models using Cayley distance and Ulam distance. Their simulation methods require the knowledge of special properties of the distance measures and they may not be able to be generalized to other distances. Here we introduce a general method to sample from any distance-based model.

We need to sample from f(x) here. The method begins at $s=0$ with the selection of $X^{(0)}=x^{(0)}$ drawn at random from some stating distribution g, with the requirement that $f\left( x^{(0)}\right) >0$. Given $X^{(s)}=x^{(s)}$ the algorithm generates $X^{(s+1)}$ as follows:

1.
Sample a candidate value $X^{*}$ from a proposal distribution $g\left( \cdot \mid x^{(s)}\right) $.
2.
Compute the Metropolis–Hastings ratio $R_{MH}\left( x^{(s)},X^{*}\right) $, where
$$\begin{aligned} R_{MH}\left( x^{(s)},X^{*}\right) =\frac{f\left( X^{*}\right) g\left( x^{(s)}\mid X^{*}\right) }{f\left( x^{(t)}\right) g\left( X^{*}\mid x^{(t)}\right) }. \end{aligned}$$
3.
Sample a random value for $X^{(s+1)}$ as follow:
$$\begin{aligned} X^{(s+1)}={\left\{ \begin{array}{ll} X^{*} &{} \text{ with } \text{ probability }\ \min \left\{ 1,R_{MH}\left( x^{(s)},X^{*}\right) \right\} \\ x^{(s)} &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$
4.
Assign $s:=s+1$ and go to step 1

Back to our problem, given $\pi _{0}$, $\lambda $, we need to sample from the distance-based model: $f(\pi )=\frac{\exp \left[ -\lambda d(\pi ,\pi _{0})\right] }{C(\lambda )}$ is the distance function we choose. Note that we don’t need to calculate the normalizing constant $C(\lambda )$ in the $R_{MH}\left( x^{(s)},X^{*}\right) $ function because the $C(\lambda )$ terms cancel. For the proposal distribution $g(\cdot |\centerdot )$ here, we will introduce two proposal distribution and compare their difference.

The proposal distribution for the algorithm can be chosen as a symmetric distribution so that $g(x^{*}\mid x^{(s)})=g(x^{(s)}\mid x^{*})$. In our case, our choice of $g(x^{*}\mid x^{(s)})$ imposes small perturbation of the elements of $x^{(s)}$. Given $x^{(s)}$, we uniformly picked two elements of $x^{(s)}$ and swap their positions as our proposal $\pi ^{*}$. Since this swap distribution is symmetric, so the Hasting corrections in the Metropolis–Hastings ratio cancel, and the ratio is given as:

$$\begin{aligned} R_{MH}\left( x^{(s)},X^{*}\right) =\frac{\exp \left( -\lambda d\left( \pi ^{*},\pi _{0}\right) \right) }{\exp \left( -\lambda d\left( \pi ^{(s)},\pi _{0}\right) \right) } \end{aligned}$$

When we need to sample from the distance-based Latent-scale Models, we first sample $\lambda _{k}$ from $exp(\lambda )$. Then we sample $\pi _{k}$ from simple distance-based model with $\lambda _{k}$ using the method above.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, P.L.H., Xu, H. Rank aggregation using latent-scale distance-based models. Stat Comput 29, 335–349 (2019). https://doi.org/10.1007/s11222-018-9811-9

Download citation

Received: 27 November 2017
Accepted: 03 April 2018
Published: 07 April 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s11222-018-9811-9

Rank aggregation using latent-scale distance-based models

Abstract

Access this article

Subscribe and save

Buy Now