Bayesian Analysis for the Paired Comparison Model
with Order Effects (Using Non-informative Priors)
Ghausia Masood Gilani
College of Statistical and Actuarial Sciences
University of the Punjab, Q.A. Campus, Lahore, Pakistan
gilani_pu@yahoo.com
Nasir Abbas
Department of Statisics, Government Postgraduate College
Jhang, Pakistan
nabbasgcj@yahoo.com
Abstract
Sometimes it may be difficult for a panelist to rank or compare more than two objects or
treatments at the same time. For this reason, paired comparison method is used. In this study,
the Davidson and Beaver (1977) model for paired comparisons with order effects is analyzed
through the Bayesian Approach. For this purpose, the posterior means and the posterior modes
are compared using the noninformative priors.
Keywords: Paired comparison method; The Davidson and Beaver model; Prior
distribution; Noninformative prior; Jeffreys prior; Uniform distribution; Posterior
distribution.
1. Introduction
Sometimes- it may be difficult for a panelist to simultaneously rank or compare
more than two treatments (objects, items, options, stimuli etc.) specially when
differences between treatments are small or the criteria are rather subtle. For this
reason, the paired comparison data is regarded as more reliable and can be
obtained more readily from panelists. In a paired comparison trial, a panelist is
given a pair of treatments and is asked to pick the better one with respect to a
given attribute. This process is repeated for all pairs of the treatments under
study.
This method is widely used in industry for assessing customer preference and
designing products using trained panelists. For example, in taste testing it is
often difficult for a panelist to cope with more than two tastes at the same time
and the introduction of a third may be confusing. A good review of the paired
comparison models including their analysis is given by Bradley (1976). David
(1988) has a detailed survey of the literature and references concerning the
method of paired comparisons. Further literature can be seen in Augustin (2004),
David, (2004), Hatzinger et. al. (2004) and the references cited therein. Some
recent developments include Abbas and Aslam (2009) who develop a paired
comparison model based on Cauchy distribution and analyze it via the Bayesian
approach. Abbas and Aslam (2010) perform Bayesian analysis of the chi-square
models suggested by Stern (1990).
Pak.j.stat.oper.res. Vol.IV No.2 2008 pp85-93
Ghausia Masood Gilani, Nasir Abbas
2. The Davidson and Beaver Model
Consider a paired comparison trial with a set of m treatments T1 , T2 ,..., Tm . Each
pair formed from the set of m treatments is ranked rij times by being presented to
a respondent who is asked to indicate a preference for one treatment of the pair.
It is assumed that the responses to the treatments can be described in terms of
an underlined continuum on which the relative worth of the treatments can be
located. We denote the worth as an index of relative merit of the treatment Ti
by i , i 0 , where, without loss of generality, i 1 . A model can be based
on the idea that when a panelist is confronted with treatment Ti, he responds with
an unconscious or latent variable Xi. The assumed mechanism is that he prefers
treatment Ti to treatment Tj if X i X j .
Bradley and Terry (1952) propose a model of paired comparisons for the trial
mentioned above. They assume that the difference between two latent variables
i.e., ( X i X j ) , has a logistic (squared hyperbolic secant) density with location
parameter (ln i ln j ) . Now the probability P ( X i X j | i , j ) that the treatment
Ti is preferred to the treatment Tj, (i j ) , when the treatments Ti and Tj are
compared, is defined as:
ij =P( X i X j | i , j ) =1 4
-(ln i ln j )
sec h 2 ( y 2)dy
i
i j
, i ( j ) 1, 2,..., m
(1)
The Bradley-Terry model is defined by (1).
Davidson and Beaver (1977) propose a modification of the Bradley -Terry model
to account for the effect of the order of presentation of the treatments within a
pair. A multiplicative order effect is suggested as an alternative to the additive
order effect proposed by Beaver and Gokhale (1975). An important feature of the
Bradley-Terry model is that the values ln1, ln2, …, lnm can be used to
represent the merits of the treatments under study on a linear scale. Thus it is
natural to assume that the logarithms of the worths, as supposed to the worths
themselves, are affected additively by the order of presentation. This is
equivalent to assume that when treatment Ti and Tj appear together in a pair,
their relative worths are subject to a multiplicative within-pair order effect ij. It is
assumed that ij. = ji, i.e., the within-pair order effect depends only on the
treatments pair. The resulting preference probabilities for the ordered pair (Ti, Tj)
are given by
ij =1 4
-(ln i ln j ) ij
and ji =
sec h 2 ( y 2)dy
i
,
i ij j
i ( j ) 1, 2,..., m ,
(2)
ij j
. Here ij > 0. When ij =1, there is no order effect and the model
i ij j
yields the Bradley-Terry model (1). When ij > 1, the worth of the treatment
presented second becomes inflated while if ij < 1, the treatment presented first
gains an advantage. The case ij = , for all (i, j) is of interest because of the
84
Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91
Bayesian Analysis for the Paired Comparison Model with Order Effects (Using Non-informative Priors)
considerable reduction in the number of parameters required to specify the
model. The multiplicative model (2) arises naturally in the setting of the linear
model, (David 1988). Suppose that an individual experiences sensations Xi and
Xj when presented the pair (i, j), and that the response i j (i preferred to j)
results when Xi > Xj is interpreted to mean that the sensation to treatment Ti
comes closer to the ideal sensation than that for treatment Tj. It is noted by
Bradley (1953) that when the difference Xi - Xj has a logistic distribution with
location parameter (lni - lnj), one obtains the preference probabilities of the
Bradley-Terry model. Now suppose that for the ordered pair (i, j), the response
i j results when Xi - Xj > ij where ij is interpreted as a shift on the sensation
scale which arises because of the order of presentation. Using the logistic
distribution for Xi - Xj one obtains the preference probabilities given by the
multiplicative model (2) with ij = ln ij.
So far it has been assumed that each response to a pair of treatments consists of
an indication of preference for one member of the pair.
3. Notations and Likelihood Function for the Model
Let wijk(1) and wijk(2) be the random variables associated with the rank of the
treatment i in the kth repetition of the treatment pair (Ti , Tj), i (j) = 1, 2, 3,…, m,
k = 1,…, rij. They are defined as:
wijk(1) = 1 or 0 according as the treatment Ti is preferred to treatment
Tj or not in the kth repetition of comparison.
wijk(2) = 1 or 0 according as the treatment Tj is preferred to treatment
Ti or not in the kth repetition of comparison.
wij(1) =
w
ijk
(1) = The frequency of preference for the treatment
k
presented first (treatment i).
wij(2) = wijk (2) = The frequency of preference for the treatment
k
presented second (treatment j)
rij = The number of timed treatment Ti is compared with treatment Tj
and rij = wij (1) wij (2)
N rij = The total number of comparisons
i
j i
wi wij (1) wij (2) = The total number of wins of the treatment Ti
j
Now we derive the likelihood function of the data for the Davidson and Beaver
model stated in (2). We put constraint on the treatment parameters of the model
that they are positive and they sum to unity. These conditions ensure that the
parameters are well defined and identifiable.
Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91
85
Ghausia Masood Gilani, Nasir Abbas
The probability of the observed result in the kth repetition of the pair (Ti, Tj) is
i
Pijk =
j
i
wijk (1)
j
i j
wijk (2)
(3)
Hence, the likelihood function of the observed outcome x (where x represents the
data (wijk(1), wijk(2)) of the trial is
m
rij
L( x;1, 2)
m
k 1 i ( j ) 1
Pijk
m
wij (2)
i 1 j i
wij (1)
i
m
i
m
i 1
j 1
i 1
where 0 i 1, i 1, 2,..., m , kij rij
m
m
m
m
wij (1)
j
j 1
j 1
i
j
m
i ( j ) 1
wijk (1) wijk (2)
wij (1)!wij (2)! ,
m
i
kij
,
(4)
1 and 0 . Here is
i 1
the order effect parameter and 1 , 2 ,..., m are the treatment parameters.
4. The Choice of Prior Distribution
Bayesian analysis is a statistical procedure, which endeavors to estimate
parameters of an under lying distribution based on the observed distribution. We
begin with the derivation of the prior distribution of parameters, include an
assessment of the likelihoods function of the sample observations derived from
the distribution identified by the parameters and then merge them to yield a
posterior distribution of the parameters of interest. Then we base the entire
parametric inference on the very posterior distribution derived for the parameters
of interest conditional upon the data. In practice, it is common to assume a
uniform distribution over the appropriate range of values of the parameters for
the prior distribution. Adams (2005) throws light on the advantages of the
Bayesian approach in an explained way.
Prior distribution quantifies information about parameter prior to any data being
gathered. The prior which expresses specific definite information about a random
variable is the informative prior. But in some cases, such as multiparameter
situations, it becomes difficult to formalize any available prior information in to a
distribution. In such cases when little prior information is known or prior elicitation
is difficult, analysis is done by choosing a prior which reflect little prior
information. These priors are known as non-informative priors. A prior distribution
is non-informative if it is flat relative to the likelihood function. Thus a prior
distribution is non-informative, if it has minimal impact on the posterior
distribution of parameter of interest and is dominated by the likelihood, that is, it
does not change very much over the region in which the likelihood is appreciable
and does not assume large values outside the range. A prior which has these
properties is said to be a locally uniform prior. Other names for non-informative
priors are reference priors, vague priors, ignorant priors or flat priors.
86
Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91
Bayesian Analysis for the Paired Comparison Model with Order Effects (Using Non-informative Priors)
Many approaches for the choice of a non-informative prior have been given. One
way of eliminating the subjectivity in the choice of prior is to use a flat or diffuse
prior distribution that is uniform across all possible values of the parameter. Such
a non-informative diffuse prior is simply a constant, i.e., p ( ) c, for belonging
to the parametric space. With a diffuse prior, the posterior is just a constant c
times the likelihood, i.e.
p ( | x) p ( ) L( | x) p ( | x) cL( | x) p ( | x) L( | x)
Another approach is using the Jeffreys prior. It satisfies the local uniformity
property for non-informative priors. It is the prior based on the Fisher information
matrix.
5. Reference (Jeffreys) Prior for the Parameters of the Model
A non-informative prior has been suggested by Jeffreys (1946, 1961) which is
frequently used in the situation where one does not have much information about
the parameters. It is defined as the density of the parameters proportional to the
square root of the determinant of the Fisher’s Information Matrix.
Let p(x| ) denotes the density of x given . The Fisher information is
2 log p (x | )
I ( ) E
2
If is a p×1 vector then
θ
2 log p (x | )
I (θ) E
i j
and thus I (θ) is a p × p matrix. The Jeffreys prior is defined as the determinant
of the Fisher information matrix, i.e., p (θ) det{I (θ)} .
Properties of the Jeffreys’ Prior
The Jeffreys Prior shows many nice properties that make it an attractive
non-infornative prior. The Jeffreys prior has invariance property with regards to
its one to one (1-1) transformation of the parameter in the sense that we get
consistent answers in any parameterization.
Bernardo (1979) shows that the Jeffreys prior is the appropriate reference prior if
there are no parameters that are regarded as the nuisance parameters and the
joint posterior distribution of all the parameters is asymptotically normal. Another
important aspect of the prior is that it is not effective by a restriction on the
parametric space.
If the likelihood function (4) belongs to the exponential family and it follows from
the regularity conditions of Johnson and Ladalla (1979) that the posterior
distribution is asymptotically normal. Here no parameter is regarded as
Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91
87
Ghausia Masood Gilani, Nasir Abbas
‘nuisance’ so the Jeffreys prior is the appropriate choice of non-informative prior
and hereafter will be called the reference (Jeffreys) Prior.
Let us consider the case for the number of treatments m=2. The likelihood
function for the parameters , 1 and 2 of the Davidson and Beaver model is
proportional to
l ( x; )
12 (2) w21 (2)
w (1) w (2) (1 ) w (1) w
( (1 )) w (1) w (2)
w
12
21
12
12 (2) w21 (2)
w
12 (2)
21
21
w (1) w (2) (1 ) w (1) w
r
r
(1 ) (1 )
12
21
12 (2)
21
12
21
Where ( , ) , 1 , 2 1 and rij wij (1) wij (2) .
Using the MAPLE package, the reference (Jeffreys) prior is:
1
p ( )
(1 )(1 )
p ( )
1
,
K (1 )(1 )
0 1, 0
where K = 5.349 ×10-29.
6. Bayesian Analysis of the Model
Using the data given in Table 1, the Bayesian analysis of the model with order
effect for two treatments with equal number of comparisons for each pair rij = 50,
(i, j = 1, 2), is presented using the non- informative priors: the uniform and the
reference (Jeffreys) priors.
Table 1: Data with Ordered Effects
Pairs
(1, 2)
(2, 1)
rij
50
50
wij(1)
14
32
wij(2)
36
18
6.1 The Posterior Distributions
The (joint) posterior distribution using the uniform prior {p( , ) 1} , for the
treatment parameter 1 (with constraint 2 =1- 1) and the order effect parameter
is
w12 (2) w21 (2)1w12 (1) w21 (2) (1 1 ) w21 (1) w12 (2)
p (1 , | x)
, 0 1 1, 0 ∞
r
r
K 1 (1 1 ) 12 (1 1 ) 1 21
where K = 6.7355 ×10-29 is the normalizing constant. Here we may replace 1 by
for simplicity.
88
Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91
Bayesian Analysis for the Paired Comparison Model with Order Effects (Using Non-informative Priors)
The (marginal) posterior densities of the parameters 1 and are:
p (1 | x) p (1 , | x)d , 0 1 1
(5)
p ( | x) p (1 , | x)d1 , 0 ∞
(6)
0
1
0
Similarly the (marginal) posterior densities of the parameters can be derived.
The (joint) posterior distribution using the Jeffreys prior for the treatment
parameters 1 (with constraint 2 = 1 - 1) and the order effect parameter is
12 (2) w21 (2)
1w (1) w (2) (1 1 ) w (1) w (2)
p (1 , | x)
,
r 1
r 1
K 1 (1 1 ) (1 1 ) 1
w
12
21
21
12
12
21
0 1 1, 0
where K = 5.3549 × 10-29 is the normalizing constant. The (marginal) posterior
densities of the parameters 1 and may be found using (5) and (6).
6.2 The Posterior Estimates
The posterior means and joint posterior modes of the parameters are considered
to be the estimates of the parameters. The posterior means of the parameters
using uniform prior and Jeffreys priors are evaluated using the quadrature
method for the data set given in Table 1. We evaluate the expressions
1
0
0
E (1 | x) 1 p (1 | x) d1 and E ( | x) p ( | x)d respectively to find the mean
estimates the worth and order effect parameters. Using the uniform prior, the
values of the posterior means of the parameters 1, 2 and are evaluated to be
0.31972, 0.68028 and 1.296670, whereas using the Jeffreys prior, these
estimates are 0.32050, 0.67950 and 1.230970. Here it can be seen that the
posterior means of the parameters evaluated using the uniform prior are very
close to those produced using the Jeffreys prior.
The estimates of the parameters of interest which maximize the posterior density
are termed as the joint posterior modes. These are found by solving the
equations obtained by equating to zero the first partial derivatives of the
logarithm of the likelihood function with regards to the unknown parameters. That
is we solve:
ln L( )
ln L( )
ln L( )
0,
0 and
0
1
2
The modal estimates using the uniform prior are found for the parameters 1 , 2
and by executing a computer program designed in the SAS package. The
values of the posterior joint mode of the vector (1, 2, ) are obtained to be
(1 = 0.318665, 2 = 0.681335, = 1.22676). Similarly the modal estimates
evaluated using the Jeffreys prior are found by running another SAS program
Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91
89
Ghausia Masood Gilani, Nasir Abbas
and the values of the joint posterior mode of the parameter vector (1, 2, ) are
obtained to be (1 = 0.319267, 2 = 0.680733, = 1.145057). Here we observe
that the resulting estimates produced using both the non-informative priors – the
Jeffreys and the uniform priors – are also very close.
7. Conclusions
Here we perform the Bayesian analysis of the Davidson Beaver model for paired
comparisons based on the non-informative priors – the Jeffreys as well as the
uniform priors. We derive the means and the modal estimates of the model
parameters for an illustration based on the observed data. From the results it
reveals that the posterior estimates obtained using the two types of the
non-informative priors are very similar, which gives confidence in using either of
the priors. The results also exhibit the robustness with respect to the choice of
non-informative priors: the Jeffreys and the uniform. This means the simpler
uniform prior is a reasonable option in this case.
References
1.
Abbas, N. and Aslam, M. (2009). Prioritizing the Items through Paired
Comparison Models, A Bayesian Approach. Pakistan Journal of Statistics,
25(1), 59-69.
2.
Abbas, N. and Aslam, M. (2010). Bayesian Analysis of Gamma Models for
Paired Comparisons. Pakistan Journal of Statistics, (in press).
3.
Adams, E.S. (2005). Bayesian analysis of linear dominance hierarchies.
Animal Behavior, 69, 1191-1201.
4.
Augustin, T. (2004). Bradley-Terry-Luce models to incorporate within-pair
order effects; Representation and uniqueness theorems. British Journal of
Mathematical and Statistical Psychology, 57, 281-294.
5.
Beaver, R. J. and Gokhale, D.V (1975). A model to incorporate within-pair
order effects in paired comparisons. Comm. Statist, 4, 923-939.
6.
Bernardo, J. M. (1979). Reference Posterior Distributions for Bayesian
Inference (with discussion), Journal of the Royal Statistical Society Series
B, 41, 113-147.
7.
Bradley, R. A and Terry, M. E. (1952). Rank analysis of incomplete Block
Designs I. The method of paired comparisons. Biometrika, 39, 324-45.
8.
Bradley, R. A. (1976). Science, Statistics and paired comparisons.
Biometrics, 32, 213-232.
9.
Bradley, R.A. (1953). Some Statistical methods in taste testing and Quality
Evaluation. Biometrics, 9, 22-38.
10.
David, H. A. (1988). The method of paired comparisons. 2nd ed. London:
Griffin.
11.
David, R. H. (2004. MM Algorithms for generalized Bradley-Terry models.
The Annals of Statistics, 32, 384-406.
90
Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91
Bayesian Analysis for the Paired Comparison Model with Order Effects (Using Non-informative Priors)
12.
Davidson, R. R. and Beaver, R. J. (1977). On extending the Bradley-Terry
model to incorporate within-pair order effects. Biometrics, 33, 693-702.
13.
Hatzinger, Reinhold: Francis, Brian, J. (2004). Fitting paired comparison
models in R. Research Report Series / Department of Statistics and
Mathematics Nr. 3. May 2004.
14.
Jeffreys, H. (1946). An Invariant Form for the Prior Probability Estimation
Problems. Proceeding of the Royal Society of London, Series A, 186,
453-461.
15.
Jeffreys, H. (1961). Theory of Probability, Oxford UK: Claredon Press.
16.
Johnson, R. A. and Ladalla, J. N. (1979). The large sample behavior of
posterior distributions when sampling from multiparameter exponential
family models and allied results, Sankhya, Ser. B, 41, 196–215.
17.
Stern, H. (1990a). A Continuum of Paired Comparison Models.
Biometrika, 77(2), 265-273.
Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91
91