[go: up one dir, main page]

Academia.eduAcademia.edu
Bayesian Analysis for the Paired Comparison Model with Order Effects (Using Non-informative Priors) Ghausia Masood Gilani College of Statistical and Actuarial Sciences University of the Punjab, Q.A. Campus, Lahore, Pakistan gilani_pu@yahoo.com Nasir Abbas Department of Statisics, Government Postgraduate College Jhang, Pakistan nabbasgcj@yahoo.com Abstract Sometimes it may be difficult for a panelist to rank or compare more than two objects or treatments at the same time. For this reason, paired comparison method is used. In this study, the Davidson and Beaver (1977) model for paired comparisons with order effects is analyzed through the Bayesian Approach. For this purpose, the posterior means and the posterior modes are compared using the noninformative priors. Keywords: Paired comparison method; The Davidson and Beaver model; Prior distribution; Noninformative prior; Jeffreys prior; Uniform distribution; Posterior distribution. 1. Introduction Sometimes- it may be difficult for a panelist to simultaneously rank or compare more than two treatments (objects, items, options, stimuli etc.) specially when differences between treatments are small or the criteria are rather subtle. For this reason, the paired comparison data is regarded as more reliable and can be obtained more readily from panelists. In a paired comparison trial, a panelist is given a pair of treatments and is asked to pick the better one with respect to a given attribute. This process is repeated for all pairs of the treatments under study. This method is widely used in industry for assessing customer preference and designing products using trained panelists. For example, in taste testing it is often difficult for a panelist to cope with more than two tastes at the same time and the introduction of a third may be confusing. A good review of the paired comparison models including their analysis is given by Bradley (1976). David (1988) has a detailed survey of the literature and references concerning the method of paired comparisons. Further literature can be seen in Augustin (2004), David, (2004), Hatzinger et. al. (2004) and the references cited therein. Some recent developments include Abbas and Aslam (2009) who develop a paired comparison model based on Cauchy distribution and analyze it via the Bayesian approach. Abbas and Aslam (2010) perform Bayesian analysis of the chi-square models suggested by Stern (1990). Pak.j.stat.oper.res. Vol.IV No.2 2008 pp85-93 Ghausia Masood Gilani, Nasir Abbas 2. The Davidson and Beaver Model Consider a paired comparison trial with a set of m treatments T1 , T2 ,..., Tm . Each pair formed from the set of m treatments is ranked rij times by being presented to a respondent who is asked to indicate a preference for one treatment of the pair. It is assumed that the responses to the treatments can be described in terms of an underlined continuum on which the relative worth of the treatments can be located. We denote the worth as an index of relative merit of the treatment Ti by i , i  0 , where, without loss of generality,  i  1 . A model can be based on the idea that when a panelist is confronted with treatment Ti, he responds with an unconscious or latent variable Xi. The assumed mechanism is that he prefers treatment Ti to treatment Tj if X i  X j . Bradley and Terry (1952) propose a model of paired comparisons for the trial mentioned above. They assume that the difference between two latent variables i.e., ( X i  X j ) , has a logistic (squared hyperbolic secant) density with location parameter (ln i  ln  j ) . Now the probability P ( X i  X j | i ,  j ) that the treatment Ti is preferred to the treatment Tj, (i  j ) , when the treatments Ti and Tj are compared, is defined as:  ij =P( X i  X j | i , j ) =1 4   -(ln i  ln  j ) sec h 2 ( y 2)dy  i i   j , i ( j )  1, 2,..., m (1) The Bradley-Terry model is defined by (1). Davidson and Beaver (1977) propose a modification of the Bradley -Terry model to account for the effect of the order of presentation of the treatments within a pair. A multiplicative order effect is suggested as an alternative to the additive order effect proposed by Beaver and Gokhale (1975). An important feature of the Bradley-Terry model is that the values ln1, ln2, …, lnm can be used to represent the merits of the treatments under study on a linear scale. Thus it is natural to assume that the logarithms of the worths, as supposed to the worths themselves, are affected additively by the order of presentation. This is equivalent to assume that when treatment Ti and Tj appear together in a pair, their relative worths are subject to a multiplicative within-pair order effect ij. It is assumed that ij. = ji, i.e., the within-pair order effect depends only on the treatments pair. The resulting preference probabilities for the ordered pair (Ti, Tj) are given by  ij =1 4   -(ln i  ln  j )  ij and  ji = sec h 2 ( y 2)dy  i , i   ij j i ( j )  1, 2,..., m , (2)  ij j . Here ij > 0. When ij =1, there is no order effect and the model i   ij j yields the Bradley-Terry model (1). When ij > 1, the worth of the treatment presented second becomes inflated while if ij < 1, the treatment presented first gains an advantage. The case ij = , for all (i, j) is of interest because of the 84 Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91 Bayesian Analysis for the Paired Comparison Model with Order Effects (Using Non-informative Priors) considerable reduction in the number of parameters required to specify the model. The multiplicative model (2) arises naturally in the setting of the linear model, (David 1988). Suppose that an individual experiences sensations Xi and Xj when presented the pair (i, j), and that the response i  j (i preferred to j) results when Xi > Xj is interpreted to mean that the sensation to treatment Ti comes closer to the ideal sensation than that for treatment Tj. It is noted by Bradley (1953) that when the difference Xi - Xj has a logistic distribution with location parameter (lni - lnj), one obtains the preference probabilities of the Bradley-Terry model. Now suppose that for the ordered pair (i, j), the response i  j results when Xi - Xj > ij where ij is interpreted as a shift on the sensation scale which arises because of the order of presentation. Using the logistic distribution for Xi - Xj one obtains the preference probabilities given by the multiplicative model (2) with ij = ln ij. So far it has been assumed that each response to a pair of treatments consists of an indication of preference for one member of the pair. 3. Notations and Likelihood Function for the Model Let wijk(1) and wijk(2) be the random variables associated with the rank of the treatment i in the kth repetition of the treatment pair (Ti , Tj), i (j) = 1, 2, 3,…, m, k = 1,…, rij. They are defined as: wijk(1) = 1 or 0 according as the treatment Ti is preferred to treatment Tj or not in the kth repetition of comparison. wijk(2) = 1 or 0 according as the treatment Tj is preferred to treatment Ti or not in the kth repetition of comparison. wij(1) = w ijk (1) = The frequency of preference for the treatment k presented first (treatment i). wij(2) =  wijk (2) = The frequency of preference for the treatment k presented second (treatment j) rij = The number of timed treatment Ti is compared with treatment Tj and rij = wij (1)  wij (2) N   rij = The total number of comparisons i j i wi   wij (1)  wij (2) = The total number of wins of the treatment Ti j Now we derive the likelihood function of the data for the Davidson and Beaver model stated in (2). We put constraint on the treatment parameters of the model that they are positive and they sum to unity. These conditions ensure that the parameters are well defined and identifiable. Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91 85 Ghausia Masood Gilani, Nasir Abbas The probability of the observed result in the kth repetition of the pair (Ti, Tj) is  i Pijk =      j  i    wijk (1)   j    i   j    wijk (2) (3) Hence, the likelihood function of the observed outcome x (where x represents the data (wijk(1), wijk(2)) of the trial is m rij L( x;1, 2)   m  k 1 i (  j ) 1  Pijk  m  wij (2) i 1 j i  wij (1)  i m i m i 1 j 1 i 1   where 0  i  1, i  1, 2,..., m , kij  rij m m m m  wij (1)  j j 1 j 1 i   j  m  i (  j ) 1 wijk (1)  wijk (2) wij (1)!wij (2)! , m  i kij , (4)  1 and   0 . Here  is i 1 the order effect parameter and 1 ,  2 ,...,  m are the treatment parameters. 4. The Choice of Prior Distribution Bayesian analysis is a statistical procedure, which endeavors to estimate parameters of an under lying distribution based on the observed distribution. We begin with the derivation of the prior distribution of parameters, include an assessment of the likelihoods function of the sample observations derived from the distribution identified by the parameters and then merge them to yield a posterior distribution of the parameters of interest. Then we base the entire parametric inference on the very posterior distribution derived for the parameters of interest conditional upon the data. In practice, it is common to assume a uniform distribution over the appropriate range of values of the parameters for the prior distribution. Adams (2005) throws light on the advantages of the Bayesian approach in an explained way. Prior distribution quantifies information about parameter prior to any data being gathered. The prior which expresses specific definite information about a random variable is the informative prior. But in some cases, such as multiparameter situations, it becomes difficult to formalize any available prior information in to a distribution. In such cases when little prior information is known or prior elicitation is difficult, analysis is done by choosing a prior which reflect little prior information. These priors are known as non-informative priors. A prior distribution is non-informative if it is flat relative to the likelihood function. Thus a prior distribution is non-informative, if it has minimal impact on the posterior distribution of parameter of interest and is dominated by the likelihood, that is, it does not change very much over the region in which the likelihood is appreciable and does not assume large values outside the range. A prior which has these properties is said to be a locally uniform prior. Other names for non-informative priors are reference priors, vague priors, ignorant priors or flat priors. 86 Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91 Bayesian Analysis for the Paired Comparison Model with Order Effects (Using Non-informative Priors) Many approaches for the choice of a non-informative prior have been given. One way of eliminating the subjectivity in the choice of prior is to use a flat or diffuse prior distribution that is uniform across all possible values of the parameter. Such a non-informative diffuse prior is simply a constant, i.e., p ( )  c, for  belonging to the parametric space. With a diffuse prior, the posterior is just a constant c times the likelihood, i.e. p ( | x)  p ( ) L( | x)  p ( | x)  cL( | x)  p ( | x)  L( | x) Another approach is using the Jeffreys prior. It satisfies the local uniformity property for non-informative priors. It is the prior based on the Fisher information matrix. 5. Reference (Jeffreys) Prior for the Parameters of the Model A non-informative prior has been suggested by Jeffreys (1946, 1961) which is frequently used in the situation where one does not have much information about the parameters. It is defined as the density of the parameters proportional to the square root of the determinant of the Fisher’s Information Matrix. Let p(x| ) denotes the density of x given . The Fisher information is   2 log p (x |  )  I ( )   E    2   If  is a p×1 vector then θ   2 log p (x | )  I (θ)   E     i  j and thus I (θ) is a p × p matrix. The Jeffreys prior is defined as the determinant of the Fisher information matrix, i.e., p (θ)  det{I (θ)} . Properties of the Jeffreys’ Prior The Jeffreys Prior shows many nice properties that make it an attractive non-infornative prior. The Jeffreys prior has invariance property with regards to its one to one (1-1) transformation of the parameter in the sense that we get consistent answers in any parameterization. Bernardo (1979) shows that the Jeffreys prior is the appropriate reference prior if there are no parameters that are regarded as the nuisance parameters and the joint posterior distribution of all the parameters is asymptotically normal. Another important aspect of the prior is that it is not effective by a restriction on the parametric space. If the likelihood function (4) belongs to the exponential family and it follows from the regularity conditions of Johnson and Ladalla (1979) that the posterior distribution is asymptotically normal. Here no parameter is regarded as Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91 87 Ghausia Masood Gilani, Nasir Abbas ‘nuisance’ so the Jeffreys prior is the appropriate choice of non-informative prior and hereafter will be called the reference (Jeffreys) Prior. Let us consider the case for the number of treatments m=2. The likelihood function for the parameters , 1 and 2 of the Davidson and Beaver model is proportional to l ( x;  )  12 (2)  w21 (2)  w (1)  w (2) (1   ) w (1)  w (   (1   )) w (1)  w (2) w  12 21 12 12 (2)  w21 (2) w 12 (2) 21 21  w (1)  w (2) (1   ) w (1)  w r r    (1   ) (1   )    12 21 12 (2) 21 12 21 Where   ( ,  ) , 1   ,  2  1   and rij  wij (1)  wij (2) . Using the MAPLE package, the reference (Jeffreys) prior is: 1 p ( )     (1   )(1   )     p ( )  1 , K    (1   )(1   )    0    1, 0     where K = 5.349 ×10-29. 6. Bayesian Analysis of the Model Using the data given in Table 1, the Bayesian analysis of the model with order effect for two treatments with equal number of comparisons for each pair rij = 50, (i, j = 1, 2), is presented using the non- informative priors: the uniform and the reference (Jeffreys) priors. Table 1: Data with Ordered Effects Pairs (1, 2) (2, 1) rij 50 50 wij(1) 14 32 wij(2) 36 18 6.1 The Posterior Distributions The (joint) posterior distribution using the uniform prior {p( ,  )  1} , for the treatment parameter 1 (with constraint 2 =1- 1) and the order effect parameter  is  w12 (2)  w21 (2)1w12 (1)  w21 (2) (1  1 ) w21 (1)  w12 (2) p (1 ,  | x)  , 0   1 1, 0    ∞ r r K 1   (1  1 ) 12 (1  1 )  1 21 where K = 6.7355 ×10-29 is the normalizing constant. Here we may replace 1 by  for simplicity. 88 Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91 Bayesian Analysis for the Paired Comparison Model with Order Effects (Using Non-informative Priors) The (marginal) posterior densities of the parameters 1 and  are:  p (1 | x)   p (1 ,  | x)d  , 0   1 1 (5) p ( | x)   p (1 ,  | x)d1 , 0    ∞ (6) 0 1 0 Similarly the (marginal) posterior densities of the parameters  can be derived. The (joint) posterior distribution using the Jeffreys prior for the treatment parameters 1 (with constraint 2 = 1 - 1) and the order effect parameter  is 12 (2)  w21 (2) 1w (1)  w (2) (1  1 ) w (1)  w (2) p (1 ,  | x)  , r 1 r 1 K 1   (1  1 ) (1  1 )  1 w 12 21 21 12 12 21 0   1  1, 0    where K = 5.3549 × 10-29 is the normalizing constant. The (marginal) posterior densities of the parameters 1 and  may be found using (5) and (6). 6.2 The Posterior Estimates The posterior means and joint posterior modes of the parameters are considered to be the estimates of the parameters. The posterior means of the parameters using uniform prior and Jeffreys priors are evaluated using the quadrature method for the data set given in Table 1. We evaluate the expressions 1  0 0 E (1 | x)   1 p (1 | x) d1 and E ( | x)    p ( | x)d  respectively to find the mean estimates the worth and order effect parameters. Using the uniform prior, the values of the posterior means of the parameters 1, 2 and  are evaluated to be 0.31972, 0.68028 and 1.296670, whereas using the Jeffreys prior, these estimates are 0.32050, 0.67950 and 1.230970. Here it can be seen that the posterior means of the parameters evaluated using the uniform prior are very close to those produced using the Jeffreys prior. The estimates of the parameters of interest which maximize the posterior density are termed as the joint posterior modes. These are found by solving the equations obtained by equating to zero the first partial derivatives of the logarithm of the likelihood function with regards to the unknown parameters. That is we solve:  ln L( )  ln L( )  ln L( ) 0,  0 and 0 1  2  The modal estimates using the uniform prior are found for the parameters 1 ,  2 and  by executing a computer program designed in the SAS package. The values of the posterior joint mode of the vector (1, 2, ) are obtained to be (1 = 0.318665, 2 = 0.681335,  = 1.22676). Similarly the modal estimates evaluated using the Jeffreys prior are found by running another SAS program Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91 89 Ghausia Masood Gilani, Nasir Abbas and the values of the joint posterior mode of the parameter vector (1, 2, ) are obtained to be (1 = 0.319267, 2 = 0.680733,  = 1.145057). Here we observe that the resulting estimates produced using both the non-informative priors – the Jeffreys and the uniform priors – are also very close. 7. Conclusions Here we perform the Bayesian analysis of the Davidson Beaver model for paired comparisons based on the non-informative priors – the Jeffreys as well as the uniform priors. We derive the means and the modal estimates of the model parameters for an illustration based on the observed data. From the results it reveals that the posterior estimates obtained using the two types of the non-informative priors are very similar, which gives confidence in using either of the priors. The results also exhibit the robustness with respect to the choice of non-informative priors: the Jeffreys and the uniform. This means the simpler uniform prior is a reasonable option in this case. References 1. Abbas, N. and Aslam, M. (2009). Prioritizing the Items through Paired Comparison Models, A Bayesian Approach. Pakistan Journal of Statistics, 25(1), 59-69. 2. Abbas, N. and Aslam, M. (2010). Bayesian Analysis of Gamma Models for Paired Comparisons. Pakistan Journal of Statistics, (in press). 3. Adams, E.S. (2005). Bayesian analysis of linear dominance hierarchies. Animal Behavior, 69, 1191-1201. 4. Augustin, T. (2004). Bradley-Terry-Luce models to incorporate within-pair order effects; Representation and uniqueness theorems. British Journal of Mathematical and Statistical Psychology, 57, 281-294. 5. Beaver, R. J. and Gokhale, D.V (1975). A model to incorporate within-pair order effects in paired comparisons. Comm. Statist, 4, 923-939. 6. Bernardo, J. M. (1979). Reference Posterior Distributions for Bayesian Inference (with discussion), Journal of the Royal Statistical Society Series B, 41, 113-147. 7. Bradley, R. A and Terry, M. E. (1952). Rank analysis of incomplete Block Designs I. The method of paired comparisons. Biometrika, 39, 324-45. 8. Bradley, R. A. (1976). Science, Statistics and paired comparisons. Biometrics, 32, 213-232. 9. Bradley, R.A. (1953). Some Statistical methods in taste testing and Quality Evaluation. Biometrics, 9, 22-38. 10. David, H. A. (1988). The method of paired comparisons. 2nd ed. London: Griffin. 11. David, R. H. (2004. MM Algorithms for generalized Bradley-Terry models. The Annals of Statistics, 32, 384-406. 90 Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91 Bayesian Analysis for the Paired Comparison Model with Order Effects (Using Non-informative Priors) 12. Davidson, R. R. and Beaver, R. J. (1977). On extending the Bradley-Terry model to incorporate within-pair order effects. Biometrics, 33, 693-702. 13. Hatzinger, Reinhold: Francis, Brian, J. (2004). Fitting paired comparison models in R. Research Report Series / Department of Statistics and Mathematics Nr. 3. May 2004. 14. Jeffreys, H. (1946). An Invariant Form for the Prior Probability Estimation Problems. Proceeding of the Royal Society of London, Series A, 186, 453-461. 15. Jeffreys, H. (1961). Theory of Probability, Oxford UK: Claredon Press. 16. Johnson, R. A. and Ladalla, J. N. (1979). The large sample behavior of posterior distributions when sampling from multiparameter exponential family models and allied results, Sankhya, Ser. B, 41, 196–215. 17. Stern, H. (1990a). A Continuum of Paired Comparison Models. Biometrika, 77(2), 265-273. Pak.j.stat.oper.res. Vol.IV No.2 2008 pp83-91 91