[go: up one dir, main page]

Academia.eduAcademia.edu

Bayesian analysis of order-statistics models for ranking data

2000, Psychometrika

In this paper, a class of probability models for ranking data, the order-statistics models, is investigated. We extend the usual normal order-statistics model into one where the underlying random variables follow a multivariate normal distribution. Bayesian approach and the Gibbs sampling technique are used for parameter estimation. In addition, methods to assess the adequacy of model fit are introduced. Robustness of the model is studied by considering a multivariate-t distribution. The proposed method is applied to analyze the presidential election data of the American Psychological Association (APA).

PSYCHOMETRIKA--VOL. 65, NO. 3, 281-299 SEPTEMBER 2000 BAYESIAN ANALYSIS OF ORDER-STATISTICS MODELS FOR R A N K I N G DATA PHILIP L.H. YU D E P A R T M E N T OF STATISTICS A N D A C T U A R I A L S C I E N C E THE U N I V E R S I T Y OF H O N G KONG In this paper, a class of probability models for ranking data, the order-statistics models, is investigated. We extend the usual normal order-statistics model into one where the underlying random variables follow a multivariate normal distribution. Bayesian approach and the Gibbs sampling technique are used for parameter estimation. In addition, methods to assess the adequacy of model fit are introduced. Robustness of the model is studied by considering a multivariate-t distribution. The proposed method is applied to analyze the presidential election data of the American Psychological Association (APA). Key words: data augmentation, Gibbs sampling, order-statistics model, ranking data. 1. Introduction The rankings of items are very common in everyday life. Gamblers want to know the rankings of the horses in horse-racing competitions; companies want to know consumers' order preference on products in the market; social and political leaders want to know which candidate will get more support in the election. Data consisting of rankings appear in many fields such as psychology, sociology, marketing, sports and econometrics. In the literature, modeling of ranking data has received a lot of attention and many models have been proposed over the years. They could be categorized into four classes: (i) order-statistics models, (ii) distance-based models, (iii) paired-comparison models, and (iv) multistage models. A more comprehensive discussion on these probability ranking models can be found in the review paper by Critchlow, Fligner and Verducci (1991). Recently, a monograph on modeling and analyzing ranking data was published by Marden (1995). Applications of these models to real life situations can be found in the literature; for example, Koop and Poirier (1994) use rank-ordered logit models to analyze voter preferences, and Lo, Bacon-Shone and Busche (1995) apply ranking probability models to racetrack betting. Among the four classes of probability models for ranking data, the class of order-statistics models has the longest history in the statistical and psychological literature. Dated back to 1927, Thurstone published the famous paper "A law of comparative judgement" in Psychological Review in which ranking of two items was considered. The basic idea behind this approach is that the judge may have tastes that fluctuate from one instant to another according to the judge's perception of each item which is not perfectly predictable and hence is a random variable. The ordering of these random variables then determines the judge's ranking of items. Mathematically, a ranking o f k items is defined as a permutation function ~" = (re1, re2,..., rck) from 1, 2 , . . . , k onto 1, 2, . . . , k, where rci is the rank given to item i, i = 1, 2, . . . , k, and larger ranks refer to the more preferred items. The author is grateful to K. Lam, K.F. Lam, the Editor, an associate editor, and three reviewers for their valuable comments and suggestions. This research was substantially supported by the CRCG grant 335/017/0015 of the University of Hong Kong and a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. HKU 7169/98H). Upon completion of this paper, I became aware that similar work had been done independently by K.G. Yao and U. B6ckenholt (1999). Requests for reprints should be sent to Philip L.H. Yu, Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, HONG KONG. 0033 -3123/2000-3/1996-0507-A $00.75/0 (~) 2000 The Psychometric Society 281 282 PSYCHOMETRIKA Let Yi, a random variable, be the utility of item i given by a judge (i = 1 . . . . . k). Then the order statistics model assigns the ranking ,rr with the following probability: P ( w ) = P(Yil < Yi2 < " " < Yik), w ¢ ~2, (1) where/r = t if and only if 7rt = r, and ~2 is the set of all k! possible rankings. Critchlow et al. (1991) observed that if the Yi's are allowed to have arbitrary dependencies, any probability distribution on rankings can be expressed as in (1). 2b simplify the model, some probabilistic structures on y's are assumed. The most common one is to assume that the Yi's are independent with c.d.f. Fi (y) = F ( y - i~i). Such models are referred to as 17mrstone order-statistics models (see Yellott 1977; and Critchlow et. al. 1991). Two famous Tht~stone models that have been studied extensively in the literature are the Thurstone(1927)-Daniels(1950)-Mosteller(1951) model where F is normal, and the Bradley-Terry(1952)--Luce(1959) (BTL) model where F is just extreme value distribution, that is, F ( y ) = 1 - e x p ( - exp(y)). Because the ranking probability in (1) under the BTL model has a close form, most applications and extensions are based on the BTL model (see, for example, Beggs, Cardell, & Hausman, 1981; and Chapman & Staelin, 1982 in which the term "rank ordered logit model" is used). Henery (1983) and Stern (1990a, 1990b) generalized the BTL model to the order-statistics model with latent utility Yi = log(ui), where ui ~ Gamma(r, )~i = exp(/~i)). Properties of the Thurstone order-statistics model are discussed in Henery ( 1981) and Critchlow et. al. Despite the fact that the ranking probability (1) under BTL model has a close form, sometimes the BTL model is not suitable for use in practice. There are two main reasons. Firstly, it satisfies the independence of irrelevant alternatives (IIA) property (Tversky, 1972) which says that the conditional distribution of the utility of each item is independent of the ranking of other items. Clearly, this property is unrealistic when some items are very similar or even substitutes for the others. Secondly, when it is applied to some real data, it was shown that it does not fit the data very well (see, for example, B6ckenholt, 1993; :Brook & Upton, 1974; and Tallis & Dansie, 1983). Therefore, to overcome these problems, dependency structures other than those in the Thurstone order-statistics models are required. One simple dependency structure on y = (Yl . . . . . yk) ~ is to assume that y follows a multivariate normal distribution. We refer such model as a multivariate normal order-statistics ( M V N O S ) model. Although the MVNOS model is a reasonable approach in modeling ranking data, it is not popularly used. The main reason is that estimation of parameters using the classical maximum likelihood (ML) approach is not practically feasible. The exact calculation of the likelihood function needs high-dimensional numerical integration which is too computationally involved for practical applications, especially lbr k being large. Hence, most applications of the MVNOS model are related to the ranking of a few items. For instance, Thurstone (1927) suggested using the MVNOS model with some special covariance structt~es to analyze the paired comparison data (i.e. k = 2). Dansie (1986) used this model to investigate the preference for k = 4 cars. To the best of my knowledge, no application on analyzing ranking data for k >_ 5 items have been found in the literature. One major breakthrough in overcoming the computational complexity for the analysis of ranking data is to adopt a Bayesian approach and use the Monte Carlo Markov Chain (MCMC) methods (See Besag, Green, Higdon & Mengersen, 1995; Tierney, 1994; and the references therein). These powerful techniques have been demonstrated to be extremely useful in fitting the BTL model (Koop & Poirier, 1994). The aim of this paper is to apply such technique to estimate the parameters of the MVNOS model. We will see later that the proposed estimation method is generally applicable for the ranking of a batch of items of any size. This paper is organized as follows. Section 2 defines formally the MVNOS model. Covariates can be incorporated into the model, and the identifiability of the parameters will also be considered. Section 3 shows how to apply [he M C M C technique to draw sample from the joint posterior distribution of the unknown parameters of the new model. Section 4 introduces PHILIP L•H• YU 283 some methods to assess the adequacy of model fit. Section 5 constitutes a simulation study to demonstrate the usefulness of the M C M C technique• The methodology is applied to analyze the data for the 1980 American Psychological Association presidential election in section 6. Section 7 discusses the robustness of the proposed model• Some concluding remarks are given in section 8. 2• The Multivariate Normal Order-Statistics (MVNOS) Models Suppose a random sample of n judges is asked to rank k items• For j = 1 . . . . . n, let ~'j = (rcij . . . . . rckj) be the ranking of the k items assigned by judge j and (i1 . . . . . ik) be the ordering of the k items corresponding to the ranking ~'j. The multivariate normal order-statistics (MVNOS) model assigns the ranking ~'j with the following probability: P('Trj) = P(Yil,j < Yi2,j < " ' ' < Yik,j), i = 1 ..... k, j = 1 ..... n, (2) where the utility vector yj = (Ylj . . . . . Ykj)1 of judge j satisfies yj = ~j (3) + ej with p j = ( # i j . . . . . #kj)1 representing the mean utility vector for judge j and iid ej ~ N(0, V). (4) Note that in (4), the diagonal elements of V, vrr, measure the dispersion of the latent utilities Yrj, for all j while its off-diagonals, Vrs (r ~ s), measure the association between the latent utilities Yrj and Ysj, for all j. For example if v12 > 0, it is more likely that items 1 and 2 are close in their rankings; otherwise, it is more likely that their rankings are far apart. 2.1. The M V N O S Model with Covariates When some covariates are associated with the judges and items, it is natural to impose the following linear model for p j : p j = Z j~3, (5) j = 1 . . . . . n, where Zj is a k x p matrix of covariates associated with judge j and 13 is a p x 1 vector of unknown parameters. For example in a marketing survey, respondents are asked to rank products according to their preference. Usually, apart from the rankings given by the respondents, some socio-economic variables (s j) about the respondents and the attributes (ai) of the products are also available. Then one may study the heterogeneity of the preference due to these variables by assuming the following model #ij = a l i Y + s } ~ i , i = 1 . . . . . k, j = (6) 1 . . . . . n. The parameter vector y represents the attribute effect common to all the respondents while the vector 6i represents the respondents' socio-economic background which may affect their preference for item i. It is easily seen that (6) is a particular case of model (5) when as0 i/ 0 Zj = • • \al o •• and 13 = Ill • k Critchlow and Fligner (1993) have considered the normal order-statistics model with item covariates• In this paper, we generalize their model to include the judges' covariates• 284 PSYCHOMETRIKA 2.2. The Likelihood Function o f the M V N O S M o d e l Note that one can add an arbitrary constant (location shift) or multiply a positive constant (scale shift) to both sides of (3) while leaving the ranking probability in (2) unchanged. The location-shift problem is commonly dealt with by subtracting the last element of (3) from the other k - 1 elements. Therefore, after substituting (5) into (3), (3) and (4) become w j = Xj[3 + e j (7) j = l . . . . . n, lid e j "--" N(O, X,), (8) wij = Y i j - y k j , X r i j = Z r i j - Z r k j , e i j = e i j - - e k j a n d X = [Ik-1, --lk-1]V[Ik-1, --lk_l] ~. Here, I denotes an identity matrix and 1 denotes a vector of l's. Then the ranking ~j with respective ordering (il . . . . . ik) corresponds to the event, where = {wj : Wil,j Ej < "" " < tt)ir__l, j < 0 < W i r + l , j < • "" < LOi~,j } whenever ir = k. (9) For the sake of simplicity, we use the convention l l ) i o , j = --00 and W i k + l , j = +OG. Notice that the scale-shift problem still exists in the model given by (7), (8) and (9). The usual way to tackle this problem is to put a constraint on Z such as all = 1. Since the judges are drawn randomly, the likelihood function of (/3, X) is given by the expression (10) L(~, X) = l | P ( E j ) . j=l Unless the number of items (k) is small, a direct maximization of the above likelihood function is computational demanding and numerically unstable because the evaluation of the likelihood function by quadrature of a k - 1-dimensional integral is not accurate, and there is no close form solution. Hence, most applications of the MVNOS model are related to the ranking of a few number of items. For instance, Dansie (1986) used this model to investigate the preference for k = 4 cars. 2.3. Identifiabili~., o f M o d e l Parameters Because rankings of items only depend on utility differences,/~ and X (with 111 fixed) are estimable, but the original parameters/xj and V cannot be fully identified. For example, as given by one of the referees, suppose k = 3 and/~j = tx, j = 1, 2 . . . . . n. Then the following three sets of parameters under the MVNOS model lead to the same ranking probabilities: Set A: Set B: iua = (1, 0 , - 1 ) ~ (~ VA = 0.2 0 1 0.8 /XB = (--1,--2,--3) ~ i12 ) 8 (:.4 VB = \0.4 0 i14 ) 1.6 6 1.6 Set C: /xC = (1, 0,--1) t VC = (0.756--0.444--0.311~ --0.444 0.356 0.089] --0.311 0.089 0.222] This is because they all end up with the same joint distribution of the utility differences Yl - Y3 and Y2 - Y3 : N(EI El0 0°4]) PHILIP L.H. YU 285 Generally speaking, the parameter 13 can be identified in the presence of covariates Xj. However, when there are no covariates, i.e., ~ j = ~ , the values of the # i ' s will be determined only within a location shift. This indeterminacy can be eliminated by imposing one constraint on the # i ' s , say, #k = 0. The maj or identification problem is due to indeterminacy of the variance-covariance matrix V of the utilities. Owing to the fact that the utilities Y i j , i = 1 , . . . , k, are invariant under any scale shift of V and any transformation of V of the form V > V+cl~+lkc I, (11) for any constant vector c (Arbuckle & Nugent, 1973), V can never be identified unless it is structured. In the previous example, it can be seen that VA can be transformed to VB and Vc by applying (11) with c = ( - 0 . 3 , 0.3, 0.5) I and c = ( - 0 . 1 2 2 , - 0 . 3 2 2 , - 0 . 3 8 9 ) I, respectively. This identifiability problem is well-known in the context of Thurstone's order statistics models and multinomial probit models (Arbuckle & Nugent, 1973; Bunch, 1991; Dansie, 1985; and Yai et al., 1997). Various solutions by imposing constraints to the covariance matrix V has been proposed in the literature and are summarized in Table 1. Among the existing methods given in Table 1, Chintagunta's method provides the most flexible form for V. However, it assumes that the utilities must be correlated and this is not always true. In this paper, we suggest a new approach as shown in the last row of Table 1. Its advantages over the existing methods are that it always produces a nonsingular matrix which includes the identity matrix (after a scale-shift) as a special case. In addition, it is easy to show that this nonsingular matrix is an invariant transformation of the matrix used by Chintagunta (1992) under the transformation (11) with c = ~1l k . See the application described in section 6 for comparison of the two methods. In the next section, we shall demonstrate how to apply the Bayesian method and the MCMC techniques in estimating the parameters of the MVNOS model. T A B L E 1. Various Solutions for Identifying V Example of reference Constraints on V Is V full rank? Conditions under which the method can be used Albright, Lerman and Manski (1977) Set the utility Ykj = 0, for all j, i.e. Vik = Vki = O, (i = 1 , . . . , k), and Vll = 1 No item k is the universal benchmark and thus its utility is uncorrelated with other utilities Dansie (1986) Set vii = 1(i = 1, . . . , k), and v12 = 0 Yes similar variation of utilities, and utilities of items 1 and 2 are uncorrelated Chintagunta (1992) Set o11 = 1 and each column of V sum to zero. Thus V = B - I](BI) - , where B = [Ik_ 1, --lk_l] No flexible form but the utilities must be correlated Our approach Set o~11 z 1 and each column of V sum Yes flexible form where A = FIk_l -1 -1] 1t Lk-1 286 PSYCHOMETRIKA 3. Bayesian Analysis of the MVNOS Model 3.1. Prior Distribution In a Bayesian approach, the first step is to specify the prior distributions of the identified parameters. As mentioned in section 2, one constraint on 2£ could be imposed in order to fix the scale and hence to identify all the parameters. However when this constraint is added, the usual Wishart prior distribution for the constrained 1i; could not be used and thus the problem becomes nonstandard. In this paper, we follow similar lines as in McCulloch and Rossi (1994). Let f(/3, ~ ) denote the joint prior distribution of (t3, ~). Then the posterior density of (/3, 1i;) is f(/3, where I I = {,IT 1 . . . . . ,'/'i';q }. xln) <x L(/3, ~ ) f ( / 3 , ~), (12) We use a normal prior on/3, /3 ~ N(/3o, AO1), and an independent Wishart prior on G ~_ ~ - 1, G = ~ - 1 ~,, Wk_l(O!, p). Note that our parameterization of the Wishart distribution is such that E(li; -1) = a P -1 . It is clear that (12) is very intractable for Bayesian calculations as it involves the complicated likelihood defined by (10). This problem can easily be solved by using Gibbs sampling with data augmentation (Tanner & Wong, 1987). We augment tile parameter (/3, ~) by the latent variable W = ( w i . . . . . wn). Now, the joint posterior density of (/3, ]£, W) is f(/3, ~, WlII) oc f ( I I I W ) f ( w l / 3 , E ) f ( / 3 , ~), (13) which allows us to sample from the full conditional posterior distributions. The details will be given in the next section. 3.2. Gibbs Sampling Algorithm for the MVNOS Model The Gibbs sampling algorithm for the MVNOS model, consisting of drawing samples consecutively from the full conditional posterior distributions, is as follows: 1. draw wij from f ( m i j lW-i,j, /3, ~, II), for i = 1 . . . . . k - 1, j = 1 . . . . . n, 2. draw/3 from f(/31X, W, II) c( f(/3t1£, W) and 3. draw X from f(Xl/3, W, 11) cx f(Xl/3, W), where w-i,j is wj with Wij deleted. To write down the above densities explicitly, we follow closely the notation used in McCulloch and Rossi (1994). Let xlj X_i, j g-i,i be the i-th row of X j, be Xj with the i-th row deleted, be the i-th column of G with gii deleted. Suppose (il . . . . . ik) is the ordering of items corresponding to their ranks ~'j = (zqj . . . . . rckj). Then rctj = r if and only if ir = t. In Step 1, we have, wijlw-i,j, /3, ]~, II ~.~ N (mij, v~j) subjecting to wit lj < ll)ij < ~l)ir+lj whenever rcij = r, (14) PHILIP L.tt. YU 287 ! where mij = xij[,I - g~il~_i,i ( w - i , j - X - i , j l J ) and r2~j = g-lii" TO simulate from this truncated univariate normal distribution, we can adopt the inverse CDF method (see Devroye, 1986 for the details of data generation). Now we go to Steps 2 and 3. Since we are conditioning on W, the MVNOS model is simply a standard Bayesian linear model set-up (Box & Tiao, 1973). Therefore, the full conditional distribution of/1 is (15) ~I~,W~Np(~I,AT1), where A 1 = Ao q- ~ = 1 and E1 = A71(AoI~o + ~ = 1 XjX-Ixj X S ~ - l w j ) " Finally, the full conditional distribution of £ is such that li; = G -1 with GI~, W ~ Wk-1 ( ce + n, P + (wj j=l -- X~/I)(wj - X}/~)' ) . (16) To simulate a sample from this Wishart distribution, we employ a simple transformation method suggested by Johnson (1987, p. 204). It is found that this method is efficient especially when the sample size is large (see also Anderson, 1984, p. 245-247). In practice, with a starting value for (13, ~, W), the Gibbs sampling of drawing successively from the full conditional distributions is iterated for M + N times. The first M burn-in iterations are discarded. Because the iterates in the Gibbs sample are autoconelated, we keep every s-th draw in the last N iterates so that the resulting sample contains approximately N / s independent draws from the joint posterior distribution. The value s ihere can be determined based on the graph of sample autocorrelation of the Gibbs iterates. It is rather simple to choose a starting value for (/3, li;). A natural choice is to use (0, I). However, it is nonstandard to find a starting value l~)r W. In this paper, we adopt the approach which is motivated by the fact that the ranking of {w U, . . . , W k - l , j , 0} must be set to be consistent with the observed ranking {rqj, . . . , rckj }. Using this fact, a simple choice for the starting 1)/12)1/2, a type of standardized rank score. value of the w's is to use wij = (Yrij -- ~kj)/((]£2 Finally, we go back to the scale-shift problem mentioned in section 3.1. If a diffuse prior of the parameters is used in the Gibbs sampling algorithm, it can be seen that the full posterior distributions stated in (14), (15) and (16) will be improper. As a result, the Gibbs sequence of draws will follow a random walk and will not converge. Hence, a proper prior should be used but we could still set the proper prior rather diffuse. It should be remarked that Thurstone's normal order statistics model belongs to MVNOS model with V = Ik, its parameters can be estimated by fixing V to Ik or equivalently, fixing 1~ to Ik_ 1 -}- lk_ 11~_ 1 and skipping the step of generating 1i; in the above Gibbs sampling algorithm. _ 4. Adequacy of the Model Fit We now consider the problem of assessing the goodness-of-fit of the model. One may consider all possible rankings (k! cases) as the events in a multinomial distribution (see Cohen & Mallows, 1983) and then compute the likelihood-ratio G2 or Pearson X 2 statistics using the observed and expected frequencies of the k! rankings to assess the goodness of fit. However, it is quite often that we may encounter many empty multinomial cells, especially when k is large and the sample size is comparatively small. This problem is commonly dealt with by grouping the k! rankings into a small number of meaningful subgroups and examining the fit for each subgroup. In particular, let ni be the observed frequency that item i is ranked as the top item. Also let Pi = P (Yi > Yl . . . . . Yi-1, Yi+l, • " , ykl[I, ~ ) be the estimated probability of ranking item i as first under the fitted MVNOS model with posterior mean estimates l~ and ~. The fit can be examined by comparing the observed frequency rti with the expected frequency r//0i, i = 1, 2 . . . . . k, 288 PSYCHOMETRIKA or by calculating the standardized residuals: n i - - y//O i ri = (n/3i(1 _/}i))1/2' j = 1,2 . . . . . k. In practice, the above assessment of model adequacy is more applicable when there are no covariates. Otherwise, most of the observed frequencies would be either 1 or 0 and hence large G 2, )~2 and standardized residuals would usually be obtained even under a valid model. One complication of facilitating the use of the above method to assess goodness-of-fit of a MVNOS model is the difficulty of evaluating numerically the estimated probability/?i which may be expressed as • .. oo ~(vl/3*, :~*) dv, oo where v = (Yl - Yi . . . . . Yi-1 - Yi, Yi+I - Y i , . . . , Yk - Yi) ~ N(/I*, Ii;*), and /1" and £* can be obtained from /} and I~ respectively. Instead of computing this integral by highdimensional numerical quadrature, we employ a reliable Monte Carlo simulation approach called Geweke( 1992)-Haj ivassiliou( 1993)-Keane(1994) (GHK) method (See also Haj ivassiliou, McFadden, & Ruud, 1996). In later application, 10,000 replications are used to obtain the G H K simulator for/3i. 5. Simulation Study We now apply the method described in the previous sections to a simulated data set containing n = 2000 rankings of k = 6 items with a covariate Z, which are generated from the following model: for j = 1 . . . . . 2000, the j-th ranking = the ranking of {w U, "t.O2j, 11)3j, W4j, W5j, 0} with wij = - 2 X i j q- F4j, i = 1, 2, 3, 4, 5 where Xij = Zij - Z6j with Zij simulated from the uniform distribution over ( - 1 , 1) and ej = (gU, g2j, g3j, g4j, gsj) ~ is simulated from a multivariate normal distribution with zero mean vector and covariance matrix, /i o o o = 4'2 0 0 0 0 x/7 0 0 0 0 ~ 0 05 05 05 0://i o o o i/ (01.5 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 1 0.5 0.5 05 0 x/~ 0 0 0 0 ~ 0 0 0 0 ,/-4 0 . Clearly, estimation based on maximum likelihood method is time-consuming and practically impossible. Here, we consider the Bayesian method. Although we must use the proper priors for fi and 1i; as explained in section 3.2, we choose a rather diffuse but proper prior: fi ~ N(fio = 0, A o I = 100) and li~-1 ~ Wk_l(Ce = k + 1), P = (k + 1)I. The first 1000 burn-in iterations were discarded. A further 10,000 iterations are taken to be a sample from the joint posterior distribution of the unknown parameters as well as the latent utilities tOij 'S. Figure 1 shows the sample autocorrelations (SACFs) of [4/~x/7~, a22/all, a55/all, P14, P35 based on the full sample of 10,000 iterates and a reduced sample by keeping every 20th draw from the full sample. The substantial reduction of SACFs for the reduced sample over the full sample indicates that the reduced sample gives approximately independent draws from the posterior distribution. Table 2 shows the posterior means, standard deviations, and 90% posterior intervals of the parameters based on the reduced sample. The posterior intervals are constructed from the 0.05 and 0.95 percentiles of the Gibbs samples. PHILIP L.H. YU 289 reduced sample: beta/sqrt(s11) full sample: beta/sqrt(sl 1 ) :lL,r . / 1'0 1'5 Lag 2'0 2'5 3'0 6 . . . . . . . . . . • ''.''11, 5 'I 15 Lag 10 J LLIJ ]-I LllJ-1-~4~+,-~,,~, .,., _ _ ................................... 1'0 1'5 Lag 2'0 2'5 2'0 • ~$ 1'0 1'5 Lag 2'5 3'0 ........... 2'0 2'5 ot 3'0 I]_I.LLIJJ_._,. SACFs of/3/a~'~, 10 15 Lag 30 3'0 5 l , 1'0 ' I 15 Lag .... 2'0 II ..r; . . . . . 7 . . . . . . . - ii i , 1'0 ~.-,-ff-: 5-F."'. ..... i 1'5 Lag 2'0 '' 2'5 i 30 < ......................... 5 ~: r35 ................................... 0 ~ 25 . . 7 - - . - - . - ;--,-,-r . . . . , - - , - , " . " , " , ' - . - , "' '1 i i I ' ~ . . . . 1'0 1'5 2'0 2'5 Lag ' r35 o[ '[~ 20 r14 --1-~144-1-~444-~ .................................. 6 . !, 30 • r14 o I . s55/s11 oi 1'5 Lag . 2'5 6 3'0 s55/s11 1'0 . s22/s11 s22/s11 o I . 20 25 30 o [ ", 6 -7:--. . . . . . . - F . F - . . . r r . . . ~ . . ( . . i - - - F p ] i.. 5 i, 1'0 1'5 Lag 2.0 i' 2'5 3'0 F I G U R E 1. 0-22/0-11 , o-55/o-11, P14 a n d P35 u s i n g t h e full a n d r e d u c e d G i b b s s a m p l e s f o r t h e s i m u l a t e d dataset. 290 PSYCHOMETRIKA TABLE2. Results fi'om the Simulated Data Set posterior moments parameter true value mean std. dev. fi/~/-d~ -2 2 3 4 5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 -2.015 2.009 3.050 4.035 5.025 0.514 0.501 0.491 0.526 0.492 0.477 0.499 0.515 0.517 0,502 0.056 0.126 0.183 0.255 0.308 0.028 0.028 0.028 0.026 0.024 0.027 0.025 0.023 0.023 0.023 o-22/o-11 °-33/o-11 o-44/o-11 o-55/o-11 P12 P13 /)14 P15 /)23 p24 P25 P34 P35 P45 90% interval (-2.107, (1.801, (2.749, (3.616, (4.519, (0.468, (0.456, (0.444, (0.483, (0.453, (0.433, (0.458, (0.478, (0.478, (0.464, - 1.922) 2.217) 3.350) 4.454) 5.532) 0.559) 0.546) 0.537) 0.569) 0.532) 0.521) 0.540) 0.553) 0.556) 0.540) It can be seen from Table 2 that all posterior means are very close to their corresponding true parameter values and all the posterior 90% intervals cover the true parameter values. This indicates that the posterior means are good estimators for the parameters in the MVNOS model. Figure 2 displays tile traces of the Gibbs sequences and the Gaussian kernel density estimates (see Venables & Ripley, 1994, p. 136) of the posterior densities for f l / x / - ~ , o-22/o-11, o-55/o-11, P14, P35. The traces of the Gibbs sequences are structureless, implying that the Gibbs iteration is converged. Moreover, some posterior densities are slightly asymmetric, indicating that asymptotic normality may not be satisfied for the maximum likelihood estimation in this situation. 6. Analysis of the APA Election Data In 1980, the American Psychological Association conducted an election at which five candidates (A, B, C, D and E) were running lbr president and voters were asked to rank order all of the candidates. Among those voters, 5738 gave complete rankings. These complete rankings are considered here. The data are given in Diaconis (1988). Suppose that higher rank implies more favorable. Then the average ranks received by candidates A, B, C, D and E are 3.16, 2.84, 3.08, 2.91 and 3.01 respectively. This means that voters prefer candidate A the most, candidate C the second, and then candidates E, D and B. However, in order to make inferences on the preference of the voters, modeling of the ranking data is needed. Let Yij be the jth voter's utility of selecting candidate i, i = A, B, C, D, E. Therefore we consider the MVNOS model in which (i) the jth voter's ranking is assumed to be formed by the relative ordering of YAj, YBj, YCj, YDj, YEj ; and (ii) the y's satisfy the following model: Yij = #i + eij, i = A, B, C, D, E, j = 1 . . . . ,5738, (eaj, eBj, ecj, eDj, e L'j)i ~ N(O, V), or equivalently, the model could be formed by the relative ordering of WAj, WBj, WCj, WDj, O, and the w's satisfy PHILIP L.H. YU 291 (a) trace of beta/sqrt(sl 1) g il :t (b) kernel density of beta/sqrt(sl 1) o 0 o 1O0 200 300 400 500 il -2.2 260 360 460 560 c:5 , 160 260 360 ,, . 1.6 -1.9 -1.8 1.8 .... 2.0 212 274 2.6 (b) kernel density of s55/s11 (a) trace of s55/s1 1 0 -2.0 (b) kernel density of s22/s11 (a) trace of s22/s11 16o -2,1 460 560 ii 4.0 4,5 5.0 5.5 6.0 615 (b) kernel density of r14 (a) trace of r14 04 0 t60 260 360 460 0,35 500 (a) trace of r35 0 1O0 200 300 0.45 0.55 0.65 (b) kernel dens{ty of r35 400 500 o~o FIGURE 2, (a) Traces o f the Gibbs sequences, and (b) Posterior densities for f i / ~ , lated dataset. The dotted line indicates the position of the true value, o~5 o 2 2 / o 11, 0.50 °55/°11, o.~5 ,°14 and eko ,035 for the simu- 292 PSYCttOMETRIKA tt~ij [ = A, B, C, D, j = 1 . . . . . 5738, = fli @ e l j , iid (~'Aj, ~:Bj, ~'Cj, eDj) t ~ N(O, 1~), where fii = I~i - I~f and ~ = ( o i j ) with ~ij = IJij -}- V E E - - V i E - - ViE. Similar to the situation in the simulation study, classical maximum likelihood estimation method would not work well to these data. l ] m s , we apply the proposed method to estimate the parameters i[1 and 1£ of the M V N O S model. Using the priors specified in the simulation study and every 20th draw in the last 10,000 of 11,000 Gibbs iterations, we obtain the Gibbs sequences for/3 and ]£. By imposing the constraint I*E = 0 and our constraint for V proposed in section 2.3 to the Gibbs sequences, estimates for l+i(i = A, B, C, D, E ) and vij , (i < j ) can be obtained. 6.1. A d e q u a c y o f M o d e l F~t and M o d e l Comparison To examine the goodness-of-fit of the M V N O S model, Table 3 shows the observed proportions and estimated partial probabilities under the M V N O S model. The results for Thurstone's normal order statistics model and Stern's mixture of BTL models are also listed in Table 3 as alternatives to the M V N O S model. Thurstone's model is fitted by repeating the Gibbs sampling with V fixed at Ik while the parameter estimates of Stern's mixture models can be found in Stern (1993). Stern found that the data seem to be a mixture of 2 to 3 groups of voters. This feature is also supported by Diaconis (1989)'s spectral analysis and McCullagh (1993)'s model of inversions. As seen from Table 3, the estimated first-choice probabilities for the M V N O S model match the observed proportions quite well. The absolute standardized residuals for the M V N O S model are all fairly small ( < 2) but not for the other models. This indicates that the M V N O S model provides a good fit to the first-choice probabilities. On the other hand, since no sparse cells are encountered in the data, we can compute the likelihood ratio G 2 and Pearson X 2 statistics and their values for the four models are shown in Table 3. We found that all four models do not fit the data very well. As mentioned by Stern (1993), it is difficult to draw conclusions about appropriate models for the APA election data because the large sample size could produce a great change in the likelihood even though there is a minor improvement in the model. TABLE3. Observed Proportions and Estimated Probabilitiesthat a Candidate Is Ranked as First under Various Models for the APA Election Data (tlw value in bracket is the residual, ri) Estimated probabilities, 1)i, under the following models Candidate Observed proportion, ni/n A B C D E 0.184 0.135 0.280 0.204 0.197 Likelihood ratio G 2 Pearson ){2 No. of identified paraxneters MVNOS model 0.193 0.130 0.276 0.198 0.200 (-1.87) ( 1.17) ( 0.70) ( 1.25) (-0.58) Thurstone's model 0.170 0.231 0.179 0.220 0.200 ( 2.78) (-17.24) ( 20.12) ( -2.93) ( -0.65) Stern's mixture model 2 components 3 components 0.199 0.153 0.276 0.186 0.186 (-2.89) (-3.84) ( 0.80) ( 3.47) ( 2.12) 0.189 0.155 0.272 0.192 0.189 (-1.16) (-4.27) ( 1.19) ( 2.28) ( 1.41) 334.13 348.13 1589.47 1941.87 453.70 485.09 325.63 339.90 13 4 9 14 293 P H I L I P L.tt. YU TABLE 4. Parameter Estimates of the MVNOS Model for the APA Election Data posterior moments parameter mean std. dev. 14A t2B PC 0.086 --0.071 0.067 --0.048 0.524 0.116 0.246 0.041 0.074 0.498 0.087 0.178 0.121 0.833 --0.123 --0.043 0.679 0.224 0.624 0.015 0.014 0.018 0.014 0.008 0.006 0.008 0.008 0.004 0.011 0.009 0.007 0.007 0.024 0.014 0.010 0.018 0.008 0.008 #D VAA VAB VAC VAD VAE VBB VBC VBD VBE vCC vCD VCE VDD VDE VEE 90% interval ( 0.018, (--0.097, ( 0.037, (--0.071, ( 0.511, ( 0.106, ( 0.233, ( 0.027, ( 0.067, ( 0.479, ( 0.072, ( 0.166, ( 0.109, ( 0.795, (--0.146, (--0.060, ( 0.651, ( 0.212, ( 0.610, 0.111) --0.048) 0.095) --0.026) 0.537) 0.126) 0.257) 0.053) 0.081) 0.516) 0.100) 0.191) 0.132) 0.870) --0.101) --0.026) 0.708) 0.239) 0.638) 6.2. Interpretation o f the t;i'tted M V N O S M o d e l Table 4 tabulates the posterior means, standard deviations and 90% posterior intervals for the parameters of the MVNOS model. It is not surprising to see that the ordering of the posterior means of the/hi's are the same as that of the average ranks. Apart from the posterior means, the Gibbs samples can also provide other information such as the estimated probability that candidate i is more favorable than candidate j. For instance, the probability that candidate A is more favorable than candidate C is estimated by the sample mean of qb ( ~A--~C ~ in the \ ~/VAA-}-VCC_2vAC it Gibbs samples, which is found to be 0.509 (posterior standard deviation = 0.006). According to the boxplots of tzi, vii and rij -~ t ) i j / ~ ( i • j ) shown in Figure 3, distributions of some parameters are slightly skewed. In addition, a large estimate of v c c indicates that voters have fairly large variation on the preference of candidate C. To further investigate the structure of the covariance matrix V, a principal components analysis of the posterior mean estimate for V is performed and the result is presented in ~I~ble 5. Using properties of principal components and the fact: that a2 = 0~-.-~.215,the utilities of the five candidates {A, B, C, D, E} can be characterized by r o.o'61 1-o.o71/ Yc = l 0.067 / L°°4;j + 1.~--.~alZl -I- 0~.215z2 + ~a3z3 + 0.~/~-~a4z4 -I- ~ a 5 z 5 , (17) where the z's are independently and identically distributed as N(0, 1) and the principal components a's are given in Table 5. Because rankings of items only depend on utility differences, the term x/'-0-Z15z2 in (17) does not affect the rankings and hence, interpretation would be based on the remaining four components. 294 PSYCttOMETRIKA (a) I q I o o ,r- o, muA muB muC muD (b) (33 O == =t= c5 J w o vAA vBB vCC vDD vEE (c) ~5 caa ~ caa o 04 o,' rAB rAg rAD rAE rBC rBD rBE rCD rCE rDE FIGURE 3. 7k j) for the APA election data. Boxplots of #i, vii and rij = v i j / ~ ( i TAt~I,E 5. Principal Components Analysis of the Posterior Mean Estimate for V Principal component, a i Candidate 1 A B C D E 0.245 -0.087 0.726 -0.524 -0.361 1.015 Variance 2 3 4 5 0.447 0.447 0.447 0.447 0.447 0.046 -0.412 0.012 -0.442 0.796 0.789 0.133 -0.515 -0.281 -0.125 -0.340 0.778 -0.081 -0.502 0.145 1.000 0.440 0.357 0.346 PHILIP L.H. YU 295 Component 1 separates two groups of candidates: {A, C} and {D, E}, implying that there are two groups of voters: voters who prefer candidates A and C more, and those who prefer candidates D and E more. Component 3 contrasts candidate E with candidates B and D, indicating that voters either prefer B and D to E, or prefer E to B and D. For instance, if voters like B, they prefer D to E. Finally, components 4 and 5 indicate a contrast between A and C and a contrast between B and D, respectively. Based on the variances of the components, we can see that component 1 dominates and hence it plays a major role on ranking the five candidates. So far, the analysis is based on our proposed constraints on V stated in section 2.3. In fact, it can be shown that the same result can be obtained by using Chintagunta's method, except that the term x/-O-~.215z2 in (17), which does not affect the rankings, will not appear. However, it should be noticed that different parameterizations of V are essentially equivalent in the sense that they produce the same ranking proabilities but it is uncertain that they could lead to the same interpretation of the parameters and the principal components. Moreover, since the form of V is not identifiable, it is impossible to specify the parameterization of V on the basis of ranking data. Therefore, an appropriate parameterization is important for interpretation of the components and is usually selected subjectively. 7. Robustness of the M V N O S Model One way to address this issue is to study the sensitivity of the parameter estimates if an outlying ranking is added to the data. First of all, 5 Xi's are simulated in the same way as described in section 5. Then the outlying ranking constructed by reversing the order of { - 2 X 1 , - 2 X 2 , . . . , - 2 X 5 } is added to the simulated dataset created in section 5. We apply the same Gibbs sampling procedure to this dataset with an outlier. The posterior moments of the parameters given in Table 2 are then reproduced. We found that the changes in the posterior means are little, less than 0.1% of the true values of parameters in magnitude whereas the changes of the posterior standard deviations are small and mainly positive. Note that the M V N O S model mainly relies on the assumption of multivariate normal distribution of the latent variables y j , j = 1, . . . , n. Therefore, an alternative way to address the robustness of the model is to consider a more general distribution and look at the differences between the 2 sets of estimates. A well-known class of distribution which includes multivariate normal distribution as a special case is the multivariate-t distribution. Z 1. Multivariate-t Latent Distribution and the MVNOS(v) Model If the random vector yj of judge j is multivariate-t distributed with location vector ~ j , scale matrix V and degrees of freedom v, its probability density function is given by f(yj) = ( 1 F ( v + k)/2 F(v/2)(vzg)k/21]~1-1/2 l q - v ( y j --~j)t]~-l(yj _~j))-(v+k)/2. Note that in the case of k = 2, this model becomes the one for binary data proposed by Albert and Chib (1993) in which a student-t link function is used. When the d.f. v tends to infinity, this model becomes the M V N O S model. Therefore, this model allows us to study the sensitivity of the parameter estimates when there are extreme utilities. Note that it is sufficient to assume a known d . f . v . With the multivariate-t distribution replacing the multivariate normal distribution, it is not difficult to modify the Gibbs sampling algorithm because the multivariate-t distribution can be treated as a scale-mixture of multivariate normal distribution (see Andrews & Mallows, 1974; and Carlin & Polson, 1991). More specifically, the vector yj have the following representation: 296 PSYCHOMETRIKA yj = ttj + ej, where and iid ejlLj ~ N(O, LjV), v/Lj ~ X2(v). (18) (19) (20) After conditioning on ,~j, the distribution of the utility vector yj follows a multivariate normal distribution. Hence, we shall denote this model as the MVNOS(v) model. Z2. A Latent-Structure Interpretation of the MVNOS(v) Model With reference to the rationale of generating rankings from the MVNOS model as stated in (3) and (4), the utilities given by the judges are assumed to be measured on the same scale. However, this is not true in general. The MVNOS(v) model provides a solution to this problem. By viewing )..j in (19) as a level of scale employed by judge j , the MVNOS(v) model extends the MVNOS model to provide a latent structure for describing the scale-heterogeneity of the judges. In other words, the ranking Rj given by judge j under the MVNOS(v) model can be generated by the following steps: 1. Draw randomly from the distribution (20) a level of scale, denoted by )~j; 2. Draw independently a utility vector yj from N(tti, ~.jV); and 3. Order the utilities YU, • • •, Ykj to produce the ranking R j . Notice that the scales )~'s are unobserved and assumed to be randomly sampled from the inverse chi-square distribution with v degrees of freedom. The larger the v, the smaller the variation of the scales given by the judges is. 7.3. Gibbs Sampling Algorithm fi)r the MVNOS(v) Model Following the Bayesian approach to the MVNOS model, we apply the w-transformation to the yj's. The Gibbs sampling algorithm for the MVNOS(v) model becomes very simple when we augment the vector (18, 1£, W) by the latent vector ~ = ()q . . . . . )~n)!. That means the Gibbs sampling algorithm consists of four main steps. Using the same priors defined in section 3.1 and the linear model for txj defined in (5), these four steps are summarized as follows: 1. Draw w i j t w _ i , j , 18, ~ , ~ , I I ~ N ( m i j , v 2 ) with Wir_lj < llJij < Wir+lj whenever ~ij = r, ! -1 where mij = xij18 - gii g~i,i(w-i,J - X-i,j18) and r/~ = )~jg~a 2. Draw 181Z, )~, W ~ Np(181, A~-I), where A1 = Ao + ~ j = l X}2f'-lxj/)~J and 181 = A71 (Ao18o+ 3. Draw ~; = G -1 such that GI18, ,~, W ~ Wk_l(C~ + n, P 4- Z j = I ( W j 4. Oraw),j118, l £ , W ~ ,G((v+k-i)/2, - X~18)(wj - X~18)'/)~j). v/2+(wj-X}18)']£-l(wj-X}18)/2), where IG(a, b) represents the inverse-gamma distribution with shape parameter a and scale parameter b. 7.4. APA Election Data--Revisited Consider the APA election data again. To evaluate the robustness of the MVNOS model, we examine six cases of degrees of freedom: v = 1, 2, 3, 5, 10, 20. Based on the same initial setting that we used in section 6, the Gibbs sequences of the unknown parameters are obtained from six PHILIP L.H. YU 297 TABLE6. Parameter Estimates of the MVNOS(v) Model for the APA Election Data Posterior mean (Posterior std. dev.) parameter v= 1 v=2 v=3 v=5 v=10 v=20 v=cxD #A 0.107 (0.018) --0.092 (0.017) 0.084 (0.021) --0.063 (0.016) 0.522 (0.008) 0.119 0.099 (0.017) --0.080 (0.016) 0.077 (0.020) --0.054 (0.015) 0.523 (0.008) 0.117 0.094 (0,016) --0.077 (0.016) 0.074 (0.020) --0.052 (0.016) 0.523 (0.008) 0.117 0.088 (0.015) --0.076 (0.015) 0.070 (0.019) --0.051 (0.014) 0.524 (0.008) 0.116 0.088 (0.015) --0.073 (0.014) 0.069 (0.019) --0.050 (0.014) 0,524 (0.007) 0.116 0.086 (0.015) --0.072 (0.014) 0.067 (0.018) --0.049 (0.014) 0,524 (0.008) 0.116 0.086 (0.015) --0.071 (0.014) 0.067 (0.018) --0.048 (0.014) 0,524 (0.008) 0.116 (0.007) (0.006) (0.007) (0.007) (0.006) (0.007) (0.006) 0.242 (0.008) 0.043 (0.008) 0.074 (0,005) 0.495 (0.012) 0.089 (0.009) 0.176 0.243 (0.009) 0.042 (0.008) 0.074 (0.004) 0.497 (0.011) 0.088 (0.009) 0.177 0.243 (0.009) 0.041 (0.008) 0.075 (0,005) 0.497 (0.011) 0.088 (0.009) 0.178 0.244 (0.008) 0.042 (0.007) 0,074 (0,004) 0.498 (0.011) 0.087 (0.009) 0.178 0,244 (0.008) 0.041 (0.008) 0,074 (0,005) 0.498 (0.012) 0.088 (0.009) 0.178 0,245 (0.008) 0.041 (0.007) 0,074 (0.004) 0.499 (0.011) 0.087 (0.009) 0.178 0,246 (0.008) 0.041 (0.008) 0,074 (0,004) 0,498 (0.011) 0.087 (0.009) 0,178 (0.007) (0.007) (0.007) (0.007) (0.008) (0.007) (0.007) 0,120 (0,008) 0.833 (0.023) --0.121 (0,014) --0.044 (0.010) 0.679 (0.018) 0.223 (0.008) 0.626 (0,009) 0,120 (0,007) 0.833 (0.023) --0.121 (0,013) --0.044 (0.010) 0.679 (0.017) 0.223 (0.008) 0.626 (0,008) 0,120 (0,007) 0.834 (0.024) --0.122 (0,014) --0.044 (0.010) 0.679 (0.018) 0.223 (0.008) 0.626 (0.009) 0,121 (0,007) 0.833 (0.025) --0.122 (0.014) --0.043 (0.010) 0.678 (0.017) 0.224 (0.008) 0.624 (0.009) 0,121 (0.007) 0.834 (0.024) --0.122 (0.014) --0.044 (0.010) 0.679 (0.018) 0.224 (0.009) 0.625 (0.009) 0,120 (0.007) 0.832 (0.023) --0.121 (0.014) --0,043 (0.010) 0.679 (0.018) 0.224 (0.008) 0.625 (0.009) 0,121 (0,007) 0,833 (0.024) --0.123 (0,014) --0,043 (0.010) 0.679 (0.018) 0.224 (0.008) 0.624 (0.008) I~B I~C /~D VAA VAB VAC VAD VAE VBB vBC VBD VBE VCC VCD VCE VDD VDE ~'EE The values in the last colunm axe extracted from Table 4. sampling chains driven by the same stream of random numbers. Table 6 shows that the posterior means and standard deviation of the parameters for various values of v. Generally speaking, the posterior means and standard deviations are fairly robust with respect to the degrees of freedom v, except a slow decay of the magnitudes of the posterior moments of the y s . Therefore, it is reasonable to say that the M V N O S model is fairly insensitive to an extreme utility or as a result an unusual ranking. 298 PSYCHOMETRIKA 8. Concluding Remarks This paper studied the order-statistics model for ranking data in which the latent utilities follow a multivariate normal distribution. The Bayesian approach via the Gibbs sampling technique is proposed for parameters estimation. The simulation study and the example given in sections 5 and 6 show that the proposed method is computationally efficient and more flexible than the maximum likelihood method in estimating the MVNOS model, especially in the situation where the number of items is moderately large, say > 5. In this paper, adequacy of model fit is assessed based on the comparisons between observed and expected frequencies of all possible rankings or a small number of non-overlapped subgroups of rankings such as subgroup of rankings in which item i is ranked as first (i = 1 , . . . , k). Of course, we might consider other choice of non-overlapped subgroups such as subgroup of rankings where items i and j are the top two, or subgroup of rankings where item i is the first and item j is the second. As we adopt a Bayesian approach for the estimation of the MVNOS model, it would be better to consider some Bayesian assessment methods of model fitness (See Albert & Chib, 1995; Gelman, Meng, & Stern, 1996; and Yao & B~3ckenholt, 1999 for some recent developments in this aspect). Further investigation is needed. It has been mentioned in section 2.3 that the parameters of a MVNOS model cannot be fully identified unless the variance-covariance matrix V is structured. In the context of multinomial probit model, some structured V are proposed in the literature, see for examples, Hausman and Wise (1978), Elrod and Keane (1993), and Yai, Iwakura, and Morichi (1997) and references therein. Further research on extending the MVNOS model to incorporate a structured V could be considered. All these interesting problems will be deferred to later papers. References Albert, J.H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88, 669-679. Albert, J.H. & Chib, S. (1995). Bayesian residual analysis for binary response regression models. Biometrika, 82, 747759. Anderson, T.W. (1984). An introduction to multivariate statistical analysis (2nd ed.). New York: John Wiley. Andrews, D.F., & Mallows, C.L. (1974). Scale mixtures of normality. Journal of Royal Statistical Society, Series B, 36, 99-102. Arbuckle, J., & Nugent, J.H. (1973). A general procedure for parameter estimation for the law of comparative judgement. British Journal of Mathematical and Statistical Psychology, 26, 240-260. Beggs, S., Cardell, S., & Hausman, J. (1981). Assessing the potential demand for electric cars. Journal of Econometrics, 17, 1-19. Besag, J. Green, R, Higdon, D., & Mengersen, K. (1995). Bayesian computation and stochastic systems. Statistical Science, 10, 3-66. B6ckenholt, U. (1993). Applications of Thurstonian models to ranking data. In M.A. Fligner & J.S. Verducci (Eds.), Probability models and statistical analyses for ranking data. New York: Springer-Verlag. Box, G.E.R, & Tiao, G.C. (1973). Bayesian inference in statistical analysis. New York: John Wiley. Bradley, R.A., & Terry, M.E. (1952). Rank analysis of incomplete block designs, I: The method of paired comparisons. Biometrika, 39, 324-345. Brook, D., & Upton, G.J.G. (1974). Biases in local government elections due to position on the ballot paper. Applied Statistics, 23,414-419. Bunch, D.S. (1991). Estimability in the multinomial probit model. Transportation Research, Series B, 25(1), 1-12. Carlin, B.E, & Polson, N.G. (1991). Inference for nonconjugate Bayesian models using the Gibbs sampler. Canadian Journal of Statistics, 19, 399-405. Chapman, R.G., & Staelin, R. (1982). Exploiting rank ordered choice set data within the stochastic utility model. Journal of Marketing Research, 19, 288-301. Chintagunta, EK. (1992). Estimating a multinomial probit model of brand choice using the method of simulated moments. Marketing Science, 11(4), 386-407. Cohen, A., & Mallows, C.L. (1983). Assessing goodness of fit of ranking models to data. The Statistician, 32,361-373. Critchlow, D.E., & Fligner, M.A. (1993). Ranking models with item covariates. In M.A. Fligner & J.S. Verducci (Eds.), Probability models and statistical analyses for ranking data (pp. 1-19). New York: Springer-Verlag. Critchlow, D.E., Fligner, M.A., & Verducci, J.S. (1991). Probability models on rankings. Journal of Mathematical Psychology, 35, 294-318. Daniels, H.E. (1950). Rank correlation and population models. Biometrika, 33, 129-135. Dansie, B.R. (1985). Parameter estimability in the multinomial probit model. Transportation Research, Series B, 19(6), 526-528. PHILIP L . H . YU 299 Dansie, B.R. (1986). Normal order statistics as permutation probability models. Applied Statistics, 35, 269-275. Devroye, L. (1986). Non-uniform random variate generation. New York: Springer-Verlag. Diaconis, E (1988). Group Representations in Probability and Statistics (IMS Lecture Notes, Volume 11). Hayward, CA: Institute of Mathematical Statistics. Diaconis, R (1989). A generalization of spectral analysis with application to ranked data. Annals of Statistics, 17, 949979. Elrod, T., & Keane, M.P. (1995). A factor-analytic probit model for representing the market structure in panel data. Journal of Marketing Research, XXXII, 1-16. Gelman, A., Meng, X.L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4), 733-760. Geweke, J. (1992). Efficient simulation from the multivariate normal and student-t distributions subject to linear constraints. Computer Science and Statistics: Proceedings of the Twenty-Third Symposium on the Interface (pp. 571578). Alexandria, VA: American Statistical Association. Hajivassiliou, V. (1993). Simulation estimation methods for limited dependent variable models. In G.S. Maddala, C.R. Rao, & H.D. Vinod 0Eds.), Handbook of Statistics (Econometrics) (Volume 11, pp. 519-543). Amersterdam: NorthHolland. Hajivassiliou, V., McFadden, D., & Ruud, E (1996). Simulation of multivariate normal rectangle probabilities and their derivatives: Theoretical and computational results. Journal of Econometrics, 72, 85-134. Hausman, J.A., & Wise, D.A. (1978). A conditional probit model for qualitative choice: Discrete decisions recognizing interdependence and heterogeneous preferences. Econometrica, 46(2), 403-426. Henery, R.J. (1981). Permutation probabilities as models for horse races. Journal of Royal Statistical Society, Series B, 43, 86-91. Henery, R.J. (1983). Permutation probabilities for gamma random variables. Joural of Applied Probability, 20, 822-834. Johnson, M.E. (1987). Multivariate statistical simulation. New York: John Wiley. Keane, M.E (1994). A computationally practical simulation estimator for panel data. Econometrica, 62, 95-116. Koop, G., & Poirier, D.J. (1994). Rank-ordered logit models: An empirical analysis of Ontario voter preferences. Journal of Applied Econometrics, 9, 369-388. Lo, V.S.Y., Bacon-Shone, J., & Busche, K. (1995). The application of ranking probability models to racetrack betting. Management Science, 41, 1048-1059. Luce, R.D. (1959). Individual choice behavior. New York: Wiley. Marden, J.L. (1995). Analyzing and modeling rank data. New York: Chapman and Hall. McCullagh, E (1993). Permutations and regression models. In M.A. Fligner & J.S. Verducci (Eds.), Probability models and statistical analyses for ranking data (pp. 196-215). New York: Springer-Verlag. McCulloch, R.E., & Rossi, EE. (1994). An exact likelihood analysis of the multinomial probit model. Journal of Econometrics, 64, 207-240. Mosteller, F. (1951). Remarks on the method of paired comparisons: I. The least squares solution assuming equal standard deviations and equal correlations. Psychometrika, 16, 3-9. Stern, H. (1990a). A continum of paired comparisons models. Biometrika, 77, 265-273. Stern, H. (1990b). Models for distributions on permutations. Journal of the American Statistical Association, 85, 558564. Stern, H. (1993). Probability models on rankings and the electoral process. In M.A Fligner & J.S. Verducci (Eds.), Probability models and statistical analyses for ranking data (pp. 173-195). New York: Springer-Verlag. Tallis, G.M., & Dansie, B.R. (1983). An alternative approach to the analysis of permutations. Applied Statistics, 32, 110-114. Tanner, T., & Wong, W. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82,528-549. Thurstone, L.L. (1927). A law of comparative judgement. Psychological Review, 34, 273-287. Tierney, L. (1994). Maxkov chains for exploring posterior distributions (with discussion and rejoinder). Annals of Statistics, 22, 1701-1762. Tversky, A. (1972). Elimination by aspects: a theory of choice. Psychological Review, 79, 281-299. Venables, W.N., & Ripley, B.D. (1994). Modern applied statistics with S-Plus. New York: Springer-Verlag. Yai, T., Iwakura, S., & Morichi, S. (1997). Multinomial probit with structured covariance for route choice behavior. Transportation Research, Series B, 31(3), 195-207. Yao, K.G., & B6ckenholt, U. (1999). Bayesian estimation of Thurstonian ranking models based on the Gibbs sampler. British Journal of Mathematical and Statistical Psychology, 52(1), 79-92. Yellot, J. (1977). The relationship between Luce's choice axiom, Thurstone's theory of comparative judgment, and the double exponential distribution. Journal of Mathematical Psychology, 15, 109-144. Manuscript received 25 SEP 1996 Final version received 17 MAY 1999