[go: up one dir, main page]

Academia.eduAcademia.edu

Bayesian estimation and stochastic model specification search for dynamic survival models

2011, Statistics and Computing

Stat Comput (2011) 21: 231–246 DOI 10.1007/s11222-009-9164-5 Bayesian estimation and stochastic model specification search for dynamic survival models Helga Wagner Received: 13 February 2009 / Accepted: 27 November 2009 / Published online: 9 December 2009 © Springer Science+Business Media, LLC 2009 Abstract Dynamic survival models are a useful extension of the popular Cox model as the effects of explanatory variables are allowed to change over time. In this paper a new auxiliary mixture sampler for Bayesian estimation of the model parameters is introduced. This sampler forms the basis of a model space MCMC method for stochastic model specification search in dynamic survival models, which involves selection of covariates to include in the model as well as specification of effects as time-varying or constant. The method is applied to two well-known data sets from the literature. Keywords Piecewise exponential model · Time-varying effects · Noncentered parameterization · Variable selection · Model choice · Cox model 1 Introduction For survival data the predominately used model to assess the effect of explanatory variables is the Cox model, where the hazard function is modelled as λ(t | z) = λ(t) · exp(z′ β). Here z is a vector of covariates and β is the vector of unknown parameters. The Cox model relies on the assumption that the hazard ratio for two individuals with covariate values z and z∗ respectively is constant over time and only de- H. Wagner () Department of Applied Statistics Johannes Kepler Universität Linz, Linz, Austria e-mail: helga.wagner@jku.at pends on the difference between their linear predictors   λ(t | z) = exp (z − z∗ )′ β . λ(t | z∗ ) This is an assumption which is not necessarily true in applications where the effect of covariates may vary over time, e.g. a certain treatment may have a positive short-term effect which vanishes in the long run. Thus models where the covariate effects are allowed to change over time are often more appropriate. A common approach to model timevarying effects is by piecewise constant functions, as these are flexible enough to capture any shape of the covariate effects, see Verweij and van Houwelingen (1995) for a frequentist and Gamerman (1991), Arjas and Gasbarra (1994), Sinha et al. (1999) and Ibrahim et al. (2001) for a Bayesian approach. In the dynamic survival model, introduced by Gamerman (1991) the baseline log-hazard as well as the effects of covariates are modeled by piecewise constant functions and a specification of the stochastic evolution over time. Whereas Gamerman (1991) used a random walk with process disturbances specified only up to their second moments, recently Hemming and Shaw (2002, 2005) considered the normal dynamic survival model with Gaussian process disturbances. Closely related to this model specification is the model of Hennerfeind et al. (2006) who use penalized splines for baseline log-hazard and covariate effects with a correlated normal prior process for the spline coefficients. Estimation of the dynamic survival model was accomplished in Gamerman (1991) using linear Bayes approximation. In a fully Bayesian approach posteriors for all parameters of normal dynamic survival models can be obtained by Monte Carlo Markov Chain (MCMC) methods which however require a Metropolis-Hastings-Algorithm, see Hemming and Shaw (2005). In this paper a new approach for 232 estimation of the normal dynamic survival model relying on data augmentation is proposed. This new auxiliary mixture sampler involves only draws from standard densities and needs no further tuning. By introducing two sequences of latent variables a representation of the normal dynamic survival model as a Gaussian state space model is obtained, where the multidimensional latent state vector consisting of baseline log-hazard and time-varying covariate effects can be sampled in one move. Auxiliary mixture sampling was introduced for Bayesian analysis of stochastic volatility models by Shephard (1994) and was applied in this context to different models by a couple of authors (Kim et al. 1998; Chib et al. 2002; Omori et al. 2007). Frühwirth-Schnatter and Wagner (2006a, 2006b) introduced auxiliary mixture sampling for Bayesian analysis of parameter-driven models for count data based on the Poisson distribution and Frühwirth-Schnatter and Frühwirth (2007) apply auxiliary mixture sampling to binary and multinomial logit models. Dynamic survival models are very flexible but model specification is a complex task as one has to decide not only which covariates to include in the final model but also whether the effect of a certain covariate is constant or varies over time. A classical model choice and variable selection strategy for similar models based on the conditional AIC was only recently implemented in Hofner et al. (2008). In a Bayesian setting model selection can be accomplished by model space MCMC methods, e.g. the reversible jump algorithm (Green 1995) or the stochastic variable selection approach (George and McCulloch 1993, 1997). Konrath et al. (2008) consider Bayesian regularisation together with a hard shrinkage rule for variable selection. The stochastic variable selection approach, usually applied for model selection in regression models, was recently extended to model selection in state space models by Frühwirth-Schnatter and Wagner (2010). Based on the data augmentation scheme introduced in this paper its implementation is feasible also for the normal dynamic survival model. Thus it is possible to start with a general model specification where all covariates are included with their effects specified as time-varying. By exploring the model space during MCMC covariate effects are identified as constant rather than time-varying or even as zero leading to a parsimonious model specification. The rest of the paper is organized as follows: Sect. 2 describes the model specification. Data augmentation by auxiliary variables and the resulting sampling scheme are discussed in Sect. 3. In Sect. 4 the noncentered parameterization of the model is introduced and variable selection is incorporated. The methods are applied on two data sets from the literature in Sect. 5. Finally Sect. 6 concludes by summarizing the results and discussing possible extensions. Stat Comput (2011) 21: 231–246 2 The normal dynamic survival model 2.1 Model specification As survival data usually are subject to right-censoring, we assume that each individual i, i = 1, . . . , n, has a survival time ti and a censoring time ci which are independent random variables. Observed data consist of the observation time yi = min(ti , ci ), a failure indicator di , taking the value 1 for complete and 0 for censored observations, and a vector of K covariates (zi1 , . . . , ziK ). Extending Cox’s proportional hazards model, not only the baseline hazard but also covariate effects are assumed to be functions of time, i.e.   K  λ(t|zi ) = exp β0 (t) + zik βk (t) . k=1 The dynamic survival model, as proposed by Gamerman (1991), is a piecewise exponential model for lifetimes, with correlated prior processes for the baseline log-hazard and the covariate effects. It is based on a given partition of the time axis S = {s0 = 0, s1 , . . . , sJ }, s0 < s1 < · · · < sJ . These division points form J intervals (s0 , s1 ], . . . , (sJ −1 , sJ ]. The baseline log-hazard β0 (t) as well as the covariate effects βk (t) are defined for k = 0, . . . , K via the piecewise constant functions βk (t) = βkj , for t ∈ Ij = (sj −1 , sj ]. To model stochastic evolution each βk (t) is assumed to follow a random walk. Whereas Gamerman (1991) specified process disturbances only up to their second moments, recently Hemming and Shaw (2002) considered the normal dynamic survival model with Gaussian random walks, βkj = βk,j −1 + ωkj , ωkj ∼ N (0, θk ). (1) The random walk priors are smoothness priors which penalize abrupt jumps of baseline log-hazard and covariate effects in subsequent intervals. Baseline log-hazard and timevarying effects can also be interpreted as a linear combination of J B-spline basis functions of degree zero with knots {s0 , s1 , . . . , sJ }, βk (t) = J  j =1 βkj I(sj −1 ,sj ] (t) and thus the normal dynamic survival model is a special case of the model considered in Hennerfeind et al. (2006). The division points {s0 , s1 , . . . , sJ } should be chosen fine enough to capture the shape of baseline hazard and timevarying effects. Usually sJ is taken to be the last observed failure or censoring time. Hemming and Shaw (2002) use Stat Comput (2011) 21: 231–246 233 equally spaced time points and Gamerman (1991) and Hemming and Shaw (2005) use a data-dependent division where the division points are the observed failure times. In this paper I consider estimation and model selection for the normal dynamic survival model where the piecewise constant hazard of subject i is defined by λ(t|zi ; t ∈ Ij ) = λij = exp(z′i β j ), (2) β j = β j −1 + ωj , (3) ωj ∼ N(0, Q(θ )). Here zi = (1, zi1 , . . . , ziK )′ is the vector of covariates and β j = (β0j , . . . , βKj )′ denotes the effects in interval Ij = (sj −1 , sj ], j = 1, . . . , J , β0j being the baseline log-hazard in Ij . The components of the state vector β are assumed to evolve independently, hence Q(θ ) = diag(θ0 , . . . , θK ). In this model an evolution variance θ0 = 0 implies a constant baseline hazard, and θk = 0 a constant effect of covariate zk . If all evolution variances θk , k = 0, . . . , K are zero, the model reduces to the exponential regression model λ(t|zi ) = exp(z′i β), with fixed covariate effects β = (β0 , . . . , βK ). 2.2 Likelihood The likelihood of the normal dynamic survival model defined in (2)–(3) is the product of individual likelihood contributions L(yi , di |zi ) = S(yi |zi )λ(yi |zi )di where S(y|.) and λ(y|.) denote the survival and the hazard function. The survival function of an observation yi ∈ Il can be expressed through conditional survival functions, see Hemming and Shaw (2005), as S(yi |zi ) =  l−1  j =1 Shaw (2002) use Gibbs sampling with a single move random walk Metropolis-Hastings-step to sample the state vector from the conditional posterior p(β|y, d, θ ). In this paper a simple Gibbs scheme based on data augmentation and avoiding Metropolis-Hastings steps is proposed. Building on the ideas of Frühwirth-Schnatter and Wagner (2006a) and Frühwirth-Schnatter and Frühwirth (2007) two sequences of latent variables are introduced which lead to a representation of the normal dynamic survival model as a conditional Gaussian state space model where direct sampling of β in one move is possible. 3.1 Splitting the observation time into episodes The factorization of the likelihood into independent contributions from the time intervals Ij suggests to split an observation time yi ∈ Il into episodes uij , j = 1, . . . , l experienced under the regime of the constant hazard λij . The total observation time yi is the sum of these episodes, yi = l j =1 uij . Consider e.g. an observation time yi ∈ I2 which is the sum of the time ui1 = s1 spent under hazard rate λi1 and the time ui2 = yi − s1 spent under hazard rate λi2 , see Fig. 1 for illustration. Each episode uij except the last is just the length of the interval Ij , uij = sj − sj −1 , j = 1, . . . , l − 1. The likelihood contributions of these episodes are of the form   exp −λij uij , j = 1, . . . , l − 1 and are equal to the likelihood contribution of a rightcensored Ex(λij ) observation. The last episode subject i experiences is uil = yi − sl−1 with a likelihood contribution given as   (5) exp −λil uil (λil )di . Dependent on the censoring indicator di , (5) corresponds to the likelihood either of a complete (for di = 1) or a rightcensored Ex(λil ) observation. S(sj |T > sj −1 , zi ) S(yi |T > sl−1 , zi ). Hence   L yi , di |λi1 , . . . , λil  l−1    = exp − λij (sj − sj −1 ) j =1   × exp − λil (yi − sl−1 ) (λil )di . (4) 3 Auxiliary mixture sampling Estimation of the unknown parameters of the normal dynamic survival model, i.e. the state vector β and the process variances θ , is feasible by MCMC methods. Hemming and Fig. 1 Hazard in the piecewise exponential model 234 Stat Comput (2011) 21: 231–246 3.2 Data augmentation In the first data augmentation step all episodes not ending by the occurrence of the interesting event are interpreted as right-censored. For each censored episode uij an unobserved complete survival time τij is introduced. The residual lifetime ξij = τij − uij , conditional on {τij > uij }, follows the exponential Ex(λij )-distribution, due to the no-memory property of the exponential distribution. Therefore the complete auxiliary survival times τij are given as τij = uij + ξij , ξij ∼ Ex(λij ) for j = 1, . . . , l − 1 (6) and uil τil = uil + ξil , ξil ∼ Ex(λil ) if di = 1, if di = 0. (7) are arranged decreasingly, so that y1 is the largest and yn is the smallest observed time. Defining a multivariate observation vector xj of dimension nj as ⎞ − ln τ1j − mr1j .. ⎟ ⎜ xj = ⎝ ⎠, . − ln τnj ,j − mrnj ,j ⎛ and εj = (εr1j , . . . , εrnj ,j ) the model can be written in the following linear Gaussian state space form: xj = Zj β j + ε j , ε j ∼ N (0, Vj ), (11) β j = β j −1 + ωj , ωj ∼ N (0, Q(θ )), (12) where Vj = diag(vr1j , . . . , vrnj ,j ) and Zj is an nj × (K + 1) matrix containing the design vectors z1 , . . . , znj for all individuals at risk at time sj −1 as its rows Using the auxiliary survival times τij the normal dynamic survival model defined in (2) and (3) can be represented as a dynamic generalized linear model (West et al. 1985) with exponentially distributed observations ⎞ z′1 ⎜ . ⎟ Zj = ⎝ .. ⎠ . z′nj τij |β j ∼ Ex(exp(z′i β j )), (8) β j = β j −1 + ωj , (9) Thus, instead of the original dynamic survival model we arrive at a partially Gaussian state space model, where the transition equation is the same as for the original model. The model for the log-hazard however is replaced by a Gaussian observation equation with a multivariate response vector xj , where xj is determined from the auxiliary survival times of the risk population at the beginning of interval Ij . ωj ∼ N (0, Q(θ )). Taking logarithms the observation equation of this model is transformed into the linear model − ln τij = zi β j + ǫij where the error ǫij has a type I extreme value distribution. As shown in Frühwirth-Schnatter and Frühwirth (2007) the density of the type I extreme value distribution can be approximated very accurately by a mixture of ten normal components pǫ (ǫ) = exp(−ǫ − e −ǫ )≈ 10  3.3 Prior distributions Priors have to be chosen for the initial value of the state vector β 0 and the process variances. We assume independent normal priors for the elements of β 0 βk0 ∼ N (ek , Pk ) wr fN (ǫ; mr , vr ). (10) r=1 The weights wr , means mr and variances vr which have been determined numerically by minimizing the Kullback– Leibler distance between the density of the type I extreme value distribution and the mixture approximation are tabulated in Frühwirth-Schnatter and Frühwirth (2007, Table 1). Introducing the component indicators rij ∈ {1, . . . , 10} as a second sequence of latent variables leads to the Gaussian state space model − ln τij = z′i β j + mrij + εrij , ⎛ εrij ∼ N (0, vrij ). Let nj denote the number of individuals at risk at the beginning of interval Ij , and assume that the observation times and independent inverse Gamma priors for the process variances θk ∼ IG(ck0 , Ck0 ), k = 0, . . . , K. 3.4 Sampling scheme With this prior choice a simple MCMC sampling scheme combining data augmentation with Bayesian estimation of Gaussian state space models as in Frühwirth-Schnatter (1994) can be implemented to sample jointly the latent process β, the variances θ = (θ0 , . . . , θK ) and the auxiliary variables xj and Rj = (r1j , . . . , rnj ,j ) for j = 1, . . . , J : Select starting values for θ and the augmented variables xj and Rj for j = 1, . . . , J and repeat the following steps: Stat Comput (2011) 21: 231–246 (a) Sample the latent states β conditional on the process variances θ and the auxiliary variables xj and Rj . The whole sequence β can be sampled by forward-filtering backward sampling (FFBS, Frühwirth-Schnatter 1994; Carter and Kohn 1994; de Jong and Shephard 1995) from the conditionally Gaussian state space form (11) and (12). Details on the implementation of FFBS are given in Appendix. (b) Sample θk , k = 0, . . . , K conditional on β from the inverse Gamma distribution IG(ck , Ck ) , where ck = ck0 + J /2, Ck = Ck0 + J  j =1 (βkj − βk,j −1 )2 /2. (c) Sample the auxiliary variables xj and Rj conditional on β (c1) Sample the auxiliary variables τij conditional on the corresponding hazard λij = exp(z′i β j ), i = 1, . . . , nj ; j = 1, . . . , J as described in (6)–(7). (c2) Sample the component indicators rij conditional on τij and λij from the following discrete distribution Pr{rij = r ∗ |τij , λij } ∝ wr ∗ ϕ(− ln τij − ln λij ; mr ∗ , sr2∗ ) for r ∗ = 1, . . . , 10. Here ϕ(x; μ, σ 2 ) denotes the probability density function of the N(μ, σ 2 ) distribution at x, see Frühwirth-Schnatter and Wagner (2006a) for details. Starting values for xj and Rj can be obtained by performing sampling step (b) with a starting value λ0ij for the hazards. In the applications I used the same hazard rate for all subjects at risk in an interval, λ0ij = λ0j , where λ0j was determined as the number of failures in interval Ij divided by the sum of observation times spent in Ij . 4 Model selection Model selection in the normal dynamic survival model means not only to select which covariate to include in the model but also to decide whether the effect of a certain covariate is constant or varies over time. To perform model selection we use a non-centered parameterization of the augmented state space model defined in (11) and (12). 4.1 The noncentered parameterization Let bj = Q−1/2 (β j − β 0 ) denote the standardized latent process. Obviously b starts in b0 = 0 and its stochastic evo- 235 lution is described by bj = bj −1 + ω̃j , ω̃j ∼ N (0, I). (13) Using the standardized state vector bj observation equation (11) can be written as xj = Zj β 0 + Zj Q1/2 bj + εj , ε j ∼ N (0, Vj ). (14) This noncentered parameterization is not identified as in the observation equation xij = K  k=0 zik βk0 + K  k=0  zik bkj (± θk ) + εij (15) √ for each k = 0, . . . , K the sign of θk and the sequence bkj , j = 1, . . . , J may be changed without changing the likelihood. As discussed in Frühwirth-Schnatter and Wagner (2010) the likelihood function of the noncentered parameterization of a state space model is bimodal in the direction of a process standard deviation if the respective component of the state vector is stochastic, and symmetric with a mode at 0 for a constant component. This means that p(x|θ , b) will be bi√ modal in the direction of θk if the effect of covariate k √ varies over time and symmetric with a mode θk = 0 if the effect is constant. Note that non-identifiability of the noncentered parameterization concerns only the sign of a component of the state vector and the corresponding process standard deviation: the effect of covariate k in each interval √ j given as βk0 + bkj (± θk ) is identified. However, this nonidentifiability has to be taken into account in the MCMC sampling scheme to guarantee exploration of the whole posterior distribution. 4.2 The parsimonious normal dynamic survival model In the observation equation of the noncentered parameterization (15) the mean of the auxiliary variable xij is equal to the log-hazard log λij of subject i in interval Ij . This log hazard has two components, the first resulting from the initial covariate effects βk0 , the second from the modification √ of these initial effects in interval Ij , bkj (± θk ). To perform model selection we introduce for each covariate effect k = 1, . . . , K two indicator variables δk and γk . The first of these indicators δk is 0 iff the initial effect β0k = 0 and thus selects the effect of covariate zk at time 0. The second indicator γk is defined as γk = 0 iff θk = 0 and selects effects that vary over time. We will include a constant baseline hazard effect in each model, hence β00 is not selected. However an additional indicator γ0 , defined as γ0 = 0 iff θ0 = 0, is introduced to select between a constant and a time-varying baseline hazard. 236 Stat Comput (2011) 21: 231–246 The final model for variable selection is then given as: xij = β00 + K  k=1 δk zik βk0 + K  k=0  γk zik bkj (± θk ) + εij , εij ∼ N(0, vrij ), (16) bkj = bk,j −1 + ω̃kj , ω̃kj ∼ N (0, 1). (17) The variable selection model can be written as the following state space model xj = Zj (δ)β 0 + Hj (γ )bj + ε j , bj = bj −1 + ω̃j , εj ∼ N(0, Vj ), ω̃l ∼ N(0, I) (18) (19) where Zj (δ) and Hj (γ ) depend on the indicators and are given as Zj (δ) = Zj diag(1, δ), Hj (γ ) = Zj Q1/2 diag(γ ). 4.3 Priors To complete model specification prior distribution for all unknown model parameters (δ, γ , β 0 , θ ) have to be chosen. 4.3.1 Prior distribution for the indicators In the variable selection model (16)–(17) there is a choice between two modeling options for the baseline log-hazard namely constant, γ0 = 0, or time-varying, γ0 = 1. For each covariate k = 1, . . . , K there are three different modeling options which are interesting from a practical point of view: no effect of the covariate, a constant effect or a time-varying effect. We identify these cases by the values ζ1 = (0, 0), ζ2 = (1, 0) and ζ3 = (1, 1) for the indicator pair (δk , γk ). Assuming independence of the indicators for k = 0, 1, . . . , K leads to a prior p(δ, γ ) = p(γ0 ) K  p(δk , γk ). k=1 Let p(γ0 = 1) = η0 and p((δk , γk ) = ζl ) = ηl for l = 1, 2, 3. As a first prior, we consider fixed prior probabilities η0 = 21 and ηl = 31 , l = 1, 2, 3. This prior, denoted as prior 1 in the following, assigns equal probability p = 0.5 · 3−K to all models under consideration. Let hl , l = 1, 2, 3 denote the number of covariates for which modeling option l is selected, i.e. hl = K k=1 × I{(δk ,γk )=ζl } and let h0 = γ0 . The prior induced on h = (h0 , h1 , h2 , h3 ) is then given as   K 1 3−K . p1 (h) = 2 h1 h2 h3 For a large number K of covariates this prior can be rather informative on the number of selected fixed and timevarying effects. A second prior is obtained by putting a hyperprior on the inclusion probabilities as in Smith and Kohn (2002) and Frühwirth-Schnatter and Tüchler (2008). For model selection in regression models Ley and Steel (2009) showed that a prior where the inclusion probability is random clearly outperforms priors with fixed inclusion probabilities. Conjugate priors are a Beta prior for η0 and a Dirichlet prior for (η1 , η2 , η3 ). For η0 ∼ Beta(1, 1) and (η1 , η2 , η3 ) ∼ Dirichlet(1, 1, 1) the resulting prior is p2 (δ, γ ) =  1 2 3l=1 Ŵ(hl + 1) · . 2 Ŵ(K + 3) This prior is uniform on h as   1 K p2 (h) = p(δ, γ ) = . (K + 1)(K + 2) h1 h2 h3 (20) As prior 3 we specify a prior of the form p(δk , γk ) = p(δk )p(γk |δk ) where Pr{γk = 1|δk = 0} = 0. This prior somehow reflects the strategy of Hofner et al. (2008), who in a first step select covariates into the model and in a second step for each of the selected covariates choose a modeling alternative (fixed or time-varying effect). A flexible prior is obtained by assuming that Pr{δk = 1|ηδ } = ηδ , k = 1, . . . , K, Pr{γk = 1|δk = 1, ηγ } = ηγ , k = 1, . . . , K, Pr{γ0 = 1|ηγ } = ηγ . If both hyperparameters ηδ and ηγ are iid uniform on [0,1], the resulting prior is p3 (δ, γ ) = p3 (δ)p3 (γ |δ) with  p3 (δ) = B 1 + K  k=1 δk , 1 + K − K  k=1  δk ,   p3 (γ |δ) = B 1 + γ0 + γk , k:δk =1 1 + (1 − γ0 ) +   (1 − γk ) , k:δk =1 where B(·, ·) is the Beta function. Note that this prior leads to a uniform distribution over the number of selected covariates hδ = K k=1 δk , p3 (hδ ) = 1 . K +1 (21) The resulting conditional prior p3 (γ |δ) is uniform over the number of regressors with potentially time-varying effect. Stat Comput (2011) 21: 231–246 237 As the log-baseline hazard is included in each model, the number of these regressors is hδ + 1 and    p3 γ 0 + γk = hγ |δ = k:δk =1 1 . hδ + 2 An interesting question is, whether one of these 3 priors favours selection of more covariates or more time varyingeffects. Under prior 1 the number of selected covariates hδ = h2 + h3 as well as the number of covariates with time-varying effect h3 follow a Binomial distribution, hδ ∼ Bino(K, 2/3) and h3 ∼ Bino(K, 1/3). Prior 2 induces a uniform distribution over all combinations of h, see (20) and therefore p2 (hδ ) = 2(hδ + 1) (K + 1)(K + 2) and p2 (h3 ) = 2(K + 1 − h3 ) . (K + 1)(K + 2) Finally, under prior 3, p3 (hδ ) is the uniform distribution given in (21) and as p3 (h3 |δ) = hδ1+1 we get p3 (h3 ) = K  hδ =h3 p3 (h3 |hδ )p3 (hδ ) = K 1  1 . K +1 j +1 j =h3 The induced priors on hδ and h3 are shown as line plots for K = 10 in Fig. 2. Prior 1 favours selection of about two third of the covariates, i.e. hδ ≈ 2/3K. Prior 2 induces prior probabilities increasing in hδ whereas prior 3 assigns equal probability to all values of hδ . With regard to the number of time-varying effects, prior 1 favours about one third of the covariates being time-varying. Both prior 2 and 3 induce prior probabilities decreasing in h3 . Fig. 2 Priors induced on the number of selected covariates hδ (left) and on the time-varying effects h3 (right) for K = 10 4.3.2 Prior distributions for the effects Conditional on the state vector, the observation equation (16) of the variable selection model defines a Gaussian regression model with heteroscedastic errors and known error variances. Denoting by α the parameter vector α = √ √ (β00 , . . . , β0K , ± θ0 , . . . , ± θK ), this regression model can be written as δ,γ xij = (wij )′ α δ,γ + εij , εij ∼ N (0, vrij ). (22) If all indicators take the value 1 then α δ,γ = α and the design δ,γ vector wij is of the form   w′ij = 1, zi1 , . . . , ziK , b0j , zi1 b1j , . . . , ziK bKj . Otherwise the restricted parameter vector α δ,γ and the preδ,γ dictor vector wij contain only those elements for which the corresponding indicator is equal to 1. Let x, R and ε denote the vectors obtained by stacking the vectors xj , Rj = (r1j , . . . , rnj ,j ) and ε j , j = 1, . . . , J , V the diagonal matrix containing the variances of the error terms δ,γ and Wδ,γ the regressor matrix with rows equal to (wij )′ in appropriate order. Using this notation the regression model for all N = Jj=1 nj auxiliary variables can be written as x = Wδ,γ α + ε, ε ∼ N (0, V). Conditional on the indicators δ and γ we specify a fractional prior for α δ,γ as in Frühwirth-Schnatter and Tüchler (2008). The basic idea for the fractional prior is to use a proportion f ∈ (0, 1) of the conditional likelihood p(x|α δ,γ , b, R) to obtain a proper posterior under the improper prior p(α δ,γ ) ∝ const. Here the fractional prior is a Normal distribution,   δ,γ δ,γ 1 δ,γ p(α |b, x, R) = N aN , (AN ) , (23) f 238 Stat Comput (2011) 21: 231–246 where δ,γ (AN )−1 = (Wδ,γ )′ V−1 Wδ,γ , δ,γ δ,γ aN = AN (Wδ,γ )′ V−1 x. (24) (25) The fraction of the fractional prior is chosen as f = 1/N , as in Smith and Kohn (2002). 4.4 Sampling scheme The noncentered parameterization together with the prior choices allow for a simple MCMC sampling scheme to sample jointly the indicators (δ, γ ), the unrestricted elements of the parameter vector α, the state process b = (b1 , . . . , bJ ) and the auxiliary variables x and R. The sampling scheme consists of the following steps: (a) Sample each indicator pair (δk , γk ) from (δk , γk |b, x, R, δ \k , γ \k ) ∝ p(x|δ, γ , b, R)p(δk , γk , δ \k , γ \k ) (b) (c) (d) (e) where δ \k and γ \k denote the elements of the indicator vectors except δk and γk . For updating the whole indicator vector (δ, γ ) a random order of updating the indicator pairs (δk , γk ) is used. Sample all unrestricted elements of the initial √ values of β 0 and all unrestricted variance parameters θk jointly δ,γ δ,γ from the multivariate normal posterior N (aN , AN ) conditional on the state vector b and the auxiliary variables (x, R). Set all remaining initial values of β 0 and all remaining variances equal to 0. Sample b = (b1 , . . . , bJ ) from the state space form p(b|δ, γ , α, x, R) given in (18) and (19). For √ each k = 0, . . . , K: Perform a random sign-switch for √θk and bkj , j = 1, . . . , J . Thus with probability √ 1/2 θk and bkj , j = 1, . . . , J are replaced by − θk and −bkj , j = 1, . . . , J . Sample the auxiliary variables xj and Rj conditional on α δ,γ and bj as in Sect. 3.4. Details on sampling steps (a) and (c) are given below. The likelihood of the noncentered parameterization is not affected by changing the sign of a component of the state vector and the corresponding process standard deviation, see Sect. 4.1. The random sign-switch step (d) of the sampling scheme guarantees that the full, eventually multimodal, posterior distribution is explored. 4.4.1 Sampling the indicators and the unrestricted parameters Sampling the indicators (δ, γ ) requires one to marginalize over the parameters which are subject to model selection, see Smith and Kohn (1996) for a full account. This is feasible for the noncentered parameterization of the normal dynamic survival model as the auxiliary mixture sampling scheme leads to a conditionally Gaussian regression model, where the marginal likelihood can be derived analytically. To sample the indicators marginally with respect to α δ,γ we combine the fractional prior (23) only with the remaining (1 − f ) fraction of the likelihood p(x|α δ,γ , b, R) not used for constructing the prior. Integration with respect to α δ,γ yields the marginal likelihood   |V|−1 (1−f )/2 p(x|δ, γ , b, R) = f (qδ +qγ )/2 (2π)N   (1 − f ) S , × exp − 2 where δ,γ δ,γ δ,γ S = x′ V−1 x − (aN )′ (AN )−1 (aN ). qδ and qγ are the number of nonzero elements in δ and γ δ,γ δ,γ respectively and aN and AN are given in (25) and (24). 4.4.2 Sampling the state vector In sampling step (c) forward-filtering-backward-sampling (FFBS, Frühwirth-Schnatter (1994), Carter and Kohn (1994), de Jong and Shephard (1995)) is used to sample the state vector b = (b1 , . . . , bJ ). If at least one indicator γk = 0 a reduced state space form is used. As the observation equation is independent of any component of the state vector b where the corresponding γk = 0, FFBS is applied to the reduced state vector of the components where γk = 1 and all other components are sampled from the prior (17). 5 Case studies 5.1 Gastric cancer data As a first application I use a data set of patients with gastric cancer, analyzed previously by Gamerman (1991) and Hemming and Shaw (2005). The data are survival times of 90 patients, randomly allocated to a therapy. Treatment was chemotherapy in the first group, and in the second group a combination of chemotherapy and radiation was applied. Overall 10 observation times were right censored. In this setting it is of interest whether there is a treatment effect of combined therapy as compared to chemotherapy alone (which serves as control) and whether this effect varies over time. The covariate vector is zi = (1, z1i ) where z1i indicates whether patient i underwent combined therapy or not. The parameter vector β therefore has 2 components. Stat Comput (2011) 21: 231–246 239 The fact that the estimated survival functions for treatment and control group cross (see Fig. 3), indicates that a Cox model with constant treatment effect might not be appropriate. 5.1.1 Fitting a dynamic survival model To fit a dynamic survival model, the failure times in the data were used as division points of the time axis, as in Gamerman (1991) and Hemming and Shaw (2005). The auxiliary mixture sampler of Sect. 3.4 was implemented in M ATLAB (Version 7.2.0) and run for M = 100,000 iterations after a burn-in of 20,000. Prior moments for the initial values β 0 were ek = 0 and Pk = 100, k = 0, 1. For the process variances proper, but uninformative inverse Gamma priors with parameters c0k = C0k = 0.01, k = 0, 1 were chosen. Table 1 reports point estimates and standard errors as well as 95%-highest posterior density intervals for the process variances θ0 and θ1 of baseline log-hazard and treatment effect. As already noted in previous analyses the process variance of the treatment effect is higher than for the baseline log-hazard. The estimated survival functions for the fitted normal dynamic survival model are compared to the Kaplan-Meier estimates in Fig. 3. The estimated baseline log-hazard and the effect of combined therapy are shown in Fig. 4. The baseline log-hazard increases early followed by a sharp decline. The estimated treatment effect declines from an effect with positive sign to a negative sign in the long run. Risk of death is thus higher for patients treated with combined therapy during the first 200 days but lower than for those treated only with chemotherapy later on. As the credible intervals include a straight line, a constant baseline hazard, as assumed in Gamerman (1991) and a constant treatment effect cannot be ruled out. 5.1.2 Comparison of the auxiliary mixture sampler to a single move MH algorithm Hemming and Shaw (2005) use a single move random walk MH step to sample the state vector. For a slightly different defined model with β 1 as initial value of the state vector they sample βk1 and the innovations ωkj , k = 0, . . . , K; j = 2, . . . , J from p ∗ (βk1 |.) ∝ p(βk1 )L1 , (26) Table 1 Gastric data: Estimated process variances (centered parameterization) p ∗ (ωkj |.) ∝ p(ωkj )Lj Parameter where Lj denotes the likelihood contribution for interval Ij ,    Lj = exp − λij (sj − sj −1 ) Mean Std.dev. 95%H.P.D. intervals θ0 0.024 0.025 [0.002, 0.067] θ1 0.057 0.059 [0.002, 0.167] for j = 2, . . . , J, (27) i:yi >sj ×  i:yi ∈Ij   exp − λij (yi − sj −1 ) (λij )di . These are however not the full conditional posterior distribj utions, as βkj = βk0 + l=1 ωkl and therefore all intervals Il where l > j contain information on ωkj . For an adequate comparison of the single move MH algorithm with auxiliary mixture sampling, βk0 and ωkj , k = 0, . . . , K; j = 1, . . . , J have to be sampled from Fig. 3 Gastric data: Estimated survival function and Kaplan-Meier-estimates for the treatment group (blue) and the control group (red) p(βk0 |.) ∝ p(βk0 ) J  Ll , p(ωkj |.) ∝ p(ωkj ) J  Ll (28) l=1 l=j for j = 1, . . . , J. (29) To compare both algorithms with respect to their efficiency and computing time the single move MH algorithm was also implemented in M ATLAB (Version 7.2.0) and run under the same priors as in Sect. 5.1.1 for M = 100,000 iterations after a burn-in of 20,000 on the same notebook with a 2.0 GHz processor. Using normal random walk proposals, 240 Stat Comput (2011) 21: 231–246 Fig. 4 Gastric data: Posterior means and 95% credible regions for the baseline log-hazard (left) and the effect of combined treatment (right) Table 2 Gastric data: Comparison of the auxiliary mixture sampler (AUX) and the single move MH sampler 5.1.3 Unrestricted noncentered parameterization Parameter Method τ ESS θ0 AUX 349.8 285.9 2.53 MH 200.3 499.3 0.46 AUX 360.4 277.5 2.45 MH 291.1 343.5 0.32 A fit of the unrestricted noncentered parameterized model provides a useful tool for exploratory analysis as the posterior of process standard deviations gives insight whether an effect is time-varying or not. To fit the noncentered parametrized model sampling steps (b)–(e) of the sampler described in Sect. 4.4 were run for 50000 iterations after a burnin of 20000. As convergence of the sampler was rather slow using the failure times as division points, a coarser division of the time axis was used, where every fifth failure time defines an interval endpoint. Histograms of the √ posterior densities for the process standard deviations ± θi , i = 0, 1 are plotted in Fig. 5. Both posteriors are bimodal, pointing to a model where both baseline log-hazard and treatment effect vary over time. θ1 ESS per min. acceptance rates were between 28% and 58%. Estimation results for process variances and baseline log-hazard as well as treatment effects were essentially the same as for the auxiliary mixture sampler. Table 2 summarizes the results of this comparison reporting the inefficiency factor, the effective sample size (ESS) and the effective sample size per minute for both process variances. Inefficiency factors, defined as τ = 1 + 2 L l ρ(l), where ρ(l) is the empirical autocorrelation at lag l, were computed using the initial monotone sequence estimator (Geyer 1992) for L. The effective sample size ESS = M/τ estimates the number of independent samples required to obtain a parameter estimate with the same precision as the MCMC estimate. As draws are less autocorrelated for the MH algorithm than for the auxiliary mixture sampler, the MH algorithm performs better in terms of inefficiency factors and effective sample sizes. However, taking into account computation time, the order is reversed with ESS per minute of the auxiliary mixture sampler being more than 5 times as high as for the MH algorithm. 5.1.4 Stochastic model specification search Stochastic model specification search was carried out using the fractional prior and the three different priors for the model indicators discussed in Sect. 4.3.1. MCMC sampling was carried out for M = 100,000 draws after a burn-in of 20,000 draws. The first 10,000 draws of the burn-in were drawn from the unrestricted model, model selection began after these first 10,000 draws. Results of the variable selection procedure are summarized in Tables 3 and 4. Note that prior 1 and prior 2 in this case where only the effect of one covariate is considered coincide and all 6 models considered have equal prior probability. This is not the case for prior 3, which assigns a prior probability of 1/4 to models 1 and 2, 1/6 to models 3 and 6 and 1/12 to models 4 and 5. The most frequently visited model in Table 3 is robust against the prior choice, but the frequency with which this model is selected varies, in Stat Comput (2011) 21: 231–246 241 Fig. 5 Gastric√data: posterior densities of ± θ0 (left) and √ ± θ1 (right) Table 5 Description of variables in the Worcester heart attack study data Table 3 Gastric data: number of draws from each model Model δ1 γ0 γ1 Prior 1/2 Prior 3 1 0 0 0 5534 10300 2 0 1 0 6992 10089 3 1 0 0 2218 2215 Variable Description age age at hospital admission in years (centered at 67 years) gender 4 1 0 1 25236 13163 5 1 1 0 2739 1565 sho 62668 cpk 6 1 1 1 57281 0 = male, 1 = female cardiogenic shock complications (0 = no, 1 = yes) peak cardiac enzyme measured in international units (IU) chf Table 4 Gastric data: posterior means of indicators miorder Prior δ1 γ0 γ1 1/2 0.8747 0.6701 0.8252 3 0.7961 0.7432 0.7583 particular models 4 and 5 are visited less frequently under prior 3. Table 4 reports the posterior means of the indicator variables under both priors. If the selected model is defined as comprising all parameters where the posterior mean of the corresponding indicator is greater than 0.5, the same model, namely the model where both log-baseline and treatment effect are time-varying, results for the two priors. 5.2 Worcester heart attack data As a further application I analyzed the data of the Worcester heart attack study described in Hosmer and Lemeshow (1999). The main goal of the study was to describe trends over time in the incidence and survival rates following hospital admission for acute myocardial infarction. The data set provided by Hosmer and Lemeshow (1999) is a sample of mitype left heart failure complications (0 = no, 1 = yes) myocardial infection order (0 = first, 1 = recurrent) myocardial infection type (0 = Q wave, 1 = not Q wave or indeterminate) the main data set with information on 481 patients. Additionally to length of follow-up, defined as days from hospital admission and status of last follow-up (dead or alive), several covariates were available which are described in Table 5. 5.2.1 Fitting a dynamic survival model in centered parameterization As some of the covariates are related to the myocardial infarction, a time-varying effect for these covariates might be more plausible than a simple proportional hazards model. In a first analysis a dynamic survival model with time-varying effects for all covariates was fitted. As many deaths occur early, the time axis was partitioned into intervals of increasing lengths, starting with a finer division of one day length, followed by intervals of half a week, a week, one month, 3 months, half a year to intervals of one year length (from year 3 to year 10). The number of events in these time in- 242 Table 6 WHAS data: interval endpoints and number of events in the first month Stat Comput (2011) 21: 231–246 j 1 sj (days) 1 2 number of events 8 16 tervals ranges from 3 (in year 10) to 18 (in year 6). Table 6 summarizes the division of the time axis and the number of events for the first month. For the initial values of the random walks (for baseline log-hazard and covariate effects), independent normal distributions with mean 0 and prior variance 100 were chosen. For the process variances inverse Gamma priors with parameters c0 = 0.1 and C0 = 0.01 were used. The auxiliary mixture sampler for the centered parameterization was run for 50,000 iterations after a burn-in of 20,000. Posterior means of log baseline hazard and covariate effects are displayed in Fig. 6 with 95% credible regions versus time (in days). Baseline log-hazard and most of the covariate effects show large changes in the beginning, many of them a sharp decline, and level off in the long run. Due to the general model specification credible intervals are rather wide making it hard to assess whether an effect is constant over time or not. The effect of age is fairly constant with positive sign which means a negative effect, i.e. risk of death after myocardial infarction increases with age. Risk is higher for female than for male patients during the first three months, but approximates those of men later on. The adverse effect of cardiogenic shock complications (sho) shows a decline in the beginning but remains high even in the long run and is significantly positive until year three after infarction. For cardiac enzyme (cpk) there is evidence for a time-varying effect: higher values lead to a higher risk during the first four days, but to a lower risk for those surviving more than five years with even a beneficial effect in the long run. Left heart failure complications (chf) have a negative effect (i.e. higher risk) during the first two years: Risk increases at the beginning, reaching its maximum in the third month, then declines. Persons experiencing a recurrent infarction (miorder) have a higher risk during the first three months, but this effect wears off in the long run. Finally for q-wave infarctions (mitype) there is a sharp decline of the risk during the first 3 months which gives evidence to a time-varying effect. 5.2.2 Model selection As a next step the unrestricted model in the noncentered parametrization was fit under a fractional prior using steps (b)–(e) of the MCMC sampler described in Sect. 4.4 with all indicators fixed to one. Histograms of the posterior densities (based on 100,000 iterations after a burnin of 20,000) 2 3 4 5 6 7 8 9 3 4 7 10 14 21 30 10 7 7 14 12 12 7 √ for the process standard deviations ± θk , k = 0, . . . , 7 are shown in Fig. 7. The posterior of the process standard deviation is bimodal with two clearly separated modes for log-baseline, √ cpk and mitype. The posterior ordinate at ± θk = 0 is practically zero, indicating a time-varying baseline and time-varying effects of cpk and mitype. For miorder, gender and chf the posterior for the process √ standard deviation has also two modes, but values of ± θ close to zero have a positive posterior probability. For age and sho the posterior is unimodal indicating a constant effect of these covariables. For stochastic model specification search the MCMC sampling scheme of Sect. 4.4 was run for M = 200, 000 draws after a burn-in of 20, 000 draws. The first 10,000 draws of the burn-in were drawn from the unrestricted model, model selection began after these first 10,000 draws. Results of the variable selection procedure under the fractional prior and three different choices for the prior on the indicators discussed in Sect. 4.3.1 are summarized in Table 7. For all priors the selected model (based on indicators with posterior mean >0.5) includes a time-varying logbaseline hazard and time-varying effects of cpk and mitype, whereas the effects of age and sho are selected as constant and the effect of gender is not selected at all. For chf and miorder the resulting specification is different for the three priors: under prior 1 a constant effect is selected for both variables, prior 2 selects a constant effect for chf, but a time-varying effect for miorder. Under prior 3 both effects are selected as time-varying. The most frequently visited model under each prior was the model containing a time-varying log-baseline hazard, time-varying effects of cpk, mitype and miorder, constant effects of age, sho and chf and no effect of gender, i.e. the model selected under prior 2. Each of the three selected models is among the top five models for any of the three priors, but the number of visits for any model varies for the different priors. To illustrate the effect of the prior on the number of timevarying effects h3 , Fig. 8 displays the three different priors and the corresponding posterior distributions of h3 . As during MCMC models are visited according to their posterior probabilities, the posterior probability of h3 = k is determined as the relative frequency of visits to models with k time-varying effects. The posterior mean of h3 is 2.8 under prior 1, 3.3 under prior 2 and 3.8 under priors 3. Prior 1 and Stat Comput (2011) 21: 231–246 243 Fig. 6 WHAS data: Posterior means and 95% credible regions for the baseline log-hazard (a) and the effects of age (b), gender (c), sho (d), cpk (e), chf (f), miorder (g) and mitype (h) versus time (in days) 2 have identical mean E(h3 ) = 2.3, but prior 2 is less informative on h3 and hence shrinkage to the prior mean is less pronounced than under prior 1. Posterior probabilities for h3 = 4, . . . , 7 are larger under prior 3 than for any of the other priors, though models with few time-varying ef- fects are favoured. This indicates a conflict between prior and likelihood: Prior 3 assigns more than 30% probability mass to h3 = 0, but models with no time-varying effect are not supported by the likelihood, actually none of these models was visited during MCMC under any of the three priors. 244 Stat Comput (2011) 21: 231–246 √ Fig. 7 WHAS data: posterior densities of the process standard deviations ± θ for baseline log-hazard (a) and the effects of age (b), gender (c), sho (d), cpk (e), chf (f), miorder (g) and mitype (h) Table 7 WHAS data: posterior means of indicators Prior 1 1.00 1.00 0.23 1.00 1.00 0.28 1.00 1.00 0.35 δ γ 3 age δ γ 2 baseline δ γ 6 Concluding remarks In this paper a new auxiliary mixture sampler for dynamic survival models is proposed, which is easy to implement and needs no tuning. Its convenience results from representing the log-hazard model as a partial Gaussian model for auxiliary variables which allows to deal with any form of the linear predictor where Gibbs sampling for the equivalent model with Gaussian errors is feasible. Sampling algorithms for Gaussian models are easily adapted to survival models with the same linear predictor by adding two steps of data augmentation. This is demonstrated for the noncentered parameterization of the log-hazard model by implementing gender sho cpk chf miorder mitype 0.19 1.00 0.93 1.00 0.77 0.90 0.06 0.12 0.90 0.35 0.38 0.81 0.46 1.00 0.99 1.00 0.90 0.95 0.18 0.16 0.97 0.38 0.50 0.85 0.43 1.00 0.90 0.99 0.89 1.00 0.21 0.25 0.87 0.51 0.60 0.97 stochastic model specification search for state space models as proposed in Frühwirth-Schnatter and Wagner (2010). For survival data stochastic model specification search is attractive as additionally to variable selection the form of each effect, constant vs. time-varying has to be specified. Three different priors for the model indicators were proposed and investigated in the applications. To get deeper insight how prior assumptions affect inference and model selection results for dynamic survival models further research is needed. The auxiliary mixture sampler has a wider application, as e.g. inclusion of normal frailties, unstructured or structured normal spatial effects or nonlinear effects of covari- Stat Comput (2011) 21: 231–246 245 Fig. 8 Priors (left) and corresponding posterior distributions (right) for the number of the time-varying effects h3 ates modelled by P-splines as in Hennerfeind et al. (2006) is straightforward. The key property that has to be maintained for application of the auxiliary mixture sampler is the piecewise constant structure of the log-hazard as a function of time. Also model specification search for random effects described for normal and logit models in Frühwirth-Schnatter and Tüchler (2008) and Tüchler (2008) could be incorporated easily for dynamic survival models. As a further extension missing information different from right-censoring, e.g. interval censoring, can be dealt with by introducing complete auxiliary survival times conditional on the available information. Estimation of covariate effects in Cox-type models is often based on the partial likelihood, which does not involve the baseline hazard function and has a marginal posterior interpretation according to Sinha et al. (2003). As the data augmentation step proposed in this paper makes explicit use of the full likelihood of the normal dynamic survival model it cannot be applied to posterior inference based on the partial likelihood. Development of an appropriate data augmentation scheme to apply the methods of this paper to partial likelihood inference will be an interesting topic for further research. Acknowledgements I would like to thank two referees and Sylvia Frühwirth-Schnatter for helpful comments and Christoph Pamminger for critical reading of the manuscript. Appendix: Forward filtering backward sampling Let xj = (x1 , . . . , xj ) denote all observations from a linear Gaussian state space model up to time j , for j = 1, . . . , J . Under a normal prior β 0 ∼ N(e0 , P0 ) on the initial value of the state vector β, the moments ej and Pj of the normal filtering density p(β j |xj , θ ) can be determined by running the Kalman filter from j = 1, . . . , J . For the conditional Gaussian state space model defined by (11) and (12) these moments are given by the following recursions: ej = ej −1 + Kj (xj − Zj ej −1 ), (30) Pj = (I − Kj Zj )Pj |j −1 , (31) where Pj |j −1 = Pj −1 + Q and Kj = Pj |j −1 Z′j (Zj Pj |j −1 Z′j + Vj )−1 . (32) The moments for the prior specified in Sect. 3.3 are e0 = (e0 , . . . , eK )′ and P0 = diag(P0 , . . . , PK ). After forward filtering, a realization of the whole multivariate state vector β J = (β 0 , . . . , β J ) can be drawn in one move from the smoothing density p(β J |xJ , θ ). This is accomplished by sampling the last element β J from N (eJ , PJ ) and the remaining elements β j from p(β j |β j +1 , xJ , θ ) back in time for j = J − 1 to j = 0. The conditional densities p(β j |β j +1 , xJ , θ ) are multivariate normal densities with moments   β̂ j |J = I − Lj +1 ej + Lj +1 β j +1 , (33)   (34) Pj |J = I − Lj +1 Pj , where Lj +1 = Pj (Pj + Q)−1 . Formulas for these moments in more general linear Gaussian state space models are given in Frühwirth-Schnatter (1994). To sample b in the noncentered parameterization from the conditional linear Gaussian state space model given in (18) and (19) formulas (30)–(34) have to be modified in an obvious way. Note however, that b starts at 0 in this case and therefore e0 = 0 and P0 is a zero-matrix, P0 = O. References Arjas, E., Gasbarra, D.: Nonparametric Bayesian inference from right censored survival data using the Gibbs sampler. Stat. Sin. 4, 505– 524 (1994) Carter, C., Kohn, R.: On Gibbs sampling for state space models. Biometrika 81, 541–553 (1994) 246 Chib, S., Nardari, F., Shephard, N.: Markov chain Monte Carlo methods for stochastic volatility models. J. Econom. 108, 281–316 (2002) de Jong, P., Shephard, N.: The simulation smoother for time series models. Biometrika 82, 339–350 (1995) Frühwirth-Schnatter, S.: Data augmentation and dynamic linear models. J. Time Ser. Anal. 15, 183–202 (1994) Frühwirth-Schnatter, S., Frühwirth, R.: Auxiliary mixture sampling with applications to logistic models. Comput. Stat. Data Anal. 51, 3509–3528 (2007) Frühwirth-Schnatter, S., Tüchler, R.: Bayesian parsimonious covariance estimation for hierarchical linear mixed models. Stat. Comput. 18, 1–13 (2008) Frühwirth-Schnatter, S., Wagner, H.: Auxiliary mixture sampling for parameter-driven models of time series of counts with applications to state space modelling. Biometrika 93, 827–841 (2006a) Frühwirth-Schnatter, S., Wagner, H.: Data augmentation and Gibbs sampling for regression models of time series of small counts. Student 5, 221–234 (2006b) Frühwirth-Schnatter, S., Wagner, H.: Stochastic model specification search for Gaussian and non-Gaussian models. J. Econom. 154, 85–100 (2010) Gamerman, D.: Dynamic Bayesian models for survival data. Appl. Stat. 40, 63–79 (1991) George, E.I., McCulloch, R.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993) George, E., McCulloch, R.: Approaches for Bayesian variable selection. Stat. Sin. 7, 339–373 (1997) Geyer, C.: Practical Markov chain Monte Carlo. Stat. Sci. 7, 473–511 (1992) Green, P.J.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732 (1995) Hemming, K., Shaw, J.E.H.: A parametric survival model applied to breast cancer survival times. Appl. Stat. 51, 421–435 (2002) Hemming, K., Shaw, J.E.H.: A class of parametric dynamic survival models. Lifetime Data Anal. 11, 81–98 (2005) Hennerfeind, A., Brezger, A., Fahrmeir, L.: Geoadditive survival models. J. Am. Stat. Assoc. 101, 1065–1075 (2006) Stat Comput (2011) 21: 231–246 Hofner, B., Kneib, T., Hartl, W., Küchenhoff, H.: Building Cox-type structured hazard regression models with time-varying effects. Stat. Model. Int. J. (2009, to appear) Hosmer, D.W., Lemeshow, S.: Applied Survival Analysis. Wiley, New York (1999) Ibrahim, J., Chen, M., Sinha, D.: Bayesian Survival Analysis. Springer, New York (2001) Kim, S., Shephard, N., Chib, S.: Stochastic volatility: Likelihood inference and comparison with ARCH models. Rev. Econ. Stud. 65, 361–393 (1998) Konrath, S., Kneib, T., Fahrmeir, L.: Bayesian regularisation in structured additive regression models for survival data. Technical report (2008) Ley, E., Steel, M.F.J.: On the effect of prior assumptions in Bayesian model averaging with applications to growth regression. J. Appl. Econom. 24, 651–674 (2009) Omori, Y., Chib, S., Shephard, N., Nakajima, J.: Stochastic volatility with leverage: fast likelihood inference. J. Econom. 140, 425–449 (2007) Shephard, N.: Partial non-Gaussian state space. Biometrika 81, 115– 131 (1994) Sinha, D., Chen, M.H., Ghosh, S.K.: Bayesian analysis and model selection for interval censored data. Biometrics 55, 585–590 (1999) Sinha, D., Ibrahim, J.G., Chen, M.H.: A Bayesian justification of Cox’s partial likelihood. Biometrika 90, 629–641 (2003) Smith, M., Kohn, R.: Nonparametric regression using Bayesian variable selection. J. Econom. 75, 317–343 (1996) Smith, M., Kohn, R.: Parsimonious covariance matrix estimation for longitudinal data. J. Am. Stat. Assoc. 97, 1141–1153 (2002) Tüchler, R.: Bayesian variable selection for logistic models using auxiliary mixture sampling. J. Comput. Graph. Stat. 17, 76–94 (2008) Verweij, P.J.M., van Houwelingen, H.C.: Time-dependent effects of fixed covariates in Cox regression. Biometrics 51, 1550–1556 (1995) West, M., Harrison, P.J., Migon, H.S.: Dynamic generalized linear models and Bayesian forecasting. J. Am. Stat. Assoc. 80, 73–83 (1985)