Abstract
Left-truncated data often arise in epidemiology and individual follow-up studies due to a biased sampling plan since subjects with shorter survival times tend to be excluded from the sample. Moreover, the survival time of recruited subjects are often subject to right censoring. In this article, a general class of semiparametric transformation models that include proportional hazards model and proportional odds model as special cases is studied for the analysis of left-truncated and right-censored data. We propose a conditional likelihood approach and develop the conditional maximum likelihood estimators (cMLE) for the regression parameters and cumulative hazard function of these models. The derived score equations for regression parameter and infinite-dimensional function suggest an iterative algorithm for cMLE. The cMLE is shown to be consistent and asymptotically normal. The limiting variances for the estimators can be consistently estimated using the inverse of negative Hessian matrix. Intensive simulation studies are conducted to investigate the performance of the cMLE. An application to the Channing House data is given to illustrate the methodology.
Similar content being viewed by others
References
Bennett S (1983) Analysis of survival data by the proportional odds model. Stat Med 2:273–277
Chen K, Jin Z, Ying Z (2002) Semiparametric analysis of transformation models with censored data. Biometrika 89:659–668
Chen Y-H (2009) Weighted Breslow-type and maximum likelihood estimation in semiparametric transformation models. Biometrika 96:591–600
Chen L, Lin DY, Zeng D (2012) Checking semiparametric transformation models with censored data. Biostatistics 13:18–31
Cheng SC, Wei LJ, Ying Z (1995) Analysis of transformation models with censored data. Biometrika 82:835–845
Cheng YJ, Huang CY (2014) Combined estimating equation approaches for semiparametric transformation models with length-biased survival data. Biometrics 70:608–618
Cox D (1972) Regression models and life tables (with discussion). J R Stat Soc Ser B 34:187–220
Crowley J, Hu M (1977) Covariance analysis of heart transplant survival data. J Am Stat Assoc 94:496–509
Dabrowska DM, Doksum KA (1988) Estimation and testing in the two-ample generalized odds-rate model. J Am Stat Assoc 83:744–749
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, Hoboken
Kim JP, Lu W, Sit T, Ying Z (2013) A unified approach to semiparametric transformation models under general biased sampling schemes. J Am Stat Assoc 108:217–227
Klein JP, Moeschberger ML (1997) Survival analysis: tenchniques for censored and truncated data. Springer, New York
Liu Y, Zhang X (2011) Analysis of dependently truncated sample using inverse probability weighted estimator. Mathematical Thesis, Department of Mathematics and Statistics, George State University (2011)
Mandel M, Betensky RA (2007) Testing goodness of fit of a uniform truncation model. Biometrics 63:405–412
Martinussen T, Scheike TH (2006) Dynamic regression models for survival data. Springer, New York
Meira-Machado L, de Uña-Álvarez J, Cadarso-Suárez C, Andersen PK (2009) Multi-state models for the analysis of time-to-event data. Stat Methods Med Res 18:195–222
Murphy SA, Rossini AJ, van der Vaart AW (1997) Maximum likelihood estimation in the proportional odds model. J Am Stat Assoc 92:968–976
Pan W, Chappell R (2002) Estimation in the Cox proportional hazards model with left-truncated and interval-censored data. Biometrics 58:64–70
Qian J, Betensky RA (2014) Assumptions regarding right censoring in the presence of left truncation. Stat Probab Lett 87:12–17
Shen P-S (2011) Semiparametric analysis of transformation models with left-truncated and right-censored data. Computat Stat 26:521–537
Shen P-S (2015) Semiparametric analysis of transformation models with dependently left-truncated and right-censored data. Commun Stat-Simul Comput. doi:10.1080/03610918.2015.1048879
Wang M-C (1987) Product-limit estimates: a generalized maximum likelihood study. Communications in Statistics, Part A Theory and Methods 6:3117–3132
Wang M-C (1989) A semiparametric model for randomly truncated data. J Am Stat Assoc 84:742–748
Wang M-C (1991) Nonparametric estimation from cross-sectional survival data. J Am Stat Assoc 86:130–143
Wang M-C, Brookmeyer R, Jewell NP (1993) Statistical models for prevalent cohort data. Biometrics 49:1–11
Woodroofe M (1985) Estimating a distribution function with truncated data. Ann Stat 13:163–167
Zeng D, Lin DY (2006) Efficient estimation of semiparametric transformation models for counting processes. Biometrika 93:627–640
Zeng D, Lin DY (2010) A general theory for maximum likelihood estimation in semiparametric regression models with censored data. Stat Sin 20:871–910
Author information
Authors and Affiliations
Corresponding author
Appendix: Proof of Theorem 1
Appendix: Proof of Theorem 1
The conditional likelihood function for the observations \(X_i\le \tau _c~(i=1,\dots ,n)\) can be written as follows:
where \(O_i=(X_i,V_i,\delta _i,Z_i)\) denote the \(i^{th}\) observation and
Similar to proof of Zeng and Lin (2010), we require the following conditions:
(C1) The true value \(\beta _0\) lies in the interior of a compact set \(\mathcal C\), and the true function \(R_0 (\cdot )\) is strictly increasing and continuously differentiable in \([0,\tau _c]\).
(C2) With probability one, \(P(\inf _{s\in [0,t]} Y_i (s)\ge 1|Z_i)> \delta _0 > 0\) for all \(t\in [0,\tau _c]\).
(C3) There exists a constant \(c_1 > 0\) and a random variable \(r_1(O_i) > 0\) such that \(E[\log r_1 (O_i)] < \infty \) and for any \(\beta \in \mathcal{C}\) and any R,
almost surely. In addition, for any constant \(c_2\),
where \(||w||_{V[0,\tau _c]}\) is the total variation of \(w(\cdot )\) in \([0,\tau _c]\) and \(r_2 (O_i)\), which may depend on \(c_2\) is a finite random variable with \(E[|\log r_2(O_i)|]<\infty \).
Furthermore, we require certain smoothness of \(\varPsi \). Let \({\dot{\varPsi }}_{\beta }\) denote the derivative of \(\varPsi \) with respect to \(\beta \) and \({\dot{\varPsi }}(H)\) denote the derivative of \(\varPsi \) along the path \(R+\epsilon H\), where H belong the set of functions in which \(R +\epsilon H\) is increasing with bounded total variation.
(C4) For any \((\beta _1,\beta _2)\in \mathcal{C}\) and \((R_1,R_2)\), \((H_1,H_2)\) with uniformly bounded total variations, there exists a random variable \(R(O_i)\in L_4(P)\) and a stochastic process \(\mu _i(t,O_i)\in L_6 (P)\) such that
In addition, \(\mu _i (t,O_i)\) is non-decreasing and \(E[R(O_i)\mu _i(t,O_i)]\) is left-continuous with uniformly bounded left- and right-derivatives for any \(t\in [0,\tau _c]\).
(C5) If
almost surely, then \(\beta ^{*}=\beta _0\) and \(R^{*}(t)=R_0 (t)\) for \(t\in [0,\tau _c]\).
(C6) Let \(BV[0,\tau _c]\) denote the space of functions with bounded total variations in \([0,\tau _c]\). There exists a bounded function \(\zeta (t,\beta _0,R_0)\in BV[0,\tau _c]\) and a matrix \(M(\beta _0,R_0)\) such that
In addition,
where \(\eta _1(s,\beta ,R)\) is a p-dimensional bounded function and \(\eta _2(s,t,\beta ,R)\) is a bounded bivariate function. Furthermore, there exists a constant \(c_3\) such that \(|\eta _2(s,t_1\beta _0,R_0)-\eta _2(s,t_2,\beta _0,R_0)|\le c3|t_1 -t_2|\) for any \(s,t_1,t_2\in [0,\tau _c]\).
(C7)
for some vector \(v\in \mathcal{R}^{p}\) and \(w\in BV[0,\tau _c]\), then \(v=0\) and \(w=0\).
Define \(\mathcal{V}=\{v\in \mathcal{R}^{p},|v|\le 1\}\) and \(\mathcal{D}=\{w(t):||w(t)||_{V[0,\tau _c]}\le 1\}\).
By (C3), the conditional likelihood function is bounded by
Furthermore, (C2) implies that the maximum of (A.1) can be attained only for \({\hat{R}}(\tau _c) <\infty \). By differentiating (A.1) with respect to \(dR(X_i)\) for which \(dN_i (X_i)=1\) and \(Y_i (X_i)=1\), it follows that \({\hat{R}}\) satisfied
Let \({\tilde{R}}\) be a step function with jumps only at the \(X_i\) for which \(dN_i (X_i)=1\) and \(Y_i(X_i)=1\), i.e. \({\tilde{R}}\) satisfies
Under condition (C4), by Lemma 1 of Zeng and Lin (2010) and the Glivenko-Cantelli Theorem, \({\tilde{R}}\) converges uniformly to \(R_0\) in \([0,\tau _c]\). Let \(l_n(\beta ,R,\tau _c)\) denote the log-likelihood function of \(L_c(\beta ,R,\tau _c)\). Similar to the arguments of Step 2 of Zeng and Lin (2010), the difference between \(l_n({\hat{\beta }},{\hat{R}},\tau _c)\) and \(l_n(\beta _0,{\tilde{R}},\tau _c)\), is negative eventually if \({\hat{R}}(\tau _c)\) diverges, which will induce a contradiction. Hence, \(\lim \sup _{n}{\hat{R}}(\tau _c) <\infty \) almost surely. Since \({\hat{R}}\) is bounded and monotone, Helly’s Theorem implies that for any subsequence, we can always choose a further subsequence such that \({\hat{R}}\) converges point-wisely to some monotone function \(R_{*}\). Without loss of generality, assume that \({\hat{\beta }}\) converges to \(\beta _{*}\). Note that
Under condition (C3), by Lemma 1 of Zeng and Lin (2010) and the Glivenko-Cantelli Theorem, the numerator and denominator in the integrand of (A.2) converges uniformly to deterministic functions. Under condition (C5), it follows by arguments of Step 3 of Zeng and Lin (2010) \(\beta _{*}=\beta _0\) and \(R_{*}=R_0\). The proof is complete.
Remark 5
Notice that by the arguments of section 10.1 of Zeng and Lin (2010), a sufficient condition for (C3), (C4) and (C6) is the (S1) given as follows.
(S1) G(t) is four-times differentiable such that \(G(0)=0\), \(G^{'}(0) > 0\) for any integer \(m\ge 0\) and any sequence \(0<t_1< \dots < t_m\le y\),
for some constants \(\mu _0\) and \(k_0\). In addition, there exits a constant \(\rho _0\) such that
Rights and permissions
About this article
Cite this article
Chen, CM., Shen, PS. Conditional maximum likelihood estimation in semiparametric transformation model with LTRC data. Lifetime Data Anal 24, 250–272 (2018). https://doi.org/10.1007/s10985-016-9385-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-016-9385-9