Abstract
The paper presents a new method for flexible fitting of D-vines. Pair-copulas are estimated semi-parametrically using penalized Bernstein polynomials or constant and linear B-splines, respectively, as spline bases in each knot of the D-vine throughout each level. A penalty induce smoothness of the fit while the high dimensional spline basis guarantees flexibility. To ensure uniform univariate margins of each pair-copula, linear constraints are placed on the spline coefficients and quadratic programming is used to fit the model. The amount of penalizations for each pair-copula is driven by a penalty parameter which is selected in a numerically efficient way. Simulations and practical examples accompany the presentation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-copula constructions of multiple dependence. Insur. Math. Econ. 44(2), 182–198 (2009)
Applegate, D.L.: The Traveling Salesman Problem. Prin. Ser. Appl. Math. Princeton University Press, Princeton (2006)
Bedford, T., Cooke, R.: Probability density decomposition for conditionally dependent random variables modeled by vines. Ann. Math. Artif. Intell. 1(32), 245–268 (2001)
Bedford, T., Cooke, R.M.: Vines: a new graphical model for dependent random variables. Ann. Stat. 30(4), 1031–1068 (2002)
Bouezmarni, T., Rombouts, J., Taamouti, A.: Asymptotic properties of the Bernstein density copula estimator for [alpha]-mixing data. J. Multivar. Anal. 101(1), 1–10 (2010)
Brechmann, E.C.: Truncated and simplified regular vines and their applications. Diploma thesis. Master’s thesis, Technische Universität München (2010)
Brechmann, E., Czado, C., Aas, K.: Truncated regular vines in high dimensions with applications to financial data. Can. J. Stat. 40(1), 68–85 (2012)
Burnham, K., Anderson, D.R.: Model Selection and Multimodel Inference—A Practical Information-Theoretic Approach. Springer, Berlin (2010)
Czado, C.: Pair-copula constructions of multivariate copulas. In: Bickel, P., Diggle, P., Fienberg, S., Gather, U., Olkin, I., Zeger, S., Jaworski, P., Durante, F., Härdle, W.K., Rychlik, T. (eds.) Copula Theory and Its Applications. Lecture Notes in Statistics, vol. 198, pp. 93–109. Springer, Berlin (2010)
de Boor, C.: A Practical Guide to Splines. Springer, Berlin (1978)
Doha, E.H., Bhrawy, A.H., Saker, M.A.: On the derivatives of Bernstein polynomials: an application for the solution of high even-order differential equations. Bound. Value Probl. 2011, 829543 (2011)
Efron, B.: Selection criteria for scatterplot smoothers. Ann. Stat. 29, 470–504 (2001)
Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11(2), 89–121 (1996)
Frahm, G., Junker, M., Szimayer, A.: Elliptical copulas: applicability and limitations. Stat. Probab. Lett. 63, 275–286 (2003)
Hahsler, M., Hornik, K.: Tsp—infrastructure for the traveling salesperson problem. J. Stat. Softw. 23(2), 1–21 (2007)
Härdle, W., Okhrin, O.: De copulis non est disputandum—copulae: an overview. AStA Adv. Stat. Anal. 94(1), 1–31 (2009)
Hurvich, C.M., Tsai, C.-L.: Regression and time series model selection in small samples. Biometrika 76(2), 297–307 (1989)
Jaworski, P., Durante, F., Härdle, W., Rychlik, T.: Copula Theory and Its Applications. Lecture Notes in Statistics, vol. 198. Springer, Berlin (2010). Proceedings
Joe, H.: Families of m-variate distributions with given margins and m(m−1)/2 bivariate dependence parameters. In: Rüschendorf, L., Schweizer, B., Taylor, M. (eds.) Distributions with Fixed Marginals and Related Topics, vol. 28, pp. 120–141. Institute of Mathematical Statistics, Hayward (1996)
Kakizawa, Y.: Bernstein polynomial probability density estimation. J. Nonparametr. Stat. 16(5), 709–729 (2004)
Kauermann, G., Schellhase, C., Ruppert, D.: Flexible copula density estimation with penalized hierarchical B-splines. Scand. J. Stat. (2013). doi:10.1111/sjos.12018
Kolev, N., Anjos, U., Mendes, B.: Copulas: a review and recent developments. Stoch. Models 22(4), 617–660 (2006)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Kurowicka, D., Cooke, R.: Uncertainty Analysis with High Dimensional Dependence Modelling. Wiley, Chichester (2006)
Lorentz, G.: Bernstein Polynomials. Mathematical Expositions, vol. 8. University of Toronto Press, Toronto (1953)
McNeil, A., Frey, R., Embrechts, P.: Quantitative Risk Management. Princeton Series in Finance. Princeton University Press, Princeton (2005)
Min, A., Czado, C.: Bayesian model selection for D-vine pair-copula constructions. Can. J. Stat. 39(2), 239–258 (2011)
Nelsen, R.: An Introduction to Copulas, 2nd edn. Springer, Berlin (2006)
Okhrin, O., Okhrin, Y., Schmid, W.: On the structure and estimation of hierarchical Archimedean copulas. J. Econom. 173(2), 189–204 (2013)
Petrone, S.: Bayesian density estimation using Bernstein polynomials. Can. J. Stat. 27(1), 105–126 (1999)
Qu, L., Yin, W.: Copula density estimation by total variation penalized likelihood with linear equality constraints. Comput. Stat. Data Anal. 56(2), 384–398 (2012)
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2013)
Rank, J. (ed.): Copulas. Risk Books, London (2007)
Reiss, T., Ogden, R.: Smoothing parameter selection for a class of semiparametric linear models. J. R. Stat. Soc. B 71(2), 505–523 (2009)
Rivlin, T.: An Introduction to the Approximation of Functions. Blaisdell, Waltham (1969)
Rosenkrantz, D., Stearns, R.E., Lewis, P.M. II: An analysis of several heuristics for the traveling salesman problem. SIAM J. Comput. 6(3), 563–581 (1977)
Rue, H., Martino, S., Chopin, N.: Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. B 71(2), 319–392 (2009)
Ruppert, D., Wand, M., Carroll, R.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric regression during 2003–2007. Electron. J. Stat. 3, 1193–1256 (2009)
Sancetta, A., Satchell, S.: Bernstein copula and its applications to modeling and approximations of multivariate distributions. Econ. Theory 20(3), 535–562 (2004)
Savu, C., Trede, M.: Hierarchies of Archimedean copulas. Quant. Finance 10(3), 295–304 (2010)
Schall, R.: Estimation in generalized linear models with random effects. Biometrika 78(4), 719–727 (1991)
Schellhase, C.: penDvine: flexible pair-copula estimation in D-vines using bivariate penalized splines. R package version 0.2.2 (2013)
Schepsmeier, U., Brechmann, E.C.: CDVine: statistical inference of C- and D-vine copulas. R package version 1.1-4 (2011)
Searle, S., Casella, G., McCulloch, C.: Variance Components. Wiley, New York (1992)
Shen, X., Zhu, Y., Song, L.: Linear B-spline copulas with applications to nonparametric estimation of copulas. Comput. Stat. Data Anal. 52(7), 3806–3819 (2008)
Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 8, 229–231 (1959)
Smith, M., Min, A., Almeida, C., Czado, C.: Modeling longitudinal data using a pair-copula construction decomposition of serial dependence. J. Am. Stat. Assoc. 105, 1467–1479 (2010)
Stein, M.L.: A comparison of generalized cross validation and modified maximum likelihood for estimating the parameters of a stochastic process. Ann. Stat. 18, 1139–1157 (1990)
Wahba, G.: A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Ann. Stat. 13, 1378–1402 (1985)
Wahba, G.: Spline Models for Observational Data. SIAM, Philadelphia (1990)
Wand, M.P., Ormerod, J.T.: On semiparametric regression with O’Sullivan penalised splines. Aust. N. Z. J. Stat. 50(2), 179–198 (2008)
Wood, S.N.: Generalized Additive Models. Chapman & Hall/CRC Press, London/Boca Raton (2006)
Wood, S.: Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. B 73(1), 3–36 (2011)
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is an extended and improved excerpt of chapter five of the dissertation: Schellhase, C. (2012): Density and Copula Estimation using Penalized Spline Smoothing, Universität Bielefeld. http://nbn-resolving.de/urn:nbn:de:hbz:361-25291040.
Appendices
Appendix A: Quadratic programming
To estimate the pair-copula we make use of the quadprog package in R which allows to solve the quadratic program. We set v=v (i,j|D) and λ=λ (i,j|D) in this section for simplicity. Let therefore \({\mathbf{s}}^{p}_{ij|D}(\boldsymbol{v},\lambda)\) and \({\mathbf{H}}^{p}_{ij|D}(\boldsymbol{v},\lambda)\) denote the first and second order derivatives of (16) yielding
We approximate the penalized likelihood \(l^{p}_{ij|D}\) in (16) through a second order Taylor expansion yielding
where δ (ij|D) is the iteration step selected by maximizing (24) subject to the linear constraints (8), (9) and (12). This optimization is carried out iteratively, by approximating the likelihood as in (24) in each iteration step. To start the algorithm an admissible starting value for v (i,j|D) is required. We use a uniform distribution on the cube [0,1]2 which defines the starting value in unique way.
Appendix B: Penalty matrix for penalizing second order derivatives
We set v=v (i,j|D) and λ=λ (i,j|D) in this section for simplicity. For the marginal penalties in u i and u j in (15) follows with (7) and transformations
The integral of the second order derivatives of Bernstein polynomials are calculated easily. The second order derivative of (10) equals (see Doha et al. 2011)
This is rewritten as
with
and \(w=\frac{(K+1)!}{(K-2)!}\). Therefore, the matrix \(P_{z_{i}}\) and \(P_{z_{j}}\) are equivalent to
So, the penalty can be written as quadratic form λ v T P int v where λ is the penalty parameter steering the amount of smoothness and \(P_{int}:=P_{u_{i}}+P_{u_{j}}\).
Appendix C: Marginal likelihood
The prior (18) is degenerated, which needs to be corrected as follows. We decompose v (i,j|D) into the two components \({\boldsymbol{v}}^{(i,j|D)}_{\sim}\) and \({\boldsymbol {v}}^{(i,j|D)}_{\bot}\), respectively, such that \({\boldsymbol {v}}^{(i,j|D)}_{\sim}\) is a normally distributed random vector with non degenerated variance and \({\boldsymbol{v}}^{(i,j|D)}_{\bot}\) are the remaining components treated as parameters, see also Wand and Ormerod (2008). In fact based on a singular value decomposition we have
with \(\tilde{\boldsymbol{\varLambda}}\) as diagonal matrix with positive eigenvalues and \(\tilde{\mathbf{U}} \in\mathbb{R}^{(K+1) \times h}\) with corresponding eigenvectors where K+1 is the number of elements in v (i,j|D) and h=K+1−4 is the rank of P. Extending \(\tilde{\mathbf{U}}\) to an orthogonal basis by \(\check{\mathbf{U}}\) gives \({\boldsymbol{v}^{(i,j|D)}_{\sim}}= \tilde{\mathbf{U}}^{T}\boldsymbol{v}^{(i,j|D)}\) with the a priori assumption \({\boldsymbol{v}^{(i,j|D)}_{\sim}} \sim N(0, \lambda^{-1}\tilde{\boldsymbol{\varLambda}}^{-1})\) and with \(\mathbf{U}=(\tilde{\mathbf{U}}, \check{\mathbf{U}})\) as orthogonal basis, we get \(\boldsymbol {v}^{(i,j|D)}_{\bot}={\check{\mathbf{U}}}^{T} \boldsymbol{v}^{(i,j|D)}\). Conditioning on \({\boldsymbol{v}^{(i,j|D)}_{\sim}}\), we have x being distributed according to (6) and with (18) we get the mixed model log likelihood
The integral can be approximated by a Laplace approximation (see also Rue et al. 2009)
where \(\hat{\boldsymbol{v}}^{(i,j|D)}\) denotes the penalized maximum likelihood estimate. We can now differentiate (26) with respect to λ which gives
Appendix D: Computational aspects
Solving the travelling salesman problem (TSP) in step 2 of Algorithm 1 is a np-hard problem (see Applegate 2006). But the TSP for e.g. p=10 dimensions, faced with the cAIC of \(\binom{10}{2}=45\) pair-copulas, is rapidly solved in R using the R-package TSP (see Hahsler and Hornik 2007). Increasing the dimensionality to p=36, say, the TSP is still rapidly solved, while the R package TSP uses the ‘nearest neighbor and repetitive nearest neighbor algorithms for symmetric and asymmetric TSPs’ (see Rosenkrantz et al. 1977). Calculating the \(\binom{p}{2}\) pair-copulas in advanced to the TSP is done rapidly using parallel computing, while each calculation of a pair-copula takes only some seconds. Therefore, the calculation of high-dimensional D-vines is done in a short period of time. Table 8 presents elapsed system.time in R for bivariate, four-dimensional and ten-dimensional data with N=100 and N=500 from a Frank copula with τ=0.5 and K=14 on Intel® Core™ 2 Quad CPU Q9550 @ 2.83 GHz.
When calculating the optimal λ of a pair-copula, we are faced with the classical problems of the estimation based on Fisher Scoring. Thus, the initial choice of a penalty λ 0 influences the number of iterations to determine the optimal penalty λ.
Rights and permissions
About this article
Cite this article
Kauermann, G., Schellhase, C. Flexible pair-copula estimation in D-vines using bivariate penalized splines. Stat Comput 24, 1081–1100 (2014). https://doi.org/10.1007/s11222-013-9421-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-013-9421-5