Ming-wei MA, Li-liang REN*, Song-bai SONG, Jia-li SONG, Shan-hu JIANG
1. State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Hohai University, Nanjing 210098, P. R. China
2. College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling 712100, P. R. China
3. Business School, Hohai University, Nanjing 211100, P. R. China
Goodness-of-fit tests for multi-dimensional copulas: Expanding application to historical drought data
Ming-wei MA1, Li-liang REN*1, Song-bai SONG2, Jia-li SONG3, Shan-hu JIANG1
1. State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Hohai University, Nanjing 210098, P. R. China
2. College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling 712100, P. R. China
3. Business School, Hohai University, Nanjing 211100, P. R. China
The question of how to choose a copula model that best fits a given dataset is a predominant limitation of the copula approach, and the present study aims to investigate the techniques of goodness-of-fit tests for multi-dimensional copulas. A goodness-of-fit test based on Rosenblatt’s transformation was mathematically expanded from two dimensions to three dimensions and procedures of a bootstrap version of the test were provided. Through stochastic copula simulation, an empirical application of historical drought data at the Lintong Gauge Station shows that the goodness-of-fit tests perform well, revealing that both trivariate Gaussian and Student t copulas are acceptable for modeling the dependence structures of the observed drought duration, severity, and peak. The goodness-of-fit tests for multi-dimensional copulas can provide further support and help a lot in the potential applications of a wider range of copulas to describe the associations of correlated hydrological variables. However, for the application of copulas with the number of dimensions larger than three, more complicated computational efforts as well as exploration and parameterization of corresponding copulas are required.
goodness-of-fit test; multi-dimensional copulas; stochastic simulation; Rosenblatt’s transformation; bootstrap approach; drought data
Copulas, initially introduced by Sklar (1959), are functions that join univariate distributions to form their multivariate distribution. They offer the flexibility of modeling multivariate distribution through the choice of margins from different families of univariate distributions and the selection of a suitable dependence structure. Due to their favorable properties, copulas have proved useful in financial applications (Frees et al. 1996; Mendes and Souza 2004). In recent years, copulas have been introduced into analyses of multivariate hydrological extreme events and have become a popular tool for modeling the dependencestructures of correlated/non-independent hydrological random variables, e.g., rainfall (Evin and Favre 2008; Wang et al. 2010; Zhang et al. 2012), floods (Grimaldi and Serinaldi 2006; Zhang and Singh 2007; Chowdhary et al. 2011), and droughts (Shiau 2006; Song and Singh 2010; Zhang et al. 2011; Ma et al. 2012).
Considering the availability of excessive copula functions, some criteria (e.g., the Akaike information criterion (AIC), Bayesian information criterion (BIC), and root mean square error (RMSE)) are widely used to select appropriate copulas as well as other multi-dimensional models by estimating their fitting biases. However, relatively small fitting biases do not invariably guarantee a satisfactory representation of the observations. Whether or not a certain copula or a parametric family of copulas is competent for the description of the dependence structures in the historical data can be investigated by applying specialized goodness-of-fit tests for copulas. Although several goodness-of-fit tests have been proposed, there are no general guidelines for selecting the optimal parametric copula. Genest and Rivest (1993) developed an empirical method to identify the best copula in the Archimedean case. Since copulas are invariant under strictly increasing transformations (Nelsen 1999), Diebold et al. (1998, 1999), Berkowitz (2001), and Berg and Bakken (2005) used the probability integral transform (PIT) of the data in the evaluation of copula models. Panchenko (2005) focused on positive definite bilinear forms, while Genest et al. (2006) utilized the Kendall’s process. For a thorough review of contributions to this field, see also Malevergne and Sornette (2003), Breymann et al. (2003), Dobri? and Schmid (2005), Junker and May (2005), and Fermanian (2005).
Dobri? and Schmid (2007) addressed a test for parametric families of bivariate copulas based on Rosenblatt’s transformation, which was also suggested and applied in Breymann et al. (2003). In these applications, bivariate copulas were mainly investigated while the methodology was tested and verified with either financial data or artificial samples. Though Dobri? and Schmid (2007) declared that the computation of the test statistics could be applied to the cases of higher-dimensional copulas, relevant studies exploring multi-dimensional copulas and coping with hydrological data have not been reported so far. In fact, difficulties and special issues are expected to arise in the process of transformations from two dimensions to three dimensions (or even to higher numbers of dimensions). Therefore, the present study aims (1) to propose a goodness-of-fit test for multi-dimensional copulas with parametric expressions based on Rosenblatt’s transformation, and (2) to verify the capability of the test through stochastic simulation of trivariate Gaussian and Student t copulas using historical drought observations.
2.1 Rosenblatt’s transformation
Rosenblatt (1952) proposed a transformation mapping a k-variate random vector with a continuous distribution to one with a uniform distribution on the k-dimensional hypercube. The transformation can be used to obtain the residuals for various multivariate probability models,which allows for formal goodness-of-fit testing of these models. A simple description of Rosenblatt’s transformation is as follows:
Following the notation of Rosenblatt (1952), letX=(X1,X2,…,Xk)be a random vector with distribution functionF(x1,x2,…,xk). The conditional cumulative distribution functions are defined as
Then, Rosenblatt’s transformationTis given byz=(z1,z2,…,zk)=Tx=T(x1,x2,…,xk), where
If the distribution ofXis continuous, the random vectorZ, given byZ=TX, is uniformly distributed on thek-dimensional hypercube.
2.2 Mathematical derivation of goodness-of-fit test
LetX,Y, andZdenote three random variables with a joint probability distribution functionFX,Y,Z(x,y,z)=P(X≤x,Y≤y,Z≤z) for {x,y,z}∈R3and the marginal distribution functionsFX(x)=P(X≤x),FY(y)=P(Y≤y), andFZ(z)=P(Z≤z)forx,y,z∈R. SupposeFX,FY, andFZare all continuous functions; then, there exists a unique copulaC:[0,1]3→[0,1] with
whereC(?), the trivariate copula, denotes the joint distribution function of the variables. LetU=FX(x),V=FY(y), andW=FZ(z), i.e.,C(u,v,w)=P(U≤u,V≤v,W≤w) for {u,v,w}∈[0,1]3, and the conditional distribution function ofWat givenU=uandV=vcan be expressed as
Here, we assume that the second-order partial derivative exists. According to Rosenblatt (1952), the random variables
and
are independent and uniformly distributed in [] 0,1. Thus, the random variable
2.3 Procedures of bootstrap version for trivariate copulas
According to Dobri? and Schmid (2007), Genest et al. (2009), Song and Singh (2010), and Ma and Song (2010), the procedures of goodness-of-fit tests for trivariate copulas using a bootstrap approach are as follows:
(2) The joint probability distribution of (Xi,Yi) is estimated using a chosen bivariate copula:
fori=1,2,…,n.
The modeled samples necessary for goodness-of-fit tests resort to copula simulation (step (6) in the above-proposed procedures). Therefore, procedures for Gaussian and Student t copulas as well as a case study are provided below to illustrate goodness-of-fit tests for trivariate copulas.
3.1 Trivariate Gaussian and Student t copulas
According to Fang et al. (2002) and Demarta and McNeil (2005), the trivariate Student t copula can be parametrically expressed as
3.2 Gaussian copula simulation
(1) Simulate the independent and uniformly distributed random variablesv1,v2, andv3.
(2) Setu1=v1.
3.3 Student t copula simulation
(2) Set u1=v1.
4.1 Data
The historical drought data from the Lintong Gauge Station in the Weihe Basin, China, were used to illustrate this proposed approach for goodness-of-fit tests of trivariate copulas. Monthly precipitations covering a period from 1959 to 2008 were used to define droughts based on the theory of runs. All the data were obtained from the National Climate Center of the China Meteorological Administration and are complete data. Using the Mann-Kendall method, the data do not show obvious trends and can be accepted as temporally homogeneous. As illustrated in Fig. 1 (wheretis time,Xtis the observed precipitation time series, andX0is a given threshold), a drought event is defined as a period when precipitation is equal to or less than the predetermined threshold. Drought characteristics, i.e., duration (D), severity (S), and peak (P) were extracted for each drought event using the averages of monthly precipitation as truncation levels, and some basic statistics of these three components are shown in Table 1. The correlation coefficients of Pearson’srn, Spearman’sρn, and Kendall’sτngiven in Table 2 show that the observed drought duration, severity, and peak are highly correlated with one another, with a maximum correlation coefficient exceeding 0.9. The results were confirmed by the Chi-plots described in Fig. 2 (for a thorough review and more details about Chi-plots, see Fisher and Switzer (1985, 2001), Ma et al. (2012), and references therein). Most of the empirical points fall outside the confidence band (α=0.05) in the Chi-plots, which indicates that apparent dependent relationships exist among drought duration, severity, and peak. While significantly positive dependent relationships between bivariate drought variables are revealed both by the results of the correlation coefficients and Chi-plots, the degree of dependence between the drought duration and severity is larger than that between the drought duration and peak, and is less than that between the drought severityand peak. However, distributions of the points in the Chi-plots also indicate different dependence structures of drought components: for duration-severity and duration-peak they are similar (almost symmetric), but they are strictly distinct (extremely asymmetric) for severity-peak.
Fig. 1Definition of drought using theory of runs
Assuming that the drought duration, severity, and peak are continuous variables, a variety of univariate cumulative distribution functions (CDFs) were used to fit the observed drought data first. Two criteria (AIC and RMSE) and various goodness-of-fit techniques (the Chi-square, Kolmogorov-Smirnov, Cramer-von Mises, Anderson-Darling, and modified weighted Durbin-Watson tests) were adopted to select margins. The exponential distribution, Weibull distribution, and generalized Pareto distribution, respectively, were eventually chosen as the optimal marginal distributions for drought duration, severity, and peak. The maximum likelihood (ML) method was applied to estimate parameters of the exponential distribution forthe drought duration, while parameters of the Weibull distribution for the drought severity and the generalized Pareto distributions for the drought peak were estimated using the probability weight-moment method (PWM). Dependence structures of drought duration, severity, and peak were then modeled with the trivariate Gaussian and Student t copulas to obtain their multivariate joint distribution. Parameters of the Gaussian and Student t copulas were computed using the maximum pseudo-likelihood estimation method (Nadarajah 2006; Song and Singh 2010) and are shown in Table 3.
Table 1Basic statistics of drought variables
Table 2Correlation coefficients of drought variables
Fig. 2Chi-plots for drought duration, severity, and peak
Table 3Parameters of Gaussian and Student t copulas
4.2 Results and discussion
According to the procedures described in Section 2.3, the Kolmogorov-Smirnov and Anderson-Darling statistics of the Gaussian and Student t copulas were numerically computed and are shown in Tables 4 and 5, respectively. Given the significance levelα=0.05, it was found that all test statistics based on the observed drought duration, severity, and peak were less than the corresponding critical values, which indicates that neither Gaussian copula nor Student t copula can be rejected at the significance levelα=0.05. In other words, the null hypothesisH0*as well asH0is accepted, i.e., both of the Gaussian and Student t copulas are acceptable for describing the dependence structures of the drought duration, severity, and peak as well as for modeling their trivariate joint probability distribution.
Table 4Critical values ofDnfor Gaussian and Student t copulas
Table 5Critical values offor Gaussian and Student t copulas
Table 5Critical values offor Gaussian and Student t copulas
Copula2Critical values at various significance levelsα An0.20 0.15 0.10 0.05 0.01 Gaussian 2.000 4 2.781 1 3.311 8 4.032 8 5.203 1 7.762 0 Student t 1.007 8 3.880 9 4.492 8 5.325 2 6.781 6 10.334 8
Throughout the limited current applications of copula-based methods to multivariate drought issues, Archimedean copulas (many of which are, generally, valid for roughly identical and symmetric dependence structures among the considered multi-variables) seem to have been most commonly used (Ma et al. 2012). Nevertheless, in reality, chances are that most of the multi-contributing variables in hydrological or meteorological processes (e.g., rainfall, floods,and especially droughts) possess various dependence structures and degrees of associations, which are asymmetric and unbalanced. For instance, the markedly heterogeneous dependences of drought duration, severity, and peak reflected in the Chi-plots (Fig. 2) are better modeled by a selected meta-elliptical family of copulas. The fitting efficiencies of trivariate Gaussian and Student t copulas are shown in Fig. 3, which can be naturally confirmed by the results of goodness-of-fit tests, and this indicates that the Gaussian and Student t copulas both produce a satisfactory representation of the historical drought observations. Thus, the dependence structures of drought duration, severity, and peak can be readily modeled using the Gaussian and Student t copulas in order to obtain corresponding multivariate characteristics (such as joint probabilities and return periods) of drought events. These potential messages are useful and essential for drought risk management as well as for practical design and planning; since the drought duration, severity, and peak can be considered in total, it is possible to obtain various combinations of different drought components for several purposes in hydrological practices.
Fig. 3Comparison of multivariate empirical and theoretical distributions
Rosenblatt’s transformation can be applied to copulas in order to propose a test of fit for them and this technique of goodness-of-fit testing can in principle be used for every parametric family of copulas. Mathematical foundations of the goodness-of-fit test for trivariate copulas and corresponding procedures of a bootstrap approach were provided. Using the Gaussian and Student t copulas as an example, we demonstrate through copula simulation that the observed historical drought data at the Lintong Gauge Station with a trivariate meta-elliptical copula are acceptable at certain significance levels. As copulas are increasingly used to describe dependences of correlated random variables, the methodologies of goodness-of-fit testing for multi-dimensional copulas can provide strong support and help a lot in the further applications of a wide variety of copulas as useful tools for exploring the dependency relationships and subsequent multivariate joint probability distributions of non-independent hydrological variables with different dependence structures and degrees of associations.
Although, in theory, the methods of goodness-of-fit tests for trivariate copulas described in this paper could be extended to have higher numbers of dimensions, more complicated computational efforts are surely required. Besides, as we pointed out in the beginning, the existing framework and methods remain ineffective for non-parametric families of copulas (whereas there are many of them in potential applications); and exploration of analytical formulas and estimation of parameters for multi-dimensional copulas can also be better addressed with more efforts in the future.
Berg, D., and Bakken, H. 2005. A Goodness-of-fit Test for Copulae based on the Probability Integral Transform. Oslo: Department of Mathematics, University of Oslo.
Berkowitz, J. 2001. Testing density forecasts, with applications to risk management. Journal of Business and Economic Statistics, 19(4), 465-474. [doi:10.1198/07350010152596718]
Breymann, W., Dias, A., and Embrechts, P. 2003. Dependence structures for multivariate high-frequency data in finance. Quantitative Finance, 3(1), 1-14. [doi:10.1080/713666155]
Chowdhary, H., Escobar, L. A., and Singh, V. P. 2011. Identification of suitable copulas for bivariate frequency analysis of flood peak and flood volume data. Hydrology Research, 42(2-3), 193-216. [doi:10.2166/nh.2011.065]
Demarta, S., and McNeil, A. J. 2005. The t copula and related copulas. International Statistical Review, 73(1), 111-129. [doi:10.1111/j.1751-5823.2005.tb00254.x]
Diebold, F. X., Gunther, T. A., and Tay, A. S. 1998. Evaluating density forecasts with applications to financial risk management. International Economic Review, 39(4), 863-883. [doi:10.2307/2527342]
Diebold, F. X., Hahn, J., and Tay, A. S. 1999. Multivariate density forecast and calibration in financial risk management: High-frequency returns on foreign exchange. The Review of Economics and Statistics, 81(4), 661-673. [doi:10.1162/003465399558526]
Dobri?, J., and Schmid, F. 2005. Testing goodness of fit for parametric families of copulas: Application to financial data. Communications in Statistics — Simulation and Computation, 34(4), 1053-1068. [doi: 10.1080/03610910500308685]
Dobri?, J., and Schmid, F. 2007. A goodness of fit test for copulas based on Rosenblatt’s transformation. Computational Statistics and Data Analysis, 51(9), 4633-4642. [doi:10.1016/j.csda.2006.08.012]
Evin, G., and Favre, A. C. 2008. A new rainfall model based on the Neyman-Scott process using cubic copulas. Water Resources Research, 44, W03433. [doi:10.1029/2007WR006054]
Fang, H. B., Fang, K. T., and Kotz, S. 2002. The meta-elliptical distributions with given marginals. Journal of Multivariate Analysis, 82(1), 1-16. [doi:10.1006/jmva.2001.2017]
Fermanian, J. D. 2005. Goodness of fit tests for copulas. Journal of Multivariate Analysis, 95(1), 119-152. [doi:10.1016/j.jmva.2004.07.004]
Fisher, N. I., and Switzer, P. 1985. Chi-plots for assessing dependence. Biometrika, 72(2), 253-265.
Fisher, N. I., and Switzer, P. 2001. Graphical assessment of dependence: Is a picture worth 100 tests? American Statistician, 55(3), 233-239.
Frees, E. W., Carriere, J., and Valdez, E. 1996. Annuity valuation with dependent mortality. The Journal of Risk and Insurance, 63(2), 229-261.
Genest, C., and Rivest, L. P. 1993. Statistical inference procedures for bivariate Archimedean copulas. Journal of the American Statistical Association, 88(423), 1034-1043. [doi:10.1080/01621459.1993. 10476372]
Genest, C., Quessy, J. F., and Rémillard, B. 2006. Goodness-of-fit procedures for copula models based on the probability integral transformation. Scandinavian Journal of Statistics, 33(2), 337-366. [doi:10.1111/ j.1467-9469.2006.00470.x]
Genest, C., Rémillard, B., and Beaudoin, D. 2009. Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics, 44(2), 199-213. [doi:10.1016/j.insmatheco.2007.10.005]
Grimaldi, S., and Serinaldi, F. 2006. Asymmetric copula in multivariate flood frequency analysis. Advances in Water Resources, 29(8), 1155-1167. [doi:10.1016/j.advwatres.2005.09.005]
Junker, M., and May, A. 2005. Measurement of aggregate risk with copulas. Econometrics Journal, 8(3), 428-454. [doi:10.1111/j.1368-423X.2005.00173.x]
Ma, M. W., and Song, S. B. 2010. Elliptical copulas for drought characteristics analysis of Xi’an gauging station. Journal of China Hydrology, 30(4), 36-42. (in Chinese)
Ma, M. W., Song, S. B., Ren, L. L., Jiang, S. H., and Song, J. L. 2012. Multivariate drought characteristics using trivariate Gaussian and Student t copulas. Hydrological Processes, published online at http://onlinelibrary.wiley.com/doi/10.1002/hyp.8432/abstract on April 17, 2012 [doi:10.1002/hyp.8432]
Malevergne, Y., and Sornette, D. 2003. Testing the Gaussian copula hypothesis for financial asset dependences. Quantitative Finance, 3(4), 231-250. [doi:10.1088/1469-7688/3/4/301]
Mendes, B. V. M., and Souza, R. M. 2004. Measuring financial risks with copulas. International Review of Financial Analysis, 13(1), 27-45. [doi:10.1016/j.irfa.2004.01.007]
Nadarajah, S. 2006. Fisher information for the elliptically symmetric Pearson distributions. Applied Mathematics and Computation, 178(2), 195-206. [doi:10.1016/j.amc.2005.11.037]
Nelsen, R. B. 1999. An Introduction to Copulas. New York: Springer.
Panchenko, V. 2005. Goodness-of-fit test for copulas. Physica A, 355, 176-182. [doi:10.1016/j.physa. 2005.02.081]
Rosenblatt, M. 1952. Remarks on a multivariate transformation. The Annals of Mathematical Statistics, 23(3), 470-472.
Shiau, J. T. 2006. Fitting drought duration and severity with two-dimensional copulas. Water Resources Management, 20(5), 795-815. [doi:10.1007/s11269-005-9008-9]
Sklar, A. 1959. Distribution functions of n dimensions and margins. Publications of the Institute of Statistics of the University of Paris, 8, 229-231. (in French)
Song, S. B., and Singh, V. P. 2010. Meta-elliptical copulas for drought frequency analysis of periodic hydrologic data. Stochastic Environmental Research and Risk Assessment, 24(3), 425-444. [doi:10.1007/ s00477-009-0331-1]
Wang, X. J., Gebremichael, M., and Yan, J. 2010. Weighted likelihood copula modeling of extreme rainfall events in Connecticut. Journal of Hydrology, 390(1-2), 108-115. [doi:10.1016/j.jhydrol.2010.06.039]
?e?ula, I. 2009. On multivariate Gaussian copulas. Journal of Statistical Planning and Inference, 139(11), 3942-3946. [doi:10.1016/j.jspi.2009.05.039]
Zhang, L., and Singh, V. P. 2007. Trivariate flood frequency analysis using the Gumbel-Hougaard copula. Journal of Hydrologic Engineering, 12(4), 431-439. [doi:10.1061/(ASCE)1084-0699(2007)12:4(431)]
Zhang, Q., Chen, Y. Q., Chen, X. H., and Li, J. F. 2011. Copula-based analysis of hydrological extremes and implications of hydrological behaviors in the Pearl River basin, China. Journal of Hydrologic Engineering, 16(7), 598-607. [doi:10.1061/(ASCE)HE.1943-5584.0000350]
Zhang, Q., Li, J. F., and Singh, V. P. 2012. Application of Archimedean copulas in the analysis of the precipitation extremes: Effects of precipitation changes. Theoretical and Applied Climatology, 107(1-2), 255-264. [doi:10.1007/s00704-011-0476-y]
(Edited by Yun-li Y U)
This work was supported by the Program of Introducing Talents of Disciplines to Universities of the Ministry of Education and State Administration of the Foreign Experts Affairs of China (the 111 Project, Grant No. B08048) and the Special Basic Research Fund for Methodology in Hydrology of the Ministry of Sciences and Technology of China (Grant No. 2011IM011000).
*Corresponding author (e-mail: RLL@hhu.edu.cn)
Received Nov. 22, 2011; accepted Apr. 12, 2012
Water Science and Engineering2013年1期