[go: up one dir, main page]

Academia.eduAcademia.edu
Environmental Modelling & Software 23 (2008) 1438–1447 Contents lists available at ScienceDirect Environmental Modelling & Software journal homepage: www.elsevier.com/locate/envsoft Evaluating remotely sensed rainfall estimates using nonlinear mixed models and geographically weighted regression Y. Kamarianakis a, b, *, H. Feidas c, G. Kokolatos d, N. Chrysoulakis b, V. Karatzias b a University of Crete, Department of Applied Mathematics, Heraklion, Greece Institute of Applied and Computational Mathematics, Foundation for Research and Technology – Hellas, Vassilika Vouton, 711 10 Heraklion, Crete, Greece c Aristotle University of Thessaloniki, Department of Geology, Division of Meteorology–Climatology, Thessaloniki, Greece d University of the Aegean, Department of Geography, Mytilene, Greece b a r t i c l e i n f o a b s t r a c t Article history: Received 16 October 2007 Received in revised form 7 April 2008 Accepted 10 April 2008 Available online 13 June 2008 This article evaluates an infrared-based satellite algorithm for rainfall estimation, the Convective Stratiform technique, over Mediterranean. Unlike a large number of works that evaluate remotely sensed estimates concentrating on global measures of accuracy, this work examines the relationship between ground truth and satellit0e derived data in a local scale. Hence, we examine the fit of ground truth and remotely sensed data on a widely adopted probability distribution for rainfall totals – the mixed lognormal distribution – per measurement location. Moreover, we test for spatial nonstationarity in the relationship between in situ observed and satellite-estimated rainfall totals. The former investigation takes place via using recent algorithms that estimate nonlinear mixed models whereas the latter uses geographically weighted regression. Ó 2008 Elsevier Ltd. All rights reserved. Keywords: Rainfall estimation Remotely sensed estimations Zero inflated lognormal distribution Nonlinear mixed models Geographically weighted regression 1. Introduction Remotely sensed information from satellites, having a high spatial coverage and high temporal sampling, can play a key role in monitoring precipitation in flood-prone regions, sea precipitation, and other extreme weather events. Such information is of particular importance in areas with sparse rain gauge and precipitation radar networks from which reliable real time assessments of precipitation can be obtained. An area where the latter argument applies is the Mediterranean basin; its climate is known for its variety and variability, due to the surrounding orography, to the relative high temperature of the sea and to the different origin and physical characteristics of the air masses. There is a paucity of dense rain gauge networks or precipitation radar networks due to the presence of the Mediterranean Sea and to the complex topography. In addition, rain gauge observations are not generally available in real time in many regions, and missing reports and grossly erroneous reports occur in cases of extremely heavy rainfall and in regions of steep terrain. * Corresponding author. Regional Analysis Division, Institute of Applied and Computational Mathematics, Foundation for Research and Technology – Hellas, Vassilika Vouton, P.O. Box 1527, GR-711 10, Heraklion Crete, Greece. Tel.: þ30 2810 391771; fax: þ30 2810 391761. E-mail address: kamarian@iacm.forth.gr (Y. Kamarianakis). 1364-8152/$ – see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.envsoft.2008.04.007 A number of techniques have been developed to indirectly estimate rainfall using visible (VIS) and infrared (IR) satellite data. Most of these methods are based on the notion that deep convective clouds might produce more rain and on operational findings, which show that regions of rainfall tend to be correlated with bright (VIS), cold (IR) clouds. When IR satellite data of high spatial and temporal resolution became available, precipitation was correlated to the cloud-top temperature (CTT) and the relationships used in the satellite techniques were redefined as a function of CTT. Numerous new precipitation estimation algorithms have been developed that use IR data as the only data input; i.e. the Arkin technique (Arkin, 1979), the Area Time Integral (ATI) technique (Doneauld et al., 1984; Lopez et al., 1989), the GOES Precipitation Index (GPI) (Arkin and Meisner, 1987), the Griffith–Woodley Technique (GWT) (Griffith et al., 1976, 1978, 1981), the Negri–Adler– Woodley Technique (NAWT) (Negri et al., 1984), the Convective Stratiform Technique (CST) (Adler and Negri, 1988) and the technique for Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) (Hsu et al., 1997; Hong et al., 2004). One disadvantage of VIS/IR techniques for estimating precipitation rates is that they are necessarily inferential (precipitation is inferred from cloud observations). The accuracy of such techniques is not adequately specified and the techniques are not readily transferable from location to location. In contrast, Y. Kamarianakis et al. / Environmental Modelling & Software 23 (2008) 1438–1447 microwave (MW) remote sensing techniques can give direct information on precipitation. Passive MW radiometry from satellite platforms has also the potential to estimate rainfall rates, because the microwave radiation contains direct signals of the cloud and precipitation microphysics. But, the lack of sufficient spatial and temporal coverage of the current polar-orbiting sensors for midlatitudes as well as the lack of active MW satellite sensors for this area constitutes a major disadvantage for studies of convective rain events. In addition, satellite-measured microwave radiances are also influenced by soil and vegetation effects over land surfaces (including mixed pixel effects in coastal areas). Despite the disadvantages of the rainfall estimates inferred by IR satellite data, interest in making such estimates has not faded. The reason for this is the short duration and high temporal variability of precipitation events and, therefore, the need for high temporal resolution in the observations. Thus, despite the availability of more accurate techniques using microwave (MW) satellite data, the geosynchronous observations (currently limited to IR wavelengths) become extremely important. However, the more physically based MW remote sensing techniques provide enough accurate precipitation estimates to be used as training data in a calibration process of an infrared satellite technique. Most of infrared techniques provide precipitation estimates that are technique-dependent and change as a function of the particular region. Because of varying rain characteristics according to different climate regimes, any developed infrared method must be evaluated against appropriate in situ measurements taken over the region of interest before any application is made. Attempts to this direction have been made for Sardinia by Marrocu et al. (1993), for the Korean peninsula by Oh et al. (2002) and for Eastern Africa by Menz (1997). The abovementioned articles, similar to the vast majority of works that evaluate remotely sensed estimates,1 calculate global measures of accuracy – like root mean square error – which are not informative in a local scale. That is, such measures cannot uncover the presence of spatial nonstationarity in the relationship of remotely sensed and ground truth data. However, a satellite technique might perform systematically better or worse in some geographic regions. Knowledge of such spatial nonstationarities is important for the optimal development of calibration algorithms. This article aims to present statistical modelling tools for the evaluation of satellite (or radar) rainfall estimates, which are informative in a local scale. For that purpose, we evaluate a satellite infrared technique for rainfall estimation, the Convective Stratiform Technique (CST), over Mediterranean. The study uses data on 12-h rainfall totals collected daily during the winter of 2004. Ground truth data are obtained from a set of 350 rain gauges and the corresponding remotely sensed estimates are derived from infrared Meteosat-7 satellite data. The dataset and the satellite algorithm are presented in detail in Sections 2 and 3, respectively. A widely adopted probability distribution for rainfall totals, the mixed (or zero inflated) lognormal distribution is fitted to both in situ rainfall measurements and CST estimates. The spatial variability of the parameters that characterize inflation, location and scale for this distribution and the spatial trends in observed and estimated mean rainfall totals are of particular interest. Section 4 explores these issues using recent advances from nonlinear mixed models. To test for spatial homogeneity in the relationship between ground truth and satellite-estimated data, we perform geographically weighted regression (GWR). The magnitude of the local 1 We cite, for instance, Feidas (2006), Feidas et al. (in press), Haiyan et al. (2006), Pinkerton (2000), Pinkerton and Aiken (1999), von Clarmann (2006) and Yoo (2002). 1439 intercepts and slopes derived from GWR, indicates the presence of geographic regions where the algorithm performs better or worse. The statistical significance of such differences in magnitude is examined via a Monte Carlo test in Section 5. As discussed in Section 6, the GWR models developed here can be also used for calibration of the CST estimates if the relationship between in situ measurements and satellite estimates is constant over time. 2. Remotely sensed and rain gauge data The dataset used in this study, comprises infrared data from Meteosat-7 satellite and rain gauge observations collected during a three-month period (October 1, 2004 to December 31, 2004). It is part of the datasets examined in Feidas et al. (in press) conducted a preliminary investigation and showed, using exploratory statistical methods and global measures of accuracy, that CST estimates performed worst during the winter in the study region. Meteosat-7 satellite images were available at half-hourly intervals during the period of interest for the Mediterranean basin. The infrared (IR) images cover the spectral area that ranges from 10.5 to 12.5 mm and the spatial resolution at the satellite subpoint is 5  5 km2 reaching 5.6  7 km2 at the eastern Mediterranean basin. Images have been pre-processed in EUMESAT (European Organisation for the Exploitation of Meteorogical Satellites): radiances were obtained from EUMESAT’s calibration data and converted to brightness temperatures through a lookup table; next, the exact brightness temperatures that correspond to the observed radiances were obtained via linear interpolation. Six-hourly precipitation totals recorded by a network of 350 weather stations (rain gauges) were employed as ground truth data. The rain gauge data were provided by the European Centre for Medium Weather Forecast (ECMWF). Fig. 1 depicts the geographical distribution of the weather stations. Preliminary tests indicated the presence of a few outlying observations that were removed from the dataset. To compare rainfall estimations from the satellite IR technique with rain gauge measurements, satellite derived rainfall maps were reprojected to the latitude–longitude coordinate system (WGS84) of the weather station coordinates (Fig. 2). The method follows the standard presented by EUMESAT (The Meteosat Archive – User Handbook EUM TD 06 – Issue 2.5). Rain depth was calculated by the precipitation algorithm for each pixel i of the image, and the corresponding values were added for 12, 24 and 48 successive satellite images corresponding to 6 h, 12 h and 24 h accumulated precipitation, according to: Si ¼ 12 X Dij (1) j¼1 where Si is the rain depth accumulated over each period (6 h, 12 h and 24 h) and assigned to each pixel i, and Dij is the estimated rain depth in the pixel i of the image j. Finally, 6 h, 12 h and 24 h accumulated precipitation (Gi) for each of the pluviometric stations processed, were compared to the rainfall ðSi Þ retrieved from satellite data for the corresponding pixel location. For each station, the total precipitation Si was calculated as the average value of 3  3 pixels, centred over each pixel corresponding to the station. That was done to reduce the effect of the error arisen by the uncertainty of 1 pixel in the co-location of the pluviometric stations on the Meteosat images. The full dataset of this study comprises cumulative 6-h rainfall estimates measured at midnight, 06.00, 12.00 and 18.00 C.E.T., 12-h cumulative rainfall estimates measured at 06.00 and 18.00 C.E.T., 24-h cumulative rainfall estimates measured at 18.00 C.E.T and the corresponding ground truth data from the rain gauges. For the purpose of this article we display results from the analysis of 1440 Y. Kamarianakis et al. / Environmental Modelling & Software 23 (2008) 1438–1447 Fig. 1. Geographical distribution of the rain gauges in the study area. the cumulative 12-h rainfall totals measured at 18.00 C.E.T. from 1, October to 31, December 2004. The results of Sections 4 and 5 are very similar for the six remaining pairs of variables. The average percentage of missing values per measurement location is 11.3% for the rain gauge data and 14.9% for the remotely sensed rainfall estimates. The histograms depicted at Fig. 3 indicate that during the study period, remotely sensed 12-h rainfall estimates measured at 18.00 C.E.T. are larger on average than the observations from the rain gauges. On the other hand, the values of standard deviation, skewness and kurtosis indicate that CST estimates do not capture some extreme rainfall events. 3. The calibrated CST algorithm In its original form, the CST method was developed for the study of convective precipitation systems in the tropical Atlantic and mid- and low-latitude continental areas (Adler and Negri, 1988). The technique locates, in an array of IR data, all local minima in the brightness temperature field. In the original study, digital radar and visible channel imagery was used to remove thin, non-raining cirrus from convective clouds and remaining brightness temperature minima were assumed to be centers of convective rain. A stratiform rain assignment, based on the mode temperature (the temperature that occurs most frequently) of each cloud system, completes the rain estimation. Since IR techniques, such as CST, can only indirectly estimate precipitation, the estimated values are technique-dependent. Fig. 2. Satellite derived rainfall estimations over the study area for 16 October 2004. Bright tones indicate high rainfall amounts. The positions of the rain gauge stations (red triangles) as well as the coastlines and country borders are overlaid. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Moreover, since retrieval techniques change as a function of the particular region, calibration of the CST parameters to the geoclimatic conditions of the study area is required before any application is made. In the present study, we used the version of the CST that has been recalibrated over the Mediterranean basin by Feidas et al. (2006) using coincident rainfall estimates from the TRMM PR for the wet period of a year (October 2003–March 2004). The goal of the recalibration was to determine parameters so that the CST will be able to reproduce the total rain volume and the PR-observed division between convective and stratiform rain for the Mediterranean region using infrared data from Meteosat-7 satellite. The CST was adjusted on a statistical basis, over the 6-month calibration period, without explicit constraints on instantaneous rain estimates. In summary, the recalibrated technique, dubbed as CST/ Met7, finds local IR minima, decides if they are convective features using a probability function, then assigns a rain area and rain rate based on the PR calibration parameters. A stratiform rain area is then defined, and a lower rain rate is assigned to those cloud pixels colder than the stratiform IR threshold and not previously assigned convective rain. As noted in the previous section, Feidas et al. (in press) evaluated the CST/Met7 technique using global measures of accuracy and observed that it achieves its worst performance during the winter. 4. A zero inflated lognormal model with random effects In Section 3, we displayed histograms of in situ rainfall measurements and CST/Met7 estimates. If our study was examining a small number of measurement sites, we could compare the corresponding histograms per location. However, given the large number of rain gauges examined in this study (350), a different approach is needed. In this section, we fit the most widely adopted probability model for rainfall, to both gauge and remotely sensed data per location, and examine the variability of the parameters p; m; s (see below) that characterize it. Density plots approximate histograms and here, instead of examining histograms, we examine densities. At this point we should emphasize that gauge data represent rainfall totals aggregated over time at a single spatial location, whereas the remotely sensed data are spatial averages over grid squares of size 3  3 pixels or 225 km2. In situ and remotely sensed data are of different spatial support and hence some of their features are not directly comparable. For the change of support problem and statistical methods that account for it, the reader is referred to Atkinson and Tate (2000), Gelfand et al. (2001), Gotway Y. Kamarianakis et al. / Environmental Modelling & Software 23 (2008) 1438–1447 1441 Fig. 3. Histograms of rain gauge estimated (top) and CST/Met7 retrieved (down) cumulative 12-h rainfall at 18.00 C.E.T. for the period of October 2004–December 2004. and Young (2002), Kyriakidis (2004) and Kyriakidis and Yoo (2005). Spatial averages have a lower probability of zero rain, and a lower variance when it is raining, than rainfall at a point (see for instance, Zhang et al., 1990). On the other hand, mean in situ versus remotely sensed rainfall levels should be close if CST data are unbiased. During the past 20 years a large number of studies aimed to determine both the spatial and temporal distribution of rainfall. Using data from the Global Atmospheric Research Program (GARP), Kedem and Chiu (1987) have shown that rainfall fits closely a lognormal distribution both spatially and temporally. Comparisons between the lognormal and gamma distributions performed by Kedem et al. (1990) have shown the lognormal distribution to fit the data better. However, the lognormal probability distribution is only valid for nonzero rainfall values. Further studies applied to rainfall data of different spatial supports, proposed a mixed model consisting of a lognormal distribution combined with a spike at zero (zero inflated lognormal distribution), see for instance Pavlopoulos and Kedem (1992), Berg and Chase (1992), Shimizu (1993), Kedem et al. (1994, 1997), Shoji and Kitaura (2006) and De Michele et al. (2008). In compliance with the abovementioned works, the analysis displayed below assumes that cumulative rainfall amounts per measurement location and the corresponding remotely sensed estimates follow the zero inflated lognormal distribution. Distributional heterogeneity amongst measurement locations is 1442 Y. Kamarianakis et al. / Environmental Modelling & Software 23 (2008) 1438–1447 Table 1 Maximum likelihood estimates for the location m, shape s, and inflation p parameters of the zero inflated lognormal distribution for the rain gauge data (R.G.) and the remotely sensed estimates (R.S.) R.G. R.S. m s p 0.401 (22.95) 0.193 (14.13) 1.578 (123.92) 1.6924 (150.49) 0.7473 (299.77) 0.611 (214.03) Table 3 Maximum likelihood estimates for model 3 for the rain gauge data (R.G.) and the remotely sensed estimates (R.S.) R.G. R.S. m bs ss p 0.416 (25) 0.193 (14.13) 1.553 (84.16) 1.6924 (150.48) 0.060 (7.35) 1.037E7 (0.01) 0.7473 (299.77) 0.611 (214.03) Note: t-statistics in parentheses. Note: t-statistics in parentheses. accommodated via allowing the parameters that characterize the zero inflated lognormal distribution to vary locally. Let X denote the cumulative 12 h rainfall observed daily at each measurement location. X takes the value 0 (no rain) with positive probability p (0 < p < 1); otherwise, conditional on positive rainfall, X is lognormally distributed with distribution function Lðxjm; sÞ where m and s denote the location and shape parameter, respectively. The distribution function of X is therefore given by PðX  xÞ ¼ GðxÞ ¼ pHðxÞ þ ð1  pÞLðxjm; sÞ; x0 (2) where H(x) stands for the step function, H(x) ¼ 0 for x < 0 and H(x) ¼ 1 for x  0. From Eq. (2) the mean rainfall amount for a specific location s is given by EðXÞ ¼ ð1  pÞEðXjX > 0Þ or 2 EðXÞ ¼ ð1  pÞemþð1=2Þs (3) and the variance at a specific location is given by  2 2 varðXÞ ¼ ð1  pÞe2mþs es  ð1  pÞ (4) At the first stage of the analysis, we assume no spatial variation in the parameters p; m; s that characterize the mixed distribution function in Eq. (2). That is we assume a zero inflated lognormal distribution with fixed parameters across measurement locations (model 1). Maximum likelihood estimates for the inflation, location and shape parameters for the rain gauge data and the remotely sensed estimates are depicted in Table 1. One observes that the estimate of m that is based on the remotely sensed data, deviates significantly from the corresponding estimate that is based on the ground truth data. The estimate of p from the remotely sensed data is smaller than the one that corresponds to ground truth, as expected beforehand given the difference in spatial supports. The t-statistics indicate that all estimates are statistically significant at the 0.01 level. Based on the estimates in Table 1 and Eq. (3), one obtains that the estimate for the mean of the zero inflated lognormal distribution that characterizes rain gauge data is equal to 0.589 whereas for the remotely sensed data the corresponding estimate equals 1.342. By applying the delta method (Cox, 1998) the 95% confidence intervals for these estimates are (0.557, 0.622) and (1.278, 1.407), respectively. The distance between the two confidence intervals indicates that CST/Met7 estimates are biased and a calibration procedure should be applied to the remotely sensed estimates, as for example in Brown et al. (2001) and Clarke et al. (2006). From Eq. (4), the estimate of the variance of the mixed distribution in Eq. (2) is equal to 16.316 for the rain gauge data and 79.454 for the Table 2 Maximum likelihood estimates for model 2 for the rain gauge data (R.G.) and the remotely sensed estimates (R.S.) R.G. R.S. bm sm s p 0.401 (22.95) 0.193 (14.13) 1.175E7 (0.01) 1.102E7 (0.02) 1.580 (123.92) 1.6925 (150.48) 0.7473 (299.77) 0.611 (214.05) Note: t-statistics in parentheses. remotely sensed data; the 95% confidence intervals are (13.467, 19.165) and (66.683, 92.224), respectively. The mean of the conditional distribution function for ground truth data, given that it is raining, equals 2.326 and its variance equals 59.837. For the remotely sensed data the corresponding mean equals 3.134 and the variance is equal to 197.101. The fact that the variance of CST/Met7 data is larger than the corresponding in situ, given positive rainfall amounts, contradicts what one would expect given the different spatial supports and further indicates that CST/Met7 estimates are biased and should be recalibrated before they are applied for predictive purposes during the winter. To explore spatial heterogeneity, we allow m to vary between measurement locations according to a Gaussian distribution Nðbm ; s2m Þ (model 2). The conditional distribution given the random effects, is still the zero inflated lognormal. This is essentially a nonlinear mixed model in which both fixed ðs; pÞ, and random ðmÞ parameters enter nonlinearly. We fit this model by numerically maximizing an approximation to the marginal likelihood – that is the likelihood integrated over the random effects. The integral approximation is performed via the adaptive quadrature method as documented in Pinheiro and Bates (1995). This approximation uses the empirical Bayes estimates of the random effects (analogous to empirical BLUPs in linear mixed models) as the central point for the quadrature and updates them for every iteration. The resulting marginal likelihood can be maximized using a variety of alternative optimization techniques; here we adopted the dual quasi-Newton algorithm. The estimation procedure was implemented in SAS, via a modification of the code provided in Littell et al. (2006, Chapter 15) for a zero inflated Poisson distribution. Table 2 presents the fixed and random parameter estimates for model 2. We observe that the estimate for the mean of the Gaussian distribution equals the corresponding (fixed) estimate at Table 1. The approximate t-statistics for the variance of the random parameter m and the Akaike information criterion (AIC) in Table 6, indicate that m does not vary significantly across measurement locations, for both the rain gauge data and the remotely sensed estimates. Table 3 presents the maximum likelihood estimates of the nonlinear mixed model in which s is distributed according to a Gaussian distribution Nðbs ; s2s Þ across measurement locations and ðm; pÞ are considered fixed (model 3). In this case both the approximate t-statistics for the variance of s and the AIC indicate significant variability of s over measurement locations for the rain gauge data. However, this is not true for the remotely sensed data. Thus, for the CST/Met7 estimates, we observe that the mean and variance of positive rainfall totals do not vary significantly across measurement locations. To put it otherwise, given that rainfall occurred, the distributions of rainfall totals do not vary significantly across measurement sites. This contrasts what one observes for the Table 4 Maximum likelihood estimates for model 4 for the rain gauge data (R.G.) and the remotely sensed estimates (R.S.) R.G. R.S. m s bp sp 0.401 (22.95) 0.193 (14.13) 1.578 (123.92) 1.6924 (150.49) 0.7423 (141) 0.611 (150.8) 0.008 (7.07) 0.003 (6.81) Note: t-statistics in parentheses. 1443 Y. Kamarianakis et al. / Environmental Modelling & Software 23 (2008) 1438–1447 Table 5 Maximum likelihood estimates for model 5 for the rain gauge data R.G. m s ss bp sp r 0.416 (25.02) 1.539 (81.83) 0.065 (7.14) 0.7424 (141.28) 0.008 (7.13) 0.3923 (4.46) Note: t-statistics in parentheses. 2 mi ¼ ð1  pi Þemi þð1=2Þsi (5) denote the expected rainfall amount at measurement location i. One may combine the estimates pi ; m; si obtained from model 5 for the rain gauge data, with Eq. (3), to obtain the rain gauge estimate for expected rainfall, mi , at location i. Similarly, (3) can be combined ~i ; m ~ obtained from model 4 for the remotely ~; s with the estimates p ~ i , the CST/Met7 estimate for expected sensed data, to produce m rainfall at measurement location i. A simple way to exploit the explanatory power of geographical coordinates is to formulate a regression model that includes them as explanatory variables. Thus, we formulate two linear regression models which aim to capture linear spatial trends: ˇ ˇ ˇ ˇ Table 6 The Akaike information criterion for models 1–5 for the rain gauge and the remotely sensed data Model Model Model Model Model 1 2 3 4 5 AIC R.G. AIC R.S. 72,769 72,771 72,646 72,408 72,272 108,730 108,732 108,732 108,623 Fig. 4. Scatterplot of the empirical Bayes estimates for the random effects that correspond to p (x-axis) and s (y-axis). ˇ rain gauge measurements, where given positive rainfall levels, the mean and the variance differ significantly across measurement locations. Table 4 presents the maximum likelihood estimates of the nonlinear mixed model in which the probability of no rain, p is distributed across measurement locations according to a Gaussian distribution Nðbp ; s2p Þ and ðm; sÞ are considered fixed (model 4). In this case both the approximate t-statistics for the variance of p and the AIC indicate significant variability of p over measurement locations for both the rain gauge data and the remotely sensed estimates. The magnitude of the variability is much smaller in the remotely sensed measurements as expected from the difference in spatial supports. Tables 1–4 and Table 6 indicate that model 4 is adequate for the remotely sensed data. To capture the significant variability of both s and p for rain gauge data, we fit a model in which only m is considered fixed across measurement locations and s, p are distributed according to a two-dimensional Gaussian distribution with correlation r (model 5, Table 5). The values of the AIC depicted in Table 6, reveal that this model improves significantly over the previous models 3 and 4. s and p appear to be negatively correlated which agrees with intuition: locations with higher probability of no rainfall tend to have lower mean rainfall levels which vary less (Fig. 4). At the final stage of this analysis, we examine how in situ and CST/Met7 estimated rainfall totals vary per measurement location as a function of geographical coordinates. For that purpose, we use the expansion method (Casetti, 1972, 1997; Jones and Casetti, 1992), which allows the parameter estimates to vary locally by making the parameters functions of other attributes. Since in our study the parameters are functions of location, a spatial expansion model results, in which trends in parameter estimates over space can be measured (Fotheringham and Pitts, 1995). Let pi ; mi ; si denote the true parameters of the mixed distribution function (2) per measurement location i. Let also mi ¼ a0 þ a1  lon þ a2  lat þ 31i (6) ~ i ¼ b0 þ b1  lon þ b2  lat þ 32i m (7) where lon and lat stand, respectively, for longitude and latitude in degrees, divided by 1000 so that the estimates of the parameters a0 ; a1 ; a2 ; b0 ; b1 ; b2 are in an easily interpretable scale and 31i wNð0; s21 Þ, 32i wNð0; s22 Þ are random errors that are normally distributed. Table 7 presents least squares estimates and model fit statistics for (6) and (7). The goodness of fit statistic (R2) indicates that 52.5% of the variability of remotely sensed estimates of mean rainfall levels, is explained by the simple linear model (7). On the other hand, only 10.7% of the variability of rain gauge-derived mean rainfall estimates is explained by model (6). The above suggest that CST/Met7 estimates display less spatial variation than rain gauge data as expected due to the difference in spatial supports. The remotely sensed estimates should capture the spatial trends followed by the ground truth data. Since the p-value for the t-statistic of a2 is greater than 0.05, one may observe that, in contrast with CST/Met7 estimates, rain gauge data do not display any significant linear dependency to the north–south direction. On the other hand, the magnitude of b2 suggests that satellite-estimated mean rainfall levels increase fast as the latitude increases. Both estimates for rainfall means decrease as the longitude increases and this is in accordance with intuition. Finally, apart from the difference in slopes, one may observe a significant difference with regard to the intercepts: the intercept term b0 is not significant at the 0.01 level of significance for the satellite-estimated means whereas a0 is significant for the model of the rain gauge data with a p-value which is less than 0.0001. Table 7 Least squares estimates for models (6) and (7) for the rain gauge data (R.G.) and the remotely sensed estimates (R.S.) a0 a1 a2 R2 R.G. 0.836 (43.88) 13.127 (63.51) 0.430 (1.02) 0.107 b0 b1 b2 R2 R.S. 0.023 (2.34) 8.352 (79.09) 35.551 (164.51) 0.5246 Note: t-statistics in parentheses. 1444 Y. Kamarianakis et al. / Environmental Modelling & Software 23 (2008) 1438–1447 Table 8 Global regression estimates Table 10 ANOVA table for GWR fit Parameter Estimate Std Error t-value Source SS DF MS F Intercept Slope 0.517 0.329 0.035 0.018 14.906 17.817 OLS residuals GWR improvement GWR residuals 10,595.1 1045.7 9549.4 2 88.16 4526.84 11.861 2.110 5.623 Note: residual sum of squares, 10,595; s, 1.515; AIC, 16,943.58; R2, 0.064. 5. Testing spatial nonstationarity via geographically weighted regression Ideally, remotely sensed estimates would have been unbiased. To put it otherwise, the relationship between rain gauge measurements and satellite rainfall estimates at each measurement location would have been expressed by regression lines with intercepts that are close to 0, slopes that are close to 1 and small variances for the error terms. Feidas et al. (in press) performed a regression (on the pooled data) between CST/Met7 estimates and rain gauge observations using the dataset of this study. Their findings suggest that the above statement is far from being true as the estimated intercept was positive and statistically significant and the estimated slope was significantly less than 1. Ideally, the spatial variation of the coefficients of the regression lines per measurement site should have been negligible: there would have been no regions (comprised by neighboring measurement locations) where the CST/Met7 algorithm performs exceptionally better or worse than others. In this section, we examine this hypothesis by fitting a regression model and testing the null hypothesis that the linear relationship it implies, is fixed over space. In other words we test for the presence of spatial nonstationarity in the relationship between ground truth data and remotely sensed estimates; this examination is expected to reveal geographic regions where the remotely sensed data perform better. The statistical tool used for that purpose is Geographically Weighted Regression (GWR). The GWR model is generally formulated as yi ¼ aðui ; vi Þ þ X xij bj ðui ; vi Þ þ 3i (8) j where ðui ; vi Þ is the location in geographical space of the ith observation and subscript j refers to different explanatory variables; for instance for a model that includes an intercept and one explanatory variable as is the case below, j ¼ 1. The setting is similar to the ordinary regression model (e.g. 3i is normally distributed with zero mean), the single difference being that the GWR model allows for different regression coefficients aðui ; vi Þ; bj ðui ; vi Þ per measurement location. In fact, aðui ; vi Þ; bj ðui ; vi Þ are considered as realizations of the continuous functions aðu; vÞ; bj ðu; vÞ at point i. The method proposed to calibrate Eq. (8) is basically a moving kernel window approach (Fotheringham et al., 2002); to estimate að:; :Þ; bj ð:; :Þ at the location of the ith observation, one carries out a weighted regression where each observation is given a weight wki , if the observations are indexed by the variable k. wki should be a monotone decreasing function of the distance between observations i and k so that observations farther from the point at which the model is calibrated weight less than observations closer to that point. Thus, implicitly one assumes that observed data near to location i have more of an influence in the estimation of the að:; :Þ; bj ð:; :Þ than do data located farther from i. For an extensive Table 9 GWR parameter five-number summary Parameter Minimum Lwr quartile Median Upr quartile Maximum Intercept Slope 0.556 0.1 0.143 0.271 0.452 0.379 0.789 0.467 1.322 0.784 Note: residual sum of squares, 9549; s, 1.452; AIC, 16,643.84; R2, 0.157. presentation of the GWR model the reader is referred to Fotheringham et al. (2002). In this study, GWR was applied to approximately 12% of the initial dataset that corresponds to nonzero rainfall observations estimated by nonzero remotely sensed measurements. Similar to Brown et al. (2001), in modelling these data, we make the fundamental assumption that the relationship between gauge and remotely sensed measurements can be described by a power law: bðui ;vi Þ Gi;t ¼ aðui ; vi ÞRi;t ei;t (9) where Gi,t is the gauge measurement at measurement location ðui ; vi Þ and time t, Ri,t is the corresponding CST/Met7 estimate, ei,t is a multiplicative error term and aðui ; vi Þ; bðui ; vi Þ are location specific coefficients that need to be estimated. It is straightforward to observe that after taking logarithms in Eq. (9) one obtains a linear relationship2 and the setting is the same as in Eq. (8). Moreover the probabilistic setting complies with the one adopted in the previous section, with a special treatment of zero values since taking logarithms in Eq. (9) precludes the inclusion of zero values for either the CST/Met7 or the gauge measurements. A preliminary data investigation revealed that remotely sensed estimates give a very reliable indication that rain is not currently falling, i.e. Ri,t ¼ 0 implies that Gi,t ¼ 0 with high probability (approximately 0.8). It is intuitively clear from the untransformed power law model (9) that these zero values convey no information about the coefficients aðui ; vi Þ; bðui ; vi Þ and can therefore be treated as missing for inference about aðui ; vi Þ and bðui ; vi Þ. Our setting is different to the one presented in Brown et al. (2001) in two aspects: first, aðui ; vi Þ and bðui ; vi Þ are not functions of time in our case; second, neighboring relationships between measurement locations are taken into account in a way that does not depend on a definition of a specific space–time covariance structure. The GWR model was calibrated via using an adaptive bi-square kernel. In particular, the weight of the jth data point at regression point i is given by:  2 wij ¼ 1  dij b wij ¼ 0 when dij  b when dij > b dij represents the distance between the jth and the ith data points and b is the bandwidth. One may think of the bandwidth as a smoothing parameter, with larger bandwidths causing greater smoothing. The effects of alternative choices for the functional form of the kernel diminish with sample size, see Fotheringham et al. (2002). Two techniques were applied for optimal bandwidth selection: AIC minimisation and cross-validation. Estimation results were practically identical for the two cases. Table 8 depicts the fit of a simple regression model with constant coefficients across measurement locations. The coefficient of determination (R2) indicates that the model fits very poorly; given the amount of data, the model is statistically significant but it 2 Preliminary tests applied to data obtained at different locations, indicated that the null hypothesis of a linear relationship between the logarithms of ground truth data and CST/Met7 estimates could not be rejected. Y. Kamarianakis et al. / Environmental Modelling & Software 23 (2008) 1438–1447 1445 Fig. 5. Voronoi map of the local intercepts of the GWR model. Fig. 6. Voronoi map of the local slopes of the GWR model. explains less than 10% of the variability of rain gauge measurements. In Table 9, we present summary statistics for the GWR estimates; from this table one may get a ‘feel’ for the degree of spatial nonstationarity in the relationship, by comparing the range of local parameter estimates with a confidence interval around the global estimate of the equivalent parameter. The reduction in the AIC from the global model suggests that the local model is better even accounting for differences in degrees of freedom. Table 10 presents the ANOVA table that tests the null hypothesis that the GWR model represents no improvement over a global model. As expected from the reduction in the AIC, the F test rejects the null hypothesis and suggests that the GWR model is a significant improvement on the global model. The significance of the spatial variability of the local parameter estimates was examined more formally via a Monte Carlo test3 (see Brunsdon et al., 1998, 1999a,b). The test indicated significant spatial variation for both the intercept and the slope as both p-values were less than 0.0001. This result suggests the presence of spatial nonstationarity in the relationship between satellite estimates and ground truth data. Apparently, the algorithm performs better in 3 All estimations and tests were performed with the GWR 3 software. some geographical regions and given the statistical significance of the test, it is unlikely that these differences emerged in the study region by chance. Figs. 5 and 6 depict Voronoi maps4 of the local estimates for the intercepts and slopes, respectively. One may observe large values for the intercepts and close to zero values for the slopes indicating a bad performance of the satellite technique, mostly in southern regions of the study area. On the other hand it appears that satellite estimates perform satisfactory in some regions (sets of neighboring locations) located in northern regions of the study area. The Monte Carlo test for spatial nonstationarity indicated that these spatial clusters displaying good performance were unlikely to emerge by chance. Finally, Fig. 7 indicates that in most of the southern and eastern part of the study area, the CST/Met7 estimates failed to capture the variability of ground truth data. This may happen due to a variety of reasons that need to be examined before a recalibrating procedure model is applied. 4 The Voronoi diagram of a collection of geometric objects is a partition of space into cells, each of which consists of the points closer to one particular object than to any others. 1446 Y. Kamarianakis et al. / Environmental Modelling & Software 23 (2008) 1438–1447 Fig. 7. Voronoi map of the approximate t-statistics for the local slopes of the GWR model. 6. Concluding remarks The wide range of applications of remote sensing techniques resulted in a large number of studies that aimed at their validation against in situ data. The vast majority of such works presented simple exploratory graphs, correlation coefficients and global measures of accuracy like mean square error and mean absolute error. Such tools are not informative in a local scale though; that is they cannot reveal geographic regions where a satellite algorithm performs significantly better or worse. This work validated an infrared-based satellite algorithm for rainfall estimation, the Convective Stratiform Technique, focusing on the spatial variability of the parameters of the probability distribution that characterizes satellite estimates and ground truth data. For that purpose, nonlinear mixed models have been developed, based on a widely adopted probability model for rainfall totals: the mixed (or zero inflated) lognormal distribution. Such models – which, to our knowledge, appear for the first time in the literature – can be used as exploratory tools for validating satellite or radar based techniques when the number of in situ locations is large and thus it is practically impossible to conduct exploratory statistical analysis and validation per measurement location. A previous article (Feidas et al., in press) used global measures of accuracy, which suggested that during the study period examined here, the CST did not display a satisfactory predictive performance in the study region. The analysis presented in Section 4 extended previous work and indicated that, in contrast to ground truth data, the distributions of positive rainfall amounts for remotely sensed estimates do not vary significantly in space. This is in contrast with intuition and combined with the bias that satellite estimates display and the different spatial trends for ground truth and satellite estimates, suggest that the technique needs to be recalibrated before it is applied for predictive purposes. But how should this calibration procedure be performed? Should it focus in specific regions or should a single model be applied to the whole study region? The above research questions suggest that a test of the null hypothesis of spatial stationarity in the relationship between ground truth data and remotely sensed estimates should be applied. For that purpose, we performed a test on the local intercepts and slopes of a geographically weighted regression model. Results indicated the existence of three regions in the northern part of the study region that the satellite algorithm performed significantly better than the rest of the study region. The GWR model presented here, can be used for calibration if the relationship between rain gauge data and satellite estimates is constant over time. Otherwise, similar to Brown et al. (2001), instead of as, bs in Eq. (9) one should estimate temporally varying coefficients ast, bst per measurement location. Such a space–time model has an advantage compared to the one presented in Brown et al. (2001) as it does not involve any assumptions with regard to the functional form of the space-time covariance function of the data. Unfortunately, given the relatively small temporal dimension of our dataset, we were unable to test if the local intercepts and slopes in (9) are functions of time. This, and the development of the associated space–time models will hopefully be presented in a future publication. Acknowledgements This work was conducted in the frame of the SATERM (A Satellite Technique for Estimating Rainfall over the Mediterranean basin) project funded by the ‘‘Competitiveness’’, Action 4.3.6.1d ‘‘Cooperation with R&T institutions in non-European countries’’ of the General Secretary of Research and Technology, Ministry of Development of Greece. References Adler, R.F., Negri, A.J., 1988. A satellite infrared technique to estimate tropical convective and stratiform rainfall. Journal of Applied Meteorology 27, 30–51. Arkin, P.A., 1979. The relationship between fractional coverage of high clouds and rainfall accumulation during GATE over B-scale array. Monthly Weather Review 107, 1382–1387. Arkin, P.A., Meisner, B.N., 1987. The relationship between large-scale convective rainfall and cold cloud over the Western Hemisphere during 1982–1984. Monthly Weather Review 115, 51–74. Atkinson, P.M., Tate, N.J., 2000. Spatial scale problems and geostatistical solutions: a review. Professional Geographer 52, 607–623. Berg, W., Chase, R., 1992. Determination of mean rainfall from the special sensor Microwave/Imager (SSM/I) using a mixed lognormal distribution. Journal of Atmospheric and Oceanic Technology 9, 129–141. Brown, P.E., Diggle, P.J., Lord, M.E., Young, P.C., 2001. Space–time calibration of radar rainfall data. Journal of the Royal Statistical Society Series C: Applied Statistics 50, 221–241. Brunsdon, C., Fotheringham, A.S., Charlton, M., 1998. Geographically weighted regression-modelling spatial nonstationarity. The Statistician 47, 431–443. Brunsdon, C., Fotheringham, A.S., Charlton, M., 1999a. Some notes on parametric significance tests for geographically weighted regression. Journal of Regional Science 39, 497–524. Brunsdon, C., Aitkin, M., Fotheringham, A.S., Charlton, M., 1999b. A comparison of random coefficient modelling and geographically weighted regression for spatially nonstationary regression problems. Geographical and Environmental Modelling 3, 47–62. Casetti, E., 1972. Generating models by the expansion method: applications to geographic research. Geographical Analysis 4, 81–91. Casetti, E., 1997. The expansion method, mathematical modeling, and spatial econometrics. International Regional Science Review 20, 9–32. Clarke, E.D., Speirs, D.C., Heath, M.R., Wood, S.N., Gurney, W.S.C., Holmes, S.J., 2006. Calibrating remotely sensed chlorophyll-a data by using penaized regression Y. Kamarianakis et al. / Environmental Modelling & Software 23 (2008) 1438–1447 splines. Journal of the Royal Statistical Society Series C: Applied Statistics 55, 331–353. Cox, C., 1998. Delta Method. In: Armitage, P., Colton, T. (Eds.), Encyclopedia of Biostatistics. John Wiley, New York, pp. 1125–1127. De Michele, C., Vezzoli, R., Pavlopoulos, H., Scholes, R.J., 2008. A minimal model of soil water–vegetation interactions forced by stochastic rainfall in water-limited ecosystems. Ecological Modelling 212, 397–407. Doneauld, A.A., Ionesku-Niscov, S., Priegnitz, D.L., Smith, P.L., 1984. The area time integral as an indicator for convective rain volume. Journal of Climate and Applied Meteorology 23, 555–561. Feidas, H., 2006. Validating three infrared-based rainfall retrieval algorithms for intense convective activity over Greece. International Journal of Remote Sensing 27, 2787–2812. Feidas, H., Kokolatos, G., Negri, A., Manyin, M., Chrysoulakis, N., 2006. A TRMMcalibrated infrared technique for rainfall estimation: application on rain events over eastern Mediterranean. Advances in Geosciences 7, 181–188. Feidas, H., Kokolatos, G., Negri, A., Manyin, M., Chrysoulakis, N., Kamarianakis, Y., A validation of an infrared-based satellite algorithm to estimate accumulated rainfall over the Mediterranean basin. Theoretical and Applied Climatology, in press, doi: 10.1007/s00704-007-0360-y. Fotheringham, A.S., Brunsdon, C., Charlton, M., 2002. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Wiley. Fotheringham, A.S., Pitts, T.C., 1995. Directional variation in distance-decay. Environment and Planning A 27, 715–729. Gelfand, A.E., Zhu, L., Carlin, B.P., 2001. On the change of support problem for spatiotemporal data. Biostatistics 2, 31–45. Gotway, C.A., Young, L.J., 2002. Combining incompatible spatial data. Journal of the American Statistical Association 97, 632–648. Griffith, C.G., Woodley, W.L., Browner, S., Teijeiro, J., Maier, M., Martin, D.W., Stout, J., Sikdar, D.N.,1976. Rainfall estimation from geosynchronous satellite imagery during daylight hours. NOAA Technical Report ERC 356-WMPO 7, Washington, D.C., USA. Griffith, C.G., Grube, P.G., Martin, D.W., Stout, J., Sikdar, D.N., 1978. Rainfall estimation from geosynchronous satellite imagery, visible and infrared studies. Monthly Weather Review 106, 1153–1171. Griffith, C.G., Augustine, J.A., Woodley, W.L., 1981. Satellite rain estimation in the U.S. High Plains. Journal of Applied Meteorology 20, 53–66. Haiyan, J., Black, P.J., Ziptser, E.J., Marks, F.D., Uhlhorn, E.W., 2006. Validation of rain rate estimation in hurricanes from the stepped frequency microwave radiometer: algorithm correction and error analysis. Journal of the Atmospheric Sciences 63, 252–267. Hong, Y., Hsu, K., Sorooshian, S., Gao, X., 2004. Precipitation estimation from remotely sensed imagery using an artificial neural network cloud classification system. Journal of Applied Meteorology 43, 1834–1852. Hsu, K., Gao, X., Sorooshian, S., Gupta, H.V., 1997. Precipitation estimation from remotely sensed imagery using an artificial neural network. Journal of Applied Meteorology 36, 1176–1190. Jones III, J.P., Casetti, E., 1992. Applications of the expansion method. Routledge, London. Kedem, B., Chiu, L.S., 1987. On the lognormality of rain rate. Proceedings of the National Academy of Sciences USA 84, 901–905. 1447 Kedem, B., Chiu, L.S., North, G.R., 1990. Estimation of mean rain rate: application to satellite observations. Journal of Geophysical Research 95, 1965–1972. Kedem, B., Pavlopoulos, H., Guan, X., Short, D., 1994. A probability distribution model for rain rate. Journal of Applied Meteorology 33, 1486–1493. Kedem, B., Pfeiffer, R., Short, D.A., 1997. Variability of space–time mean rain rate. Journal of Applied Meteorology 36, 443–451. Kyriakidis, P.C., 2004. A geostatistical framework for area-to-point spatial interpolation. Geographical Analysis 36, 259–289. Kyriakidis, P.C., Yoo, E., 2005. Geostatistical prediction and simulation of point values from areal data. Geographical Analysis 37, 124–151. Littell, R.C., Milliken, G.A., Stroup, W.W., Wolfinger, R.D., Schabenberger, O., 2006. SAS for Mixed Models, second ed. SAS Institute Inc. Lopez, R.E., Atlas, D., Rosenfeld, D., Thomas, J.L., Blanchard, D.O., Holle, R.L., 1989. Estimation of rainfall using the radar echo area time integral. Journal of Applied Meteorology 28, 1162–1175. Marrocu, M., Pompei, A., Dalu, G., Liberti, G.L., Negri, A., 1993. Precipitation estimation over Sardinia from satellite infrared data. International Journal of Remote Sensing 14, 115–134. Menz, G., 1997. Regionalization of precipitation models in east Africa using Meteosat data. International Journal Climatology 17, 1011–1027. Negri, A.J., Adler, R.F., Wetzel, P.J., 1984. Satellite rain estimation: an analysis of the Griffith–Woodley technique. Journal of Climate and Applied Meteorology 23, 102–116. Oh, H.J., Sohn, B.J., Smith, E.A., Turk, F.J., Ae-suk, S., Chung, H.S., 2002. Validating infrared-based rainfall retrieval algorithms with 1-minute spatially dense raingauge measurements over Korean peninsula. Meteorology and Atmospheric Physics 81, 273–287. Pavlopoulos, H., Kedem, B., 1992. Stochastic modelling of rain rate processes: a diffusion model. Stochastic Models 8, 397–420. Pinheiro, J.C., Bates, D.M., 1995. Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics 4, 12–35. Pinkerton, M.H., 2000. Validating remotely sensed ocean colour data using a moored databuoy. University of Southampton, Faculty of Science, School of Ocean and Earth Science, Ph.D. Thesis, 187 pp. Pinkerton, M.H., Aiken, J., 1999. Calibration and validation of remotely sensed observations of ocean color from a Moored data buoy. Journal of Atmospheric and Oceanic Technology 16, 915–923. Shimizu, K., 1993. A bivariate lognormal distribution with an analysis of rainfall data. Journal of Applied Meteorology 32, 161–171. Shoji, T., Kitaura, H., 2006. Statistical and geostatistical analysis of rainfall in central Japan. Computer and Geosciences 32, 1007–1024. von Clarmann, T., 2006. Validation of remotely sensed profiles of atmospheric state variables: strategies and terminology. Atmospheric Chemistry and Physics 6, 4311–4320. Yoo, C., 2002. A ground validation problem of remotely sensed soil moisture data. Stochastic Environmental Research and Risk Assessment 16, 175–187. Zhang, R., Warrick, A.W., Myers, D.E., 1990. Variance as a function of sample support size. Mathematical Geology 22, 107–122.