[go: up one dir, main page]

Academia.eduAcademia.edu
Spatio-temporal analysis of NDFF records: generating dynamic distribution maps of flora and fauna species T. Hengl1 , H. Sierdsema2 1 Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands Telephone: +31-(0)20-5257379, Fax: +39-(0)20-5257431 Email: T.Hengl@uva.nl 2 SOVON Dutch Centre for Field Ornithology, Rijksstraatweg 178, 6573 DG Beek-Ubbergen, The Netherlands Telephone: +31-(0)24-6848145, Fax: +31-(0)24-6848122 Email: Henk.Sierdsema@sovon.nl Abstract. NDFF is the Dutch National Database on Flora and Fauna database that contains over 20 millions of records for the period of >40 years, and that covers almost all major taxa occurring in the Netherlands. In this article, two representative species (Rana lessonae and Microtus oeconomus) were analyzed to produce maps of potential and actual species’ distribution. The Genetic Algorithm for Rule Set Production (GARP) as implemented in openModeller plugin has been used to generate the maps of potential spreading; GLM-kriging with a log link function has been used to generate maps of actual spreading for the representative species. A large repository of environmental maps (500 m resolution) has been used as environmental predictors. These preliminary result show that both GARP and GLM-kriging provide informative maps that are of potential use to decision makers. 1 I NTRODUCTION Species’ Distribution Modelling aims at explaining species’ distribution in both space and time by using field records of species’ occurrence metrics. A Species Distribution Model (SDM) can be defined as an analytical algorithm that predicts either actual or potential distribution of a species, given field observations, auxiliary maps, as well as expert knowledge. The preferred spatial prediction techniques used in biogeography are: various type of Generalized Linear Models (GLMs), classification and regression trees (CART), Ecological Niche Factor Analysis (ENFA), Genetic Algorithms (GA), point pattern analysis algorithms, maximum entropy-based techniques and similar [8]. The type of model that we can use depends largely on type of occurrence records we work with — presence-only records; presence/absence records; counts and/or actual measurements of species’ attributes. Methods such as ENFA or point pattern analysis are typically applied to presence-only records. Although statistical theory for various SDM’s exist for quite some time, it was not until recently that the tools to generate species’ distribution maps have become accessible to the wider research community. This progress is largely facilitated by the increasing development of R environment for statistical computing, and its numerous Environmetrics and Spatial Analysis packages [5, 2]. Among many, probably the most known packages for spatial analysis of occurrence records are the adehabitat package for habitat analysis for animals; BiodiversityR for community ecology analysis; spatstat for analysis of point patterns; and gstat for geostatistical analysis. In addition to the R packages for the analysis of occurrence records, a group of researchers from the Reference Center on Environmental Information in Campinas, SP, have developed openModeller — an open source plugin that contains an extensive Niche modeling library [7]. Although openModeller is not available as a R package, it can be easily controlled from R — a model can be run using a simple text configuration files that specify which input datasets to use. In this article we test R-based SDM tools for analysis of spatio-temporal records of species’ occurrences. We use records from the Dutch National Database on Flora and Fauna (NDFF) database that contains over 20 millions of records for the period of >20 years, and that covers almost all major taxa occurring in the Netherlands. We focus on six representative species, assuming that a generic workflows can be developed to analyze majority of species recorded in the Netherlands in a semi-automated way. 2 M ETHODS AND MATERIALS We implement two groups of methods to generate distribution maps from the NDFF records: (1) the GARP (Genetic Algorithm for Rule Set Production) technique that predicts potential spreading of species based on the feature space analysis; and (2) the GLMbased regression-kriging method that produces maps of actual spreading of target species, based on both feature and geographical space analysis. We focus on two representative continental species of high importance for biodiversity conservation in the Netherlands: Rana lessonae (pool frog) and Microtus oeconomus (root vole). The environmental predictors of interest for SDM are typically: (1) species’ density at previous year (or in further past); (2) julian day (season); (3) climatic variables; (4) soil and geology variables; (5) forest canopy, biomass and land cover; (6) proximity to fresh water supply; (7) proximity to food supply; (8) proximity to natural shelters / nesting locations; (9) proximity to human settlement / human impact variables; (10) species distribution of the competing species, and similar. In this case study, we use a large repository (32 maps in total) of geographical layers ranging from: land cover maps, proximity and density maps, climatic, soil and vegetation variables (remote sensing-based), soil-hydrological parameters and similar. 2.1 Modelling potential distribution (habitat mapping) For prediction of the potential spreading, we use the GARP algorithm as implemented in the openModeller; more specifically, we use the GARP with Best Subsets method [1]. GARP has been evaluated as one of the most accurate SDM among competitors by [8]. The recent implementation of GARP with Best Subsets method generates a map showing the potential of the species to occur (0–100 index), but also the model statistics for training data: overall accuracy, omission error, AUC, true negatives and true positives. Because GARP is not as sensitive on poor spatial representation of the area of interest (i.e. sampling bias), we split the occurrence records in blocks of five years and then generate a list of maps showing the potential spreading to determine temporal trends in the species’ distributions. 2.2 Spatial prediction of the actual distribution For the generation of the actual distribution maps, we use the GLM-kriging computational framework as described in [6] and [3]. We fit a log link function1 to explain the deterministic part of variation in spatial spreading of a species: E (zt (s)) = µt (s) = g −1 (q · β) (1) where zt (s) is the count value at some location s and for a given time-slice t, q are environmental predictors, and the link function g is a log-tranformation: g(µt (s)) = log (zt (s)) (2) so that the deterministic value can be estimated as:   µ̂t (s0 ) = exp β̂0,t + β̂1,t · q1,t (s0 ) + . . . β̂p,t · qp,t (s0 ) (3) After we fit the GLM, we can derive the Pearson residuals [6], which can be analyzed for spatial auto-correlation, i.e. fit a variogram and then interpolate the residuals using ordinary kriging. The final predictions are made by adding the predicted deterministic part of variation and the interpolated residual variation. Hengl et al. [3] have recently demonstrated that, if only occurrence2 (positive) records are available, the GLM model in Eq.(3) will be hard to fit and will perform poorly. To avoid this problem, we use the GARP to generate the pseudo-absences at locations that are further away from the observation points and at typical locations in feature space where occurrences are unlikely to occur. Next, the occurrences and pseudo-absences are combined and used to fit a GLM model with a log link function. In the third step, the GLM residuals are analyzed for spatial autocorrelation, variogram model fitted and residuals interpolated and added to the GLM predictions. At this stage, we avoid producing spatio-temporal interpolations, and instead focus only on the mean spread of species over the whole period of observation. We aggregate the records for all years and basically ignore the t =time parameter. The target variable is cumulative space-time density expressed in number of observed counts per ha−1 year−1 (counts divided by plot area in ha, and observation period in days/365). 3 R ESULTS 3.1 Species’ potential distribution and temporal trends The derived maps of potential spreading for two species for periods 1990–1995, 1995– 2000, and 2000–2005, can be seen in Fig. 1. In this case, Microtus oeconomus seems 1 The Poisson model is used to account for the skewed distribution of count data, which is typical for most of the species records. The model is fitted using the glm method as implemented in the stats package. 2 Sampling is purposively designed so that surveyors only visit the areas where the anticipate to see certain species. to have a more stable habitat than Rana lessonae. Running these models from R using openModeller as external library is rather straight forward. Fig. 1 demonstrates potential of GARP to detect significant changes in the species population considering its potential habitat. k = 84% k = 96% k = 94% 100.000 66.667 33.333 1990-1995 1995-2000 2000-2005 0.000 k = 92% k = 52.9% k = 66.3% Figure 1: Temporal changes in the species’ habitat suitability (0-100%) for: Microtus oeconomus (above) and Rana lessonae (below); derived using the GARP with Best Subsets method [1], as implemented in the openModeller. k indicates the prediction accuracy. Notice that the accuracy for the habitat suitability maps is high (confusion matrix statistics reported by GARP): it ranges from 84% to 96% for Microtus oeconomus, and from 52.9% to 92.0% for Rana lessonae. Wether this prediction accuracy is realistic or result of over fitting, remains to be tested. 3.2 Actual distribution of the six representative species The result of GLM-kriging show that predictors are significant in explaining the distribution of the species: in the case of Microtus oeconomus the model explains 98% of the variability (N =1716); in the case of Rana lessonae 97% (N =2449). By comparing maps in Fig. 1 and Fig. 2, we can notice that there are quite some difference between the actual and potential spreading of these two species — as anticipated, the actual spreading is much narrower than the potential (GLM-kriging remains the existing hot-spots). Note also that analysis and predictions using GLM modeling is somewhat more complex than GARP. We experienced many constraints considering possible automation of processing. Typically, the fitting of the model is sensitive to extreme values and feature space coverage of point samples. Once we produce maps of the actual spreading of species, we can further analyze such maps for a range of species to determine if their spreading is spatially connected, to derive biodiversity indices and delineate biodiversity hot-spots. Microtus oeconomus 5.00 3.33 1.67 0.00 Rana lessonae 11.000 7.333 3.667 0.000 Figure 2: Actual counts in no ha−1 year−1 (log-scale) interpolated using regressionkriging at 500 m resolution (right). The maps on the left show actual records of this species, with included simulated pseudo-absences. 4 D ISCUSSION AND NEXT STEPS The results of initial testing of R-based workflows show that semi-automated retrieval and generation of distribution maps from NDFF records provides an important insight into the spatio-temporal patterns of the species distribution in the Netherlands. The accuracy of maps we produced is relatively high. In the case of Rana lessonae, the habitat suitability maps for the period 1995–2005 seem to be less accurate with many false positives. This demonstrates that SDM methods need to be tailored species-based; likewise, the future monitoring networks need to be carefully designed to secure maximum possible mapping accuracy given the limited resources. We anticipate that the biggest constraint for the success of semi-automated SDM mapping systems will be the quality of the occurrence/count records — especially their spatial reference that is extremely variable, but also inconsistent sampling density considering both spatial and temporal domains. The model in Eq.(3) assumes that the count data are spatially independent; the variogram of residuals is fitted separately and it is not used to improve GLM modelling. From a theoretical point of view, non-Gaussian, mixed geostatistical models as implemented in the geoR package would be more appropriate for this type of data than simple linear models implemented in the gstat package. GARP is computationally intensive — to generate maps shown in Fig. 1 (364,000 grid cells) takes several minutes per map on a standard PC; based on the default settings. Interpolation of maps using regression-kriging takes even more time. These are all important technical issues that will need to be addressed before we will be able to provide robust tools and related automated mapping products e.g. SDM via a web-service. EcoGRID.nl, a project managed by the Dutch Nature Data Authority, aims at providing researchers, policy-makers and stake-holders with relevant biodiversity information, visualization and tools. This work is also heavily based on the use of the Free and Open Source GIS packages. The authors would like to especially thank the developers of adehabitat, spatstat, gstat and openModeller for kindly providing these tools as open source packages. R EFERENCES [1] R.P. Anderson, D. Lew, and A.T. Peterson. Evaluating predictive models of species’ distributions: criteria for selecting optimal models. Ecological Modelling, 162:211–232, 2003. [2] R. Bivand, E. Pebesma, and V. Rubio. Applied Spatial Data Analysis with R. Use R Series. Springer, Heidelberg, 2008. [3] T. Hengl, H. Sierdsema, A. Radović, and A. Dilo. Spatial prediction of species’ distributions from occurrence-only records: combining point pattern analysis, ENFA and regression-kriging. Ecological Modelling, accepted for publication, 2009. [5] T. Kneib and T. Petzoldt. Introduction to the Special Volume on “Ecology and Ecological Modeling in R”. Journal of Statistical Software, 22(1):1–7, 2007. [6] E. J. Pebesma, R. N. M. Duin, and P. A. Burrough. Mapping sea bird densities over the North Sea: spatially aggregated estimates and temporal changes. Environmetrics, 16(6):573–587, 2005. [7] R. Piñeiro, J. F. Aguilar, D. D. Munt, and G.N. Feliner. Ecology matters: Atlantic-Mediterranean disjunction in the sand-dune shrub Armeria pungens (Plumbaginaceae). Molecular Ecology, 16:2155– 2171, 2007. [8] A. Tsoar, O. Allouche, O. Steinitz, D. Rotem, and R. Kadmon. A comparative evaluation of presenceonly methods for modelling species distribution. Diversity & Distributions, 13(9):397–405, 2007.