[go: up one dir, main page]

Academia.eduAcademia.edu
Ecotoxicology DOI 10.1007/s10646-015-1421-0 Analysing chemical-induced changes in macroinvertebrate communities in aquatic mesocosm experiments: a comparison of methods Eduard Szöcs • Paul J. Van den Brink • Laurent Lagadic • Thierry Caquet Marc Roucaute • Arnaud Auber • Yannick Bayona • Matthias Liess • Peter Ebke • Alessio Ippolito • Cajo J. F. ter Braak • Theo C. M. Brock • Ralf B. Schäfer • Accepted: 17 January 2015 Ó Springer Science+Business Media New York 2015 Abstract Mesocosm experiments that study the ecological impact of chemicals are often analysed using the multivariate method ‘Principal Response Curves’ (PRCs). Recently, the extension of generalised linear models (GLMs) to multivariate data was introduced as a tool to analyse community data in ecology. Moreover, data aggregation techniques that can be analysed with univariate statistics have been proposed. The aim of this study was to compare their performance. We compiled macroinvertebrate abundance datasets of mesocosm experiments designed for studying the effect of various organic chemicals, mainly pesticides, and re-analysed them. GLMs for multivariate data and selected aggregated endpoints were compared to PRCs regarding their performance and Electronic supplementary material The online version of this article (doi:10.1007/s10646-015-1421-0) contains supplementary material, which is available to authorized users. potential to identify affected taxa. In addition, we analysed the inter-replicate variability encountered in the studies. Mesocosm experiments characterised by a higher taxa richness of the community and/or lower taxonomic resolution showed a greater inter-replicate variability, whereas variability decreased the more zero counts were encountered in the samples. GLMs for multivariate data performed equally well as PRCs regarding the community response. However, compared to first axis PRCs, GLMs provided a better indication of individual taxa responding to treatments, as separate models are fitted to each taxon. Data aggregation methods performed considerably poorer compared to PRCs. Multivariate community data, which are generated during mesocosm experiments, should be analysed using multivariate methods to reveal treatmentrelated community-level responses. GLMs for multivariate data are an alternative to the widely used PRCs. E. Szöcs (&)  R. B. Schäfer Institute for Environmental Sciences, University KoblenzLandau, Fortstraße 7, 76829 Landau, Germany e-mail: szoecs@uni-landau.de M. Liess Department System Ecotoxicology, UFZ – Helmholtz Centre for Environmental Research, Permoserstrasse 15, 04318 Leipzig, Germany P. J. Van den Brink  T. C. M. Brock Alterra, Wageningen University and Research Centre, P.O. Box 47, 6700 AA Wageningen, The Netherlands P. Ebke Mesocosm GmbH, Neu-Ulrichstein 5, 35315 Homberg (Ohm), Germany P. J. Van den Brink Department of Aquatic Ecology and Water Quality Management, Wageningen University, Wageningen University and Research Centre, P.O. Box 47, 6700 AA Wageningen, The Netherlands A. Ippolito International Centre for Pesticides and Health Risk Prevention (ICPS), University Hospital Luigi Sacco, Via GB Grassi, 74, 20157 Milan, Italy L. Lagadic  T. Caquet  M. Roucaute  A. Auber  Y. Bayona INRA, UMR0985 Ecologie et Santé des Ecosystèmes, Équipe Écotoxicologie et Qualité des Milieux Aquatiques, Agrocampus Ouest, 65 rue de Saint Brieuc, 35042 Rennes Cedex, France C. J. F. ter Braak Biometris, Wageningen University, Wageningen, The Netherlands 123 E. Szöcs et al. Keywords Mesocosms  Principal Response Curves  Generalised linear models  Multivariate analysis  Community-level effects Introduction Mesocosm experiments Freshwater model ecosystems (hereafter mesocosms) are a valuable tool for ecological risk assessment (ERA) since they allow assessing contaminant effects on community structure and functioning (Giddings et al. 2002; OECD 2006a; Van Leeuwen and Vermeire 2007). Mesocosms are at an intermediate level between laboratory and field studies, regarding ecological relevance and experimental control (Newman and Clements 2008) and are usually conducted if lower-tier studies indicate potential risks (de Jong et al. 2008; EFSA PPR 2013). Lentic and lotic mesocosms are used in ecotoxicology. They allow simulating more or less natural freshwater communities and realistic exposure regimes so that both direct and indirect treatment-related effects can be studied. In addition, they also allow medium- to long-term observations so that latency of effects and recovery can be studied (Brock et al. 2011). Communities in mesocosms are monitored multiple times in control and treated systems before and after toxicant exposure and therefore mesocosm experiments can evaluate causality between treatments and effects. Such experiments generate large and complex data sets: multiple taxa are sampled repeatedly from many replicate mesocosms (Van den Brink 2013). Analysing, summarising, and interpreting such data is challenging and special statistical techniques are needed to assess effects at the community level. The standard tool to analyse community-level responses in mesocosms is a multivariate technique termed Principal Response Curves (PRCs; Van den Brink and Ter Braak 1999). Recently, generalised linear models (GLMs) were proposed as a tool to analyse ecological community data [generalised linear models for multivariate data (GLMmv), Warton et al. 2012]. To date, however, this new method has only been applied to mesocosm data in one study (CañedoArgüelles et al. 2014). Moreover, data aggregation techniques have been suggested to analyse such data (SanchezBayo and Goka 2012; Beketov et al. 2013). We compared these three statistical methods by reanalysing data from several mesocosm studies. Our aim was to evaluate these tools with respect to their ability to detect effects and identification of responding taxa. Moreover, we identified factors affecting inter-replicate variability in mesocosm experiments. 123 Methods Data sets We compiled 11 datasets (published and unpublished, see Table 1) from mesocosm studies. These datasets covered a wide range of experimental designs: pond and stream systems, single and repeated applications, single compounds and mixtures, as well as different sample sizes and durations. Only macroinvertebrate communities were considered in this study. Each of these studies was analysed using PRCs, GLMmv and selected aggregated endpoints. Inter-replicate variability We measured inter-replicate variability as variation in community structure. This can be expressed as dispersion in multivariate space and quantified by using any distance measure between observations (Anderson 2006). We ^ based on Eumeasured the multivariate dispersion (r) clidean distance using log-transformed abundances (see next section), as this distance metric is used in PRCs. For the Euclidean distance the multivariate dispersion is equal to the sum of taxon variances (Anderson et al. 2011). Dispersions for the replicated mesocosms were calculated for every treatment and at each sampling event. We note that inter-replicate variability integrates various processes that lead to variation (e.g. different community development, sampling error, observer bias). We used linear mixed effect modelling (Bolker et al. 2009) to identify factors determining inter-replicate vari^ Two study level characteristics ability (expressed as r). (total number of taxa encountered in the study and duration of the experiment) and three replicate level characteristics [proportion of zero counts, number of replicates and treatment (binary variable: control or treatment)] were used as predictors of variability. Study and treatment within study were used as random intercepts. R2 values were calculated using the approach of Nakagawa and Schielzeth (2013). Model assumptions were visually inspected using diagnostic plots of residuals (Zuur et al. 2010). Principal Response Curves PRCs (Van den Brink and Ter Braak 1998, 1999) is the most widely used method to analyse community-level responses from mesocosm experiments. Variation in such experiments can be explained by three sources of variation: (1) the time course during the experiment, (2) the treatment and (3) the random (residual) variation between replicates. PRC belongs to the multivariate methods of constrained ordination and is a special form of redundancy analysis Chemical-induced changes in macroinvertebrate communities Table 1 Overview and source of the analysed mesocosm datasets ID Type Chemical Application No. of taxa No. of treatmentsa No. of samplings No. replicates per treatment No. replicates control Reference 1 Indoor microcosms Carbendazim Repeated 86 6 7 2 2 Cuppen et al. (2000) 2 Outdoor ponds –b Single 27 5 12 2 4 pers. comm. Ebke, P. 3 Outdoor ponds Mix Repeated 69 3 32 3 4 Auber et al. (2011)c 4 Outdoor ponds Mix Repeated 69 3 32 3 4 Auber et al. (2011)c 5 Outdoor streams Mix Repeated 46 3 5 2 5 Ippolito et al. (2012) 6 Outdoor streams Thiacloprid Repeated 35 4 9 2/4d 10/4d Liess and Beketov (2011) 7 Outdoor ditches Chlorpyrifos Single 188 5 15 2 4 van den Brink et al. (1996) Caquet et al. (2007) 8 Outdoor ponds Deltamethrin Single 40 4 21 4/3 5 9 Outdoor ditches k-Cyhalothrin Repeated 75 6 4 2 2 Roessink et al. (2005) 10 Outdoor ponds Thiram Repeated 55 3 14 2 3 Bayona et al. (2014)c 11 Outdoor ponds Hydrocarbon Repeated 60 5 14 2 3 Bayona et al. (2014)c a Incl. control b Proprietary data c Split into two separate analyses, due to experimental design d Changed during experiment (RDA), a multivariate extension of linear regression with dimension reduction (Legendre and Legendre 2012). For PRC a RDA model is fitted to the multivariate response using treatment, time and their interaction as predictors. However, there is usually little interest in the temporal change due to overall community development during the experiment. Therefore, the main effect of time is ‘partialled out’. The resulting RDA axes (dimensions) display only the variation that can be explained by treatment and the treatment 9 time interaction, but do not display the general temporal trend. PRCs are displayed in a specific diagram, from which one can obtain three types of information that are of primary ecotoxicological interest (see Supplemental Material for an overview): i. ii. iii. Deviations of treated communities from control Displayed as the mean difference of site scores between treatment and control on the first axis over time. Taxa responsible for the observed treatment-related differences Species scores on the first axis indicate the contribution of taxa to the observed pattern. However, species scores should be interpreted with caution and a low taxon weight does not necessarily translate to a small response (Van den Brink and Ter Braak 1999). Test of significance Whether the first PRC axis displays a statistically significant amount of variation can be tested using permutations (Legendre et al. 2011). In a PRC usually only the first axis is explored; however, if responses are complex subsequent PRC axes should also be explored. This allows a large variety of species responses to be represented by a limited number of diagrams (Van den Brink and Ter Braak 1998). PRC preserves the Euclidean distance between samples, therefore, we ln(Ax ? 1) transformed the abundances, where the term Ax was chosen to be 2 for the lowest abundance value (x) greater than zero (Van den Brink et al. 2000). The significance of the overall treatment effect (main effect and interaction) was tested using permutations and the first eigenvalue (Van den Brink and Ter Braak 1998). Given that mesocosms are sampled multiple times during the experiment, samples are not independent, which we considered by restricting the possible permutations. To evaluate the significance of the effect at each time point (e.g. to determine possible community recovery after a particular time point), separate RDAs using treatment as predictor were applied to each sampling event. We used 1,000 permutations for all significance tests. Generalized linear models for multivariate data Count data, as usually recorded in mesocosm experiments, are inherently not normally distributed: counts have a lower bond at zero, are discrete values and show typically a 123 E. Szöcs et al. mean–variance relationship. Therefore, other distributions than the normal distribution should be used to model such kind of data (Stroup 2014). GLMs can model different types of distributions, the normal distribution being a special case. Generalized Linear Models for multivariate data (GLMmv) are the extension of GLMs to a multivariate response. As demonstrated recently, these may have higher statistical power compared to multivariate techniques like RDA that commonly have been used by ecologists (Warton et al. 2012). GLMmv generalizes the regression model approach of (partial) RDA, as it is also possible to specify a normal response distribution. However, GLMmv differ from PRCs in that they cannot perform dimension reduction. GLMmv fit separate GLMs to each taxon, where the error distribution of the response is chosen by the type of response. To make inference about the (multivariate) community response these individual species models are P combined using the sum-of-Likelihood-Ratios ( LR) statistic (Warton 2011; Warton et al. 2012). Contributions of each taxon to the community response can be derived from the deviance of the univariate GLM and univariate responses are directly available. Model assumptions of univariate models can be checked using residual plots (see Supplemental Material for an overview). We fitted GLMmv to the dataset using the negative binomial distribution, because it is more flexible than the Poisson distribution and appropriate for count data (O’Hara and Kotze 2010). Nested models were compared using a Likelihood-Ratio test based on permutations to make inference about the community response. The model y  time þ treatment þ time  treatment incorporates a treatment 9 time interaction, therefore, allows the treatment effect to vary over time. For testing any treatment related effect, this model was compared to a model including only time as predictor y  time Moreover, we examined the interaction by testing the treatment effects for each sampling event. As with PRC significance was tested using 1,000 restricted permutations. Aggregated endpoints Aggregated endpoints summarise the multivariate response to a single response variable, which can be analysed using univariate techniques. Sanchez-Bayo and Goka (2012) studied four community endpoints: total abundance, taxonomic richness, a diversity index (Shannon–Wiener) and a similarity index (percentage difference, i.e., Bray–Curtis coefficient). Beketov et al. (2013) showed in a simulation study that using the total abundance as response variable 123 had better statistical power than using RDA, especially if inter-replicate variation was high. Three community endpoints were calculated for each sampling event and treatment: i. ii. iii. Total abundance (N) The sum of all individuals. Additionally, the sum of log-transformed abundances (N (log)) was used, as this down-weights highly abundant taxa. Taxa richness (TR) The sum of taxa encountered in the samples. Diversity index (H’) Expressed as Shannon–Wiener index. Similarity indices were not considered in our study, as they have already been compared with PRCs by Van den Brink and Ter Braak (1998) and proven to be less sensitive for detecting effects. The overall treatment effect was assessed for these endpoints by analysis of variance (ANOVA). Comparison of methods Performances of GLMmv and aggregated endpoints were compared to PRCs to assess (a) whether the conclusions drawn based on statistical significance are in agreement and (b) their ability to reveal taxon responses. We compare both by testing the overall treatment effect (treatment ? treatment 9 time interaction) in each study and testing the treatment effect per sampling event (165 sampling events in 11 studies, see Table 1). Ecotoxicologists are mainly interested in testing hypotheses and determining the no observed effect concentrations (Brock et al. 2014). Therefore, we based our comparisons on p values related to the respective hypotheses. Although the use of p values is subject to debate (e.g. Ellison et al. 2014), they provide a common currency to compare the three investigated methods, as all methods have been applied to the same datasets and tested the same hypotheses. The underlying characteristics of the real world data are unknown. For convenience and because it represents the current standard method in ERA, we defined PRC as the reference method. We used contingency table statistics to compare the performances of the other methods to PRCs. False negative rates (FNR) were calculated, where the results of PRCs were assumed as reference (truth): FNR ¼ FN=ðTP þ FN Þ where TP = true positive, i.e., both, PRCs and the compared method showed a statistically significant effect; FN = false negative, i.e., PRCs showed a statistically significant effect, but the compared method did not. Chemical-induced changes in macroinvertebrate communities Software Table 2 Mixed-effects modelling of the effects of study properties on inter-replicate variability All computations were performed using R [version 3.0.2 on Linux, 64-bit (R Core Team 2013)]. Linear mixed effect models were fitted using the lme4 package (Bates et al. 2014). PRCs were calculated using the vegan package (Oksanen et al. 2013), GLMmv using the mvabund package (Wang et al. 2012) and restricted permutations were created using the permute package (Simpson 2013). The data cannot be provided because it is partly proprietary. However, a reproducible R tutorial on how to analyse mesocosm data of Study no. 7 with the above mentioned methods is provided in the Supplemental Material. Model parameters b [95 % CI] Variance Fixed effects Intercept % zero counts 5.908 -0.67 [5.263, 6.552] [-0.798, -0.535] Total no. taxa 3.213 [2.614, 3.811] Day 0.172 [0.081, 0.263] No. replicates 0.004 [-0.163, 0.179] -0.118 [-0.596, 0.360] Treatment Random effects Treatment in study 0.30 Study 0.81 Residual 0.86 Model statistics Results Most studies showed comparable inter-replicate variability (median r^ = 5.08, median absolute deviation r^ = 1.19). Study no. 7 showed greatest (median r^ = 14.19) and Study no. 2 (median r^ = 2.89) the smallest variability. r^ decreased as more zero counts were recorded in the samples, and increased slightly during the time course of the experiments (Table 2). However, the most important factor determining variability was the total number of taxa recorded in the studies. The inter-replicate variability increased with the number of taxa recorded. We found no relationship between variability and the number of replicates as well as between control and treatment replicates (Table 2 and Supplemental Material). Results obtained by GLMmv were in agreement with results from PRCs. Analysing the overall pattern by comparing the full with reduced models resulted in matching conclusions in 10 of 11 studies (Fig. 1a). For Study no. 5 PRCs showed a statistically significant result, whereas GLMmv did not. Except for this study, p-values obtained from GLMmv were equal or lower to those obtained from PRCs. GLMmv showed high taxon deviances, where the PRCs assigned low species scores (Fig. 2). Comparable outcomes were found for GLMmv and RDA when applied per sampling event (Fig. 1b). Data aggregation methods did not perform as well as PRCs. Compared with the outcome of PRCs, FNR for data aggregation methods were high (mean FNR = 48.6 ± 2.9 %). For total abundance, down-weighting abundant taxa by using the logarithm improved the performance only slightly (FNR = 46.5 %). Performance of the three investigated aggregation methods was best when taxa richness was used as the descriptor of macroinvertebrate communities (FNR = 45.9 %). R2LMMðmÞ 80.3 % R2LMMðcÞ 92.2 % AIC 1,845.6 BIC 1,885.7 N 640 NStudy 11 Ntreatment 47 Predictors have been scaled to zero mean and unit variance. All parameters were estimated using restricted maximum likelihood and model assumptions were checked visually. Bold values indicate parameters with zero outside the confidence interval CI confidence interval, R2LMMðmÞ marginal R2LMM , R2LMMðcÞ conditional R2LMM , AIC Akaike Information Criterion, BIC Bayesian information criterion Discussion Inter-replicate variability The total number of taxa encountered in the studies was most strongly associated with inter-replicate variability: The smaller the number of taxa and/or the greater the taxonomic resolution of identification, the greater the interreplicate variability. A greater number of taxa is likely associated with greater community complexity. A stochastic loss of species may lead to very different communities, depending on species functions and interactions within the community. This might explain why more complex communities showed a greater inter-replicate variability. More importantly, another explanation is the dispersion measure. For Euclidean distances r^ increases as more taxa are recorded within the community, explaining the great 123 E. Szöcs et al. a b Fig. 1 p values analysing the 11 datasets with principal response curves and multivariate generalized linear models. a p value of GLMmv versus PRCs assessing overall treatment effect (treatment ? treatment 9 time interaction) in the 11 studies. The point of equal performance at p = 0.001 comprises studies 1, 2, 6 and 8. b p values of testing treatment effect per sampling event (165 sampling events in total, see Table 1). Black line represents the 1:1 line, dashed lines show an alpha-level of 0.05. Study ID refers to Table 1 (Color figure online) variability of the most species-rich mesocosm study (Study no. 7). This is also enhanced by a different taxonomic resolution between studies. A lower taxonomic resolution leads to aggregation into higher levels and therefore to fewer taxa in the samples. Variability decreased as more zero counts were encountered in the samples. This could be attributed to the underlying measure of multivariate dispersion. The Euclidean distance, as used in PRC, takes the joint absence of a taxon from two observations into account. For example, samples having no taxa in common may be more similar than samples sharing the same taxa. This phenomenon is widely known as the double-zero problem (Legendre and Legendre 2012). Therefore, replicates with many zero counts tend to be more similar and cause reduced variability. Distance measures that ignore double zeros might be more appropriate in this case (Legendre and Legendre 2012). Over the experiments time course variability increased slightly. Similarly, using univariate population level statistics, Sanderson et al. (2009) found an initial increase of the variability in micro/mesocosm studies. Knauer et al. (2005) used multivariate dispersion measures to analyse inter-replicate variability of zooplankton over a period of 3 years. They found an increase of variability over time only during 1 year and the variability between years was greater than within a given year. Similarly, our results suggest that this observed increase in variability is negligible as compared to other sources of variation (Table 2 and Supplemental Material). Generalized linear models for multivariate data 123 We found that GLMmv showed a similar performance when compared to PRCs (Fig. 1). Except for Study no. 5, GLMmv provided comparable inferences for the overall treatment effect test. The experimental setup in Study no. 5 was profoundly different from the setups of all other studies considered: no pre-application data were available, as samples were taken immediately after treatments (Ippolito et al. 2012). Mesocosm studies usually show a characteristic pattern: diverging communities directly after treatment followed by possible recovery during the time course of the experiment, when exposure has ceased. The study conducted by Ippolito et al. (2012) lacked this pattern with a constant treatment effect over time. Therefore, the interaction term may be ignored in this experiment (no significant treatment 9 time interaction, p = 0.375). However, we fitted the same model including the interaction term to each dataset for purpose of comparison. Omitting the interaction term for Study no. 5 lead to a statistically significant treatment effect on the communities (p = 0.012). An important difference between GLMmv and PRCs is that the latter implies a dimension reduction. We considered in our study only the first PRC axis (rank 1), whereas GLMmv work on full rank. Significance tests in PRCs could be also done in full rank. This difference did not affect our results, as the first PRC axis displayed the maximum proportion of explained variance (see Supplemental Material). Chemical-induced changes in macroinvertebrate communities Fig. 2 Absolute PRC species scores compared to GLM deviances. Both represent a measure of species response to treatment. Each point represents a taxon and both measures represent the contribution of this taxon to the community response. Deviances and species scores were standardized within study. Colour indicates whether the univariate GLMs showed a statistically significant treatment effect (p values are unadjusted for multiple testing). Black line represents the 1:1 line. Study ID refers to Table 1 (Color figure online) Using simulations, Warton et al. (2012) showed that GLMmv have better power properties than (full-rank) RDA when error variance of the affected taxa is low (compared to unaffected taxa). The main reason is that the pseudo-F statistics in RDA weights each taxon equally, instead of P weighing inversely with the error variance, as the LR statistic does. Using datasets of mesocosm experiments we found similar results for PRCs and GLMmv. This could be explained by the fact that in most studies affected and unaffected taxa showed a similar error variance. GLMmv indicated high single taxon deviances, where PRCs assigned low species scores. Therefore, GLMmv were able to identify responding taxa that would be missed by PRCs when only the first PRC axis is taken into account. This represents an advantage of modelling single taxon responses. Univariate statistics are directly obtained, whereas PRCs can only identify taxa that follow the global pattern. Note, however, that if more than one PRC axis is considered, more variable response patterns could be displayed, but this makes also interpretation more complex (Van den Brink and Ter Braak 1998; Daam et al. 2008). Moreover, it is recommended that a community-level analysis with PRCs should always be accompanied by a univariate analysis of single taxa to ensure that all responses of important taxa have been captured (Giddings et al. 2002; de Jong et al. 2008; Van den Brink 2013; EFSA PPR 2013). The difference between full rank and rank 1 fits is also present for taxon responses. Nevertheless, we used rank-1 species scores, as this is the preferred statistic used by ecotoxicologists to assess taxon responses. The results of PRCs can be displayed in an easy-tointerpret diagram, whereas GLMmv are lacking such a representation. Nevertheless, a similar graphical representation can be extracted from the results. Taxa can be P ordered by their deviance and the LR statistics from perweek tests can be plotted along a time axis which greatly resembles the response indicated by PRC. GLMmv are computationally much more demanding than PRCs. The 123 E. Szöcs et al. computations for a medium sized study (e.g. Study no. 10, 55 taxa and 14 sampling events) took 4 s for PRCs, whereas it took 18 min for GLMmv (on a Linux machine with 64-bit, 8 GB RAM and 2.27 GHz). RDA and PRCs are both based on the Euclidean distance between samples. As previously outlined, this distance measure is sensitive to double zeros in the data. Therefore, other distance measures that do not take the joint absence of taxa from samples into account (Asymmetric coefficients) are frequently used to analyse data from field studies. The mesocosm studies considered here, partly showed zero counts as high as those observed in field studies (see Supplemental Material). For example, in the study conducted by Szöcs et al. (2012) on the effects of salinisation and pesticides on macroinvertebrate communities in South Australian streams, on average, 78 % of taxa exhibited zero counts. Especially when the number of zero counts is high, asymmetric distance measures may result in a better performance. A PRC-like analysis, using any distance measure, can be performed by distance-based RDA (db-RDA) (Legendre and Anderson 1999; McArdle and Anderson 2001). In fact, results of db-RDA are identical to those of RDA when db-RDA is used with Euclidean distances. Asymmetric distance measures, like the percentage difference coefficient, have already been used to analyse data from mesocosm experiments, but only as univariate response (e.g. Van den Brink and Ter Braak 1998; SanchezBayo and Goka 2012). Alternatively, special transformations of community data allows using RDA with data containing many zeros (Legendre and Gallagher 2001). This method is known as transformation-based RDA (tbRDA). To our knowledge, both methods have not yet been applied to mesocosm experiments in ecotoxicology. However, they may have a better performance when zero counts are high. Hence, their suitability to analyse mesocosm experiments should be further explored. Liess and Beketov (2011) adapted a trait-based indicator for mesocosm application (SPEARmesocosm). In this index, the relative (log-transformed) abundance of species classified as potentially vulnerable is calculated based on a binary species classification system and can be analysed using univariate statistical techniques. Notwithstanding that aggregation of vulnerable taxa may be more powerful, this could not be tested here, as SPEARmesocosm was developed for lotic systems. However, a similar trait-based indicator for lentic systems is currently under development (personal communication; Roucaute, M.; INRA). In the past, the use of this method has been subject to debates (Van den Brink and Ter Braak 2012; Liess and Beketov 2012). We note, that instead of analysing classifications of species based on traits, the (weighted) multivariate trait response could also be analysed using PRCs or GLMmv. Aggregation of data results in a loss of information, since all complexity of the response is aggregated into a single variable. For instance, the taxon response is essential for an ecological and mechanistic interpretation of results. Aggregated endpoints loose this information because taxon responses cannot be directly derived. By contrast, techniques taking the multivariate nature of the data into account preserve most of the information in the data. They enable conclusions on the responses of taxa, with GLMmv providing a clearer overview of responsive taxa. Nevertheless, in situations of high variability among taxa, aggregated endpoints that group taxa by relevant traits that influence sensitivity and vulnerability to the test item (e.g. taxonomic group, voltinism, dispersal capacity, trophic classification), may be favourable, because such a grouping decreases variability. A limitation of the current study is that our comparison is based on 11 empirical mesocosm studies. Further simulation studies, comparing the methods based on datasets with known properties that mimic mesocosm data, are needed to examine the conditions under which the different methods are most suitable to analyse mesocosm experiments. Aggregated endpoints Implications for ecotoxicology All aggregated endpoints considered here performed considerably poorer than PRCs. Beketov et al. (2013) compared RDA with t-tests on total abundance by simulation and concluded that the latter has better power properties. In their simulation, abundance of all simulated taxa decreased uniformly due to treatment and the proportion of zero counts varied. However, a uniform decline of taxa is an oversimplification of treatment effects. In mesocosms, it is likely that some taxa may benefit from treatment because of biotic interactions or show no effect. Such effects would reduce the power of the total abundance endpoint, but would be captured by PRCs and GLMmv. This could explain discrepancies between this study and that of Beketov et al. (2013). 123 Data sets frequently encountered in community ecotoxicology are inherently not normally distributed, e.g., proportions (bounded between 0 and 1), abundances (positive integer data) or biomass (strictly positive) (Wang and Riffel 2011). GLMs offer a powerful and flexible parametric framework to analyse such non-normal data (O’Hara and Kotze 2010). However, they are rarely used in ecotoxicology. For example GLMs are only marginally discussed in the guideline on statistical analysis of ecotoxicological data (OECD 2006b). Instead, scientists often use data transformations to achieve normality and variance homogeneity or use non-parametric tests (Wang and Riffel Chemical-induced changes in macroinvertebrate communities 2011). GLMs may have considerable advantages, not only in the multivariate case (Stroup 2014). A large amount of data are generated for environmental risk assessment, but only a small fraction is made publicly available (Schäfer et al. 2013). Most raw data sets used in this paper were provided only upon personal request and most requests to industry remained unsuccessful. Note that data for Study no. 7 is also available as an example data in the Canoco computer package and the vegan R package. We argue to not let data go to waste (sensu Poisot et al. 2013) and make data publicly available. The interest of industry to protect their intellectual property could be considered by anonymizing properties of the substance. Ecotoxicology will undoubtedly benefit from open data. For example, it enhances transparency and, by combining data sets from different experiments, patterns may emerge that might be overlooked otherwise. When dealing with community data, statistical methods that take the multivariate nature of the data into account are advisable. Data aggregation methods are, despite their simplicity, an inappropriate tool to analyse mesocosm data for ERA. GLMmv are an alternative to the widely used PRCs, yielding to similar results regarding the community response and a better indication of responding taxa. Acknowledgments The authors gratefully acknowledge the contributions from scientists and technicians from the INRA Experimental Unit of Aquatic Ecology and Ecotoxicology and Ecotoxicology and Quality and Aquatic Research Group during mesocosm experiments. These experiments were conducted in programmes funded by the Interface Recherche-Expertise pour l’Evaluation du Risque en Ecotoxicologie coordinated by the Structure Scientifique Mixte INRA– Direction Générale de l’Alimentation (Study no. 8), by the Ministry of Ecology, Sustainable Development and Energy through its ‘‘Pesticides’’ research programme (Study Nos. 3, 4), and by TOTAL S.A. (Study Nos. 10, 11). A. Auber and Y. Bayona PhDs were funded through a grant from The Region Bretagne and INRA and by TOTAL S.A., respectively. The studies conducted at Wageningen UR (Study nos. 1, 7 and 9) were financially supported by the Dutch Ministry of Economic Affairs, as well as the contribution of Paul Van den Brink and Theo Brock to this paper (Project BO-20-002-001). The data sets of these studies are available upon request. Conflict of interest of interest. The authors declare that they have no conflict References Anderson MJ (2006) Distance based tests for homogeneity of multivariate dispersions. Biometrics 62:245–253 Anderson MJ, Crist TO, Chase JM, Vellend M, Inouye BD, Freestone AL, Sanders NJ, Cornell HV, Comita LS, Davies KF, Harrison SP, Kraft NJB, Stegen JC, Swenson NG (2011) Navigating the multiple meanings of beta diversity: a roadmap for the practicing ecologist. Ecol Lett 14:19–28 Auber A, Roucaute M, Togola A, Caquet T (2011) Structural and functional effects of conventional and low pesticide input crop- protection programs on benthic macroinvertebrate communities in outdoor pond mesocosms. Ecotoxicology 20:2042–2055 Bates D, Maechler M, Bolker B, Walker S (2014) lme4: linear mixedeffects models using Eigen and S4. R package version 1.1-0. https://github.com/lme4/lme4/ Bayona Y, Roucaute M, Cailleaud K, Lagadic L, Bassères A, Caquet T (2014) Effect of thiram and a petroleum distillate on freshwater macroinvertebrate communities in outdoor stream and pond mesocosms: I Study design, chemical fate and structural responses. Ecotoxicology (submitted) Beketov MA, Kattwinkel M, Liess M (2013) Statistics matter: data aggregation improves identification of community-level effects compared to a commonly used multivariate method. Ecotoxicology 22:1–10 Bolker B, Brooks M, Clark C, Geange S, Poulsen J, Stevens M, White J (2009) Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol Evol 24:127–135 Brock TCM, Arts GHP, ten Hulscher TEM, de Jong FMV, Luttik R, Roex EWM, Smit CE, van Vliet PJM (2011) Aquatic effect assessment for plant protection products; Dutch proposal that addresses the requirements of the Plant Protection Product Regulation and Water Framework Directive. Alterra Report 2235, Alterra, Wageningen Brock TCM, Hammers-Wirtz M, Hommen U et al (2014) The minimum detectable difference (MDD) and the interpretation of treatment-related effects of pesticides in experimental ecosystems. Environ Sci Pollut Res 22(2):1160–1174. doi:10.1007/ s11356-014-3398-2 Cañedo-Argüelles M, Bundschuh M, Gutiérrez-Cánovas C, Kefford BJ, Prat N, Trobajo R, Schäfer RB (2014) Effects of repeated salt pulses on ecosystem structure and functions in a stream mesocosm. Sci Total Environ 476–477:634–642 Caquet T, Hanson M, Roucaute M, Graham D, Lagadic L (2007) Influence of isolation on the recovery of pond mesocosms from the application of an insecticide. II Benthic macroinvertebrate responses. Environ Toxicol Chem 26:1280–1290 Cuppen JGM, Van den Brink PJ, Camps E, Uil KF, Brock TCM (2000) Impact of the fungicide carbendazim in freshwater microcosms. I. Water quality, breakdown of particulate organic matter and responses of macroinvertebrates. Aquat Toxicol 48:233–250 Daam MA, Van den Brink PJ, Nogueira AJA (2008) Impact of single and repeated applications of the insecticide chlorpyrifos on tropical freshwater plankton communities. Ecotoxicology 17: 756–771 De Jong FMW, Brock TCM, Foekema EM, Leeuwangh P (2008) Guidance for summarizing and evaluating aquatic micro- and mesocosm studies. RIVM Report 601506009/2008, RIVM EFSA PPR (2013) Guidance on tiered risk assessment for plant protection products for aquatic organisms in edge-of-field surface waters. EFSA Panel on Plant Protection Products and their Residues (PPR). Parma, Italy. EFSA J 11(7):3290 Ellison AM, Gotelli NJ, Inouye BD, Strong DR (2014) P values, hypothesis testing, and model selection: it’s déjà vu all over again. Ecology 95:609–610 Giddings JM, Brock TCM, Heger W, Heimbach F, Maund SJ, Norman S, Ratte H-T, Schäfers C, Streloke M (eds) (2002) Community-level aquatic system studies—interpretation criteria (CLASSIC). SETAC, Pensacola Ippolito A, Carolli M, Varolo E, Villa S, Vighi M (2012) Evaluating pesticide effects on freshwater invertebrate communities in alpine environment: a model ecosystem experiment. Ecotoxicology 21:2051–2067 Knauer K, Maise S, Thoma G, Hommen U, Gonzalez-Valero J (2005) Long-term variability of zooplankton populations in aquatic mesocosms. Environ Toxicol Chem 24:1182–1189 123 E. Szöcs et al. Legendre P, Anderson MJ (1999) Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecol Monogr 69:1–24 Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129:271–280 Legendre P, Legendre L (2012) Numerical ecology. Elsevier, Amsterdam Legendre P, Oksanen J, ter Braak CJF (2011) Testing the significance of canonical axes in redundancy analysis. Methods Ecol Evol 2:269–277 Liess M, Beketov M (2011) Traits and stress: keys to identify community effects of low levels of toxicants in test systems. Ecotoxicology 20:1328–1340 Liess M, Beketov MA (2012) Rebuttal related to ‘‘Traits and Stress: Keys to identify community effects of low levels of toxicants in test systems’’ by Liess and Beketov (2011). Ecotoxicology 21:300–303 McArdle BH, Anderson MJ (2001) Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82:290–297 Nakagawa S, Schielzeth H (2013) A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol Evol 4:133–142 Newman MC, Clements WH (2008) Ecotoxicology : a comprehensive treatment. CRC Press, Boca Raton O’Hara RB, Kotze DJ (2010) Do not log-transform count data. Methods Ecol Evol 1:118–122 OECD (2006a) Guidance document on simulated freshwater lentic field tests (outdoor microcosms and mesocosms). No. 53 in Series on testing and assessment. OECD, Paris OECD (2006b) current approaches in the statistical analysis of ecotoxicity data: a guidance to application. No. 54 in Series on testing and assessment. OECD, Paris Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara RB, Simpson GL, Solymos P, Stevens MHH, Wagner H (2013) Vegan: Community Ecology Package. R package version 2.1-41. http://vegan.r-forge.r-project.org/ Poisot TE, Mounce R, Gravel D (2013) Moving toward a sustainable ecological science: don’t let data go to waste! Ideas Ecol Evol 6:11–19 Roessink I, Arts GHP, Belgers JDM, Bransen F, Maund SJ, Brock TCM (2005) Effects of lambda-cyhalothrin in two ditch microcosm systems of different trophic status. Environ Toxicol Chem 24:1684–1696 Sanchez-Bayo F, Goka K (2012) Evaluation of suitable endpoints for assessing the impacts of toxicants at the community level. Ecotoxicology 21:667–680 Sanderson H, Laird B, Brain R, Wilson CJ, Solomon KR (2009) Detectability of fifteen aquatic micro/mesocosms. Ecotoxicology 18:838–845 123 Schäfer RB, Bundschuh M, Focks A, von der Ohe PC (2013) Letter to the Editor. Environ Toxicol Chem 32:734–735 Simpson GL (2013) Permute: functions for generating restricted permutations of data. R package version 0.8-0. http://CRAN.Rproject.org/package=permute Stroup WW (2014) Rethinking the analysis of non-normal data in plant and soil science. Agron J 106:1–17 Szöcs E, Kefford BJ, Schäfer RB (2012) Is there an interaction of the effects of salinity and pesticides on the community structure of macroinvertebrates? Sci Total Environ 437:121–126 R Core Team (2013) R: a language and environment for statistical computing. Vienna, Austria. http://www.R-project.org/ Van den Brink PJ (2013) Assessing aquatic copulation and community-level risks of pesticides. Environ Toxicol Chem 32:972–973 Van den Brink PJ, ter Braak CJF (1998) Multivariate analysis of stress in experimental ecosystems by principal response curves and similarity analysis. Aquat Ecol 32:163–178 Van den Brink PJ, ter Braak CJF (1999) Principal response curves: analysis of time-dependent multivariate responses of biological community to stress. Environ Toxicol Chem 18:138–148 Van den Brink PJ, ter Braak CJF (2012) Response to ‘‘Traits and stress: keys to identify community effects of low levels of toxicants in test systems’’ by Liess and Beketov (2011). Ecotoxicology 21:297–299 Van den Brink PJ, van Wijngaarden RPA, Lucassen WGH, Brock TCM, Leeuwangh P (1996) Effects of the insecticide Dursban 4E (active ingredient chlorpyrifos) in outdoor experimental ditches: II. Invertebrate community responses and recovery. Environ Toxicol Chem 15:1143–1153 Van den Brink PJ, Hattink J, Bransen F, van Donk E, Brock TCM (2000) Impact of the fungicide carbendazim in freshwater microcosms. II. Zooplankton, primary producers and final conclusions. Aquat Toxicol 48:251–264 Van Leeuwen CJ, Vermeire TG (2007) Risk assessment of chemicals: an introduction. Springer, Netherlands Wang M, Riffel M (2011) Making the right conclusions based on wrong results and small sample sizes: interpretation of statistical tests in ecotoxicology. Ecotoxicol Environ Saf 74:684–692 Wang Y, Naumann U, Wright ST, Warton DI (2012) mvabund—an R package for model-based analysis of multivariate abundance data. Methods Ecol Evol 3:471–474 Warton D (2011) Regularized sandwich estimators for analysis of high-dimensional data using generalized estimating equations. Biometrics 67:116–123 Warton DI, Wright ST, Wang Y (2012) Distance-based multivariate analyses confound location and dispersion effects. Methods Ecol Evol 3:89–101 Zuur A, Ieno E, Elphick C (2010) A protocol for data exploration to avoid common statistical problems. Methods Ecol Evol 1:3–14