Ecotoxicology
DOI 10.1007/s10646-015-1421-0
Analysing chemical-induced changes in macroinvertebrate
communities in aquatic mesocosm experiments: a comparison
of methods
Eduard Szöcs • Paul J. Van den Brink • Laurent Lagadic • Thierry Caquet
Marc Roucaute • Arnaud Auber • Yannick Bayona • Matthias Liess •
Peter Ebke • Alessio Ippolito • Cajo J. F. ter Braak • Theo C. M. Brock •
Ralf B. Schäfer
•
Accepted: 17 January 2015
Ó Springer Science+Business Media New York 2015
Abstract Mesocosm experiments that study the ecological impact of chemicals are often analysed using the
multivariate method ‘Principal Response Curves’ (PRCs).
Recently, the extension of generalised linear models
(GLMs) to multivariate data was introduced as a tool to
analyse community data in ecology. Moreover, data aggregation techniques that can be analysed with univariate
statistics have been proposed. The aim of this study was to
compare their performance. We compiled macroinvertebrate abundance datasets of mesocosm experiments designed for studying the effect of various organic chemicals,
mainly pesticides, and re-analysed them. GLMs for multivariate data and selected aggregated endpoints were
compared to PRCs regarding their performance and
Electronic supplementary material The online version of this
article (doi:10.1007/s10646-015-1421-0) contains supplementary
material, which is available to authorized users.
potential to identify affected taxa. In addition, we analysed
the inter-replicate variability encountered in the studies.
Mesocosm experiments characterised by a higher taxa
richness of the community and/or lower taxonomic
resolution showed a greater inter-replicate variability,
whereas variability decreased the more zero counts were
encountered in the samples. GLMs for multivariate data
performed equally well as PRCs regarding the community
response. However, compared to first axis PRCs, GLMs
provided a better indication of individual taxa responding
to treatments, as separate models are fitted to each taxon.
Data aggregation methods performed considerably poorer
compared to PRCs. Multivariate community data, which
are generated during mesocosm experiments, should be
analysed using multivariate methods to reveal treatmentrelated community-level responses. GLMs for multivariate
data are an alternative to the widely used PRCs.
E. Szöcs (&) R. B. Schäfer
Institute for Environmental Sciences, University KoblenzLandau, Fortstraße 7, 76829 Landau, Germany
e-mail: szoecs@uni-landau.de
M. Liess
Department System Ecotoxicology, UFZ – Helmholtz Centre for
Environmental Research, Permoserstrasse 15, 04318 Leipzig,
Germany
P. J. Van den Brink T. C. M. Brock
Alterra, Wageningen University and Research Centre,
P.O. Box 47, 6700 AA Wageningen, The Netherlands
P. Ebke
Mesocosm GmbH, Neu-Ulrichstein 5, 35315 Homberg (Ohm),
Germany
P. J. Van den Brink
Department of Aquatic Ecology and Water Quality
Management, Wageningen University, Wageningen University
and Research Centre, P.O. Box 47, 6700 AA Wageningen,
The Netherlands
A. Ippolito
International Centre for Pesticides and Health Risk Prevention
(ICPS), University Hospital Luigi Sacco, Via GB Grassi, 74,
20157 Milan, Italy
L. Lagadic T. Caquet M. Roucaute A. Auber Y. Bayona
INRA, UMR0985 Ecologie et Santé des Ecosystèmes, Équipe
Écotoxicologie et Qualité des Milieux Aquatiques, Agrocampus
Ouest, 65 rue de Saint Brieuc, 35042 Rennes Cedex, France
C. J. F. ter Braak
Biometris, Wageningen University, Wageningen,
The Netherlands
123
E. Szöcs et al.
Keywords Mesocosms Principal Response Curves
Generalised linear models Multivariate analysis
Community-level effects
Introduction
Mesocosm experiments
Freshwater model ecosystems (hereafter mesocosms) are a
valuable tool for ecological risk assessment (ERA) since
they allow assessing contaminant effects on community
structure and functioning (Giddings et al. 2002; OECD
2006a; Van Leeuwen and Vermeire 2007). Mesocosms are
at an intermediate level between laboratory and field
studies, regarding ecological relevance and experimental
control (Newman and Clements 2008) and are usually
conducted if lower-tier studies indicate potential risks (de
Jong et al. 2008; EFSA PPR 2013).
Lentic and lotic mesocosms are used in ecotoxicology.
They allow simulating more or less natural freshwater communities and realistic exposure regimes so that both direct and
indirect treatment-related effects can be studied. In addition,
they also allow medium- to long-term observations so that
latency of effects and recovery can be studied (Brock et al.
2011). Communities in mesocosms are monitored multiple
times in control and treated systems before and after toxicant
exposure and therefore mesocosm experiments can evaluate
causality between treatments and effects.
Such experiments generate large and complex data sets:
multiple taxa are sampled repeatedly from many replicate
mesocosms (Van den Brink 2013). Analysing, summarising, and interpreting such data is challenging and special
statistical techniques are needed to assess effects at the
community level.
The standard tool to analyse community-level responses
in mesocosms is a multivariate technique termed Principal
Response Curves (PRCs; Van den Brink and Ter Braak
1999). Recently, generalised linear models (GLMs) were
proposed as a tool to analyse ecological community data
[generalised linear models for multivariate data (GLMmv),
Warton et al. 2012]. To date, however, this new method has
only been applied to mesocosm data in one study (CañedoArgüelles et al. 2014). Moreover, data aggregation techniques have been suggested to analyse such data (SanchezBayo and Goka 2012; Beketov et al. 2013).
We compared these three statistical methods by reanalysing data from several mesocosm studies. Our aim
was to evaluate these tools with respect to their ability to
detect effects and identification of responding taxa.
Moreover, we identified factors affecting inter-replicate
variability in mesocosm experiments.
123
Methods
Data sets
We compiled 11 datasets (published and unpublished, see
Table 1) from mesocosm studies. These datasets covered a
wide range of experimental designs: pond and stream
systems, single and repeated applications, single compounds and mixtures, as well as different sample sizes and
durations. Only macroinvertebrate communities were
considered in this study. Each of these studies was analysed
using PRCs, GLMmv and selected aggregated endpoints.
Inter-replicate variability
We measured inter-replicate variability as variation in
community structure. This can be expressed as dispersion
in multivariate space and quantified by using any distance
measure between observations (Anderson 2006). We
^ based on Eumeasured the multivariate dispersion (r)
clidean distance using log-transformed abundances (see
next section), as this distance metric is used in PRCs. For
the Euclidean distance the multivariate dispersion is equal
to the sum of taxon variances (Anderson et al. 2011).
Dispersions for the replicated mesocosms were calculated
for every treatment and at each sampling event. We note
that inter-replicate variability integrates various processes
that lead to variation (e.g. different community development, sampling error, observer bias).
We used linear mixed effect modelling (Bolker et al.
2009) to identify factors determining inter-replicate vari^ Two study level characteristics
ability (expressed as r).
(total number of taxa encountered in the study and duration
of the experiment) and three replicate level characteristics
[proportion of zero counts, number of replicates and
treatment (binary variable: control or treatment)] were used
as predictors of variability. Study and treatment within
study were used as random intercepts. R2 values were
calculated using the approach of Nakagawa and Schielzeth
(2013). Model assumptions were visually inspected using
diagnostic plots of residuals (Zuur et al. 2010).
Principal Response Curves
PRCs (Van den Brink and Ter Braak 1998, 1999) is the
most widely used method to analyse community-level responses from mesocosm experiments. Variation in such
experiments can be explained by three sources of variation:
(1) the time course during the experiment, (2) the treatment
and (3) the random (residual) variation between replicates.
PRC belongs to the multivariate methods of constrained
ordination and is a special form of redundancy analysis
Chemical-induced changes in macroinvertebrate communities
Table 1 Overview and source of the analysed mesocosm datasets
ID
Type
Chemical
Application
No. of
taxa
No. of
treatmentsa
No. of
samplings
No. replicates
per treatment
No.
replicates
control
Reference
1
Indoor
microcosms
Carbendazim
Repeated
86
6
7
2
2
Cuppen et al. (2000)
2
Outdoor ponds
–b
Single
27
5
12
2
4
pers. comm. Ebke, P.
3
Outdoor ponds
Mix
Repeated
69
3
32
3
4
Auber et al. (2011)c
4
Outdoor ponds
Mix
Repeated
69
3
32
3
4
Auber et al. (2011)c
5
Outdoor streams
Mix
Repeated
46
3
5
2
5
Ippolito et al. (2012)
6
Outdoor streams
Thiacloprid
Repeated
35
4
9
2/4d
10/4d
Liess and Beketov
(2011)
7
Outdoor ditches
Chlorpyrifos
Single
188
5
15
2
4
van den Brink et al.
(1996)
Caquet et al. (2007)
8
Outdoor ponds
Deltamethrin
Single
40
4
21
4/3
5
9
Outdoor ditches
k-Cyhalothrin
Repeated
75
6
4
2
2
Roessink et al. (2005)
10
Outdoor ponds
Thiram
Repeated
55
3
14
2
3
Bayona et al. (2014)c
11
Outdoor ponds
Hydrocarbon
Repeated
60
5
14
2
3
Bayona et al. (2014)c
a
Incl. control
b
Proprietary data
c
Split into two separate analyses, due to experimental design
d
Changed during experiment
(RDA), a multivariate extension of linear regression with
dimension reduction (Legendre and Legendre 2012). For
PRC a RDA model is fitted to the multivariate response
using treatment, time and their interaction as predictors.
However, there is usually little interest in the temporal
change due to overall community development during the
experiment. Therefore, the main effect of time is ‘partialled
out’. The resulting RDA axes (dimensions) display only the
variation that can be explained by treatment and the
treatment 9 time interaction, but do not display the general temporal trend.
PRCs are displayed in a specific diagram, from which
one can obtain three types of information that are of primary ecotoxicological interest (see Supplemental Material
for an overview):
i.
ii.
iii.
Deviations of treated communities from control
Displayed as the mean difference of site scores
between treatment and control on the first axis over
time.
Taxa responsible for the observed treatment-related
differences Species scores on the first axis indicate
the contribution of taxa to the observed pattern.
However, species scores should be interpreted with
caution and a low taxon weight does not necessarily
translate to a small response (Van den Brink and Ter
Braak 1999).
Test of significance Whether the first PRC axis
displays a statistically significant amount of
variation can be tested using permutations (Legendre
et al. 2011).
In a PRC usually only the first axis is explored; however, if responses are complex subsequent PRC axes should
also be explored. This allows a large variety of species
responses to be represented by a limited number of diagrams (Van den Brink and Ter Braak 1998).
PRC preserves the Euclidean distance between samples,
therefore, we ln(Ax ? 1) transformed the abundances,
where the term Ax was chosen to be 2 for the lowest
abundance value (x) greater than zero (Van den Brink et al.
2000). The significance of the overall treatment effect
(main effect and interaction) was tested using permutations
and the first eigenvalue (Van den Brink and Ter Braak
1998). Given that mesocosms are sampled multiple times
during the experiment, samples are not independent, which
we considered by restricting the possible permutations.
To evaluate the significance of the effect at each time
point (e.g. to determine possible community recovery after
a particular time point), separate RDAs using treatment as
predictor were applied to each sampling event. We used
1,000 permutations for all significance tests.
Generalized linear models for multivariate data
Count data, as usually recorded in mesocosm experiments,
are inherently not normally distributed: counts have a
lower bond at zero, are discrete values and show typically a
123
E. Szöcs et al.
mean–variance relationship. Therefore, other distributions
than the normal distribution should be used to model such
kind of data (Stroup 2014). GLMs can model different
types of distributions, the normal distribution being a
special case. Generalized Linear Models for multivariate
data (GLMmv) are the extension of GLMs to a multivariate
response. As demonstrated recently, these may have higher
statistical power compared to multivariate techniques like
RDA that commonly have been used by ecologists (Warton
et al. 2012). GLMmv generalizes the regression model approach of (partial) RDA, as it is also possible to specify a
normal response distribution. However, GLMmv differ
from PRCs in that they cannot perform dimension
reduction.
GLMmv fit separate GLMs to each taxon, where the
error distribution of the response is chosen by the type of
response. To make inference about the (multivariate)
community response these individual species models are
P
combined using the sum-of-Likelihood-Ratios ( LR)
statistic (Warton 2011; Warton et al. 2012). Contributions
of each taxon to the community response can be derived
from the deviance of the univariate GLM and univariate
responses are directly available. Model assumptions of
univariate models can be checked using residual plots (see
Supplemental Material for an overview).
We fitted GLMmv to the dataset using the negative binomial distribution, because it is more flexible than the
Poisson distribution and appropriate for count data (O’Hara
and Kotze 2010). Nested models were compared using a
Likelihood-Ratio test based on permutations to make inference about the community response. The model
y time þ treatment þ time treatment
incorporates a treatment 9 time interaction, therefore, allows the treatment effect to vary over time. For testing any
treatment related effect, this model was compared to a
model including only time as predictor
y time
Moreover, we examined the interaction by testing the
treatment effects for each sampling event. As with PRC
significance was tested using 1,000 restricted permutations.
Aggregated endpoints
Aggregated endpoints summarise the multivariate response
to a single response variable, which can be analysed using
univariate techniques. Sanchez-Bayo and Goka (2012)
studied four community endpoints: total abundance, taxonomic richness, a diversity index (Shannon–Wiener) and a
similarity index (percentage difference, i.e., Bray–Curtis
coefficient). Beketov et al. (2013) showed in a simulation
study that using the total abundance as response variable
123
had better statistical power than using RDA, especially if
inter-replicate variation was high.
Three community endpoints were calculated for each
sampling event and treatment:
i.
ii.
iii.
Total abundance (N) The sum of all individuals.
Additionally, the sum of log-transformed abundances (N (log)) was used, as this down-weights
highly abundant taxa.
Taxa richness (TR) The sum of taxa encountered in
the samples.
Diversity index (H’) Expressed as Shannon–Wiener
index.
Similarity indices were not considered in our study, as
they have already been compared with PRCs by Van den
Brink and Ter Braak (1998) and proven to be less sensitive
for detecting effects.
The overall treatment effect was assessed for these
endpoints by analysis of variance (ANOVA).
Comparison of methods
Performances of GLMmv and aggregated endpoints were
compared to PRCs to assess (a) whether the conclusions
drawn based on statistical significance are in agreement
and (b) their ability to reveal taxon responses. We compare
both by testing the overall treatment effect (treatment ? treatment 9 time interaction) in each study and
testing the treatment effect per sampling event (165 sampling events in 11 studies, see Table 1).
Ecotoxicologists are mainly interested in testing hypotheses and determining the no observed effect concentrations (Brock et al. 2014). Therefore, we based our
comparisons on p values related to the respective hypotheses. Although the use of p values is subject to debate
(e.g. Ellison et al. 2014), they provide a common currency
to compare the three investigated methods, as all methods
have been applied to the same datasets and tested the same
hypotheses.
The underlying characteristics of the real world data are
unknown. For convenience and because it represents the
current standard method in ERA, we defined PRC as the
reference method. We used contingency table statistics to
compare the performances of the other methods to PRCs.
False negative rates (FNR) were calculated, where the results of PRCs were assumed as reference (truth):
FNR ¼ FN=ðTP þ FN Þ
where TP = true positive, i.e., both, PRCs and the compared method showed a statistically significant effect;
FN = false negative, i.e., PRCs showed a statistically
significant effect, but the compared method did not.
Chemical-induced changes in macroinvertebrate communities
Software
Table 2 Mixed-effects modelling of the effects of study properties
on inter-replicate variability
All computations were performed using R [version 3.0.2 on
Linux, 64-bit (R Core Team 2013)]. Linear mixed effect
models were fitted using the lme4 package (Bates et al.
2014). PRCs were calculated using the vegan package
(Oksanen et al. 2013), GLMmv using the mvabund package
(Wang et al. 2012) and restricted permutations were created using the permute package (Simpson 2013).
The data cannot be provided because it is partly proprietary. However, a reproducible R tutorial on how to
analyse mesocosm data of Study no. 7 with the above
mentioned methods is provided in the Supplemental
Material.
Model parameters
b
[95 % CI]
Variance
Fixed effects
Intercept
% zero counts
5.908
-0.67
[5.263, 6.552]
[-0.798, -0.535]
Total no. taxa
3.213
[2.614, 3.811]
Day
0.172
[0.081, 0.263]
No. replicates
0.004
[-0.163, 0.179]
-0.118
[-0.596, 0.360]
Treatment
Random effects
Treatment in study
0.30
Study
0.81
Residual
0.86
Model statistics
Results
Most studies showed comparable inter-replicate variability
(median r^ = 5.08, median absolute deviation r^ = 1.19).
Study no. 7 showed greatest (median r^ = 14.19) and Study
no. 2 (median r^ = 2.89) the smallest variability. r^ decreased as more zero counts were recorded in the samples,
and increased slightly during the time course of the experiments (Table 2). However, the most important factor
determining variability was the total number of taxa
recorded in the studies. The inter-replicate variability increased with the number of taxa recorded. We found no
relationship between variability and the number of replicates as well as between control and treatment replicates
(Table 2 and Supplemental Material).
Results obtained by GLMmv were in agreement with
results from PRCs. Analysing the overall pattern by comparing the full with reduced models resulted in matching
conclusions in 10 of 11 studies (Fig. 1a). For Study no. 5
PRCs showed a statistically significant result, whereas
GLMmv did not. Except for this study, p-values obtained
from GLMmv were equal or lower to those obtained from
PRCs.
GLMmv showed high taxon deviances, where the PRCs
assigned low species scores (Fig. 2). Comparable outcomes
were found for GLMmv and RDA when applied per sampling event (Fig. 1b).
Data aggregation methods did not perform as well as
PRCs. Compared with the outcome of PRCs, FNR for data
aggregation methods were high (mean FNR = 48.6 ±
2.9 %). For total abundance, down-weighting abundant
taxa by using the logarithm improved the performance only
slightly (FNR = 46.5 %). Performance of the three investigated aggregation methods was best when taxa richness was used as the descriptor of macroinvertebrate
communities (FNR = 45.9 %).
R2LMMðmÞ
80.3 %
R2LMMðcÞ
92.2 %
AIC
1,845.6
BIC
1,885.7
N
640
NStudy
11
Ntreatment
47
Predictors have been scaled to zero mean and unit variance. All parameters were estimated using restricted maximum likelihood and
model assumptions were checked visually. Bold values indicate parameters with zero outside the confidence interval
CI confidence interval, R2LMMðmÞ marginal R2LMM , R2LMMðcÞ conditional
R2LMM , AIC Akaike Information Criterion, BIC Bayesian information
criterion
Discussion
Inter-replicate variability
The total number of taxa encountered in the studies was
most strongly associated with inter-replicate variability:
The smaller the number of taxa and/or the greater the
taxonomic resolution of identification, the greater the interreplicate variability. A greater number of taxa is likely
associated with greater community complexity. A
stochastic loss of species may lead to very different communities, depending on species functions and interactions
within the community. This might explain why more
complex communities showed a greater inter-replicate
variability.
More importantly, another explanation is the dispersion
measure. For Euclidean distances r^ increases as more taxa
are recorded within the community, explaining the great
123
E. Szöcs et al.
a
b
Fig. 1 p values analysing the 11 datasets with principal response
curves and multivariate generalized linear models. a p value of
GLMmv versus PRCs assessing overall treatment effect (treatment ? treatment 9 time interaction) in the 11 studies. The point
of equal performance at p = 0.001 comprises studies 1, 2, 6 and 8.
b p values of testing treatment effect per sampling event (165
sampling events in total, see Table 1). Black line represents the 1:1
line, dashed lines show an alpha-level of 0.05. Study ID refers to
Table 1 (Color figure online)
variability of the most species-rich mesocosm study (Study
no. 7). This is also enhanced by a different taxonomic
resolution between studies. A lower taxonomic resolution
leads to aggregation into higher levels and therefore to
fewer taxa in the samples.
Variability decreased as more zero counts were encountered in the samples. This could be attributed to the
underlying measure of multivariate dispersion. The Euclidean distance, as used in PRC, takes the joint absence of
a taxon from two observations into account. For example,
samples having no taxa in common may be more similar
than samples sharing the same taxa. This phenomenon is
widely known as the double-zero problem (Legendre and
Legendre 2012). Therefore, replicates with many zero
counts tend to be more similar and cause reduced variability. Distance measures that ignore double zeros might
be more appropriate in this case (Legendre and Legendre
2012).
Over the experiments time course variability increased
slightly. Similarly, using univariate population level
statistics, Sanderson et al. (2009) found an initial increase
of the variability in micro/mesocosm studies. Knauer et al.
(2005) used multivariate dispersion measures to analyse
inter-replicate variability of zooplankton over a period of
3 years. They found an increase of variability over time
only during 1 year and the variability between years was
greater than within a given year. Similarly, our results
suggest that this observed increase in variability is negligible as compared to other sources of variation (Table 2
and Supplemental Material).
Generalized linear models for multivariate data
123
We found that GLMmv showed a similar performance when
compared to PRCs (Fig. 1). Except for Study no. 5,
GLMmv provided comparable inferences for the overall
treatment effect test. The experimental setup in Study no. 5
was profoundly different from the setups of all other
studies considered: no pre-application data were available,
as samples were taken immediately after treatments (Ippolito et al. 2012). Mesocosm studies usually show a
characteristic pattern: diverging communities directly after
treatment followed by possible recovery during the time
course of the experiment, when exposure has ceased. The
study conducted by Ippolito et al. (2012) lacked this pattern
with a constant treatment effect over time. Therefore, the
interaction term may be ignored in this experiment (no
significant treatment 9 time interaction, p = 0.375).
However, we fitted the same model including the interaction term to each dataset for purpose of comparison.
Omitting the interaction term for Study no. 5 lead to a
statistically significant treatment effect on the communities
(p = 0.012).
An important difference between GLMmv and PRCs is
that the latter implies a dimension reduction. We considered in our study only the first PRC axis (rank 1),
whereas GLMmv work on full rank. Significance tests in
PRCs could be also done in full rank. This difference did
not affect our results, as the first PRC axis displayed the
maximum proportion of explained variance (see Supplemental Material).
Chemical-induced changes in macroinvertebrate communities
Fig. 2 Absolute PRC species scores compared to GLM deviances.
Both represent a measure of species response to treatment. Each point
represents a taxon and both measures represent the contribution of
this taxon to the community response. Deviances and species scores
were standardized within study. Colour indicates whether the
univariate GLMs showed a statistically significant treatment effect
(p values are unadjusted for multiple testing). Black line represents
the 1:1 line. Study ID refers to Table 1 (Color figure online)
Using simulations, Warton et al. (2012) showed that
GLMmv have better power properties than (full-rank) RDA
when error variance of the affected taxa is low (compared
to unaffected taxa). The main reason is that the pseudo-F
statistics in RDA weights each taxon equally, instead of
P
weighing inversely with the error variance, as the
LR
statistic does. Using datasets of mesocosm experiments we
found similar results for PRCs and GLMmv. This could be
explained by the fact that in most studies affected and
unaffected taxa showed a similar error variance.
GLMmv indicated high single taxon deviances, where
PRCs assigned low species scores. Therefore, GLMmv were
able to identify responding taxa that would be missed by
PRCs when only the first PRC axis is taken into account.
This represents an advantage of modelling single taxon
responses. Univariate statistics are directly obtained,
whereas PRCs can only identify taxa that follow the global
pattern. Note, however, that if more than one PRC axis is
considered, more variable response patterns could be
displayed, but this makes also interpretation more complex
(Van den Brink and Ter Braak 1998; Daam et al. 2008).
Moreover, it is recommended that a community-level
analysis with PRCs should always be accompanied by a
univariate analysis of single taxa to ensure that all responses of important taxa have been captured (Giddings
et al. 2002; de Jong et al. 2008; Van den Brink 2013; EFSA
PPR 2013). The difference between full rank and rank 1 fits
is also present for taxon responses. Nevertheless, we used
rank-1 species scores, as this is the preferred statistic used
by ecotoxicologists to assess taxon responses.
The results of PRCs can be displayed in an easy-tointerpret diagram, whereas GLMmv are lacking such a
representation. Nevertheless, a similar graphical representation can be extracted from the results. Taxa can be
P
ordered by their deviance and the LR statistics from perweek tests can be plotted along a time axis which greatly
resembles the response indicated by PRC. GLMmv are
computationally much more demanding than PRCs. The
123
E. Szöcs et al.
computations for a medium sized study (e.g. Study no. 10,
55 taxa and 14 sampling events) took 4 s for PRCs,
whereas it took 18 min for GLMmv (on a Linux machine
with 64-bit, 8 GB RAM and 2.27 GHz).
RDA and PRCs are both based on the Euclidean distance between samples. As previously outlined, this distance measure is sensitive to double zeros in the data.
Therefore, other distance measures that do not take the
joint absence of taxa from samples into account (Asymmetric coefficients) are frequently used to analyse data
from field studies. The mesocosm studies considered here,
partly showed zero counts as high as those observed in field
studies (see Supplemental Material). For example, in the
study conducted by Szöcs et al. (2012) on the effects of
salinisation and pesticides on macroinvertebrate communities in South Australian streams, on average, 78 % of
taxa exhibited zero counts.
Especially when the number of zero counts is high,
asymmetric distance measures may result in a better performance. A PRC-like analysis, using any distance measure, can be performed by distance-based RDA (db-RDA)
(Legendre and Anderson 1999; McArdle and Anderson
2001). In fact, results of db-RDA are identical to those of
RDA when db-RDA is used with Euclidean distances.
Asymmetric distance measures, like the percentage difference coefficient, have already been used to analyse data
from mesocosm experiments, but only as univariate response (e.g. Van den Brink and Ter Braak 1998; SanchezBayo and Goka 2012). Alternatively, special transformations of community data allows using RDA with data
containing many zeros (Legendre and Gallagher 2001).
This method is known as transformation-based RDA (tbRDA). To our knowledge, both methods have not yet been
applied to mesocosm experiments in ecotoxicology.
However, they may have a better performance when zero
counts are high. Hence, their suitability to analyse mesocosm experiments should be further explored.
Liess and Beketov (2011) adapted a trait-based indicator
for mesocosm application (SPEARmesocosm). In this index,
the relative (log-transformed) abundance of species classified as potentially vulnerable is calculated based on a
binary species classification system and can be analysed
using univariate statistical techniques. Notwithstanding
that aggregation of vulnerable taxa may be more powerful,
this could not be tested here, as SPEARmesocosm was developed for lotic systems. However, a similar trait-based
indicator for lentic systems is currently under development
(personal communication; Roucaute, M.; INRA). In the
past, the use of this method has been subject to debates
(Van den Brink and Ter Braak 2012; Liess and Beketov
2012). We note, that instead of analysing classifications of
species based on traits, the (weighted) multivariate trait
response could also be analysed using PRCs or GLMmv.
Aggregation of data results in a loss of information,
since all complexity of the response is aggregated into a
single variable. For instance, the taxon response is essential
for an ecological and mechanistic interpretation of results.
Aggregated endpoints loose this information because taxon
responses cannot be directly derived. By contrast, techniques taking the multivariate nature of the data into account preserve most of the information in the data. They
enable conclusions on the responses of taxa, with GLMmv
providing a clearer overview of responsive taxa. Nevertheless, in situations of high variability among taxa, aggregated endpoints that group taxa by relevant traits that
influence sensitivity and vulnerability to the test item (e.g.
taxonomic group, voltinism, dispersal capacity, trophic
classification), may be favourable, because such a grouping
decreases variability.
A limitation of the current study is that our comparison is
based on 11 empirical mesocosm studies. Further simulation
studies, comparing the methods based on datasets with
known properties that mimic mesocosm data, are needed to
examine the conditions under which the different methods
are most suitable to analyse mesocosm experiments.
Aggregated endpoints
Implications for ecotoxicology
All aggregated endpoints considered here performed considerably poorer than PRCs. Beketov et al. (2013) compared
RDA with t-tests on total abundance by simulation and
concluded that the latter has better power properties. In their
simulation, abundance of all simulated taxa decreased uniformly due to treatment and the proportion of zero counts
varied. However, a uniform decline of taxa is an oversimplification of treatment effects. In mesocosms, it is likely
that some taxa may benefit from treatment because of biotic
interactions or show no effect. Such effects would reduce the
power of the total abundance endpoint, but would be captured by PRCs and GLMmv. This could explain discrepancies between this study and that of Beketov et al. (2013).
123
Data sets frequently encountered in community ecotoxicology are inherently not normally distributed, e.g.,
proportions (bounded between 0 and 1), abundances
(positive integer data) or biomass (strictly positive) (Wang
and Riffel 2011). GLMs offer a powerful and flexible
parametric framework to analyse such non-normal data
(O’Hara and Kotze 2010). However, they are rarely used in
ecotoxicology. For example GLMs are only marginally
discussed in the guideline on statistical analysis of ecotoxicological data (OECD 2006b). Instead, scientists often
use data transformations to achieve normality and variance
homogeneity or use non-parametric tests (Wang and Riffel
Chemical-induced changes in macroinvertebrate communities
2011). GLMs may have considerable advantages, not only
in the multivariate case (Stroup 2014).
A large amount of data are generated for environmental
risk assessment, but only a small fraction is made publicly
available (Schäfer et al. 2013). Most raw data sets used in
this paper were provided only upon personal request and
most requests to industry remained unsuccessful. Note that
data for Study no. 7 is also available as an example data in
the Canoco computer package and the vegan R package.
We argue to not let data go to waste (sensu Poisot et al.
2013) and make data publicly available. The interest of
industry to protect their intellectual property could be
considered by anonymizing properties of the substance.
Ecotoxicology will undoubtedly benefit from open data.
For example, it enhances transparency and, by combining
data sets from different experiments, patterns may emerge
that might be overlooked otherwise.
When dealing with community data, statistical methods
that take the multivariate nature of the data into account are
advisable. Data aggregation methods are, despite their
simplicity, an inappropriate tool to analyse mesocosm data
for ERA. GLMmv are an alternative to the widely used
PRCs, yielding to similar results regarding the community
response and a better indication of responding taxa.
Acknowledgments The authors gratefully acknowledge the contributions from scientists and technicians from the INRA Experimental
Unit of Aquatic Ecology and Ecotoxicology and Ecotoxicology and
Quality and Aquatic Research Group during mesocosm experiments.
These experiments were conducted in programmes funded by the
Interface Recherche-Expertise pour l’Evaluation du Risque en Ecotoxicologie coordinated by the Structure Scientifique Mixte INRA–
Direction Générale de l’Alimentation (Study no. 8), by the Ministry
of Ecology, Sustainable Development and Energy through its ‘‘Pesticides’’ research programme (Study Nos. 3, 4), and by TOTAL S.A.
(Study Nos. 10, 11). A. Auber and Y. Bayona PhDs were funded
through a grant from The Region Bretagne and INRA and by TOTAL
S.A., respectively. The studies conducted at Wageningen UR (Study
nos. 1, 7 and 9) were financially supported by the Dutch Ministry of
Economic Affairs, as well as the contribution of Paul Van den Brink
and Theo Brock to this paper (Project BO-20-002-001). The data sets
of these studies are available upon request.
Conflict of interest
of interest.
The authors declare that they have no conflict
References
Anderson MJ (2006) Distance based tests for homogeneity of
multivariate dispersions. Biometrics 62:245–253
Anderson MJ, Crist TO, Chase JM, Vellend M, Inouye BD, Freestone
AL, Sanders NJ, Cornell HV, Comita LS, Davies KF, Harrison
SP, Kraft NJB, Stegen JC, Swenson NG (2011) Navigating the
multiple meanings of beta diversity: a roadmap for the practicing
ecologist. Ecol Lett 14:19–28
Auber A, Roucaute M, Togola A, Caquet T (2011) Structural and
functional effects of conventional and low pesticide input crop-
protection programs on benthic macroinvertebrate communities
in outdoor pond mesocosms. Ecotoxicology 20:2042–2055
Bates D, Maechler M, Bolker B, Walker S (2014) lme4: linear mixedeffects models using Eigen and S4. R package version 1.1-0.
https://github.com/lme4/lme4/
Bayona Y, Roucaute M, Cailleaud K, Lagadic L, Bassères A, Caquet
T (2014) Effect of thiram and a petroleum distillate on
freshwater macroinvertebrate communities in outdoor stream
and pond mesocosms: I Study design, chemical fate and
structural responses. Ecotoxicology (submitted)
Beketov MA, Kattwinkel M, Liess M (2013) Statistics matter: data
aggregation improves identification of community-level effects
compared to a commonly used multivariate method. Ecotoxicology 22:1–10
Bolker B, Brooks M, Clark C, Geange S, Poulsen J, Stevens M, White
J (2009) Generalized linear mixed models: a practical guide for
ecology and evolution. Trends Ecol Evol 24:127–135
Brock TCM, Arts GHP, ten Hulscher TEM, de Jong FMV, Luttik R,
Roex EWM, Smit CE, van Vliet PJM (2011) Aquatic effect
assessment for plant protection products; Dutch proposal that
addresses the requirements of the Plant Protection Product
Regulation and Water Framework Directive. Alterra Report
2235, Alterra, Wageningen
Brock TCM, Hammers-Wirtz M, Hommen U et al (2014) The
minimum detectable difference (MDD) and the interpretation of
treatment-related effects of pesticides in experimental ecosystems. Environ Sci Pollut Res 22(2):1160–1174. doi:10.1007/
s11356-014-3398-2
Cañedo-Argüelles M, Bundschuh M, Gutiérrez-Cánovas C, Kefford
BJ, Prat N, Trobajo R, Schäfer RB (2014) Effects of repeated salt
pulses on ecosystem structure and functions in a stream
mesocosm. Sci Total Environ 476–477:634–642
Caquet T, Hanson M, Roucaute M, Graham D, Lagadic L (2007)
Influence of isolation on the recovery of pond mesocosms from
the application of an insecticide. II Benthic macroinvertebrate
responses. Environ Toxicol Chem 26:1280–1290
Cuppen JGM, Van den Brink PJ, Camps E, Uil KF, Brock TCM
(2000) Impact of the fungicide carbendazim in freshwater
microcosms. I. Water quality, breakdown of particulate organic
matter and responses of macroinvertebrates. Aquat Toxicol
48:233–250
Daam MA, Van den Brink PJ, Nogueira AJA (2008) Impact of single
and repeated applications of the insecticide chlorpyrifos on
tropical freshwater plankton communities. Ecotoxicology 17:
756–771
De Jong FMW, Brock TCM, Foekema EM, Leeuwangh P (2008)
Guidance for summarizing and evaluating aquatic micro- and
mesocosm studies. RIVM Report 601506009/2008, RIVM
EFSA PPR (2013) Guidance on tiered risk assessment for plant
protection products for aquatic organisms in edge-of-field
surface waters. EFSA Panel on Plant Protection Products and
their Residues (PPR). Parma, Italy. EFSA J 11(7):3290
Ellison AM, Gotelli NJ, Inouye BD, Strong DR (2014) P values,
hypothesis testing, and model selection: it’s déjà vu all over
again. Ecology 95:609–610
Giddings JM, Brock TCM, Heger W, Heimbach F, Maund SJ,
Norman S, Ratte H-T, Schäfers C, Streloke M (eds) (2002)
Community-level aquatic system studies—interpretation criteria
(CLASSIC). SETAC, Pensacola
Ippolito A, Carolli M, Varolo E, Villa S, Vighi M (2012) Evaluating
pesticide effects on freshwater invertebrate communities in
alpine environment: a model ecosystem experiment. Ecotoxicology 21:2051–2067
Knauer K, Maise S, Thoma G, Hommen U, Gonzalez-Valero J (2005)
Long-term variability of zooplankton populations in aquatic
mesocosms. Environ Toxicol Chem 24:1182–1189
123
E. Szöcs et al.
Legendre P, Anderson MJ (1999) Distance-based redundancy
analysis: testing multispecies responses in multifactorial ecological experiments. Ecol Monogr 69:1–24
Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129:271–280
Legendre P, Legendre L (2012) Numerical ecology. Elsevier,
Amsterdam
Legendre P, Oksanen J, ter Braak CJF (2011) Testing the significance
of canonical axes in redundancy analysis. Methods Ecol Evol
2:269–277
Liess M, Beketov M (2011) Traits and stress: keys to identify
community effects of low levels of toxicants in test systems.
Ecotoxicology 20:1328–1340
Liess M, Beketov MA (2012) Rebuttal related to ‘‘Traits and Stress:
Keys to identify community effects of low levels of toxicants in
test systems’’ by Liess and Beketov (2011). Ecotoxicology
21:300–303
McArdle BH, Anderson MJ (2001) Fitting multivariate models to
community data: a comment on distance-based redundancy
analysis. Ecology 82:290–297
Nakagawa S, Schielzeth H (2013) A general and simple method for
obtaining R2 from generalized linear mixed-effects models.
Methods Ecol Evol 4:133–142
Newman MC, Clements WH (2008) Ecotoxicology : a comprehensive
treatment. CRC Press, Boca Raton
O’Hara RB, Kotze DJ (2010) Do not log-transform count data.
Methods Ecol Evol 1:118–122
OECD (2006a) Guidance document on simulated freshwater lentic
field tests (outdoor microcosms and mesocosms). No. 53 in
Series on testing and assessment. OECD, Paris
OECD (2006b) current approaches in the statistical analysis of
ecotoxicity data: a guidance to application. No. 54 in Series on
testing and assessment. OECD, Paris
Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara
RB, Simpson GL, Solymos P, Stevens MHH, Wagner H (2013)
Vegan: Community Ecology Package. R package version 2.1-41.
http://vegan.r-forge.r-project.org/
Poisot TE, Mounce R, Gravel D (2013) Moving toward a sustainable
ecological science: don’t let data go to waste! Ideas Ecol Evol
6:11–19
Roessink I, Arts GHP, Belgers JDM, Bransen F, Maund SJ, Brock
TCM (2005) Effects of lambda-cyhalothrin in two ditch
microcosm systems of different trophic status. Environ Toxicol
Chem 24:1684–1696
Sanchez-Bayo F, Goka K (2012) Evaluation of suitable endpoints for
assessing the impacts of toxicants at the community level.
Ecotoxicology 21:667–680
Sanderson H, Laird B, Brain R, Wilson CJ, Solomon KR (2009)
Detectability of fifteen aquatic micro/mesocosms. Ecotoxicology
18:838–845
123
Schäfer RB, Bundschuh M, Focks A, von der Ohe PC (2013) Letter to
the Editor. Environ Toxicol Chem 32:734–735
Simpson GL (2013) Permute: functions for generating restricted
permutations of data. R package version 0.8-0. http://CRAN.Rproject.org/package=permute
Stroup WW (2014) Rethinking the analysis of non-normal data in
plant and soil science. Agron J 106:1–17
Szöcs E, Kefford BJ, Schäfer RB (2012) Is there an interaction of the
effects of salinity and pesticides on the community structure of
macroinvertebrates? Sci Total Environ 437:121–126
R Core Team (2013) R: a language and environment for statistical
computing. Vienna, Austria. http://www.R-project.org/
Van den Brink PJ (2013) Assessing aquatic copulation and community-level risks of pesticides. Environ Toxicol Chem 32:972–973
Van den Brink PJ, ter Braak CJF (1998) Multivariate analysis of
stress in experimental ecosystems by principal response curves
and similarity analysis. Aquat Ecol 32:163–178
Van den Brink PJ, ter Braak CJF (1999) Principal response curves:
analysis of time-dependent multivariate responses of biological
community to stress. Environ Toxicol Chem 18:138–148
Van den Brink PJ, ter Braak CJF (2012) Response to ‘‘Traits and
stress: keys to identify community effects of low levels of
toxicants in test systems’’ by Liess and Beketov (2011).
Ecotoxicology 21:297–299
Van den Brink PJ, van Wijngaarden RPA, Lucassen WGH, Brock
TCM, Leeuwangh P (1996) Effects of the insecticide Dursban
4E (active ingredient chlorpyrifos) in outdoor experimental
ditches: II. Invertebrate community responses and recovery.
Environ Toxicol Chem 15:1143–1153
Van den Brink PJ, Hattink J, Bransen F, van Donk E, Brock TCM
(2000) Impact of the fungicide carbendazim in freshwater
microcosms. II. Zooplankton, primary producers and final
conclusions. Aquat Toxicol 48:251–264
Van Leeuwen CJ, Vermeire TG (2007) Risk assessment of chemicals:
an introduction. Springer, Netherlands
Wang M, Riffel M (2011) Making the right conclusions based on
wrong results and small sample sizes: interpretation of statistical
tests in ecotoxicology. Ecotoxicol Environ Saf 74:684–692
Wang Y, Naumann U, Wright ST, Warton DI (2012) mvabund—an R
package for model-based analysis of multivariate abundance
data. Methods Ecol Evol 3:471–474
Warton D (2011) Regularized sandwich estimators for analysis of
high-dimensional data using generalized estimating equations.
Biometrics 67:116–123
Warton DI, Wright ST, Wang Y (2012) Distance-based multivariate
analyses confound location and dispersion effects. Methods Ecol
Evol 3:89–101
Zuur A, Ieno E, Elphick C (2010) A protocol for data exploration to
avoid common statistical problems. Methods Ecol Evol 1:3–14