Cerebral Cortex June 2011;21:1223--1230
doi:10.1093/cercor/bhq233
Advance Access publication November 10, 2010
FEATURE ARTICLE
Endogenous Auditory Spatial Attention Modulates Obligatory Sensory Activity in Auditory
Cortex
Alan J. Power1, Edmund C. Lalor1,2 and Richard B. Reilly1,2
Trinity Centre for Bioengineering and 2Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
1
Address correspondence to Edmund C. Lalor. Email: edlalor@tcd.ie.
Keywords: AESPA, auditory evoked potential, continuous and competing
stimuli, dichotic listening
Introduction
In natural everyday environments, we are typically bombarded
with a convoluted mixture of competing sound sources. The
auditory system has evolved, however, to be able to complete
the highly complex task of allowing us to focus, with relative
ease, on a constituent source among the combination of
sources reaching the ears. There are 2 broad classes of
attentional mechanisms: exogenous attention and endogenous
attention (e.g., Hopfinger and West 2006). Exogenous attention
is the attraction of attentional focus by an environmentally
salient stimulus, whereas endogenous attention is the selfdirected focus of attention to a particular region or feature of
the environment. Extensive research investigating the intriguing phenomenon of endogenous auditory attention has been
carried out using the electroencephalogram (EEG) and the
magnetoencephalogram (Hillyard et al. 1973; Näätänen et al.
1992; Woldorff et al. 1993). These studies throw up an
interesting debate as to the generators affected by endogenous
attention. Hillyard et al. (1973), in their seminal study, found
the N1 wave of the auditory evoked potential (AEP) to be
enhanced by auditory selective attention. They suggested that
this was due to increased sensory processing of the attended
stimulus. This view is supported by others (Woldorff et al.
Ó The Author 2010. Published by Oxford University Press. All rights reserved.
For permissions, please e-mail: journals.permissions@oup.com
1993). However, the idea that obligatory sensory components
of the AEP are affected by endogenous attention has been
challenged by Näätänen and Picton (1987) and Näätänen et al.
(1992). They suggest that the N1 enhancement is likely due to
the engagement or enhancement of separate components of
the N1 wave that are not related to obligatory processes but to
a matching process between the sensory input and an
‘‘attentional trace.’’ Although they suggest that under some
conditions, obligatory components may possibly be affected by
endogenous attention (Näätänen and Picton 1987), they remain
somewhat skeptical about obligatory sensory involvement in
endogenous auditory attention (Näätänen et al. 1992; Alho et al.
1994). Research into this question is ongoing (Bidet-Caulet
et al. 2007; Ross et al. 2010).
The majority of EEG studies investigating endogenous
auditory attention, such as those mentioned above, have
employed the standard event-related potential (ERP) technique. These studies, although informative, are somewhat
hampered by the fact that they use discrete stimuli that may
not be best suited to the investigation of endogenous attention.
Specifically, the assertion that attentional effects to discrete
onset stimuli are entirely due to endogenous top-down
attention is complicated by the exogenous attention grabbing
effects of onset stimuli. It has been suggested, in the visual
domain, that endogenous and exogenous attention may be
separate systems (Hopfinger and West 2006). Whether these
proposed separate systems affect sensory processing in the
same manner, however, is still unknown, as is how they interact
when both are engaged. That said, despite the fact that the
attention grabbing nature of discrete stimuli is well established
in both visual (Jonides 1981) and auditory (Escera et al. 2000)
modalities there have been very few studies that attempt to
eliminate the possible confounding exogenous attentional
influence on the investigation of endogenous attention.
The continuous nature of steady-state response (SSR) stimuli
likely minimizes the influence of exogenous attention on the
SSRs. Studies employing SSR to investigate sustained auditory
attention have had mixed results, however: One study using
simultaneously presented click trains presented at a rate of
either 37 or 40 Hz found a transient N1 effect to stimulus onset
but no steady-state effect (Linden et al. 1987), which suggests
that sustained responses are not affected by endogenous
attention. A more recent study using amplitude modulated
tone bursts of 800 ms duration, however, found a steady-state
effect but no transient N1 effect (Ross et al. 2004), which
suggests that sustained responses are affected by endogenous
attention. The latter study did not employ a stimulus competition design however and used a visual task in the unattended
condition. Thus, we are unable to discern, from this
Downloaded from https://academic.oup.com/cercor/article-abstract/21/6/1223/349063 by guest on 30 October 2018
Endogenous attention is the self-directed focus of attention to
a region or feature of the environment. In this study, we assess the
effects of endogenous attention on temporally detailed responses to
continuous and competing auditory stimuli obtained using the novel
auditory evoked spread spectrum analysis (AESPA) method. There
is some debate as to whether an enhancement of sensory
processing is involved in endogenous attention. It has been
suggested that attentional effects are not due to increased sensory
activity but are due to engagement of separate temporally
overlapping nonsensory attention-related activity. There are also
issues with the fact that the influence of exogenous attention
grabbing mechanisms may hamper studies of endogenous attention.
Due to the nature of the AESPA method, the obtained responses
represent activity directly related to the stimulus envelope and thus
predominantly correspond to obligatory sensory processing. In
addition, the continuous nature of the stimuli minimizes exogenous
attentional influence. We found attentional modulations at ~136 ms
(during the Nc component of the AESPA response) and localized
this to auditory cortex. Although the involvement of separate
nonsensory attentional centers cannot be ruled out, these findings
clearly demonstrate that endogenous attention does modulate
obligatory sensory activity in auditory cortex.
Figure 1. An AESPA response obtained, at Fz, using a modulated broadband noise
carrier with equal power in the range 0--22.05 kHz.
1224 Endogenous Auditory Spatial Attention
d
Power et al.
continuous auditory streams concurrently, one to the left and
one to the right ear, and subjects are instructed to attend either
to the left or right.
Materials and Methods
Subjects and Data Acquisition
Seventeen subjects aged 21--33 (15 males) participated in the study.
The experiment was undertaken in accordance with the Declaration of
Helsinki. The Ethics Committee of the School of Psychology at Trinity
College Dublin approved the experimental procedures, and each
subject provided written informed consent. Subjects reported no
history of hearing impairment or neurological disorder. EEG data were
recorded from 130 electrode positions, filtered over the range 0--134
Hz, and digitized at the rate of 512 Hz using a BioSemi Active Two
system. EEG data were then digitally filtered off-line with a high-pass
filter, where the passband was above 2 Hz and with a –60 dB response at
1 Hz and a low-pass filter with passband below 35 Hz and a –50 dB
response at 45 Hz. The data at each channel were re-referenced to the
average of the responses at the left and right mastoids.
Stimuli
The AESPA stimulus consists of a carrier stimulus amplitude modulated
by a spread spectrum signal. In this case, 2 root mean square (RMS)
normalized band-pass noise carriers of bandwidth 1 kHz centered at 1
kHz (LOW stream) and 5 kHz (HIGH stream), respectively, were
employed. These center frequencies were chosen on the basis that 1
kHz and 5 kHz tones are perceived with approximately the same
loudness (ISO:226) and also because they are far enough apart in
frequency that they are perceived separately. These carriers were then
modulated by independent spread spectrum signals, and the low- and
high-frequency streams were concurrently presented to the left and
right ears, respectively (Fig. 2a,b,c). The reason that the stimuli were
separated in location as well as carrier stimulus center frequency is
because, 1) it is spatial attention that is under investigation and 2) if
both the left and right streams had identical carriers, the stimuli may
have been perceived as 1 auditory object varying in interaural intensity
difference as opposed to 2 spatially separate objects.
Figure 2. The stimulation setup. (a) LOW and HIGH streams were presented
dichotically. A segment of the (b) LOW and (c) HIGH frequency carrier stimuli,
respectively. Examples of (d) target and (e) distracter events.
Downloaded from https://academic.oup.com/cercor/article-abstract/21/6/1223/349063 by guest on 30 October 2018
experimental protocol, whether one competing stimulus
would result in increased activity relative to another. The
somewhat contradictory results suggest that the effects of
sustained endogenous auditory attention on competitive
continuous stimuli have yet to be adequately elucidated.
A further issue relates to the simultaneity of stimuli in most
ERP-based studies. In such studies, the discrete stimuli are not
presented truly simultaneously in the attended and unattended
channels (Hillyard et al. 1973; Näätänen et al. 1992). Thus, the
argument that the effects found are due to competing stimuli is
somewhat weakened. In an attempt to overcome this
drawback, paradigms where simultaneous stimulation is
employed have been carried out. A recent study assessed the
transient onset responses of subjects who were asked to detect
an occasional change in modulation frequency of amplitude
modulated sounds presented to one ear while ignoring
concurrent sounds in the other ear (Ross et al. 2010).
Interpretation of these results, however, is complicated by
the fact that any increase in the transient response to the
attended stimulus would be superimposed on the unaffected
(or even possibly inhibited) response to the unattended
stimulus perhaps diluting any attention effect. Indeed,
researchers investigating auditory scene analysis have been
forced to go to great lengths to create the perception of
auditory streaming using discrete stimuli (Sussman et al. 1999;
Ritter et al. 2006; De Sanctis et al. 2008). Recently, however,
there has been a move toward employing more natural stimulus
paradigms to assess auditory function (Lalor et al. 2009; Kerlin
et al. 2010; Lalor and Foxe 2010). Lalor and Foxe (2010)
obtained temporally detailed responses to natural speech
stimuli and Kerlin et al. (2010) found attentional modulation
of activity in auditory cortex (AC) when using competing
speech stimuli and a template-matching analysis method.
In an attempt to further address some of the concerns
outlined above, we employ a novel method for obtaining
temporally detailed responses to continuous stimuli: The
auditory evoked spread spectrum analysis (AESPA) method
(Lalor et al. 2009). Figure 1 shows a typical AESPA response
obtained using a continuous amplitude modulated broadband
noise stimulus. The AESPA response represents obligatory
sensory activity directly related to the input stimulus (Lalor
et al. 2009). The AESPA method also allows for the extraction of
separate responses to simultaneously presented continuous
stimuli. Furthermore, the continuous nature of the stimuli
minimizes exogenous attentional influence. Exploiting this
method, we employed a cocktail party-like experimental
approach to investigate the effects of endogenous attention
on sensory processing in the auditory system. We present 2
The spread spectrum signals consisted of Gaussian noise with energy
uniformly distributed between 0 and 30 Hz. Taking into account, the
logarithmic nature of auditory stimulus intensity perception, the values
of these modulating signals, x, were then mapped to the amplitude of
the audio stimulus, x#, using the following exponential relationship:
x# = 102x ;
and normalized to between 0 and 1. It was expected that this would
result in a more linear perception of audio intensity modulation.
Transitions between levels were smoothed by using a 5-ms ramp
consisting of half a period of a 100-Hz sine wave. Using the Nyquist
sampling theorem and given that EEG power above 30 Hz is low, the
modulation rate of each signal was set to be 60 Hz.
Signal Processing
We obtain the AESPA by performing a linear least squares fit of the
response model
Quantification of Results
When calculating task performance any response occurring within a 1 s
period after an event was considered to be a response to that event. We
calculated the percentage of correct responses, percentage of
responses to distracters in the attended stream, and also percentage
of responses to events in the to-be ignored stream, which are shown in
Table 1. To test the behavioral results statistically, they were submitted
to a 2 3 2 repeated measures analysis of variance (ANOVA) using
factors of stimulus (LOW-LEFT vs. HIGH-RIGHT) and event in the
attended stream (target vs. distracter).
The Global Field Power (GFP, Lehmann and Skrandies 1980) was
obtained for the grand average responses to each stimulus condition.
The GFP is a reference free measure of the response power over the
whole scalp. The GFP was used for preliminary visualization of the data
and to indicate possible periods of attentional modulation of the
responses. The periods of interest were obtained by submitting the
GFPs to running t-tests. A component was considered to be of interest if
responses were significantly different (P < 0.01) for a period of at least
11 consecutive samples (~20 ms). In order to investigate how different
areas are affected by attention, we also partitioned the scalp into 4
regions (Frontal, Central, Left Temporal, and Right Temporal) and
averaged over responses at electrodes within each region. Later, AESPA
components (i.e., Nc and Pd) have been shown to have very low signalto-noise ratio in parietal--occipital regions (Lalor et al. 2009), and thus
this region was not included in analysis. Figure 3 shows the responses
in the included regions as well as the GFPs for the HIGH-RIGHT and
LOW-LEFT responses when attended and unattended. In the GFP plots,
the areas identified as being of significant interest by the running t-tests
are shaded gray. Topographic maps of the Nc component when
attended and unattended are also shown, as are difference topographies.
Statistical differences in components identified as being possibly
effected by attention were tested using a repeated measures ANOVA.
The RMS values in a ~10 ms window around the component peak were
used in the statistical analysis. Topographic maps of affected
components were also obtained using the average amplitude in the
relevant windows. Component topographies were compared using the
topographic ANOVA (TANOVA) method (Murray et al. 2008). This
method assesses whether 2 topographies are statistically different using
a nonparametric randomization procedure. Given that topographic
maps reflect source configuration, this was done in order to assess
whether the attentional modulation of any component was due to an
engagement of additional generators or whether it was merely due to
enhanced activity of generators already engaged in the unattended
condition (i.e., if topographies are not statistically different, then we
can assert that they are likely due to the same generators). A further
more detailed investigation of the component generators was also
carried out using the BESA (5.2) source analysis package.
Table 1
Behavioral results
yðt Þ = wðsÞ xðt Þ + noise;
where y(t) is the measured EEG response, x(t) is the amplitude
envelope of the stimulus, the symbol * indicates convolution, w(s) is
the impulse-response function to the amplitude of the stimulus, and the
noise is assumed to be Gaussian (Lalor et al. 2009). In other words,
the AESPA response, w (s), is analogous to a filter that describes how
the brain transforms the auditory input into the EEG output. Keeping
p(T) ± standard deviation%
p(D) ± standard deviation%
p(TBI) ± standard deviation%
LOW-LEFT
HIGH-RIGHT
82.89 ± 11.59
57.83 ± 26.81
1.79 ± 1.29
86.75 ± 9.35
65.41 ± 26.84
1.41 ± 1.07
Note: Percentage of correct responses (p(T)), percentage of responses to distracters in the
attended stream (p(D)), and percentage of events responded to in the to-be-ignored stream
(p(TBI)) are shown for each condition.
Cerebral Cortex June 2011, V 21 N 6 1225
Downloaded from https://academic.oup.com/cercor/article-abstract/21/6/1223/349063 by guest on 30 October 2018
Experimental Procedure and Tasks
While abstaining completely from eyeblinks is not possible for long
periods, subjects were instructed to keep the number of eyeblinks to
a minimum during each session. Subjects were also instructed to keep
all other types of motor activity to a minimum during testing. Testing
was carried out in a dark room, and in order to minimize eye
movements, subjects were asked to keep their eyes open and to fixate
on a small cross presented in the center of their visual field.
Each subject undertook 10 trials where they were asked to attend to
the HIGH stream in the right ear (attend-HIGH-RIGHT condition) and
10 trials where they were asked to attend to the LOW stream in the left
ear (attend-LOW-LEFT condition). Each trial was 120 s in duration. The
sequence of conditions was randomized for each subject. Stimuli were
presented at an intensity level deemed comfortable by the subject
before beginning the experiment.
In order to monitor each subject’s progress, targets and distracters
were inserted randomly in each stream. These events consisted of
a specific pattern of amplitude modulation imposed on the random
process. Targets consisted of a modulation level of –2.5 dBfs for 25.5 ms
followed by –12 dBfs for 16 ms followed by –2.5 dBfs for 25.5 ms, giving
a total length of 67 ms, whereas distracters consisted of a flat
modulation of –6 dBfs for 67 ms. dBfs refers to decibels full scale and
represents a dB value relative to the maximum modulation level for
each subject (see Fig. 2d,e). Although the events are embedded in the
stimulus, they are still distinguishable from the ongoing amplitude
modulations. This is due to the fact that the events are generally
somewhat louder than the ongoing modulation. The reason for this is
because of the exponential mapping outlined above that restricts the
modulating waveform to spend ~90% of its time below –12 dBfs.
Furthermore, due to the random nature of the modulating waveform,
nonevent-related amplitudes exceeding –12 dBfs generally have a short
duration compared with the 67 ms duration of events.
Subjects were instructed to respond only when a target in the
attended stream was heard. Each trial contained a total of 24 events
(i.e., both targets and distracters). The proportion of targets and
distracters in each trial was randomly assigned ranging from 8 targets
(and therefore 16 distracters) to 16 targets (and therefore 8
distracters). On average, 48.75% of events were targets and 51.25%
were distracters. No event, either within or between streams, could
occur within 1 s of another and the maximum separation between
events within streams could not be more than 9 s. EEG was recorded
for later analysis where both the responses to the HIGH-RIGHT and
LOW-LEFT streams for each condition were extracted. The stimulation
paradigm and the stimuli are outlined in Figure 2.
this in mind, the time axis for the AESPA carries a different meaning
than the time axes in traditional ERP studies. Each point on the time
axis can be interpreted as being the relative time between the
continuous EEG and the continuous input intensity signal. Therefore,
the AESPA at –100 ms, for example, indexes the relationship between
the input intensity signal and the EEG 100 ms earlier; obviously this
should be 0. As another example, the AESPA at 100 ms indexes how the
input intensity signal affects the EEG 100 ms later.
Results
Behavioral Results
In order to keep subjects highly engaged in the attentional task,
discriminating between targets and distracters in the attended
stream was deliberately made difficult. The difficulty of the task
was evidenced by the relatively high number of distracters to
which subjects responded (see Table 1). However, the fact that
subjects responded to a significantly greater percentage of
targets than distracters (main effect of event, F1,16 = 28.088, P <
0.001) shows that subjects were capable of performing the
task. Furthermore, the very low percentage of events
responded to in the ignored stream indicates that attention
1226 Endogenous Auditory Spatial Attention
d
Power et al.
to the intended stream was taking place. Despite the fact that
we found a significant main effect of stimulus (F1,16 = 14.203,
P = 0.002), due to higher number of event responses when the
right ear was attended than when the left ear was attended,
there was no stimulus 3 event interaction (P > 0.05). This
suggests the ability to carry out the task (i.e., to distinguish
between targets and distracters) did not differ depending on
which ear was attended.
Electrophysiology Results
Based on the running t-tests carried out on the GFPs, we
identified the Nc and Pd components as being of particular
interest for further investigation. Nc and Pd were defined as the
Downloaded from https://academic.oup.com/cercor/article-abstract/21/6/1223/349063 by guest on 30 October 2018
Figure 3. Responses for indicated regions and GFPs for the stimuli presented at the left and right ears when attended and unattended (upper panels). Time intervals of interest
identified by running t-tests are shaded gray in the GFPs. Topographic maps of the Nc component when attended and unattended as well as difference topographies are shown in
lower panel.
stimulus type (P > 0.05 for both). This tells us that there was
no hemispheric bias due to the spatial nature of the stimuli.
The possibility that increased activity in the Nc component
window may be due to an engagement of additional nonobligatory generators and not increased activity of the sensory
activity was investigated using the TANOVA method (Murray
et al. 2008). Topographies in attended and unattended
conditions for the LOW-LEFT response were not found to be
statistically different (LOW-LEFT attended vs. unattended: P =
0.5855). This suggested that the same generators were involved
in both attended and unattended conditions for the responses
to the LOW-LEFT stimulus. In the case of the HIGH-RIGHT
responses, however, the TANOVA did suggest that the topographies were statistically different between conditions (P =
0.0147). That said, the TANOVAs did not indicate statistically
dissimilar topographies between the LOW-LEFT unattended
and HIGH-RIGHT unattended responses (P = 0.3365) or
between the LOW-LEFT attended and HIGH-RIGHT attended
responses (P = 0.0874) suggesting that regardless of stimulus
similar generators were engaged.
To further investigate the location and strength of the Nc
generators and the possible attention affects, a dipole analysis
was carried out on the Nc component. Starting with the Nc
component of the unattended LOW-LEFT responses, we
attempted to fit 2 symmetrical regional sources (a regional
source in BESA consists of 3 orthogonal dipoles at the same
location). Fitting the sources to a window encompassing the
whole Nc component (111--154 ms: identified from the GFPs)
placed the sources at Talairach coordinates x = –32.6, y = –16.8,
z = 13.6. This configuration accounted for 98.88% of the
variance in the fitted window, and the sources are located
within 1 cm of Heschl’s gyrus (HG). In fact fixing the source
locations to the center of the auditory core (at talairach x = –46,
y = –24, z = 12) only slightly reduced the variance explained to
98.12%. Applying this same model to the attended LOW-LEFT
responses accounted for 96.76% of the variance. Thus, bilateral
sources in AC provided an excellent model of the Nc
component in both the attended and unattended conditions.
This coupled with the insignificant dissimilarity in topographies shown by the TANOVA indicates that the same sources
were involved when the stimulus was attended and unattended. Following the same procedure for the HIGH-RIGHT
responses resulted in an initial localization of the unattended
Nc to talairach coordinates x = –37.7, y = –24.9, z = 18.5 with
97.73% of the variance explained. Again this is within 1 cm of
HG and fixing the sources to the center of the auditory core as
before only slightly reduced the variance explained to 97.24%.
Applying this model to the attended HIGH-RIGHT responses
accounted for 93.36% of the variance. Although the model
accounted for slightly lower variance than in the case of the
LOW-LEFT stimuli bilateral sources in AC again provided an
excellent model of the Nc component. Employing the same
model for the LOW-LEFT and HIGH-RIGHT responses is backed
up by the TANOVA results, which suggest that both the LOWLEFT and HIGH-RIGHT stimuli employ similar generators when
unattended as well as when attended. Indeed, it is possible that
the lower signal power of the HIGH-RIGHT responses (indicated by the main effect of stimulus mentioned above)
may have played a part in TANOVA results that suggested
topographical differences between HIGH-RIGHT conditions.
These noisier responses may also account for the slightly lower
explained variance in the BESA model. Thus, the fact that
Cerebral Cortex June 2011, V 21 N 6 1227
Downloaded from https://academic.oup.com/cercor/article-abstract/21/6/1223/349063 by guest on 30 October 2018
RMS amplitude in a ~10 ms interval around the mean latency of
each component of the grand averages of the LOW-LEFT and
HIGH-RIGHT responses (i.e., Nc: 136 ms and Pd: 208 ms, see
Fig. 3). In order to test the effects of attention for each
stimulation condition, we performed a 4-way 2 3 2 3 4 3 2
repeated measures ANOVA using factors of stimulation (LOWLEFT stream vs. HIGH-RIGHT stream), attention (Attended vs.
Unattended), electrode region (left, right, central, and frontal),
and component (Nc vs. Pd). Greenhouse--Geisser corrections
were applied to the repeated measures factors where the
sphericity assumption was violated with the corrected degrees
of freedom reported.
First and most importantly, there was a main affect of
attention (F1,16 = 27.75, P < 0.001), indicating that components
of the responses to the attended stream were enhanced. There
was also a significant effect of stimulus (F1,16 = 54.004, P <
0.001). This is due to the fact that the later cortical responses
(i.e., Nc and Pd) to the LOW-LEFT stream are greater in
amplitude than responses to the HIGH-RIGHT stream. This is
likely due to the logarithmic nature of frequency representation in auditory cortex, that is, the higher the frequency the
smaller the amount of cortex devoted to it (Romani et al. 1982).
Using a wider carrier stimulus bandwidth for the higher
frequency stream may result in more similar HIGH and LOW
responses. There was no stimulus 3 attention interaction (P >
0.05), suggesting that both the HIGH-RIGHT and LOW-LEFT
streams were similarly affected by attention. A significant
attention 3 region 3 component interaction (F1.98,31.71 = 7.636,
P = 0.002) was found. To further interrogate the components
driving this interaction, we performed separate 2-way 2 3 4
repeated measures ANOVAs on Nc and Pd with factors of
attention (attended vs. unattended) and region (left, right,
frontal, and central). In the case of Nc, we found a significant
effect of attention (F1,16 = 20.28, P < 0.001) as well as
a significant attention 3 region interaction (F3,48 = 3.44, P =
0.024). In the case of Pd, the effect of attention was not
significant (P > 0.05), although there was an attention 3 region
interaction (F3,48 = 7.41, P < 0.001). In order to ascertain the
regions driving these interactions, post hoc t-test were carried
out and identified the Nc effect to be driven by attention in left
(t16 = 5.21, P < 0.001), right (t16 = 4.1, P = 0.001), central (t16 =
4.6, P < 0.001), and frontal (t16 = 2.4, P = 0.027) regions. The Pd
interaction was driven by an effect in the frontal region (t16 =
2.4, P = 0.027). Employing bonferonni correction, however,
resulted in a significant Nc effect only in left, right, and central
regions but no Pd effect. This suggests that the Pd effect
indicated by the running t-test performed on the GFP is
marginal, whereas the Nc effect is robust. This is further
indicated by an attention 3 component interaction in the initial
4-way ANOVA that approached significance (F1,16 = 3.138, P =
0.096).
Just because both left and right regions are similarly affected
by attention does not rule out the possibility that responses
may be biased to one hemisphere over the other. Since it has
been suggested that spatial processing may by lateralized
(Zatorre and Penhune 2001; Spierer et al. 2009), we sought
investigate whether this was the case here. The initial 4-way
ANOVA resulted in a stimulus 3 region interaction (F3,48 =
10.02, P < 0.001). This allowed us to perform post hoc t-tests
on the LEFT-LOW and HIGH-RIGHT responses to inspect the
regional differences driving this interaction. We found no
difference between the right and left regions for either
the responses with the higher signal-to-noise ratio (i.e., the
responses to the LOW-LEFT stimulus) did not result in
statistically significant topographical differences combined
with the fact that the Nc component in all stimulus conditions
is well explained by sources located in AC suggests that the Nc
modulation results from an enhancement of the obligatory
sensory activity in AC and not the engagement of supplementary nonobligatory activity.
Discussion
1228 Endogenous Auditory Spatial Attention
d
Power et al.
Downloaded from https://academic.oup.com/cercor/article-abstract/21/6/1223/349063 by guest on 30 October 2018
In this study, we used continuous and simultaneous competing
stimuli to assess endogenous auditory attention in a cocktail
party-like environment. The use of continuous stimuli has
eliminated possible confounding effects associated with
exogenous attention and discrete stimulation. We found
a strong attentional effect of the Nc component in left and
right hemispheres as well as central areas. Since the AESPA
response primarily represents obligatory sensory processing
and since Nc generators have been localized to AC, we have
shown that sensory processing in AC is modulated by
endogenous top-down attention.
Our results are at odds with Näätänen’s attentional trace
hypothesis (Näätänen 1982), which suggests that obligatory
sensory components are not affected by attention. He proposes
that attention acts by way of an endogenous processing
negativity (PN), which is due to a comparison process between
the neural representation of the stimulus and the relevant
attentional trace. This PN overlaps and is superimposed on true
sensory processes giving the impression of modulation of
sensory activity in many cases. Näätänen et al. (1992) do
concede that in some instances simultaneous effects on
sensory activity cannot be ruled out entirely due to the
inability of current methods to distinguish between obligatory
sensory activity and overlapping voluntary activity. They remain
skeptical, however, of the involvement of sensory processes in
attention effects (Näätänen et al. 1992). Although we have
isolated an endogenous attention effect on obligatory sensory
processes, due to the nature of AESPA responses we are
precluded from investigating the undeniably important voluntary components, such as the PN, which are not well
synchronized to stimulus fluctuations. Our results do agree
with Hillyard et al. (1973), however, who suggests that sensory
processes are affected by attention. The results also agree with
the findings of Woldorff et al. (1993), who found attentional
modulation of activity localized to AC in the ranges 20--50 ms
and 80--130 ms and argue that these effects are due to
attentional modulation of sensory processes.
Recently, evidence has been emerging that the majority of
sound feature processing is achieved subcortically and that AC
represents sounds in terms of auditory objects (Nelken 2004).
Furthermore, it has been suggested that AC is involved in
sensory memory (Näätänen and Winkler 1999; Näätänen 2001;
Ulanovsky et al. 2003). A suggested mechanism for sensory
memory is stimulus-specific adaptation (SSA; Ulanovsky et al.
2003), which has been posited as a possible neuronal correlate
both of the decreased N1 to repeated stimuli and of the
mismatch negativity. SSA is the process by which neurons
decrease their responses to sequences of identical stimuli, i.e.,
the activity of neurons is affected by stimulus history.
Furthermore, activity of sustained responses has been shown
to be significantly affected by SSA (Ulanovsky et al. 2003) and
thus, due to the continuous nature of the AESPA stimulus, it is
likely that our responses primarily represent activity of the
subset of neurons that are least susceptible to SSA. That is to say
those cells most susceptible to SSA would contribute minimal
activity in response to a continuous stimulus, whereas those
least susceptible to SSA, that is, the neurons least involved in
sensory memory and most involved in feature processing,
would contribute most to the response. This would suggest
that the attention effect found here is due to enhanced feature
processing and not related to sensory memory representation.
Also, the fact that SSA has been shown to be more prominent
for sustained responses than for onset responses of neurons in
primary AC (Ulanovsky et al. 2003) may account for the smaller
size of the Nc components relative to the AEP-N1 component
(Lalor et al. 2009).
Although we only found attentional effects around 136 ms, it
is possible that earlier AESPA components, especially the Nb
and Pc components, may be affected by attention and but these
components are somewhat ill defined. This may be due to the
frequency content of the carrier stimuli: Previously, these
components were seen to be ill defined when a 1 kHz tone was
used as the carrier stimulus as opposed to broadband noise
(Lalor et al. 2009). A wider carrier stimulus bandwidth may
allow for a more detailed investigation of these components.
The lack of an early effect on well-defined components such as
Pa may be due to the nature of the task employed here. It has
been shown that the locus of attention is flexible and varies
depending on the processing stage most heavily loaded by the
task in question (Vogel et al. 2005; Kelly et al. 2008). We
employed a difficult event discrimination task (i.e., discrimination between targets and distracters), which is likely to load
later processes as opposed to simpler frequency deviant
identification tasks (e.g., Woldorff et al. 1993).
Recent efforts at assessing attention to simultaneously
presented stimuli have looked at transient responses to the
onset of amplitude-modulated sounds (Ross et al. 2010).
Attentional modulation was found as early as 143 ms and was
localized to AC. However, whether this is due to increased
sensory activity or a separate endogenous effect is not clear.
Furthermore, whether this effect has been diluted by unaffected or inhibited responses to the simultaneous unattended
stimulus is unclear. Thus, not only has employing the AESPA
method allowed for the investigation of endogenous attention
effects on obligatory sensory processes that are unaffected by
simultaneous voluntary components, it has also allowed for the
isolation of truly separate responses to concurrent stimuli. That
said, however, we can only assert that sensory processes are
affected by endogenous attention and cannot investigate the
certain effects of endogenous attention on nonsensory
cognitive processes.
A recent attempt was made to examine attention in
a competing stimulus environment using an intricate paradigm
(Bidet-Caulet et al. 2007). The authors of that study assessed
both transient responses and SSRs from depth electrode
recordings in patients with epilepsy. While the results from
this study were encouraging, they exhibited a number of
inconsistencies: Responses to certain stimuli were enhanced
when attended in some conditions and reduced when attended
in other conditions, and no attention effects were found in
a significant number of subjects. Furthermore, the use of onsets
and SSRs precluded the assessment of the timing of attentional
enhancement during sustained attention. This study also
Funding
Irish Research Council for Science, Engineering and Technology.
Notes
We thank Dr Simon P. Kelly for useful comments on the manuscript and
Dr Robert Whelan for assistance with the statistical analysis. Conflict of
Interest: None declared.
References
Ahveninen J, Jääskeläinen IP, Raij T, Bonmassar G, Devore S,
Hämäläinen M, Levänen S, Lin F-H, Sams M, ShinnCunningham BG, et al. 2006. Task-modulated ‘‘what’’ and ‘‘where’’
pathways in human auditory cortex. Proc Natl Acad Sci.
103:14608--14613.
Alho K, Teder W, Lavikainen J, Näätänen R. 1994. Strongly focused
attention and auditory event-related potentials. Biol Psychiatry.
38:73--90.
Bidet-Caulet A, Fischer C, Besle J, Aguera PE, Giard MH, Bertrand O.
2007. Effects of selective attention on the electrophysiological
representation of concurrent sounds in the human auditory cortex.
J Neurosci. 27:9252--9261.
Cohen YE, Knudsen EI. 1999. Maps versus clusters: different representations of auditory space in the midbrain and forebrain. Trends
Neurosci. 22:128--135.
De Sanctis P, Ritter W, Molholm S, Kelly SP, Foxe JJ. 2008. Auditory
scene analysis: the interaction of stimulation rate and frequency
separation on pre-attentive grouping. Eur J Neurosci. 27:
1271--1276.
Escera C, Alho K, Schröger E, Winkler I. 2000. Involuntary attention and
distractability as evaluated with event-related potentials. Audiol
Neurootol. 5:151--166.
Hillyard SA, Hink RF, Schwent VL, Picton TW. 1973. Electrial signs of
selective attention in the human brain. Science. 182:177--180.
Hopfinger JB, West VM. 2006. Interactions between endogenous and
exogenous attention on cortical visual processing. Neuroimage.
31:774--789.
Jonides J. 1981. Voluntary versus automatic control over the mind’s eye
movement. Atten Perform. 9:187--203.
Kelly SP, Gomez-Ramirez M, Foxe JJ. 2008. Spatial attention modulates
initial afferent activity in human primary visual cortex. Cereb
Cortex. 18:2629--2636.
Kerlin JR, Shahin AJ, Miller LM. 2010. Attentional gain control of
ongoing cortical speech representations in a ‘‘Cocktail Party’’.
J Neurosci. 30:620--628.
Lalor EC, Foxe JJ. 2010. Neural responses to uninterrupted natural
speech can be extracted with precise temporal resolution. Eur J
Neurosci. 31:189--193.
Lalor EC, Power AJ, Reilly RB, Foxe JJ. 2009. Resolving precise temporal
processing properties of the auditory system using continuous
stimuli. J Neurophysiol. 102:349--359.
Lehmann D, Skrandies W. 1980. Reference-free identification of
components of checkerboard-evoked multichannel potential fields.
Electroencephalogr Clin Neurophysiol. 48:609--621.
Linden RD, Picton TW, Hamel G, Campbell KB. 1987. Human auditory
steady-state evoked potentials during selective attention. Electroencephalogr Clin Neurophysiol. 66:145--159.
Murray MM, Brunet D, Michel CM. 2008. Topographic ERP analyses:
a step-by-step tutorial review. Brain Topogr. 20:249--264.
Näätänen R. 1982. Processing negativity: an evoked-potential reflection
of selective attention. Psychol Bull. 92:605--640.
Näätänen R. 2001. ‘‘Primitive Intelligence’’ in the auditory cortex.
Trends Neurosci. 24:283--288.
Näätänen R, Picton T. 1987. The N1 wave of human electric and
magnetic response to sound: a review and analysis of component
structure. Psychophysiology. 24:375--425.
Näätänen R, Teder W, Alho K, Lavikainen J. 1992. Auditory attention and
selective input modulation: a topographical ERP study. Neuroreport.
3:493--496.
Näätänen R, Winkler I. 1999. The concept of auditory stimulus
representation in cognitive neuroscience. Psychol Bull.
125:826--859.
Nelken I. 2004. Processing of complex stimuli and natural scenes in the
auditory cortex. Curr Opin Neurobiol. 14:474--480.
Pinek B, Duhamel JR, Cavé C, Brouchon M. 1989. Audio-spatial deficits
in human: differential effects associated with left versus right
hemisphere parietal damage. Cortex. 25:175--186.
Rauschecker JP, Tian B. 2000. Mechanisms and streams for processing
of ‘‘what’’ and ‘‘where’’ in auditory cortex. Proc Natl Acad Sci U S A.
97:11800--11806.
Cerebral Cortex June 2011, V 21 N 6 1229
Downloaded from https://academic.oup.com/cercor/article-abstract/21/6/1223/349063 by guest on 30 October 2018
posited a dominant role for left hemisphere in attentional
selection with right hemisphere being inhibited as a function of
attentional load. This is at odds with our results that show
voluntary attentional enhancement effects over both left and
right hemispheres and no lateralization of responses. However,
further inspection of their results reveals attentional enhancement in right hemisphere for a number of attentional
conditions, suggesting that it may be premature to make strong
conclusions about the hemispheric specialization of auditory
attention in sensory areas.
There is much debate relating to lateralization of attentional
and spatial processing with some studies favoring a right
hemisphere dominance (Tanaka et al. 1999; Spierer et al.
2009), others a left lateralization (Pinek et al. 1989), others
a bias to the hemisphere contralateral to the stimulus (Zatorre
et al. 1995), others whole field neglect following unilateral
lesions, that is, no difference between left and right lesion
subjects (Zatorre and Penhune 2001). Indeed, in their study that
encountered mixed lateralization results when studying patients
with unilateral temporal lobe excisions either encroaching on or
sparing HG Zatorre and Penhune (2001) summed up the
observed variation succinctly: ‘‘the existence of individual
differences likely illustrates differential patterns of functional
lateralization.’’ Thus, the possibility of a predominant lateralization of spatial processing in auditory cortex is still an open
question. Our results suggest that intensity processing is not
affected by the spatial properties of a stimulus. This does not rule
out the possibility of the involvement of spatially sensitive
centers, not represented in the intensity processing characterized by the AESPA responses, which may be lateralized:
Frequency-specific interaural time differences, interaural level
differences as well as monaural amplitude spectrum cues are
thought to integrate in a nonlinear fashion to create a map of
space represented by location-specific neurons (Cohen and
Knudsen 1999). Thus, since the activity of these location-specific
neurons is not linearly related to the intensity of the stimulus
then it is unlikely to be accounted for in the AESPA response.
This would further back up our assertion that the AESPA
response is due to the activity of intensity processing neurons
only (Lalor et al. 2009). In light of this, the current results,
although enlightening as to the effect of endogenous attention
on intensity processing, are not directly comparable with
previous studies on the lateralization of spatial processing.
Furthermore, there has been much interest in the purported
separation of ‘‘what’’ and ‘‘where’’ streams in the auditory system
(Rauschecker and Tian 2000; Tian et al. 2001; Ahveninen et al.
2006). Given that amplitude modulation is the driving feature of
the stimulus employed here (i.e., a likely what feature), it may be
that the what stream is preferentially driven by the current
implementation of the AESPA method and thus activity of the
location relevant where stream is not represented. Implementation of a paradigm whereby the intensity of a stimulus is kept
constant but the location of the stimulus is modulated would
shed further light on this possibility.
1230 Endogenous Auditory Spatial Attention
d
Power et al.
unilateral visuospatial neglect. J Neurol Neurosurg Psychiatry.
67:481--486.
Tian B, Reser D, Durham A, Kustov A, Rauschecker JP. 2001. Functional
specialization in rhesus monkey auditory cortex. Science.
292:290--293.
Ulanovsky N, Las L, Nelken I. 2003. Processing of low-probability
sounds by cortical neurons. Nat Neurosci. 6:391--398.
Vogel EK, Woodman GF, Luck SJ. 2005. Pushing around the locus of
selection: evidence for the flexible-selection hypothesis. J Cogn
Neurosci. 17:1907--1922.
Woldorff M, Gallen CC, Hampson SA, Hillyard SA, Pantev C, Sobel D,
Bloom FE. 1993. Modulation of early sensory processing in human
auditory cortex during selective attention. Proc Natl Acad Sci U S A.
90:8722--8726.
Zatorre RJ, Penhune VB. 2001. Spatial localization after excision of
human auditory cortex. J Neurosci. 21:6321--6328.
Zatorre RJ, Ptito A, Villemure JG. 1995. Preserved auditory spatial
localization following cerebral hemispherectomy. Brain. 118:
879--889.
Downloaded from https://academic.oup.com/cercor/article-abstract/21/6/1223/349063 by guest on 30 October 2018
Ritter W, De Sanctis P, Molholm S, Javitt DC, Foxe JJ. 2006.
Preattentively grouped tones do not elicit MMN with respect to
each other. Psychophysiology. 43:423--430.
Romani GL, Williamson SJ, Kaufman L. 1982. Tonotopic organization of
the human auditory cortex. Science. 216:1339--1340.
Ross B, Hillyard SA, Picton TW. 2010. Temporal dynamics of
selective attention during dichotic listening. Cereb Cortex.
20:1360--1371.
Ross B, Picton TW, Herdman AT, Hillyard SA, Pantev C. 2004. The effect
of attention on the auditory steady-state response. Neurol Clin
Neurophysiol. 22:1--4.
Spierer L, Bellmann-Thiran A, Maeder P, Murray MM, Clarke S. 2009.
Hemispheric competence for auditory spatial representation. Brain.
132:1953--1966.
Sussman E, Ritter W, Vaughan HG, Jr. 1999. An investigation of the
auditory streaming effect using event-related brain potentials.
Psychophysiology. 36:22--34.
Tanaka H, Hachisuka K, Ogata H. 1999. Sound lateralisation in patients
with left or right cerebral hemispheric lesions: relation with