Resource
https://doi.org/10.1038/s41593-021-00962-x
A massive 7T fMRI dataset to bridge cognitive
neuroscience and artificial intelligence
Emily J. Allen 1,2, Ghislain St-Yves3,17, Yihan Wu4, Jesse L. Breedlove3,18, Jacob S. Prince5,19,
Logan T. Dowdle 6,7, Matthias Nau 8, Brad Caron9,10, Franco Pestilli 11,12,13, Ian Charest14,15,
J. Benjamin Hutchinson16, Thomas Naselaris3,17,20 and Kendrick Kay 1,20 ✉
Extensive sampling of neural activity during rich cognitive phenomena is critical for robust understanding of brain function.
Here we present the Natural Scenes Dataset (NSD), in which high-resolution functional magnetic resonance imaging responses
to tens of thousands of richly annotated natural scenes were measured while participants performed a continuous recognition
task. To optimize data quality, we developed and applied novel estimation and denoising techniques. Simple visual inspec-
tions of the NSD data reveal clear representational transformations along the ventral visual pathway. Further exemplifying the
inferential power of the dataset, we used NSD to build and train deep neural network models that predict brain activity more
accurately than state-of-the-art models from computer vision. NSD also includes substantial resting-state and diffusion data,
enabling network neuroscience perspectives to constrain and enhance models of perception and memory. Given its unprec-
edented scale, quality and breadth, NSD opens new avenues of inquiry in cognitive neuroscience and artificial intelligence.
N
euroscience has an insatiable appetite for data. Many ongo- ideal dataset should include naturalistic stimuli: the visual system is
ing efforts to extensively sample brain activity1–3 and struc- distributed widely across the brain, and natural scenes, in addition
ture4–6 are motivated, in part, by the availability of new to being ecologically relevant, are effective activators of the entire
computational methods that make analysis of massive datasets fea- system12. Moreover, the ideal dataset should be large: to take full
sible. Equally as important is the growing desire to understand how advantage of powerful data analysis and machine learning (ML)
the brain coordinates complex sensory and motor behaviors and techniques that have recently become available, we need consider-
the realization that the neural networks supporting such behaviors ably more data than are currently available. How much? Modern
span multiple scales, from single neurons to local circuits to whole ML methods used in computer vision to process natural scenes (for
systems. Understanding massive, complex networks will inevitably example, deep convolutional neural networks (CNNs)) require tens
require commensurately massive amounts of data. to hundreds of thousands of image samples for training13,14. A data-
The need for massive data is especially acute in visual neurosci- set that sampled brain activity at these scales would raise the excit-
ence, which is a model system for understanding brain function. ing possibility of exploiting these methods to develop better models
The network that mediates our ability to flexibly and efficiently of how the brain processes natural scenes15–20 and would accelerate
perceive the visual world occupies approximately one-third of efforts to bridge cognitive neuroscience and artificial intelligence21.
human cerebral cortex7 and interconnects brain areas with pro- In this paper, we present a dataset that achieves sampling at this
foundly different functional properties8. This network both encodes ambitious scale. The NSD consists of high-resolution (1.8-mm)
visual stimuli and interfaces visual representations into a cognitive whole-brain 7T functional magnetic resonance imaging (fMRI)
context, including information about what one has already seen9, of eight carefully screened human participants who each viewed
might see10 or is selectively attending11. Understanding vision 9,000–10,000 color natural scenes (22,000–30,000 trials) during
thus means interrogating a high-dimensional, context-dependent 30–40 scan sessions distributed over the course of 1 year. Aggregated
neural network. across participants, NSD includes responses to 70,566 distinct natu-
Given these considerations, it is clear that extensive experimental ral scene images—this is more than an order of magnitude larger
data providing access to whole-brain responses to complex stimuli than similar datasets involving fMRI sampling of many images22–24.
are critical in the quest to understand the human visual system. The Moreover, as we show, the high quality of the NSD dataset makes
1
Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, Minneapolis, MN, USA. 2Department of
Psychology, University of Minnesota, Minneapolis, MN, USA. 3Department of Neuroscience, Medical University of South Carolina, Charleston, SC, USA.
4
Graduate Program in Cognitive Science, University of Minnesota, Minneapolis, MN, USA. 5Department of Psychology, Carnegie Mellon University,
Pittsburgh, PA, USA. 6Department of Neuroscience, Center for Magnetic Resonance Research (CMRR), University of Minnesota, Minneapolis, MN, USA.
7
Department of Neurosurgery, Center for Magnetic Resonance Research (CMRR), University of Minnesota, Minneapolis, MN, USA. 8National Institute
of Mental Health (NIMH), Bethesda MD, USA. 9Program in Neuroscience, Indiana University, Bloomington IN, USA. 10Program in Vision Science, Indiana
University, Bloomington IN, USA. 11Department of Psychology, University of Texas at Austin, Austin, TX, USA. 12Center for Perceptual Systems, University
of Texas at Austin, Austin, TX, USA. 13Institute for Neuroscience, University of Texas at Austin, Austin, TX, USA. 14Center for Human Brain Health,
School of Psychology, University of Birmingham, Birmingham, UK. 15cerebrUM, Département de Psychologie, Université de Montréal, Montréal QC,
Canada. 16Department of Psychology, University of Oregon, Eugene, OR, USA. 17Present address: Department of Neuroscience, University of Minnesota,
Minneapolis, MN, USA. 18Present address: Department of Psychology, University of Minnesota, Minneapolis, MN, USA. 19Present address:
Department of Psychology, Harvard University, Cambridge, MA, USA. 20These authors jointly supervised this work: Thomas Naselaris, Kendrick Kay.
✉e-mail: kay@umn.edu
116 Nature Neuroscience | VOL 25 | January 2022 | 116–126 | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
it possible to leverage the full power of modern ML methods for presentations such that both short-term and long-term repetitions
developing better models of visual representation. Achieving high were probed (Extended Data Fig. 1a). Parameters were selected such
data quality was afforded, in part, by the use of ultra-high magnetic that, even in the first scan session, images were not always new, and,
field strength (7T) to improve signal-to-noise ratio (SNR) over what even in the last scan session, images were not always old (Extended
is attained at lower field strengths25. Data Fig. 1b).
NSD incorporates several innovations in addition to its
unprecedented scale and quality. To reconcile extensive sampling Neuroimaging data collection on carefully selected participants.
with a practical time commitment, we used an aggressive rapid All fMRI data in the NSD were collected at 7T using a whole-brain,
event-related design. This drove the development of new analysis 1.8-mm, 1.6-s, gradient-echo, echo-planar imaging (EPI) pulse
techniques that accurately compensate for the overlap of hemo- sequence. After verbally screening several potential participants
dynamic responses across successive trials. To ensure participant with respect to basic eligibility criteria, we recruited 14 individuals
engagement and control cognitive state, we incorporated a con- to participate in an initial 7T fMRI screening session that involved
tinuous recognition task26 in which participants were instructed to population receptive field (pRF)30 and category functional local-
indicate whether they have seen each presented image at any point izer (fLoc)31 experiments. Based on data from this scan session, we
in the past. In addition to making the experiment tolerable (and ranked the 14 participants with respect to data quality. Specifically,
even somewhat interesting) for participants, the inclusion of this we quantified BOLD variance explained in the pRF and fLoc experi-
task makes the NSD, to our knowledge, the longest-term continu- ments, behavioral performance in the pRF and fLoc experiments
ous recognition memory fMRI study in history and, thus, a likely and two metrics of head motion, normalized these six measures and
source of new insights into long-term memory formation and the then averaged the measures (for details, see ‘Rankings from the 7T
cognitive context of vision. Finally, to ensure the broad reach of the fMRI screening session’ in the Methods). We then invited the top
NSD dataset, we incorporated design input from a large network of eight individuals to participate in the full NSD experiment (all indi-
collaborators with diverse scientific interests (for example, low-level viduals accepted). This selection process was conducted to ensure
vision, high-level vision, memory, connectivity and neuroanatomy) the best possible data quality for the NSD. Analyses conducted after
and technical expertise (for example, mapping, multivariate pattern completion of the NSD experiment confirm that the ranking pro-
analysis, encoding models, representational similarity analysis and cedure successfully identified individuals who yield high-quality
neural network modeling). This input helped precipitate a carefully data and that data quality would have suffered substantially had we
curated dataset with extensive auxiliary measures. omitted the selection process (Fig. 2c).
This paper provides a comprehensive description of the design, Data were collected from the eight NSD participants over the
acquisition and preparation of the NSD dataset. In particular, we course of 1 year (Fig. 1c). Participants consistently engaged with the
detail the state-of-the-art acquisition and analysis methods that we task: the average response rate across scan sessions was above 99%
developed for the dataset and present comprehensive assessments for all participants, and the response rate never dropped below 96%
that evidence the high quality of the data. We also present initial in any single scan session. Moreover, all participants exhibited suc-
analyses of the NSD dataset, demonstrating the feasibility of using cessful recognition performance (Fig. 1d), issuing ‘old’ responses at
data-driven analyses to reveal insights into vision and memory. We a higher rate for previously presented images (blue and orange lines)
expect that the NSD will serve as a valuable resource with wide- than for novel images (yellow lines). The full NSD dataset includes
spread application in neuroscience and its intersection with artifi- a variety of anatomical neuroimaging measures (including T1, T2,
cial intelligence. diffusion, venogram and angiogram), functional neuroimaging
measures (including the pRF and fLoc experiments, the NSD exper-
Results iment, resting-state data and two additional experiments involving
Sampling thousands of images during continuous recognition. synthetic stimuli and visual imagery) and behavioral measures (Fig.
We obtained 73,000 color natural scenes from the richly annotated 2a,b). In some fMRI sessions, physiological data (ten sessions per
Microsoft Common Objects in Context (COCO) image dataset14, a participant) and eye-tracking data (2–4 sessions per participant)
dataset that is heavily used in the computer vision and ML commu- were also collected. Analysis of the eye-tracking data indicates
nities. Our experimental design specified that each of eight partici- that participants were able to successfully maintain central fixa-
pants would view 10,000 distinct images, and a special set of 1,000 tion most of the time, with some variability in fixation performance
images would be shared across participants (eight participants × across participants (Extended Data Fig. 4). Regarding the core NSD
9,000 unique images + 1,000 shared images = 73,000 images). This experiment, we completed the full set of 40 NSD scan sessions for
sampling strategy was chosen to maximize the number of distinct four of the participants, but, owing to unforeseen summer absences
images in the NSD while also facilitating investigations of simi- and scheduled decommissioning of the 7T scanner, we completed
larities and differences in brain representations across individuals27. 30–32 NSD scan sessions for each of the other participants. A full
Each image would be presented three times to a given participant. breakdown of data collection and analysis procedures is provided in
Although this is a low number, we reasoned that three trials would Extended Data Figs. 2 and 3.
be sufficient to produce robust responses given our use of ultra-high
field (7T) fMRI. Furthermore, images would be presented using a Stable high-resolution imaging across scan sessions. In our expe-
rapid event-related design consisting of 4-s trials (Fig. 1a). This rience, although visual inspection is non-quantitative and somewhat
was done to maximize statistical power and to create an engaging subjective, it is still the most effective way to assess many common
experience for the participants. In addition, the continuous nature aspects of fMRI pre-processing32. Accordingly, we generated a com-
of task engagement—in contrast to slow event-related designs and prehensive set of visualizations that detail the excellent quality of the
block designs where engagement is likely to fluctuate—helps avoid raw and pre-processed NSD data. These include detailed inspections
unwanted respiratory variations28 and arousal-related confounds29. of raw time series data to confirm the presence of stimulus-evoked
The NSD experiment was split across 40 scan sessions for signals (Supplementary Fig. 3); movies that assess the co-registration
each participant (Fig. 1b). To control cognitive state and encour- of the different imaging modalities (for example, T1, T2 and EPI;
age deep processing of the images, participants were instructed Supplementary Video 1); movies that assess the manually edited
to perform a continuous recognition task in which they reported cortical surface reconstructions generated using FreeSurfer
whether the current image had been presented at any previous (Supplementary Video 2); movies that assess the registration of
point in the experiment. We controlled the distributions of image the NSD participants to the fsaverage (Supplementary Video 3)
Nature Neuroscience | VOL 25 | January 2022 | 116–126 | www.nature.com/natureneuroscience 117
Resource NaTurE NEuroSciEncE
a Task: “Have you seen b
this image before?”
Stimulus trials Blank trials
Fixation
dot
One NSD scan session:
Richly
annotated 12 NSD runs, 750 stimulus trials total
images
F 1 2 3 4 F 5 6 7 8 F 9 10 11 12 F
8.4°
3s
Fieldmaps
1s
c NSD NSD (with resting-state) Additional
Participant
1
2
3
4
5
6
7
8
–50 0 50 100 150 200 250 300
Days
d Participant 1 Participant 2 Participant 3 Participant 4 Participant 5 Participant 6 Participant 7 Participant 8
100
Response = old (%)
75
50
25
0
10 20 30 40
Easy trials (image is a repeat within the same scan session)
Scan session
Hard trials (image is a repeat from a previous scan session)
Novel trials (image is not a repeat)
Fig. 1 | Design of the NSD experiment. a, Trial design. While maintaining central fixation, participants viewed sequences of color natural scenes and
judged whether each image had been previously shown at any point in the past. The scenes, taken from Microsoft’s COCO14, are richly annotated with
object information (as depicted). b, Run and session design. Each run lasted 5 min and consisted of 62 or 63 stimulus trials with occasional interspersed
blank trials. Each scan session consisted of 12 runs (750 stimulus trials). c, Timeline of 7T fMRI scan sessions. Each individual participated in an
initial screening session (prffloc), 30–40 NSD core sessions and two final sessions (nsdsynthetic and nsdimagery). The first NSD core session
corresponds to day 0. d, Behavioral performance. For each of three trial types, we quantify the percentage of trials on which the participant indicated
an ‘old’ response.
and MNI (Supplementary Video 4) group spaces; movies that voxels and runs by a simple ON–OFF general linear model (GLM))
inspect raw and pre-processed EPI volumes (Supplementary are stable across scan sessions for each participant, although there
Video 5); and movies that provide volume and surface visualizations is substantial variation in the strength of BOLD responses across
of the stability of mean EPI intensity across sessions (Supplementary participants (Fig. 2d, right).
Videos 6 and 7 and Supplementary Fig. 4) and the stability of One feature that we implemented in the pre-processing of the
BOLD responses across sessions (Supplementary Videos 8 and 9). fMRI data was to interpolate the data on a fine temporal grid and
All movies are readily viewable online (https://osf.io/zyb3t/). The a fine spatial grid in the same steps used to correct for slice timing
visualizations—in particular, Supplementary Video 9—indicate that differences and spatial displacements (for example, head motion).
the quality of the NSD data enable precision functional mapping33: This upsampling strategy preserves fine-scale detail that is pres-
activity patterns are fine-scale and highly reliable within individual ent in the raw fMRI data due to the temporal jitter of the acquired
participants, and these patterns are distinct across participants. fMRI volumes relative to the experimental paradigm and the spatial
In addition to visual inspection, quantitative data quality metrics jitter of the acquired fMRI volumes relative to the anatomy of the
were computed for each NSD scan session. This was in fact done brain32,34. An illustration of the benefits of upsampling is provided
on a rolling basis as the data were acquired, allowing us to monitor in Extended Data Fig. 5. This example highlights the existence of
data quality and provide performance bonuses to the participants. fine-scale detail in fMRI image intensities (Extended Data Fig. 5b,
Inspecting the metrics, we see that temporal signal-to-noise ratio top row) as well as in BOLD responses extracted from the fMRI data
(tSNR) is stable across scan sessions for each participant (Fig. 2d, (Extended Data Fig. 5b, bottom row, and Extended Data Fig. 5c).
left). One participant, participant 8, exhibited low tSNR compared to Notably, this fine-scale detail is replicable across different scan ses-
the other participants; this can be attributed to higher levels of head sions (Extended Data Fig. 5c, bottom, and Extended Data Fig. 5d),
motion for this participant (Fig. 2d, middle). We also observe that indicating that the upsampled preparation reveals meaningful detail
BOLD responses (quantified as median variance explained across that is lost under a non-upsampled approach.
118 Nature Neuroscience | VOL 25 | January 2022 | 116–126 | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
a b c Screening
pRF T1 T2 Functional Diffusion 3T
7T
Measurement
Linear fit
Extrapolation
60
Noise ceiling (%)
40
20
fLoc 0
1 2 3 4 5
Venogram Angiogram High-res T2 (hippocampus) Participant ranking
from screening
Resting-state
100
Variance explained (%)
80
Behavior Physiology Eye-tracking Performance bonus
+ 60
40
20
0
tSNR Head motion 1 mm BOLD response
Participant 1 Participant 5
Frame-wise displacement (mm)
50 0 40 80 12
Participant 2 Participant 6
0.2 10 Participant 3 Participant 7
Participant 4 Participant 8
ON–OFF R2 (%)
40
8
tSNR
30 6
0.1
4
20
2
10 0 0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
Scan session Scan session Scan session
Fig. 2 | Overview of acquired data. a, Auxiliary fMRI experiments. Data from the pRF and fLoc experiments were used to define retinotopic visual areas and
category-selective regions, respectively. Resting-state data were collected before and after the NSD runs in a subset of the NSD core sessions (totaling 100
or 180 min per participant). b, Available measures. Examples of the actual data are depicted. c, Participant selection. Data quality from the initial screening
session was used to rank a set of 14 participants. On the right is an illustration of one measure contributing to the ranking—specifically, variance explained
in the fLoc experiment (one slice per participant; identical color range). The inset compares the participant ranking against the b3 noise ceiling calculated
on the full NSD dataset (Fig. 3). A line fit to the eight NSD participants (gold dots) is extrapolated to predict noise ceilings for the individuals who were not
selected for participation in the NSD (red circles). d, Metrics of data quality (for details, see ‘Data quality metrics’ in the Methods). Results for individual
participants (thin colored lines) and the median across participants (thick black line) are shown. The insets show detail on tSNR and head motion for one
sample run (see Supplementary Figs. 1 and 2 for more information).
Extensive auxiliary measures to complement the NSD data. To to help streamline subsequent analyses of the data. The goal of the
enrich the fMRI data from the NSD experiment, we collected and GLM was to obtain single-trial betas—that is, estimates of the fMRI
prepared a large set of auxiliary measures. These measures include response amplitude of each voxel to each trial conducted. Given the
substantial amounts of resting-state data (minimum 100 min per low SNR of fMRI and the overlap of the hemodynamic response from
participant), external physiological measures during the resting- trial to trial, estimating accurate betas is a challenging endeavor. We
state scan sessions, diffusion data and associated derivatives (white- thus developed a novel GLM approach consisting of three com-
matter tracts and structural connectivity matrices) and an extensive ponents. First, we used a library of hemodynamic response func-
collection of manually defined regions of interest (ROIs), including tions (HRFs) derived from an initial analysis of the dataset as an
retinotopic and category-selective areas as well as subregions of efficient and well-regularized method for estimating voxel-specific
the thalamus and medial temporal lobe. Results and discussion of HRFs (Fig. 3a–c). Second, we adapted the GLMdenoise tech-
these resources can be found in Supplementary Note 1, Extended nique35 to the single-trial GLM framework, thereby enabling the
Data Figs. 6 and 7 and Supplementary Fig. 5. use of data-driven nuisance regressors (Fig. 3d). Third, to address
the challenge posed by highly correlated single-trial regressors,
Accurate estimation of single-trial fMRI response amplitudes. we developed an efficient implementation of ridge regression36
We performed a GLM analysis of the data from the NSD experiment and used this to regularize and improve the accuracy of the betas
Nature Neuroscience | VOL 25 | January 2022 | 116–126 | www.nature.com/natureneuroscience 119
Resource NaTurE NEuroSciEncE
a S1 S2 S3 S4 S5 S6 S7 S8 d 10
8
GLMdenoise
regressors
Number of
6
4
2
b
0
PC1 PC2 PC3 c 1 20 1 2 3 4 5 6 7 8
1 Participant
0 e
BOLD signal
PC3
20 1
0
PC1
PC3
PC2 0 3 6 9 12 15 18 21 24 27 30
PC2 Time (s)
< 0.7 Ridge regression fraction 1
f Noise ceiling (%)
g
CGS 50 b1 b2 b3
Noise ceiling (%)
CGS
0 25 50 75 40
PrCS PrCS
CS CS 30
PoCS PoCS
Calc Calc 20
10
IPS IPS
0
SFRS SFRS 1 2 3 4 5 6 7 8
IFRS
STS STS Participant
IFRS
PrCS PoCS
LS LS
CS
OTS IPS
CGS
CoS
CoS OTS SFRS
IFRS LS Calc
STS
EPI signal Cortical surface OTS CoS
dropout nsdgeneral ROI imperfections Zo Inflated,
om
ed thresholded
Beta version 1 Beta version 2 Beta version 3
(b1) (b2) (b3)
Fig. 3 | Improving SNR through novel response estimation and denoising methods. a–c, Library of HRFs. HRFs were estimated within a subspace spanned
by three PCs. Distributions of voxel-specific HRFs are shown for individual participants (a) and the group average (b). These distributions reside on the
unit sphere with coordinate axes corresponding to three PC time courses (b, inset). We defined a series of points on the unit sphere (cyan dots), and the
time courses associated with these points are used as the HRF library (c). d, GLMdenoise. Horizontal lines indicate the average number of GLMdenoise
regressors identified in a scan session (1.8-mm preparation; error bars indicate bootstrapped 68% confidence intervals). e, Ridge regression. Optimal ridge
regression fractions are shown for an example scan session (participant 5, nsd10, 1-mm preparation). f, Noise ceilings for the case where responses are
averaged across three trials. Results from individual participants (nativesurface preparation) were mapped to fsaverage and then averaged. Right inset
shows results thresholded at 15% on the inflated left hemisphere (see also Supplementary Video 10). g, Performance summary. Each bar indicates the
median noise ceiling across vertices in the nsdgeneral ROI. Calc, calcarine sulcus; CGS, cingulate sulcus; CoS, collateral sulcus; CS, central sulcus; IFRS,
inferior frontal sulcus; IPS, intraparietal sulcus; LS, lateral sulcus; OTS, occipitotemporal sulcus; PoCS, post-central sulcus; PrCS, precentral sulcus; SFRS,
superior frontal sulcus; STS, superior temporal sulcus.
(Fig. 3e). To assess the efficacy of these various GLM techniques, we We quantified the quality of the different beta versions (b1, b2
generated three versions of the betas, reflecting increasing sophis- and b3) by calculating noise ceilings for individual voxels. The
tication (Extended Data Fig. 8a–c). Beta version 1 (b1) is the result noise ceiling is a measure of trial-to-trial reliability, quantifying the
of simply using a canonical HRF for all voxels. Beta version 2 (b2) is percentage of variance in a voxel’s responses that can be attributed
the result of fitting an HRF to each voxel using the library-of-HRFs to the stimulus and not to measurement noise (Methods). Surface
approach. Beta version 3 (b3) uses the library-of-HRFs approach as maps of noise ceiling results reveal locations of reliable responses
with b2 but also adds the use of GLMdenoise and ridge regression to the NSD stimuli: high noise ceilings are present in occipital
in an attempt to improve the accuracy of the betas. cortex and extend into temporal and parietal cortex (Fig. 3f and
120 Nature Neuroscience | VOL 25 | January 2022 | 116–126 | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
Supplementary Video 10). Notably, the maps reveal very large guessing (adjusted hit rate: hit rate minus false alarm rate) and
increases in noise ceilings from b1 to b2 to b3, indicating that the binned this measure by the time since last exposure (considering
additional GLM techniques incorporated into b2 and b3 improve only those trials involving a previously shown image). At the group
reliability of responses. Detailed quantifications show that these level, participants exhibited performance levels greater than chance
improvements are highly consistent across voxels and participants (adjusted hit rate > 0) in all measured intervals, ranging from 1 s to
(Fig. 3g and Supplementary Fig. 6a) and that noise ceiling estimates 1 year. At the level of individuals, all participants showed a positive
are highly reliable (Supplementary Fig. 6b). For b3, the noise ceiling adjusted hit rate in the longest time bin for which data are available
levels in visual cortex are, on average, 36% (calculated by computing for every participant (when binning on a log scale; seven of eight
the median across the nsdgeneral ROI and then averaging across participants when binning on a linear scale). These results indi-
participants). This means that a typical visual cortex voxel in the cate that, from its behavioral component alone, NSD is powered to
NSD dataset has associated with it a set of 10,000 responses (30,000 address questions concerning human memory spanning short (sec-
trials divided by 3 trials per image = 10,000 images), and a large onds) to relatively long (months) time scales.
percentage, 36%, of the variance in these 10,000 values is a signal But what about neural effects? To assess whether recognition
that is, in theory, predictable. Expressed in terms of Pearson’s cor- effects are present in the fMRI data, we performed two-sample
relation (r), this is equivalent to a prediction accuracy of r = 0.60. t-tests contrasting NSD betas observed for hits with NSD betas
Complementing the noise ceiling analysis, we also performed sim- observed for correct rejections (the so-called ‘old/new effect’38).
ple univariate analyses of the NSD betas (Extended Data Fig. 8d,e); We found highly consistent old/new effects at the level of indi-
these analyses show that the NSD dataset contains high response vidual scan sessions (Fig. 4b, top; see also Supplementary
reliability across trials within a participant as well as high response Fig. 7). Moreover, these effects occur in expected frontal and parietal
reliability across participants. regions39 and persist at the group level (Fig. 4b, bottom). The scale
and statistical power afforded by the NSD dataset also provide addi-
A massive increase in equivalent trials. To put the quality of the tional insight. Whereas old/new effects are typically studied using
NSD data into perspective, we propose the concept of ‘equivalent group-level analyses, the quality of the NSD dataset reveals highly
trials’, which allows comparison of different datasets that vary in statistically significant results at the level of individual participants.
SNR and trial distribution (see Methods for details). The next largest Indeed, when pooling trials across all NSD scan sessions, several
data collection effort that is similar in nature to NSD is BOLD5000 participants exhibited statistically significant activity differentiating
(ref. 22). Using the same GLM analysis methods on both NSD and hits and correct rejections in nearly the entire cerebral cortex (see
BOLD5000, we found that the SNR per trial is approximately 0.260 results for a representative participant in Fig. 4b, top). Reminiscent
for the NSD and 0.187 for BOLD5000. Combining these values with of past datasets employing extensive sampling of individuals40, the
the number of trials conducted in each dataset, we estimate that the current results suggest that the extent of cortex engaged by basic
total size of the NSD dataset is 213,000 trials × (0.260)2 = 14,399 memory processes is much more widespread than previously appre-
equivalent trials, whereas the total size of BOLD5000 is 18,870 trials ciated, although a careful consideration of effect sizes would be
× (0.187)2 = 660 equivalent trials. Thus, using the metric of equiva- important for a full understanding of the effect.
lent trials, the NSD can be viewed as 14,399/660 = ~22 times as large
as the BOLD5000 dataset. This is a massive increase in statistical Rich stimulus sampling for probing brain representations. The
power. Note that even if we do not take into account the higher SNR NSD samples a large variety of natural scenes. To gain insight into
per trial in the NSD dataset, the NSD still has substantially more the breadth of stimulus sampling available, we constructed repre-
participants (eight versus four), more trials per participant (26,625 sentational dissimilarity matrices (RDMs) from the NSD betas and
versus 4,718, on average) and more hours of fMRI per participant performed t-distributed stochastic neighbor embedding41 (t-SNE)
(35.5 versus 13.7, on average) than BOLD5000. to visualize the underlying representations. We computed t-SNE
embeddings in different regions along the ventral visual pathway
Successful recovery of retinotopy. Having demonstrated the qual- for an example participant (Fig. 5a). These embeddings reflect
ity of the NSD data, we now turn to example analyses that illustrate arrangements of stimuli that are driven by the overall similarity
the rich scientific insights that can be derived from the data. As a of multi-voxel activity patterns in the brain, independent of their
simple starting example, we fit a voxel-wise pRF model that uses anatomical organization within a given ROI. Visualizing the data
local contrast in the NSD images to account for the NSD betas. This in this way reveals intriguing patterns of semantic representation
simple model is expected to recover spatial tuning in early visual that are clearly visible by eye. For example, by color-coding the
cortex where responses co-vary with stimulus energy37. Indeed, in resulting embeddings according to animacy attributes (Fig. 5b), we
all eight participants, high-quality maps of angle and eccentric- found that, in posterior ventral temporal cortex (pVTC), there is a
ity estimates are obtained in early visual cortex, and these esti- clear large-scale pattern progressing from images containing people
mates extend all the way to the fovea (Extended Data Fig. 9 and (gray dots, lower left), images containing animals (red dots, middle)
Supplementary Modeling Note 1). These results provide a check of and images containing inanimate objects (blue dots, upper right),
the validity of the NSD betas. They also show that participants were whereas the pattern is not present in early visual areas V1, V2 and
able to maintain central fixation reliably enough to support detailed V3. This aspect of semantic representation is consistent with previ-
mapping of visual space. This finding is consistent with our analysis ous studies42,43.
of the eye-tracking data (Extended Data Fig. 4). Other intriguing patterns are also visible. In anterior ventral
temporal cortex (aVTC), the animacy progression is present to
Reliable and long-term recognition memory effects. The use of a some extent, but a different, more clustered representation emerges
continuous recognition task establishes the NSD as one of the larg- that presumably reflects more complex categorical and semantic
est datasets relevant to human memory. Despite the challenging clusters. Indeed, zooming in on small sections of the t-SNE embed-
nature of the task, we found that participants were able to success- ding for aVTC reveals that these clusters contain images with rela-
fully discriminate old images from new images (average d' across tively homogeneous semantic content (Fig. 5c): the blue cluster
participants: 1.28, maximum: 1.47, minimum: 0.94). Furthermore, is dominated by images of round edible objects, whereas the gray
recognition memory remained above chance even at long time cluster is dominated by images of people interacting with objects.
scales between repetitions (Fig. 4a). Specifically, for each session, Note that the clustering of semantically related images does not
we calculated a measure of recognition accuracy accounting for necessarily mean that these representations are truly semantic
Nature Neuroscience | VOL 25 | January 2022 | 116–126 | www.nature.com/natureneuroscience 121
Resource NaTurE NEuroSciEncE
a
1 1
0.8 0.8
0.6 0.6
Adjusted hit rate
0.4 0.4
0.2 0.2
0 0
–0.2 –0.2
–0.4 –0.4
0 50 100 150 200 250 300
ar
d
m k
al
in
th
ee
tri
1
m
ye
1
on
Time (d)
w
1
1
1
1
All sessions
b nsd05 nsd10 nsd15 nsd20 All sessions (shuffled labels)
CGS
PrCS CS
PoCS Calc
SFRS IPS
Single IFRS STS
participant LS OTS
CoS
Average
across
participants
Inflated
–8 t -value 8 –40 t -value 40
Fig. 4 | Reliable and long-term recognition memory effects. a, Behavioral recognition effects. Adjusted hit rate indicates recognition accuracy accounting
for guessing (hit rate minus false alarm rate) and is binned by time between repetitions on a linear scale (left) or a log scale (right). Dashed line indicates
chance performance. Each dot in each bin summarizes relevant trials from one scan session. Black line indicates the mean across participants, with the
ribbon indicating ± 1 s.e.m. b, Neural recognition effects. We performed two-sample t-tests on NSD betas contrasting ‘hits’ > ‘correct rejections’. All
results are shown on a flattened left hemisphere fsaverage surface and thresholded at |t | > 3 (inset shows inflated surface). Tests were performed for trials
taken from individual NSD scan sessions (columns 1–4) as well as for trials pooled across all NSD scan sessions (column 5). In addition, we performed
a control in which trial labels in the pooled analysis were shuffled (column 6). Results for participant 1 (top row) and a simple average of results across
participants (bottom row) are shown. Calc, calcarine sulcus; CGS, cingulate sulcus; CoS, collateral sulcus; CS, central sulcus; IFRS, inferior frontal sulcus;
IPS, intraparietal sulcus; LS, lateral sulcus; OTS, occipitotemporal sulcus; PoCS, post-central sulcus; PrCS, precentral sulcus; SFRS, superior frontal sulcus;
STS, superior temporal sulcus.
in the sense of being invariant or independent of visual features; To counteract these noise correlations, one simple approach is to
the clustering could be driven by certain visual features that are compare representations across ROIs using data from distinct tri-
diagnostic of object categories44. To tease apart these possibilities, als45. To further summarize the second-order RDM, we computed
additional detailed analyses would be necessary. Overall, these find- the average correlation of brain RDMs across all ROI pairs, restrict-
ings show how simple visual inspections of the NSD dataset can ing this calculation to distinct participants to avoid the effects of
be used to generate hypotheses about visual representations in the spatial noise correlations (Fig. 5f). We observe that correlations are
human brain. highest for brain RDMs from the same ROI (for example, a given
To further characterize brain representations using a quantita- participant’s V1 RDM is more correlated with other participants’
tive analysis, we calculated how well brain RDMs are captured V1 RDMs compared to other ROIs), confirming consistencies in
by a model RDM constructed from category labels in the COCO brain representations across participants (for a complementary
image dataset. Consistent with the clustering observed in the t-SNE univariate analysis of across-participant consistency, see Extended
embeddings, we found that categorical structure is pronounced in Data Fig. 8d,e).
VTC compared to early visual areas (Fig. 5d). Finally, to assess the
utility of the NSD for investigating similarities of brain represen- A brain-optimized neural network model of the visual system.
tations across participants, we isolated images that were common One of the main motivations for the NSD was to amass sufficient
across participants and created a second-order RDM that quantifies sampling of brain activity to be able to drive data-hungry ML
the similarity of brain RDMs across ROIs and participants (Fig. 5e). techniques. As an intriguing test case, we specifically investigated
In this second-order RDM, we observed high levels of consistency whether we could successfully use the scale of the NSD to train, from
in each ROI’s representation across participants (red outlines). We scratch, a deep CNN to accurately predict brain activity17. Adopting
also observed distinct representations across ROIs, with the larg- the framework of encoding models46, we took NSD betas from
est distinctions occurring between early visual areas and VTC. One visual areas V1–hV4, divided these data into a training set (used
noticeable finding is the existence of strong off-diagonal elements for parameter tuning) and a validation set (used to assess predic-
(white arrows); these elements indicate spatial noise correlations tion performance) and evaluated how accurately different compu-
that are typical in fMRI and other neural measurement techniques. tational models predict brain responses in the validation set based
122 Nature Neuroscience | VOL 25 | January 2022 | 116–126 | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
a c aVTC
aVTC
pVTC
V3
V2
V1
b V1 V2 V3 pVTC aVTC
People People + animals People + inanimates Animals Animals + inanimates Inanimates
d Participant 1 e f V1
0.25 V1 V2 V3 pVTC aVTC 0.7
Participant 2 1 V2
Participant 3 V3
V1 0.6
Correlation of category RDM
0.2 Participant 4
Average across-participant
pVTC
correlation of brain RDMs
Participant 5 0.8
Participant 6 0.5 aVTC
with brain RDM
Participant 7 V2
0.15
Participant 8 0.6 0.4
Correlation
Group average
V3 0.3
0.1
0.4
0.2
pVTC
0.05
0.2 0.1
aVTC
0 0
V1 V2 V3 pVTC aVTC 0 V1 V2 V3 pVTC aVTC
Fig. 5 | Representational similarity analysis reveals transformations of representations along the ventral visual stream. a, Illustration of fsaverage ROIs
used for the representational similarity analysis. b, t-SNE embedding for each ROI in an example participant (participant 1). Each dot represents a distinct
image (total, 10,000). Using category labels from the COCO image dataset, we color each dot according to whether the associated image contains
particular combinations of people, animals and inanimates. c, t-SNE embedding for aVTC with actual images depicted. Insets highlight an inanimate cluster
(blue inset) and a cluster of people with inanimate objects (gray inset). d, Categorical brain repesentations. We plot the correlation between brain RDMs
and a model RDM constructed from category labels in the COCO dataset. Color-shaded regions indicate within-participant error (mean and standard
error across distinct groups of images), whereas the gray-shaded region indicates across-participant error (mean and standard error across participants).
e, Similarities of brain representations across ROIs and participants. Depicted are correlations across brain RDMs obtained for different ROIs and
participants. Thin white lines separate groups of eight participants. f, Quantitative summary. We summarize the results of e by averaging the upper triangle
of each group of 8 × 8 participants, reflecting the correlation of RDMs from different participants. Shaded regions indicate standard errors estimated by
bootstrapping participants with replacement.
Nature Neuroscience | VOL 25 | January 2022 | 116–126 | www.nature.com/natureneuroscience 123
Resource NaTurE NEuroSciEncE
0.45
S1 Noise ceiling S2 Noise ceiling S5 S6
0.40 Noise ceiling
Median validation accuracy
0.35 Noise ceiling
0.30
0.25
0.20
0.15
Gabor
0.10 AlexNet
GNet (single-participant)
0.05 GNet (multi-participant)
0
1 2 3 6 9 18 27 1 2 3 6 9 18 27 1 2 3 6 9 18 27 1 2 3 6 9 18 27
Training samples (×1,000) Training samples (×1,000) Training samples (×1,000) Training samples (×1,000)
b
0.8 S1-V1 S1-V2 S1-V3 S1-hV4
Validation accuracy
0.6
0.4
0.2
Individual voxels
Noise ceiling
0 Median across voxels
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
Noise ceiling Noise ceiling Noise ceiling Noise ceiling
c
S1 S2 S5 S6
Validation accuracy
V1
0.50
V2
V3
0.25 hV4
Fig. 6 | Prediction of brain activity using a brain-optimized neural network. We used encoding models46 to predict voxel activity in V1–hV4. NSD betas
were divided into a training set (consisting of up to 9,000 images × 3 trials = 27,000 training samples per participant) and a validation set (consisting
of up to 1,000 images × 3 trials = 3,000 validation samples per participant), and the accuracy of different encoding models was quantified as the
voxel-wise correlation between model predictions and responses observed in the validation set. a, Performance as a function of amount of training data
used. Models include an encoding model based on AlexNet, which is a task-optimized neural network (blue); encoding models based on GNet, which is a
brain-optimized neural network trained using data from single participants (orange) or data from multiple participants (red); and a V1-like control model
based on Gabor filters (purple). Plotted lines and error bars indicate mean and standard deviation across results obtained from different bootstrap samples
of the data. b, Detailed view of the performance of the multi-participant GNet model for a representative participant. c, Surface maps depicting spatial
distribution of validation accuracy for the multi-participant GNet model.
on the presented image. The primary encoding model of interest is GNet model exhibits an impressive increase in performance, achiev-
based on a new network that we refer to as ‘GNet’, a brain-optimized ing approximate parity with the AlexNet-based encoding model
CNN whose parameters are trained using image–response pair- (Fig. 6a, blue arrows). Interestingly, when we trained a single GNet
ings observed in the training set. For comparison, we also evalu- model using brain activity from multiple participants, we found that
ated an encoding model based on AlexNet47, a task-optimized the model was able to outperform the AlexNet model (two-tailed
CNN whose parameters are pre-trained using explicit labels of paired t-test across participants, P = 0.013), albeit modestly (Fig. 6a,
objects taken from an image database. AlexNet has been previously red arrows). Noticeably, the simple Gabor model accounts for sub-
shown to provide state-of-the-art performance in modeling visual stantial variance in the responses; nonetheless, the more complex
responses15,19. Finally, we included a simple V1-like control model CNN-based models provide additional predictive power, consistent
based on oriented Gabor filters24. Details of modeling procedures with previous observations48. For additional insight into model per-
are provided in Supplementary Modeling Note 2 and Extended formance, we compared voxel-wise performance levels of the GNet
Data Fig. 10. model to noise ceiling estimates (Fig. 6b). Across voxels, prediction
Varying the amount of training data provided to the models, we accuracy is tightly correlated with the noise ceiling, suggesting that
found that the performance of the GNet-based encoding model is voxel-wise differences in prediction accuracy simply reflect differ-
relatively poor when only small amounts of training data are avail- ences in SNR. In addition, performance levels are close to, but do
able (Fig. 6a, orange arrows). This is expected because the feature not reach, the noise ceiling. Finally, cortical surface maps indicate
extractors in GNet are not pre-trained and thus require data for tun- that voxel-wise performance levels vary across foveal and peripheral
ing. However, when large amounts of training data are available, the representations (Fig. 6c).
124 Nature Neuroscience | VOL 25 | January 2022 | 116–126 | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
The demonstration that an encoding model based on a voxel-specific HRFs; (2) adaptation of the GLMdenoise tech-
brain-optimized CNN (GNet) outperforms an encoding model nique35 to a single-trial GLM framework; and (3) development of
based on a task-optimized CNN (AlexNet) is important for two rea- ridge regression as an effective method for regularizing single-trial
sons. First, it indicates that the NSD is large enough to successfully response estimates. These three techniques have been integrated into
train a complex neural network architecture. Had the NSD data- a toolbox that can be applied to other neuroimaging datasets and are
set been smaller in scale or lower in quality, qualitatively different the subject of a forthcoming paper. An important lesson stemming
patterns of model performance would have been obtained (in from our results is that well-executed data collection is important
Fig. 6a, compare orange arrows reflecting a few thousand trials to but not the only factor to consider: data preparation methods exert
red arrows reflecting tens of thousands of trials). Second, the suc- a major influence on the quality of a dataset and, hence, its scientific
cessful training of a brain-optimized CNN opens the possibility of value. One can view improvements in data quality as equivalent to
new avenues of investigation into the nature of the features used increases in data quantity, in the sense that analysis methods that
in CNNs. It is an interesting open question whether the features reduce unwanted variability (noise) can be interpreted as increasing
learned by task-optimized networks like AlexNet are similar to, or the effective amount of data collected35. Thus, by improving data
diverge from, the features present in brain-optimized networks like quality, the methods introduced with the NSD are contributing to
GNet. In general, brain-optimized networks17 are a useful alterna- the massive scale of the dataset.
tive to task-optimized networks16,20, as the narrowly defined tasks The NSD dataset has many potential applications. Given its
that task-optimized networks are typically trained to solve do not extensive sampling of natural scenes (70,566 distinct images aggre-
necessarily respect the diversity of functions supported by the gated across eight participants) and high SNR, the dataset will be
human visual system49 nor necessarily match properties found in useful for investigating a variety of phenomena in low-, mid- and
biological visual systems50. high-level vision. In addition, the memory component of the NSD
experiment provides a unique opportunity to study the neural
Discussion mechanisms of both short-term and long-term memory (ranging
In the last several years, several large-scale neuroimaging datasets have from seconds to many months) as well as potential interactions
been made publicly available for re-use (for example, refs. 5,33,51–53). between vision and memory. From a methodological perspective,
Several distinguishing aspects of the present work sets the NSD the repeated scanning of individuals using a consistent experimen-
apart from past datasets. One is the unprecedented scale of the data- tal manipulation (up to 40 scan sessions of the NSD experiment
set. The NSD shares the motivation of recent ‘deep’ (or ‘precision’) per participant) provides a unique opportunity for development
neuroimaging efforts33,54–57 that are seeking to amass large amounts and evaluation of neuroimaging pipelines. Finally, perhaps the
of data from individual subjects, as opposed to modest amounts of most exciting use of the NSD is as a common dataset to bridge the
data from a large number of subjects. In this context of deep neuro- disciplines of cognitive science, neuroscience and artificial intelli-
imaging, the NSD is, to our knowledge, the most extensive fMRI data gence21. As we have shown in the context of deep neural network
collection effort that has been performed to date. This can be gauged modeling (Fig. 6), there are sufficient data in the NSD to success-
not only in terms of the number of hours of fMRI data acquisition fully drive the training of neural network models with thousands
per participant (30–40 h of data for each of eight participants on the of free parameters. This demonstration exemplifies how the NSD—
core NSD experiment) and the high spatial resolution of the acquired with its large amounts of carefully curated fMRI data collected dur-
data (1.8 mm) but also the wealth of additional measures beyond ing a rich cognitive paradigm—enables data-driven approaches
the core experiment, including substantial amounts of resting-state toward understanding the complexities of information processing
and diffusion data, physiological data and functional localizers. in the brain.
The availability of extensive measures provides the opportunity to
build complete models of how individual brains support vision and Online content
memory58. Of course, the emphasis on depth in individuals comes Any methods, additional references, Nature Research report-
at the cost of sampling fewer individuals; datasets emphasizing large ing summaries, source data, extended data, supplementary infor-
numbers of individuals, such as the Human Connectome Project5, mation, acknowledgements, peer review information; details of
are better suited for studying variability in the general popula- author contributions and competing interests; and statements of
tion and how psychological traits broadly relate to brain structure data and code availability are available at https://doi.org/10.1038/
and function. s41593-021-00962-x.
A second aspect is the unusually high quality of the data.
Although the quality of neuroimaging data is more complex to Received: 1 March 2021; Accepted: 12 October 2021;
assess than quantity, assessment of data quality is essential because Published online: 16 December 2021
MRI data have relatively low sensitivity and are prone to errors and
artifacts. In particular, when acquiring massive datasets, there is a References
risk of accumulating unknown sources of noise and artifact. The 1. de Vries, S. E. J. et al. A large-scale standardized physiological survey reveals
functional organization of the mouse visual cortex. Nat. Neurosci. 23,
work presented in this paper (and in the accompanying files in the
138–151 (2020).
data release) guards against this possibility by crafting a customized 2. Siegle, J. H. et al. Survey of spiking in the mouse visual system reveals
and highly optimized approach to pre-processing the NSD data and functional hierarchy. Nature 592, 86–92 (2021).
providing comprehensive documentation of the high data quality 3. Stringer, C., Pachitariu, M., Steinmetz, N., Carandini, M. & Harris, K. D.
(see also Supplementary Note 2). Several factors likely contributed High-dimensional geometry of population responses in visual cortex. Nature
571, 361–365 (2019).
to the high data quality. These include (1) the use of ultra-high mag- 4. Markram, H. et al. Reconstruction and simulation of neocortical
netic field strength (7T), which enhances BOLD contrast-to-noise microcircuitry. Cell 163, 456–492 (2015).
ratio; (2) the screening, training and incentivization of participants; 5. Van Essen, D. C. et al. The WU-Minn human connectome project: an
(3) the detailed inspection and supervision of data processing; and overview. Neuroimage 80, 62–79 (2013).
(4) the large network of collaborators who helped guide the design 6. Zheng, Z. et al. A complete electron microscopy volume of the brain of adult
Drosophila melanogaster. Cell 174, 730–743 (2018).
and trajectory of the dataset. 7. Van Essen, D. C. et al. Mapping visual cortex in monkeys and humans using
A third aspect of the present work lies in the novel analysis tech- surface-based atlases. Vis. Res. 41, 1359–1378 (2001).
niques developed for improved GLM analysis of fMRI time series 8. Grill-Spector, K. & Malach, R. The human visual cortex. Annu. Rev. Neurosci.
data. These include (1) an efficient and robust method to estimate 27, 649–677 (2004).
Nature Neuroscience | VOL 25 | January 2022 | 116–126 | www.nature.com/natureneuroscience 125
Resource NaTurE NEuroSciEncE
9. Wheeler, M. E., Petersen, S. E. & Buckner, R. L. Memory’s echo: vivid 35. Kay, K. N., Rokem, A., Winawer, J., Dougherty, R. F. & Wandell, B.
remembering reactivates sensory-specific cortex. Proc. Natl Acad. Sci. USA GLMdenoise: a fast, automated technique for denoising task-based fMRI data.
97, 11125–11129 (2000). Front. Neurosci. 7, 247 (2013).
10. Breedlove, J. L., St-Yves, G., Olman, C. A. & Naselaris, T. Generative feedback 36. Rokem, A. & Kay, K. Fractional ridge regression: a fast, interpretable
explains distinct brain activity codes for seen and mental images. Curr. Biol. reparameterization of ridge regression. Gigascience 9, giaa133 (2020).
30, 2211–2224 (2020). 37. Albrecht, D. G. & Hamilton, D. B. Striate cortex of monkey and cat: contrast
11. Kay, K. N., Weiner, K. S. & Grill-Spector, K. Attention reduces response function. J. Neurophysiol. 48, 217–237 (1982).
spatial uncertainty in human ventral temporal cortex. Curr. Biol. 25, 38. Wagner, A. D., Shannon, B. J., Kahn, I. & Buckner, R. L. Parietal lobe
595–600 (2015). contributions to episodic memory retrieval. Trends Cogn. Sci. 9,
12. Huth, A. G., Nishimoto, S., Vu, A. T. & Gallant, J. L. A continuous semantic 445–453 (2005).
space describes the representation of thousands of object and action 39. Spaniol, J. et al. Event-related fMRI studies of episodic encoding and
categories across the human brain. Neuron 76, 1210–1224 (2012). retrieval: meta-analyses using activation likelihood estimation.
13. Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. https:// Neuropsychologia 47, 1765–1779 (2009).
www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (University of 40. Gonzalez-Castillo, J. et al. Whole-brain, time-locked activation with simple
Toronto, 2009). tasks revealed using massive averaging and model-free analysis. Proc. Natl
14. Lin, T.-Y. et al. Microsoft COCO: Common Objects in Context. European Acad. Sci. USA 109, 5487–5492 (2012).
Conference on Computer Vision. https://link.springer.com/ 41. Maaten, Lvander & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn.
chapter/10.1007/978-3-319-10602-1_48, 740–755 (Springer, 2014). Res. 9, 2579–2605 (2008).
15. Güçlü, U. & van Gerven, M. A. J. Deep neural networks reveal a gradient in 42. Connolly, A. C. et al. The representation of biological classes in the human
the complexity of neural representations across the ventral stream. J. Neurosci. brain. J. Neurosci. 32, 2608–2618 (2012).
35, 10005–10014 (2015). 43. Naselaris, T., Stansbury, D. E. & Gallant, J. L. Cortical representation of
16. Khaligh-Razavi, S.-M. & Kriegeskorte, N. Deep supervised, but not animate and inanimate objects in complex natural scenes. J. Physiol. Paris
unsupervised, models may explain IT cortical representation. PLoS Comput. 106, 239–249 (2012).
Biol. 10, e1003915 (2014). 44. Long, B., Yu, C.-P. & Konkle, T. Mid-level visual features underlie the
17. Seeliger, K. et al. End-to-end neural system identification with neural high-level categorical organization of the ventral stream. Proc. Natl Acad. Sci.
information flow. PLoS Comput. Biol. 17, e1008558 (2021). USA 115, E9015–E9024 (2018).
18. Stansbury, D. E., Naselaris, T. & Gallant, J. L. Natural scene statistics account 45. Henriksson, L., Khaligh-Razavi, S.-M., Kay, K. & Kriegeskorte, N. Visual
for the representation of scene categories in human visual cortex. Neuron 79, representations are dominated by intrinsic fluctuations correlated between
1025–1034 (2013). areas. Neuroimage 114, 275–286 (2015).
19. St-Yves, G. & Naselaris, T. The feature-weighted receptive field: an 46. Naselaris, T., Kay, K. N., Nishimoto, S. & Gallant, J. L. Encoding and
interpretable encoding model for complex feature spaces. Neuroimage 180, decoding in fMRI. Neuroimage 56, 400–410 (2011).
188–202 (2018). 47. Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with
20. Yamins, D. L. K. et al. Performance-optimized hierarchical models predict deep convolutional neural networks. Advances in Neural Information
neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, Processing Systems 25 https://papers.nips.cc/paper/2012/hash/c399862d3b9d6
8619–8624 (2014). b76c8436e924a68c45b-Abstract.html, 1097–1105 (2012).
21. Naselaris, T. et al. Cognitive computational neuroscience: a new conference 48. Cadena, S. A. et al. Deep convolutional models improve predictions of
for an emerging discipline. Trends Cogn. Sci. 22, 365–367 (2018). macaque V1 responses to natural images. PLoS Comput. Biol. 15,
22. Chang, N. et al. BOLD5000, a public fMRI dataset while viewing 5000 visual e1006897 (2019).
images. Sci. Data 6, 49 (2019). 49. Wang, A., Tarr, M. & Wehbe, L. Neural Taskonomy: Inferring the Similarity
23. Horikawa, T. & Kamitani, Y. Generic decoding of seen and of Task-Derived Representations from Brain Activity. In Advances in Neural
imagined objects using hierarchical visual features. Nat. Commun. 8, Information Processing Systems 32 https://papers.nips.cc/paper/2019/hash/f490
15037 (2017). c742cd8318b8ee6dca10af2a163f-Abstract.html, 15475–15485 (2019).
24. Kay, K. N., Naselaris, T., Prenger, R. J. & Gallant, J. L. Identifying natural 50. Sinz, F. H., Pitkow, X., Reimer, J., Bethge, M. & Tolias, A. S. Engineering a
images from human brain activity. Nature 452, 352–355 (2008). less artificial intelligence. Neuron 103, 967–979 (2019).
25. Triantafyllou, C. et al. Comparison of physiological noise at 1.5 T, 3 T and 7 51. Aliko, S., Huang, J., Gheorghiu, F., Meliss, S. & Skipper, J. I. A naturalistic
T and optimization of fMRI acquisition parameters. Neuroimage 26, neuroimaging database for understanding the brain using ecological stimuli.
243–250 (2005). Sci. Data 7, 347 (2020).
26. Brady, T. F., Konkle, T., Alvarez, G. A. & Oliva, A. Visual long-term memory 52. Nastase, S. A., Liu, Y.-F., Hillman, H., Norman, K. A. & Hasson, U.
has a massive storage capacity for object details. Proc. Natl Acad. Sci. USA Leveraging shared connectivity to aggregate heterogeneous datasets into a
105, 14325–14329 (2008). common response space. Neuroimage 217, 116865 (2020).
27. Haxby, J. V., Guntupalli, J. S., Nastase, S. A. & Feilong, M. Hyperalignment: 53. Taylor, J. R. et al. The cambridge centre for ageing and neuroscience
modeling shared information encoded in idiosyncratic cortical topographies. (Cam-CAN) data repository: structural and functional MRI, MEG, and
eLife 9, e56601 (2020). cognitive data from a cross-sectional adult lifespan sample. Neuroimage 144,
28. Power, J. D., Lynch, C. J., Adeyemo, B. & Petersen, S. E. A critical, 262–269 (2017).
event-related appraisal of denoising in resting-state fMRI studies. Cereb. 54. Bellec, P. & Boyle, J. A. Bridging the gap between perception and action: the
Cortex 30, 5544–5559 (2020). case for neuroimaging, AI and video games. Preprint at https://psyarxiv.
29. Roth, Z. N., Ryoo, M. & Merriam, E. P. Task-related activity in human visual com/3epws (2019).
cortex. PLoS Biol. 18, e3000921 (2020). 55. Pinho, A. L. et al. Individual Brain Charting, a high-resolution fMRI dataset
30. Benson, N. C. et al. The human connectome project 7 Tesla retinotopy for cognitive mapping. Sci. Data 5, 180105 (2018).
dataset: description and population receptive field analysis. J. Vis. 18, 56. Poldrack, R. A. et al. Long-term neural and physiological phenotyping of a
23 (2018). single human. Nat. Commun. 6, 8885 (2015).
31. Stigliani, A., Weiner, K. S. & Grill-Spector, K. Temporal processing 57. Seeliger, K., Sommers, R. P., Güçlü, U., Bosch, S. E. & van Gerven, M. A. J. A
capacity in high-level visual cortex is domain specific. J. Neurosci. 35, large single-participant fMRI dataset for probing brain responses to
12412–12424 (2015). naturalistic stimuli in space and time. Preprint at https://www.biorxiv.org/
32. Kay, K. et al. A critical assessment of data quality and venous effects in content/10.1101/687681v1 (2019).
sub-millimeter fMRI. Neuroimage 189, 847–869 (2019). 58. Naselaris, T., Allen, E. & Kay, K. Extensive sampling for complete models of
33. Gordon, E. M. et al. Precision functional mapping of individual human individual brains. Curr. Opin. Behav. Sci. 40, 45–51 (2021).
brains. Neuron 95, 791–807 (2017).
34. Kang, X., Yund, E. W., Herron, T. J. & Woods, D. L. Improving the resolution Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
of functional brain imaging: analyzing functional data in anatomical space. published maps and institutional affiliations.
Magn. Reson. Imaging 25, 1070–1078 (2007). © The Author(s), under exclusive licence to Springer Nature America, Inc. 2021
126 Nature Neuroscience | VOL 25 | January 2022 | 116–126 | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
Methods 99 directions PA (posterior-to-anterior phase encode direction), 100 directions
Participant recruitment. The NSD study was advertised to the University of AP and 100 directions PA. Diffusion volumes were acquired at b values of 0, 1,500
Minnesota community. We sought to recruit right-handed individuals (18–65 or 3,000 s mm−2. We also acquired an angiogram using a time-of-flight multi-slab
years old) with no known cognitive deficits or color blindness and with normal 3D sequence (0.39 mm × 0.39 mm × 0.5 mm resolution, TR = 19.0 ms, TE = 2.91 ms,
or corrected-to-normal vision. Those who were interested in participating were flip angle 18°, bandwidth 186 Hz per pixel, phase partial Fourier 6/8, slice partial
contacted for a phone interview to explain the nature of the study and to screen Fourier 6/8, iPAT 2, TA = 5.5 min).
them for eligibility. We discussed the long-term nature of the study, the time At 7T, we collected functional data and associated fieldmaps and a few
commitment that it would involve and the feasibility of traveling to the scanner additional anatomical measures (venogram and high-resolution T2). Functional
on a regular basis. We paid attention to the communicativeness of potential data were collected using gradient-echo EPI at 1.8-mm isotropic resolution with
participants and their general attitude toward study participation. Selecting whole-brain (including cerebellum) coverage (84 axial slices, slice thickness
participants whom we were confident would provide high-quality data was more 1.8 mm, slice gap 0 mm, field-of-view 216 mm (FE) × 216 mm (PE), phase encode
important to us than obtaining a random sample of the general population. Based direction anterior-to-posterior, matrix size 120 × 120, TR = 1,600 ms, TE = 22.0 ms,
on the phone interviews, we invited 14 individuals whom we thought were strong flip angle 62°, echo spacing 0.66 ms, bandwidth 1,736 Hz per pixel, partial Fourier
candidates to participate in an initial 7T fMRI screening session. Of these, eight 7/8, iPAT 2, multi-band slice acceleration factor 3). The use of moderate spatial
were selected to participate in the full NSD experiment. resolution capitalizes on the SNR benefits provided by ultra-high magnetic field
strength. At the beginning of each 7T session, we acquired a short test EPI scan
Participants. Eight participants (two males and six females; age range, 19–32 and adjusted the gain factor (FFT scale factor) accordingly to ensure large
years) participated in the NSD dataset (subj01–subj08). There were six additional dynamic range while avoiding clipping. Empirical measurements indicate that the
participants (four males and two females; age range, 20–53 years) who participated acoustic noise caused by the EPI sequence is 112 dBA; assuming a conservative
in the initial 7T fMRI screening session but not in the remainder of data noise reduction estimate of 26 dB for the earplugs that we used, the resulting noise
collection. No statistical methods were used to pre-determine the sample size; level is 86 dBA, which can be safely endured for approximately 8–16 continuous
rather, our experimental approach58 emphasizes collecting extensive data from hours according to guidelines from the National Institute for Occupational
each participant, which enables the demonstration and replication of effects in Safety and Health (1998) and the Occupational Safety and Health
individual participants. Participants were naive to the design of the NSD dataset. Administration (2009).
All participants had normal or corrected-to-normal visual acuity. Informed written In addition to the EPI scans, the 7T sessions also included dual-echo fieldmaps
consent was obtained from all participants, and the experimental protocol was for post hoc correction of EPI spatial distortion (same overall slice slab as the EPI
approved by the University of Minnesota institutional review board. Participants data, 2.2 mm × 2.2 mm × 3.6 mm resolution, TR = 510 ms, TE1 = 8.16 ms,
were compensated at a rate of $30 per hour, plus performance bonuses. Additional TE2 = 9.18 ms, flip angle 40°, bandwidth 301 Hz per pixel, partial Fourier 6/8,
participant information, including height, weight, handedness and visual acuity, TA = 1.3 min per scan). Fieldmaps were periodically acquired over the course of
was logged and is available online. each scan session to track changes in the magnetic field (details provided below).
Individuals participated in several neuroimaging and behavioral data collection In one of the 7T sessions held for each participant, we acquired a venogram
sessions (a full breakdown is provided in Extended Data Fig. 2). Neuroimaging using a susceptibility-weighted imaging 3D sequence (0.5625 mm × 0.5625 mm ×
included 3T structural scan sessions and 7T functional scan sessions. The 7T 0.6 mm resolution, TR = 28 ms, TE = 21 ms, flip angle 17°, bandwidth 120 Hz per
functional scan sessions included an initial screening session termed ‘prffloc’, pixel, phase partial Fourier 6/8, slice partial Fourier 6/8, iPAT 3, TA = 10.1 min).
referring to the pRF and fLoc experiments conducted in that session. The 7T This venogram could be useful for investigating the effect of vasculature on fMRI
sessions also included, for each participant, 30–40 sessions in which the main signals32. In addition, for the purposes of hippocampal segmentation, we acquired
NSD experiment was conducted (‘nsd01–nsd40’). These sessions are collectively in one of the 7T sessions a high-resolution T2-weighted TSE scan (0.357 mm ×
termed the ‘NSD core’. In some of these sessions, resting-state data were acquired 0.357 mm × 1.5 mm resolution, 56 oblique slices oriented perpendicular to the
before and after the NSD experiment. Finally, the 7T sessions also included two long axis of the hippocampus, field-of-view 160 mm (FE) × 156.4 mm (PE), TR
sessions conducted after completion of the NSD core; these sessions, termed = 16,000 ms, TE = 53 ms, bandwidth 100 Hz per pixel, no partial Fourier, iPAT 2,
‘nsdsynthetic’ and ‘nsdimagery’, involved measuring responses to synthetic stimuli turbo factor 15, TA = 4.5 min).
and cognitive task manipulations (including mental imagery), respectively. The In the prffloc 7T fMRI session, the acquisition structure was [F BWLL F BWLL
total number of 7T fMRI scan sessions was 43, 43, 35, 33, 43, 35, 43 and 33 for F BWLL F], where F indicates a fieldmap, B indicates a multibar run of the pRF
subj01–subj08, respectively. The average number of hours of resting-state fMRI experiment (188 TRs), W indicates a wedgering run of the pRF experiment (188
conducted for each participant was 2.0 h, and the average number of hours of TRs) and L indicates a run of the fLoc experiment (195 TRs). In the NSD 7T fMRI
task-based fMRI conducted for each participant was 38.5 h. Each individual also sessions, the acquisition structure was either [F NNNN F NNNN F NNNN F] or
participated in several behavioral assessments after scanning was complete. These [F RNNNN F NNNN F NNNNR F], where F indicates a fieldmap, N indicates a run
included a variety of behavioral measures (‘nsdpostbehavior’), a final memory test of the NSD experiment (188 TRs) and R indicates a resting-state run (188 TRs).
(‘nsdmemory’) and an image-similarity assessment (‘nsdmeadows’).
Stimulus display and scanner peripherals. Ear plugs were used to reduce acoustic
MRI data acquisition. MRI data were collected at the Center for Magnetic noise experienced by the participants. To minimize head motion, we acquired a
Resonance Research at the University of Minnesota. Some data were collected headcase61 for each of the eight NSD participants (Caseforge, http://caseforge.co)
using a combination of a 3T Siemens Prisma scanner and a standard Siemens and deployed the headcases starting from the second NSD core scan session
32-channel RF head coil. Most data were collected using a combination of a 7T (nsd02). To ensure maximal participant comfort, only the posterior half of the
Siemens Magnetom passively shielded scanner and a single-channel-transmit, headcases was used (omitting the anterior half). Standard foam padding was used
32-channel-receive RF head coil (Nova Medical). Illustrations of the different types to mitigate head motion before that point (prffloc and nsd01).
of MRI data acquired are provided in Fig. 2b. Below, we summarize the scanning Stimuli were presented using a Cambridge Research Systems BOLDscreen
protocols (full protocol printouts are available online). 32 LCD monitor positioned at the head of the 7T scanner bed, placed flush against
At 3T, we collected several anatomical measures (T1, T2, diffusion and the scanner bore. We chose to use an LCD monitor because it delivers a sharp,
angiogram). The motivation for collecting data at 3T was to ensure acquisition high-quality image, in contrast to typical scanner setups involving projectors and
of T1 volumes with good gray-matter/white-matter contrast and homogeneity, backprojection screens. The monitor operated at a resolution of 1,920 pixels ×
which is difficult to achieve at ultra-high field59. To increase contrast-to-noise 1,080 pixels at 120 Hz. The size of the full monitor image was 69.84 cm (width) ×
ratio and enable the ability to assess reliability, we acquired several repetitions of 39.29 cm (height). Participants viewed the monitor via a mirror mounted on
T1-weighted and T2-weighted volumes. For each participant, we collected between the RF coil. The viewing distance was 5 cm from the participants’ eyes to the mirror +
six and ten scans of a whole-brain T1-weighted MPRAGE sequence (0.8-mm 171.5 cm from the mirror to the monitor image = 176.5 cm total. Measurements of
isotropic resolution, TR = 2,400 ms, TE = 2.22 ms, TI = 1,000 ms, flip angle 8°, the display spectral power density were obtained using a PR-655 spectroradiometer
bandwidth 220 Hz per pixel, no partial Fourier, in-plane acceleration factor (Photo Research). The BOLDscreen is designed by the manufacturer to behave as a
(iPAT) 2, TA = 6.6 min per scan) and 2–3 scans of a whole-brain T2-weighted linear display device, and our measurements confirmed this to be the case.
SPACE sequence (0.8-mm isotropic resolution, TR = 3,200 ms, TE = 563 ms, We determined the maximum square extent visible in both eyes given the
bandwidth 744 Hz per pixel, no partial Fourier, iPAT 2, TA = 6.0 min per scan). constraints of the RF coil to be 8.4° × 8.4° (714 pixels × 714 pixels). Thus, stimuli
In addition to T1 and T2 data, we also acquired four high-angular-resolution, from the various experiments (for example, pRF, fLoc and NSD) were adjusted
diffusion-weighted spin-echo EPI scans, using protocols from the Lifespan to fill 8.4° of visual angle (details provided below). At the beginning of each scan
Human Connectome Project Development effort60. These protocols involved session, we made an effort to position the monitor in the same location relative
varying the number of diffusion directions and the phase encode direction to the scanner and to position the participant’s head and RF coil in the same
(1.5-mm isotropic resolution, TR = 3,230 ms, TE = 89.20 ms, flip angle 78°, location relative to the scanner. We also used a calibration square (8.4° in size) to
refocusing flip angle 160°, bandwidth 1,700 Hz per pixel, echo spacing 0.69 ms, determine any incidental horizontal or vertical offsets needed in that session for
partial Fourier 6/8, no iPAT, multi-band slice acceleration factor 4, TA = 5.6 min the participant to see the entire square in each eye, unobstructed. Given these
per scan for 99 directions, TA = 5.7 min per scan for 100 directions). The four efforts, we think that consistent and high-quality visual stimulation was
scans included 99 directions AP (anterior-to-posterior phase encode direction), achieved across scan sessions. Nonetheless, we caution that, due to limitations in
Nature Neuroscience | www.nature.com/natureneuroscience
Resource NaTurE NEuroSciEncE
positioning and/or potential drift over the course of a scan session, some slight Image preparation. The NSD stimuli are prepared as a single brick of RGB images
occlusion of the corners of the 8.4° × 8.4° square extent might have occurred with dimensionality 425 pixels × 425 pixels × 3 RGB channels × 73,000 images and
some of the time. unsigned 8-bit integer format.
A Mac Pro computer controlled stimulus presentation using code based on Images were taken from Microsoft’s COCO image database14. COCO images
Psychophysics Toolbox 3.0.14 (refs.62,63). Behavioral responses were recorded are photographs collected from online repositories; each image is supplemented by
using a button box (Current Designs). In some scan sessions (nsd21–nsd30, a rich set of annotations (for example, boundary polygons around objects, natural
the same sessions in which the primary set of resting-state data were acquired), language captions and body pose estimates). Of the 90 original COCO categories,
physiological data were collected using a pulse oximeter and a respiratory belt a total of 80 COCO categories exist in the 73,000 NSD images. We used COCO
(stock Siemens equipment). Care was taken to secure the oximeter with tape to the images in the 2017 train/val split14 and restricted selection to the subset of images
left index finger of the participant and to secure the respiratory belt snugly to the for which pixel-level annotations of ‘stuff ’64 (for example, sky, land, wall and road)
participant’s torso. Physiological data were carefully synchronized with the fMRI in addition to ‘things’ (for example, car, skateboard and hat) were available.
data and cropped but are not further analyzed in this paper. We selected only images whose smaller dimension (height or width) was at
In several scan sessions (see Extended Data Fig. 2 for details), eye-tracking was least 425 pixels. Where necessary, we squared image dimensions by cropping
performed using an EyeLink 1000 system (SR Research) combined with a custom out pixels along the largest dimension. For example, if the original image was
infrared illuminator mounted on the RF coil. Eye-tracking was performed for 425 × 585, we cropped away 160 pixels from the larger dimension, resulting in an
the left eye, and eye-tracking data were obtained at 2,000 Hz using the Pupil-CR image that is 425 × 425. The median number of pixels cropped per image was 160.
centroid mode. We caution that the eye-tracking data are variable in quality, as After cropping, images were downsampled, if needed, to 425 × 425.
achieving sufficient pupil contrast was often difficult given the constraints of the Cropping an image can change the way the viewer interprets it. We refer to this
scanner setup. For information complementary to the eye-tracking data, we also effect of cropping as ‘semantic loss’. To be able to take full advantage of the rich
captured video recordings of the eye-tracker computer display (Fig. 2b) using a annotations available for the COCO images, we attempted to minimize semantic
cell phone secured to a mount. These video recordings are useful for checking the loss when cropping images. For landscape-oriented images, we selected among a
accuracy of the eye-tracker and provide information in scan sessions where pupil center, left or right crop. For portrait-oriented images, we selected among a center,
tracking and data acquisition failed completely. Details of pre-processing and top or bottom crop (finer grids of cropping options had little effect on results).
analysis of eye-tracking data are provided in Supplementary Note 3. Selection of crops was carefully performed based on quantitative analysis and
visual inspection (details provided in the NSD Data Manual).
Day-to-day acquisition procedures. Participants were scanned approximately In addition to screening to minimize semantic loss, we implemented a
once a week, with attempts to keep a regular weekly scan time. At the beginning screening procedure to remove duplicate images. Some of the COCO images are
of each session (starting at approximately nsd07), participants were asked to rate extremely similar to each other, differing only by a post-processing operation (that
on a five-point scale how well they slept the night before, their mood, how hungry is, grayscaling or sharpening) or by a few frames in a motion-capture sequence.
they were and their stress level. We also asked whether they had had caffeine in To remove these near-duplicates, we downsampled all images to 40 × 40 and then
the past 3 h. At the end of each scan session, participants were asked to rate how computed the correlation of grayscale pixel intensities between all image pairs.
comfortable they were during the session and to provide any general feedback they We manually inspected the image pairs with the 500 highest correlation values.
had about the session. These various measures, as well as any technical issues that Of these, 38 image pairs were observed to be near-duplicates. We randomly
arose during the session, were logged onto a spreadsheet (available online). selected another image from the COCO dataset to replace one image in each
In the first several scan sessions, we emphasized the importance of fixation and near-duplicate pair. Finally, we screened captions for all images for indications of
performed simple tests before scanning in which we watched the participant’s eyes violent or salacious content. No images were deemed too offensive to include in the
while they attempted to fixate and while they deliberately broke fixation. This was experiment.
done to help participants understand what good fixation feels like. In every scan The distribution of ‘thing’ categories across the final images selected for the
session, we reminded participants about the importance of fixation and about the NSD was nearly identical to distribution in the full COCO dataset. As a result,
correct mapping between buttons and responses. the ‘person’ category was over-represented; however, with a few exceptions, all 80
During data collection, we monitored aspects of data quality, including COCO object categories were displayed in at least 100 images to each participant.
overall image quality, head motion, quality of physiological data and behavioral Note that images tend to depict more than one category, so that a given object
performance. Between functional runs, we checked in with the participant to assess category frequently appeared in the same image with other categories. For each
their energy level, enthusiasm and compliance. If we noticed any substantial drops participant’s images, at least 90% of the images contained two or more of the 80
in response rate, we politely notified the participant and offered short breaks COCO categories.
before continuing.
To promote participant engagement and retention, participants were given Distribution of image presentations. We determined the ordering of the 10,000
the opportunity to earn monetary bonuses that gradually increased in size over images × 3 trials = 30,000 trials in advance and kept the ordering fixed across
the course of the NSD study. These bonuses were contingent on achieving certain participants. The idea is that these 10,000 images are actually treated as slots into
performance levels on data quality metrics, such as head motion and response which different NSD images are inserted. We designated the first 1,000 slots as
rate (details available online). Information regarding performance was supplied to corresponding to the special shared1000 images; the remaining 9,000 slots were
participants in the form of a continually updated ‘leaderboard’ figure. We found filled with unique images for each participant. Note that because the trial ordering
that this figure greatly helped to motivate participants. and repetition structure are identical across participants, the difficulty of the
recognition task is similar across participants (up to the fact that some images
The NSD experiment. Basic design. In the NSD experiment, participants might be more difficult to remember than others).
performed a long-term continuous recognition task while viewing a large number We controlled the distribution of image presentations to prevent the
of color natural scenes. We chose this recognition task because it engages and recognition task from becoming too difficult (and risking loss of participant
challenges the observer and is unbiased with respect to the specific content of the morale). In the procedure, we conceptualized the task of determining the trial
images (unlike other tasks such as animacy judgment). In addition, it infuses the ordering as equivalent to placing image presentations on a circle that would
experiment with a rich memory dimension that is likely of interest to memory eventually be cut and unraveled. The rationale for this circular design is to
researchers. In total, 73,000 distinct images were prepared. We intended that the minimize the extent to which certain points in the experiment differ from others;
eight NSD participants would each view 10,000 distinct images presented three of course, because the circle eventually becomes a line, there is some imperfection
times each over the course of 40 scan sessions. We designated a special set of 1,000 (see discussion below regarding ‘burn-in’ and ‘dead’ time). To determine
images (chosen randomly from the full set of prepared images) as shared images presentation times, we created a circular probability distribution by mixing a von
that would be seen by all participants (referred to as the ‘shared1000’); all other Mises distribution and a uniform distribution (Extended Data Fig. 1a). Using
images would be mutually exclusive across participants. The distribution of the random draws from the resulting distribution (positioning the distribution at a
three presentations of each image was tightly controlled, and participants were random location on the circle for each image), we determined three presentation
naive to both the number and distribution of the presentations. Note that, because times for each of the 10,000 images. After completing the placement of all
some NSD participants completed only 30 of the 40 prescribed scan sessions, there 30,000 trials, we then cut the circle, unraveled it into a linear sequence of
are ultimately 515 images, out of the shared 1,000 images, that were viewed all image presentations and divided this sequence into 40 consecutive segments
three times by all eight participants (referred to as the ‘shared515’). corresponding to the 40 NSD scan sessions (750 trials per session).
Images were presented using a 3-s ON/1-s OFF trial structure (Fig. 1a). In To determine presentation times, we created a circular probability distribution
informal piloting, we found that this pacing made the recognition task feasible by mixing a von Mises distribution and a uniform distribution (Extended Data
and not overly taxing. In addition, we reasoned that the relatively long stimulus Fig. 1a). For each image, we positioned the peak of the von Mises distribution at a
duration would increase neural activity and that the rapidity of the design would random position on the circle (that is, we randomly sampled the mean parameter
allow more trials to be collected and, thereby, increase overall experimental power. from −180° to 180°) and then randomly sampled presentation times for each
Finally, we speculated that the 3/1 trial structure would yield a pleasant experience of the three image repetitions from the mixture distribution. We chose specific
for participants, at least compared to slow event-related designs where most parameters for the probability distribution: we used a von Mises distribution with
experimental time is spent viewing a blank screen. a concentration parameter of 729 and a mixing ratio of 60% and 40% for the von
Nature Neuroscience | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
Mises and uniform distributions, respectively. This choice of parameters yields to collect a sizable amount of data: 40, 40, 32, 30, 40, 32, 40 and 30 NSD sessions
appealing properties. First, the distribution is relatively narrow (Extended Data for subj01–subj08, respectively. In these collected data, each participant viewed
Fig. 1a) and, therefore, ensures that there will be many trials involving an image 9,209–10,000 distinct images and participated in 22,500–30,000 trials. Aggregated
that has been presented in the recent past (thus making the trials easy) while still across participants, the total number of distinct images shown was 70,566, and the
allowing the probing of more distant memory events. Second, there is minimal total number of trials was 213,000.
‘burn-in’ time at the beginning of the experiment: even in the first scan session,
there is still a substantial number of trials involving old images (Extended Data Debriefing. After completion of the final memory test (details below), participants
Fig. 1b, blue line). Third, there is minimal ‘dead’ time at the end of the experiment: filled out a post-NSD questionnaire. This questionnaire probed topics such
even in the last scan session, there is still a substantial number of trials involving as strategies used for performing the NSD task and estimates for the number
new images (Extended Data Fig. 1b, blue line). of images viewed and the number of image repetitions. After filling out this
To provide a sense of the overall experimental design, we computed basic questionnaire, the design of the NSD experiment was then revealed to the
statistics on each NSD scan session. For a typical session, the total number of participants.
distinct images shown once, twice and all three times within that session is 437,
106 and 34, respectively (these numbers reflect the mean across scan sessions, Other experiments. pRF experiment. We adapted the experiment used in the
rounding to the nearest integer). Human Connectome Project 7T Retinotopy Dataset30. Stimuli consisted of slowly
moving apertures filled with a dynamic colorful texture (Fig. 2a). Apertures and
Trial and run design. Each trial lasted 4 s and consisted of the presentation of an textures were updated at a rate of 15 Hz. Two run types were used. The first, termed
image for 3 s, followed by a 1-s gap. In total, 75 trials were conducted in a run; ‘multibar’, involves bars sweeping in multiple directions (same as RETBAR in
thus, each run lasted 300 s. The first three trials (12 s) and the last four trials (16 s) the Human Connectome Project 7T Retinotopy Dataset). The second, termed
were blank trials. The remaining 68 trials were divided into 63 stimulus trials and ‘wedgering’, involves a combination of rotating wedges and expanding and
five blank trials. The blank trials were randomly positioned in each run such that contracting rings. Both run types included blank periods.
the minimum and maximum continuous number of stimulus trials was nine trials For consistency with the NSD experiment, stimuli were resized to fill a circular
(36 s) and 14 trials (56 s), respectively (Fig. 1b). For even-numbered runs, the region with diameter 8.4°. Each run lasted 300 s (exact empirical timings were
63rd stimulus trial was designated to be a blank trial. In total, 12 NSD runs were highly accurate and ranged between 299.95 s and 300.00 s). Throughout stimulus
collected in one NSD session, yielding a total of (63 + 62) × 6 = 750 stimulus trials. presentation, a small semi-transparent dot (0.2° × 0.2°) was present at the center of
Moreover, this design was repeated for all 40 NSD sessions: 750 stimulus trials × the stimuli. The color of the central dot switched randomly to one of three colors
40 sessions = 30,000 stimulus trials. The temporal ordering of stimulus and blank (black, white or red) every 1–5 s. Participants were instructed to maintain fixation
trials was generated once and kept fixed across participants. on the dot and to press a button whenever the color of the dot changed. To further
Note that the experimental design involves minimal trial jittering: for the aid fixation, a semi-transparent fixation grid was superimposed on the stimuli and
most part, the time interval separating consecutive stimulus images is fixed at was present throughout the experiment65. A total of six runs (three multibar and
1 s, although occasionally, due to blank trials, the time interval is 5 s. This design three wedgering) were collected in the first 7T fMRI session (prffloc).
was intended to maximize statistical power and differs from conventional fMRI
practice where intervals are often chosen randomly from a fixed range. fLoc experiment. This experiment was developed by the Grill-Spector laboratory31
(stimuli and presentation code available at http://vpnl.stanford.edu/fLoc/). The
Stimulus presentation and task. Because the BOLDscreen is calibrated to behave as experiment consisted of the presentation of grayscale images depicting different
a linear display device, we used a squaring luminance response when presenting stimulus categories (Fig. 2a). There were ten categories, grouped into five stimulus
the NSD experiment to simulate the typical viewing of digital images. At the time domains: characters (word and number), bodies (body and limb), faces (adult
of presentation, the prepared NSD images were resized using linear interpolation and child), places (corridor and house) and objects (car and instrument). Stimuli
from their native resolution of 425 pixels × 425 pixels to 714 pixels × 714 pixels to were presented on a scrambled background (different backgrounds for different
occupy 8.4° × 8.4° on the display. Throughout each run (including blank trials), stimuli). Stimuli were presented in 4-s trials. In a trial, eight images from a given
a small semi-transparent red fixation dot with a black border (0.2° × 0.2°, 50% category were sequentially presented (image duration, 0.5 s). Each run included
opacity) was present at the center of the stimuli (Fig. 1a). Stimuli were shown six presentations of each of the ten categories as well as blank trials (also of 4-s
against a gray background with an RGB value of 127, 127 and 127. duration).
Participants were instructed to fixate the central dot and to press button 1 using For consistency with the NSD experiment, stimuli were resized to fill a square
the index finger of their right hand if the presented image was new—that is, if the region filling 8.4° × 8.4° of visual extent. Each run lasted 300 s (exact empirical
image had never been presented before—or button 2 using the middle finger of timings were highly accurate and ranged between 300.000 s and 300.002 s).
their right hand if the presented image was old—that is, the image was identical Throughout stimulus presentation, a small red fixation dot was present at the
to one that had been presented before, either in the current scan session or any center of the stimuli. Participants were instructed to maintain fixation on the
previous scan session. Participants were additionally instructed to continue to dot and to press a button whenever they noticed an image in which only the
fixate and wait for the next image in the event of blank trials. background was present (‘oddball’ task). In total, six runs were collected in the first
Before the start of the NSD experiment, we showed the participants a version 7T fMRI session (prffloc).
of the experiment involving cartoon images, for them to become familiarized with
the feel and timing of the task. During the NSD experiment, minimal feedback Resting-state experiment. Stimuli consisted of a white fixation cross (0.5° × 0.5°)
was provided to the participants regarding their performance on the recognition on a gray background (Fig. 2a). Each resting-state run lasted 300 s. In the second
task. Participants were blinded to the precise details of the NSD experiment (for resting-state run held within a given scan session, the fixation cross turned red
example, total number of images and total number of presentations per image). after 12 s had elapsed and remained red for 4 s before returning to white.
Participants were informed only about their response rate (fraction of trials on Resting-state data were acquired in several NSD core scan sessions: nsd21–
which they successfully made a response) and a vague ‘performance metric’, which, nsd38 for subj01 and subj05 and nsd21–nsd30 for all other participants. Thus, a
unbeknownst to them, quantified their percent correct for easy trials (trials that total of 100 min or 180 min of resting-state data were acquired for each participant.
involved the presentation of an image that had occurred earlier in the same scan In each session, one resting-state run was acquired at the beginning of the session
session). We revealed the nature of the design in a debriefing session after the (before the NSD runs), and another resting-state run was acquired at the end of the
completion of the NSD experiment (details below). session (after the NSD runs).
In the first resting-state run, participants were instructed to stay awake and
Details on experiment timing. Stimulus presentation was locked to the refresh rate fixate the cross but otherwise rest. In the second resting-state run, participants
of the BOLDscreen monitor. Empirical measurements confirmed that the monitor were additionally instructed to inhale deeply when the fixation cross turned
refresh rate was nearly exactly 120 Hz: duration of runs was highly reliable, ranging red. This instructed breath was designed to aid analysis of the physiological data
from 299.95 s to 299.98 s. To compensate for the slight offset from 300 s, the fMRI collected concomitantly with the resting-state data. Before each resting-state run,
data were pre-processed to achieve a sampling rate of 0.999878 s (high-resolution participants were asked to report their current sleepiness level using the Stanford
preparation) or 0.999878 s × (4/3) = 1.333171 s (standard-resolution preparation). Sleepiness Scale66 (1–7, where 1 is most active and 7 is most sleepy). After each
For brevity, we refer to these numbers as 1.000 s and 1.333 s. Experimental runs resting-state run, participants were asked to report their sleepiness level during the
were started by a trigger issued by the MR scanner. Due to input polling and run that had just completed.
monitor refresh, there was slight variability in the delay between trigger detection After the last scan session involving resting-state data, participants filled out a
and the presentation of the first stimulus frame, ranging from 3 ms to 22 ms. We post-resting-state questionnaire. This questionnaire queried what the participants
did not attempt to compensate for this delay. were doing during the resting-state runs and whether they thought about the
images from the NSD experiment.
Acquisition. Due to constraints on participant availability (including unplanned
out-of-town absences in the summer of 2019) and due to constraints on scanner Synthetic stimuli experiment (nsdsynthetic). After completion of the NSD
availability (the 7T scanner was decommissioned in November 2019), we did not experiment, we conducted an additional 7T fMRI scan session in which
complete the full NSD experiment for every participant. Fortunately, we were able responses were measured to a variety of carefully controlled synthetic
Nature Neuroscience | www.nature.com/natureneuroscience
Resource NaTurE NEuroSciEncE
(non-naturalistic) stimuli while the participant performed either a fixation Pre-processing of MRI data. Details of the pre-processing of anatomical,
task or a one-back task. These data will be described and released in a functional and diffusion data are provided in Supplementary Notes 4 and 5.
forthcoming manuscript. Functional data were pre-processed using one temporal resampling to correct
for slice time differences and one spatial resampling to correct for head motion
Visual imagery experiment (nsdimagery). After completion of the nsdsynthetic within and across scan sessions, EPI distortion and gradient non-linearities. Two
experiment, we conducted an additional 7T fMRI scan session in which responses versions of the functional data were prepared: a 1.8-mm standard-resolution
were measured while participants engaged in visual imagery and other cognitive preparation (temporal-resolution, 1.333 s) and an upsampled 1.0-mm
tasks. These data will be described and released in a forthcoming manuscript. high-resolution preparation (temporal-resolution, 1.000 s). Analyses of the pRF
and fLoc experiments were used to define retinotopic and category-selective
Additional behavioral measures (nsdpostbehavior, nsdmemory and nsdmeadows). ROIs, respectively. Other ROIs were also defined, including an ‘nsdgeneral’ ROI
Several behavioral assessments were conducted after completion of the NSD indicating occipital regions generally responsive in the NSD experiment and a
experiment. Some of these were relatively brief and included the following ‘corticalsulc’ ROI collection indicating major cortical sulci and gyri. Annotations
(nsdpostbehavior): open-ended questions regarding language ability; the Vividness for several of the corticalsulc ROIs are shown in Figs. 3f and 4b.
of Visual Imagery Questionnaire67; the Test of Word Reading Efficiency68, including
both Sight Word Efficiency and Phonemic Decoding Efficiency; the Cambridge Data quality metrics. Several data quality metrics were calculated (export_
Memory Test for Faces69; ultra-fast measurement of contrast sensitivity70; and an runmetrics.m) and summarized in Figs. 1d and 2d. tSNR was computed from
assessment of chromatic sensitivity (participants adjusted intensities of red, green raw fMRI volumes (no pre-processing) by first de-trending the time series data
and blue channels on the BOLDscreen display until minimal luminance flicker from each voxel (quadratic polynomial fit) and then dividing the mean signal
was perceived). intensity by the standard deviation of signal intensity values (autoqc_fmri.m).
We also conducted a final memory test in which we collected various We calculated the median tSNR across voxels within a simple brain mask (mean
memory-related measures regarding the images shown to the participants during volume thresholded at 1/10th of the 99th percentile of values) and then computed
the NSD experiment (nsdmemory). These data will be described and released in a the median across runs. Head motion was quantified by calculating frame-wise
forthcoming manuscript. displacement77 based on motion parameter estimates (1.8-mm preparation). We
Finally, using the web-based Meadows platform (http://meadows-research. calculated the mean frame-wise displacement across volumes in a run and then
com), we conducted an assessment of how the NSD participants perceive computed the median across runs. BOLD response was quantified by calculating
and interpret the NSD images (nsdmeadows). First, we selected a small set of the percentage of variance explained by a simple ON–OFF GLM model (1.8-mm
images that maximally span semantic space. This was done by isolating the preparation). We calculated the median variance explained across voxels within
shared515 images; computing shifted inverse frequency sentence embeddings the nsdgeneral ROI and then computed the median across runs. (Additional
for the sentence captions provided by the COCO dataset71; and using a greedy details on the ON–OFF GLM can be found in the ‘GLMsingle algorithm’ section.)
approach to determine the subset of 100 images that maximize the average Response rate was quantified by calculating the percentage of trials for which the
distance between each image’s embedding and its closest neighbor. We then participant pressed a button and then computing the mean across runs. Behavioral
asked participants to perform a Multiple Arrangements Task72 in which they performance was quantified by dividing trials into easy trials (trials for which the
arrange using a drag-and-drop interface the 100 images within a white circular presented image had been previously presented in the same scan session), hard
arena according to the similarity of their content. Using an adaptive procedure, trials (trials for which the presented image had been previously presented but
subsequent arrangements were conducted using subsets of the images to maximize in a previous scan session) and novel trials (trials for which the presented image
information gain. This was done until 45 min had elapsed. Using a similar interface had never been previously presented) and then calculating, for each trial type, the
on Meadows, participants then provided valence and arousal ratings for the 100 percentage of trials on which the participant indicated an ‘old’ response.
images as well as three additional images pulled from the shared515 images. To identify EPI signal dropout regions (export_signaldropout.m), we divided
Ratings were performed separately for valence and arousal and were accomplished the T2 volume (resampled to match the EPI data) by the mean EPI volume (1-mm
by freely arranging, using a drag-and-drop interface, the images (delivered in small preparation). The resulting volume is useful as it indicates which voxels have
batches) along a one-dimensional axis ranging from low to high. This assessment high signal intensity in the T2 but are corrupted by signal dropout in the EPI.
took about 15 min. We mapped the volume to the cortical surface (cubic interpolation; mean across
depth), transformed the result to fsaverage and then used a data-driven threshold
Overview of data analysis. We designed custom analysis strategies to maximize to mark atypically high values. Vertices marked in at least four of the eight
the quality of derived measures from the NSD data. Several methods are based on participants are indicated in Fig. 3f. To visualize surface imperfections, we took
recent work32,73 where further details can be found. Data analysis and visualization the voxels that were marked in the 0.8-mm anatomical space (during the manual
were performed using custom code in MATLAB and Python as well as tools from inspection of FreeSurfer surface imperfections), smoothed this binary volume
various packages, such as FreeSurfer, SPM, FSL, ANTs74 and ITK-SNAP75. An with a 3D Gaussian with full width at half maximum of 2 mm, mapped the result
archive of code used is provided online (https://github.com/cvnlab/nsddatapaper/), to the cortical surface (cubic interpolation; maximum across depth) and then
and specific code files are referenced in the text below. transformed the result to fsaverage. Vertices exceeding 0.01 in at least one of the
A comprehensive schematic outlining the data analysis performed in this eight participants are indicated in Fig. 3f.
paper is provided in Extended Data Fig. 3. The analysis of the NSD data can be
divided into three components: (1) pre-processing of the anatomical, diffusion Rankings from the 7T fMRI screening session. Six quality measures (pRF BOLD,
and functional data; (2) time series analysis of the fMRI data to estimate trial-wise fLoc BOLD, pRF behavior, fLoc behavior, raw motion and de-trended motion)
betas; and (3) further analyses of the trial-wise betas to answer specific scientific were computed for each of the 14 individuals who participated in the screening
questions. The first two components produce the so-called ‘prepared data’ that session. BOLD quality was quantified as the percentage of voxels for which
are generally useful to the community, whereas the third component refers to variance explained by modeling the fMRI time series data (either pRF model fitting
analyses performed for the purposes of this paper (estimation of pRFs from the or GLM model fitting) exceeded 20%. Behavior quality was quantified as described
NSD data, univariate memory analysis, representational similarity analysis and above. Motion was quantified by calculating the median voxel displacement
brain-optimized neural network training). Data collection and analysis were not relative to the reference volume used for motion correction, computing the median
performed blinded to the conditions of the experiments. No data were excluded of this quantity across volumes and then computing the mean across runs. This
from analyses, with the exception of a few T1 volumes (2 of 52 volumes = 4%) and motion quantification was performed using raw motion parameter estimates
certain portions of the eye-tracking data that were corrupted by noise (11 of 160 (thereby providing a measure of global head displacement over the course of the
eye-tracking runs = 7%). session) as well as using motion parameter estimates that are linearly de-trended
The pre-processing approach that we designed for the NSD dataset within each run (thereby providing a measure of within-run head instability). Each
prioritizes accuracy and preservation of information (for example, avoiding of the six measures was linearly scaled to span the range 1–5, where 1 corresponds
spatial smoothing). We avoid ‘baking in’ unnecessary assumptions (for example, to the worst performance and 5 corresponds to the best performance observed
aggressively removing signal fluctuations without careful assessment of validity), across participants. Finally, the normalized measures were averaged to produce an
and we avoid assuming the accuracy of automated methods; care is taken to overall ranking for each participant, as depicted in Fig. 2c.
manually inspect each pre-processing step to ensure satisfactory results. Although
we think our pre-processing is general and likely suitable for most downstream Analysis of behavioral data from the NSD experiment. The behavioral data from
uses of the data, the raw data are also available for those who want to explore other the NSD experiment were lightly reformatted for the convenience of subsequent
pre-processing approaches, such as fmriprep76. We note several aspects of the NSD analyses (analyzebehavior_nsd.m). We first checked whether the participant had
dataset that might render the dataset challenging from a pre-processing standpoint: accidentally positioned their fingers on incorrect buttons on the button box and
the relatively high spatial resolution of the fMRI data (1.8 mm) places higher compensated for this if necessary. (In a few instances, we deliberately instructed
demands on spatial accuracy; the ultra-high field strength (7T) used for the fMRI participants to use alternative buttons due to hardware malfunction of the button
data yields higher levels of EPI spatial distortion compared to lower field strengths; box.) We then recorded, for each stimulus trial, several quantities, including time
and the emphasis on many repeated scans of individuals heightens the importance of image presentation, whether the image presented was new or old, whether the
of achieving consistent imaging results across scan sessions. response was correct or incorrect and the reaction time. Button responses were
Nature Neuroscience | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
extracted from a time window extending 250–4,250 ms after image onset. In the the hyperparameter (for example, number of nuisance regressors and shrinkage
case of multiple buttons pressed during a trial, we scored the final button pressed, fraction). For each value, we assess how well the resulting beta estimates generalize
excluding any redundant presses of that button (participants sometimes repeated to left-out runs. For example, in leave-one-run-out cross-validation, one run is
button presses for good measure). held out as the validation run; stimuli that occur in both the training runs and
the validation run are identified; and squared errors between the regularized beta
GLM analysis of the NSD experiment. Overview of approach. We performed a estimates from the training runs and the unregularized beta estimates from the
GLM analysis of the pre-processed time series data from the NSD experiment. To validation run are calculated. This procedure is iterated with each run serving as
maximize flexibility for subsequent analyses, the GLM approach was designed to the validation run, and errors are summed across iterations.
provide estimates of BOLD response amplitudes (‘betas’) for single trials. Due to
low SNR, single-trial estimation in fMRI is challenging. We, therefore, developed GLMsingle algorithm. Having described the essential aspects of the estimation
several analysis components to optimize the quality of single-trial betas. These framework above, we now turn to the steps in the GLMsingle algorithm.
components are packaged into a tool called GLMsingle, which is the subject of a GLMsingle involves fitting several different GLM variants. Each variant includes
forthcoming manuscript where additional details and discussion can be found. polynomial regressors to characterize the baseline signal level: for each run, we
The first analysis component of GLMsingle is the use of a library of HRFs, include polynomials of degrees 0 through round (L/2), where L is the duration in
whereby the best-fitting HRF from the library is chosen for each voxel. This simple minutes of the run.
approach for compensating for differences in hemodynamic time courses across
1. Fit a simple ON–OFF GLM. In this model, stimulus trials are treated as
voxels78 has several appealing features: it is efficient and can be executed with little
instances of a single experimental condition, and a canonical HRF is used
computational cost (and, hence, can accommodate the massive scale of the NSD);
(getcanonicalhrf.m). Thus, there is a single ‘ON–OFF’ predictor that attempts
and it invariably provides well-regularized HRF estimates. The second analysis
to capture signals driven by the experiment. The utility of this simple model
component is an adaptation of GLMdenoise to a single-trial GLM framework.
is to provide variance explained (R2) values that help indicate which voxels
GLMdenoise35 is a technique in which data-derived nuisance regressors are
carry experiment-driven signals.
identified and used to remove noise from—and, therefore, improve the accuracy
2. Fit a baseline single-trial GLM. In this model, each stimulus trial is modeled
of—beta estimates. The third component is an application of ridge regression79 as
separately using the canonical HRF. This model provides a useful baseline for
a method for dampening the noise inflation caused by correlated single-trial GLM
comparison.
predictors. To determine the optimal level of regularization for each voxel, we
3. Identify HRF for each voxel. We fit the data multiple times with a single-trial
make use of a recently developed efficient re-parameterization of ridge regression
GLM, each time using a different HRF from the library of HRFs. For each
called ‘fractional ridge regression’36.
voxel, we identify which HRF provides the best fit to the data (highest vari-
ance explained) and inherit the single-trial betas associated with that HRF.
Derivation of the library of HRFs. To generate a library of HRFs that accurately
Note that the final model for each voxel involves a single chosen HRF from
capture empirically occurring time course variation, we performed an initial
the library (not a weighted sum of HRFs).
analysis of data from the first NSD core session (nsd01). This library was fixed and
4. Use GLMdenoise to determine nuisance regressors to include in the model.
used for the analysis of all subsequent NSD sessions. The first step was to create a
We define a pool of noise voxels (brain voxels that have low ON–OFF R2)
comprehensive summary of observed time courses (hrf_derivecanonicalpcs.m).
and then perform principal component (PC) analysis on the time series data
The time series data from each participant’s nsd01 session was fit using a finite
associated with these voxels. The top PCs are added one at a time to the GLM
impulse response model (0–30 s) where all of the stimulus trials are treated as
until cross-validation performance is maximized on average across voxels.
instances of a single experimental condition (this simplification is necessary to
5. Use fractional ridge regression to regularize single-trial betas. With the nui-
make estimation feasible). We identified voxels for which model variance explained
sance regressors determined, we use fractional ridge regression (fracridge36)
(R2) was greater than 10%, and, from these voxels, we randomly drew 20,000
to estimate the single-trial betas, systematically evaluating different shrinkage
voxels (with replacement). Pooling across participants, time course estimates from
fractions. For each voxel, in the context of a GLM that incorporates the spe-
the resulting 160,000 voxels were subjected to singular value decomposition to
cific HRF chosen for that voxel, cross-validation is used to select an optimal
determine the top three PCs (shown in Fig. 3b, inset). To fine-tune time course
shrinkage fraction for that voxel. To mitigate bias on the overall scale of betas,
estimates, we re-fit the time series data from the nsd01 session using these three
we apply a post hoc scaling and offset on betas obtained for a given voxel to
PCs as the basis (as opposed to the finite impulse response basis). Finally, adopting
match, in a least squares sense, the unregularized betas obtained for
the visualization approach of the Temporal Decomposition Method73, we projected
that voxel.
voxel time course estimates onto the unit sphere (using the same voxel selection
criterion of R2 > 10%) and constructed a 2D histogram for each participant (shown
in Fig. 3a). Application of GLMsingle to the NSD data. We used GLMsingle to analyze the
The second step was to define a set of time courses that span the observed time series data independently for each NSD scan session (glm_nsd.m). Major
time course variation (hrf_constructmanifold.m). To do this, we converted the algorithmic parameters included the following: we evaluated up to ten nuisance
2D histograms to units of relative frequency and then averaged the histograms regressors; we evaluated shrinkage fractions from 0.05 to 0.90 in increments of 0.05
across participants. Inspecting the group average histogram (shown in Fig. 3b), and from 0.91 to 1 in increments of 0.01 (representing a finer grain for voxels with
we manually clicked a sequence of points on the unit sphere that follow the data the best SNR); we performed six-fold cross-validation (consecutive pairs of runs)
density as closely as possible. We then parameterized the path traced by these for Steps 4 and 5; and we used an ON–OFF R2 threshold of 5% in Step 4.
points (a simple one-dimensional manifold) by positioning regularly spaced Three different versions of the single-trial betas were computed and saved. The
points where successive points are separated by six angular degrees (Fig. 3b, cyan first beta version (b1, ‘betas_assumehrf ’) is the result of Step 2 and reflects the use
dots). The time courses corresponding to the resulting set of 20 points were cubic of a canonical HRF. The second beta version (b2, ‘betas_fithrf ’) is the result of Step
interpolated to a sampling rate of 0.1 s and normalized to peak at 1 (Fig. 3c). 3 and reflects the result of voxel-wise HRF estimation. The third beta version (b3,
Finally, we fit each time course using a double-gamma function as implemented ‘betas_fithrf_GLMdenoise_RR’) is the result of Step 5 and reflects the additional
in SPM’s spm_hrf.m (hrf_fitspmhrftomanifold.m). This yielded a library of 20 GLMdenoise and ridge regression procedures. Betas were converted to units of
canonical HRFs that might be useful for application to other experimental datasets percent BOLD signal change by dividing amplitudes by the mean signal intensity
(getcanonicalhrflibrary.m). We note that variation in time course shape is likely observed at each voxel and multiplying by 100. Although we provide betas in units
due to the influence of macrovasculature on BOLD temporal dynamics73. of percent signal change, we suggest that users might want to z-score the responses
of each voxel within each scan session to eliminate potential non-stationarities and
Cross-validation framework for single-trial GLM. The GLMdenoise and to equalize units across voxels.
ridge regression analysis components of GLMsingle both require tuning of For user convenience, we created preparations of the single-trial betas in
hyperparameters. To determine the optimal setting of hyperparameters, we use additional spaces other than the native 1.8-mm and 1.0-mm functional spaces. For
a cross-validation approach in which out-of-sample predictions are made for the ‘nativesurface’ preparation, we performed cubic interpolation of the 1.0-mm
single-trial beta estimates, as opposed to time series data. This simplifies and betas onto each of the three cortical surface depths and averaged across depths
reduces the computational requirements of the cross-validation procedure. Note (analysis_transformfsaverage.m). The result was then mapped using nearest
that, because of cross-validation, although GLMsingle produces estimates of neighbor interpolation to fsaverage space to create the ‘fsaverage’ preparation. For
responses to single trials, it does require the existence of and information regarding the ‘MNI’ preparation, we mapped the 1.0-mm betas to MNI space using cubic
repeated trials—that is, trials for which the stimulus is the same. interpolation (analysis_transformMNI.m).
The first step of the cross-validation procedure is to analyze all of the available
data using no regularization. In the case of GLMdenoise, this amounts to the GLM analysis of the resting-state experiment. As an optional resource, we fit
inclusion of zero nuisance regressors; in the case of ridge regression, this amounts the time series data from the resting-state experiment using methods that parallel
to the use of a shrinkage fraction of 1, indicating ordinary least squares regression. those used for the NSD experiment (glm_nsdresting.m). For each scan session
In both cases, the analysis produces a full set of unregularized single-trial betas (for involving resting-state, we took the two resting-state runs (first and last run
example, in one NSD session, there are 750 single-trial betas distributed across 12 acquired) and analyzed the data using the design matrix of the neighboring NSD
runs). The second step of the procedure is to perform a grid search over values of runs and the same voxel-wise HRFs determined from analyzing the NSD runs in
Nature Neuroscience | www.nature.com/natureneuroscience
Resource NaTurE NEuroSciEncE
that scan session (this is analogous to beta version b2). Although there is no reason also computed split-half noise ceiling estimates. This was achieved by splitting the
to think that spontaneous resting-state activity conforms to the 4-s trial structure available images into two mutually exclusive groups and computing noise ceiling
of the NSD experiment, these resting-state betas might be useful as a direct estimates independently for each group. The noise ceiling results shown in Fig. 3f,g
comparison for the NSD betas. and Supplementary Fig. 6 were computed assuming n = 3, reflecting the scenario
in which trial-wise betas are averaged across three trials for each image. The noise
Noise ceiling estimation. To obtain a measure of data quality, noise ceilings were ceiling results shown in Fig. 6a,b were computed assuming n = 1 and are expressed
estimated for the NSD betas (export_noiseceiling.m). The noise ceiling for a given in correlation units (square root of percent variance explained).
voxel is defined as the maximum percentage of variance in the voxel’s responses We include a few important notes as follows. Even though the NSD consists
that can, in theory, be explained, given the presence of measurement noise. Our of only up to three trials for a given image, the estimate of response variability
method for estimating the noise ceiling follows the general framework laid out in for each voxel (that is, the noise standard deviation) is averaged across a very
previous studies80,81. Several assumptions are made: (1) the signal contained in the large number of images, thus stabilizing the noise ceiling estimate. Also, note that
voxel’s response is determined solely by the presented image; (2) the variability of our noise ceiling metric refers to activity levels in individual voxels in individual
the signal across different images is Gaussian distributed; (3) the noise is Gaussian participants. It is thus quite different from, for example, noise ceiling metrics
distributed with zero mean; and (4) the response to an image is equal to the signal computed for group average representational dissimilarity matrices83. The latter are
plus noise. Given these assumptions, any observed response is a sample from a sum more abstracted away from the data given that they summarize properties observed
of Gaussian distributions: across a collection of voxels, reflect second-order computations on activity levels
( ) and not activity levels themselves and probe responses at the group level and not at
RESP ∼ N μsignal , σ signal + N (0, σ noise ) the individual level.
where RESP indicates the NSD beta observed on a given trial, μsignal is the mean Calculation of equivalent trials. To provide a common basis for comparing
signal across different images, σsignal is the standard deviation of the signal across different datasets, we define the number of equivalent trials present in a dataset
different images and σnoise is the standard deviation of the noise (for illustration of as N × ncsnr2, where N indicates the number of trials conducted and ncsnr is the
these concepts, see Extended Data Fig. 8c). Note that the first Gaussian distribution noise ceiling SNR (as defined above). The assumptions here are that (1) every trial
characterizes true signal variability, whereas the second Gaussian characterizes has equal value, irrespective of whether it is used to measure brain responses to
variability due to noise. Also, note that this framework treats response variability an image that has already been shown or a new image (for example, two trials for
unrelated to the stimulus as ‘noise’, but such variability might, in fact, reflect ‘signal’ one image is equivalent to one trial for two distinct images); and (2) increases in
from the perspective of functional connectivity82. SNR are equivalent to the collection of additional trials. For an illustrative example
To compute the noise ceiling, we first take the trial-wise NSD betas for each of the second assumption, suppose an experimenter chooses to improve SNR by
voxel and z-score these betas within each scan session. This simple normalization averaging the response to a given image across p repetitions of that image. This
compensates for non-stationarities that might exist across sessions. We then effectively reduces the noise standard deviation by a factor of √p, and ncsnr will
calculate the variance of the betas across the three presentations of each image thus increase by a factor of √p. Alternatively, the experimenter could choose to
(using the unbiased estimator that normalizes by n–1 where n is the sample size), not average and instead use the p trials as is. In the former case, the number of
average this variance across images and then compute the square root of the result. equivalent trials is 1 × (√p × ncsnr)2 = p × ncsnr2, whereas, in the latter case, the
This produces an estimate of the noise standard deviation: number of equivalent trials is p × ncsnr2. Thus, the two cases correspond to the
same number of equivalent trials.
We conducted an auxiliary analysis that directly compares the NSD against the
√
σ̂ noise = mean β 2σ
( )
BOLD5000 dataset22. The goal of this analysis was to calculate a summary ncsnr
value for each dataset, so that the number of equivalent trials can be calculated.
where β 2σ indicates the variance across the betas obtained for a given image. Next, For fair comparison, both NSD and BOLD5000 were analyzed using the same
given that the variance of the z-scored betas is 1, we estimate the signal standard GLM methods described in this paper (beta version b3). We then defined a
deviation as follows: common brain region on which data quality can be compared. This was done by
transforming the nsdgeneral ROI to MNI space and then mapping the resulting
σ̂ signal = 1 − σ̂ 2
noise + MNI mask to each participant in the two datasets. Finally, we computed the
median ncsnr observed across voxels in the mask in each participant.
where ||+ indicates positive half-wave rectification. Finally, we simplify by The median ncsnr, averaged across participants, was 0.260 for the NSD
calculating a single scalar quantity: (averaged across the first four NSD participants) and 0.187 for BOLD5000
(averaged across the four participants in BOLD5000). This indicates that, despite
σ̂ signal the longer time duration allocated per trial in BOLD5000 (10 s) compared to
ncsnr =
σ̂ noise the NSD (4 s), the quality of a single-trial beta in the NSD is higher than that in
BOLD5000. Specifically, one NSD trial is approximately equivalent to (0.260)2/
where ncsnr indicates the noise ceiling SNR. (0.187)2 = 1.93 BOLD5000 trials. This increase in quality is likely due, in part, to
Given the framework described above, the noise ceiling can be calculated as the the screening of participants and the ultra-high magnetic field strength (7T) used
amount of variance contributed by the signal expressed as a percentage of the total in the NSD. Note that the ncsnr metric quantifies the SNR per trial and is expected
amount of variance in the data: to be unbiased with respect to the number of repeated trials used to calculate it.
Thus, although the exact number of trials per image is different in the NSD and
σ 2signal BOLD5000 datasets, the ncsnr values can still be directly compared.
NC = 100 ×
σ 2signal + σ 2noise
Univariate analysis of memory recognition. For this analysis (results shown in
where NC indicates the noise ceiling. We would like to be able to calculate the noise Fig. 4b), we used version 3 of the NSD betas (b3) in the fsaverage preparation.
ceiling based on the single scalar ncsnr. Moreover, because a researcher might want Betas for each surface vertex were kept in percent signal change units. Using the
to average across multiple presentations of each image before attempting to explain behavioral responses, we identified trials involving hits (participants responded
the NSD betas, we would like a method for flexibly expressing the noise ceiling for ‘old’ to a previously presented image) and trials involving correct rejections
different levels of trial averaging. With some algebra, it can be shown that the noise (participants responded ‘new’ to a novel image). Then, for each participant, we
ceiling can be expressed as follows: calculated two-sample t-values at each surface vertex. This was done both for trials
pooled within individual NSD scan sessions as well as for trials pooled across all
ncsnr2 sessions.
NC = 100 × 1
ncsnr2 + n
Representational similarity analysis. For this analysis (results shown in Fig. 5),
where n indicates the number of trials that are averaged together (see the NSD we used version 3 of the NSD betas (b3) in the fsaverage preparation. Betas for
Data Manual for the derivation and additional details). We note that there is a each surface vertex were z-scored within each scan session, concatenated across
direct relationship between the commonly used metric of split-half reliability and sessions and averaged across repeated trials for each distinct image. To support the
the noise ceiling: if a voxel has two sets of responses that reflect the same image representational similarity analysis84, we defined a set of ROIs (V1, V2, V3, pVTC
presentations, then the correlation between the two sets of responses multiplied and aVTC) on the fsaverage surface. This was done by mapping the manually
by 100 is equal to the noise ceiling for single-trial responses expressed in percent defined V1, V2 and V3 from each participant to fsaverage, averaging across
variance explained. participant and using the result to guide the definition of group-level ROIs. We
Using the above methods, we calculated noise ceilings for each of the beta also defined a posterior and anterior division of ventral temporal cortex (pVTC
versions and for each of various spatial preparations (1.8-mm, 1-mm, fsaverage and and aVTC, respectively) based on anatomical criteria. For each participant, we
nativesurface). For simplicity, noise ceiling estimates were calculated using betas extracted betas for vertices within each ROI (concatenating across hemispheres).
associated with images with all three presentations available. To assess stability, we We then computed Pearson’s correlation between beta patterns across all possible
Nature Neuroscience | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
pairs of images. This yielded RDMs with rows and columns indexing 71. Arora, S., Liang, Y. & Ma, T. A simple but tough-to-beat baseline for sentence
distinct images (for example, the RDMs for participant 1 have embeddings. https://openreview.net/pdf?id=SyK00v5xx (2017).
dimensionality 10,000 × 10,000 with correlations corresponding to 49,995,000 72. Kriegeskorte, N. & Mur, M. Inverse MDS: inferring dissimilarity structure
possible pairs). from multiple item arrangements. Front. Psychol. 3, 245 (2012).
To help visualize and interpret these large dissimilarity matrices, we performed 73. Kay, K., Jamison, K. W., Zhang, R.-Y. & Uğurbil, K. A temporal
t-SNE embedding41,85 using a perplexity level of 100 (Fig. 5b,c). This projects decomposition method for identifying venous effects in task-based fMRI.
the high-dimensional representations onto a 2D plane such that the distance of Nat. Methods 17, 1033–1039 (2020).
a given pair on the plane reflects that pair’s distance in the high-dimensional 74. Avants, B. B. et al. A reproducible evaluation of ANTs similarity metric
representation as accurately as possible. To verify the strong categorical structure performance in brain image registration. Neuroimage 54, 2033–2044 (2011).
visible in pVTC and aVTC (Fig. 5b), we quantified the similarity of the brain 75. Yushkevich, P. A. et al. User-guided 3D active contour segmentation of
RDMs to a model RDM constructed from the category labels in the COCO anatomical structures: significantly improved efficiency and reliability.
dataset. Specifically, we constructed an RDM from a binary matrix indicating the Neuroimage 31, 1116–1128 (2006).
presence or absence of each of the 80 COCO categories (cosine distance metric) 76. Esteban, O. et al. fMRIPrep: a robust preprocessing pipeline for functional
and correlated this model RDM with each brain RDM. This process was performed MRI. Nat. Methods 16, 111–116 (2019).
for mutually exclusive groups of 100 images drawn from all images presented three 77. Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L. & Petersen, S. E.
times to a given participant (the number of groups was 100, 100, 62, 54, 100, 62, Spurious but systematic correlations in functional connectivity MRI networks
100 and 54 for the eight participants, respectively). We calculated the mean arise from subject motion. Neuroimage 59, 2142–2154 (2012).
and standard error across results obtained for different groups of images 78. Handwerker, D. A., Gonzalez-Castillo, J., D’Esposito, M. & Bandettini, P. A.
(Fig. 5d). Finally, we investigated similarity of brain representations across ROIs The continuing challenge of understanding and modeling hemodynamic
and participants. This was done by isolating the shared515 images, constructing variation in fMRI. Neuroimage 62, 1017–1023 (2012).
brain RDMs for these images and correlating brain RDMs across ROIs and 79. Hoerl, A. E. & Kennard, R. W. Ridge regression: Biased estimation for
participants. The resulting second-order RDM is shown in Fig. 5e, with further nonorthogonal problems. Technometrics 12, 55–67 (1970).
quantification of this matrix shown in Fig. 5f. 80. Kay, K. N., Winawer, J., Mezer, A. & Wandell, B. Compressive
spatial summation in human visual cortex. J. Neurophysiol. 110,
Reporting Summary. Further information on research design is available in the 481–494 (2013).
Nature Research Reporting Summary linked to this article. 81. Lage-Castellanos, A., Valente, G., Formisano, E. & De Martino, F. Methods
for computing the maximum performance of computational models of fMRI
Data availability responses. PLoS Comput. Biol. 15, e1006397 (2019).
The NSD dataset is freely available at http://naturalscenesdataset.org. The data 82. Biswal, B., Yetkin, F. Z., Haughton, V. M. & Hyde, J. S. Functional
are hosted in the cloud, allowing researchers to exploit high-performance cloud connectivity in the motor cortex of resting human brain using echo-planar
computing to efficiently analyze the dataset. We provide both raw data in BIDS MRI. Magn. Reson. Med. 34, 537–541 (1995).
format86 and prepared data files, along with extensive technical documentation 83. Nili, H. et al. A toolbox for representational similarity analysis. PLoS Comput.
in the NSD Data Manual. To ensure strict validation for an upcoming Algonauts Biol. 10, e1003553 (2014).
prediction challenge87, the initial public release will withhold the last three NSD 84. Kriegeskorte, N., Mur, M. & Bandettini, P. Representational similarity
scan sessions from each participant (approximately 8.4% of the NSD data). Images analysis—connecting the branches of systems neuroscience. Front. Syst.
used for the NSD were taken from the Common Objects in Context database14 Neurosci. 2, 4 (2008).
(https://cocodataset.org). 85. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn.
Res. 12, 2825–2830 (2011).
86. Gorgolewski, K. J. et al. The brain imaging data structure, a format for
Code availability organizing and describing outputs of neuroimaging experiments. Sci. Data 3,
We provide an archive of code used in this study (https://github.com/cvnlab/ 1–9 (2016).
nsddatapaper/) as well as utility functions for working with the prepared NSD data 87. Cichy, R. M., Roig, G. & Oliva, A. The Algonauts Project. Nat. Mach. Intell. 1,
(https://github.com/cvnlab/nsdcode/). Custom algorithms developed for this study 613 (2019).
include GLMsingle (https://github.com/cvnlab/GLMsingle/) and fracridge (https://
github.com/nrdg/fracridge/). Example scripts demonstrating scientific analyses of
the NSD data are available (https://github.com/cvnlab/nsdexamples/); these scripts Acknowledgements
We thank the NSD participants for their time and endurance; E. Aminoff, J. Pyles, M.
might be useful for teaching purposes.
Tarr, M. Hebart and C. Baker for advice on experimental design and data collection; J.
Power and A. Schapiro for consultation on resting-state and physiological data; V. Carr
References and R. Olsen for consultation on hippocampal subfield scanning protocols; A. Grant
59. Polimeni, J. R., Renvall, V., Zaretskaya, N. & Fischl, B. Analysis strategies for for assistance with scanner peripherals; F. Gosselin and J. Tardif for contrast sensitivity
high-resolution UHF-fMRI data. Neuroimage 168, 296–320 (2018). analysis; B. Klimes-Dougan and K. Cullen for designing the valence/arousal assessment;
60. Harms, M. P. et al. Extending the Human Connectome Project across ages: W. Guo for segmentations of the medial temporal lobe; M. Arcaro, A. Bratch, D. Finzi,
imaging protocols for the Lifespan Development and Aging projects. A. White and J. Winawer for assistance with ROI definition; C. Gorgolewski and R.
Neuroimage 183, 972–984 (2018). Poldrack for discussion of BIDS and data sharing; R. Cichy, E. Yacoub, K. Grill-Spector,
61. Power, J. D. et al. Customized head molds reduce motion during resting state K. Jamison, A. Rokem, A. Huth, S. Anzellotti, N. Kriegeskorte and J. Winawer for
fMRI scans. Neuroimage 189, 141–149 (2019). general discussions; and K. Ugurbil for overall project advice. We also thank our NSD
62. Brainard, D. H. The Psychophysics Toolbox. Spat. Vis. 10, 433–436 (1997). collaborators for shaping the trajectory of the project. This work was supported by NSF
63. Pelli, D. G. The VideoToolbox software for visual psychophysics: transforming CRCNS grants IIS-1822683 (K.K.) and IIS-1822929 (T.N.); NIH grants P41 EB015894,
numbers into movies. Spat. Vis. 10, 437–442 (1997). P30 NS076408, S10 RR026783 and S10 OD017974-01, the W. M. Keck Foundation and
64. Caesar, H., Uijlings, J. & Ferrari, V. COCO-Stuff: Thing and Stuff classes the NIMH Intramural Research Program ZIAMH002909 (M.N.); and NSF BCS-1734853,
in context. In IEEE/CVF Conf. Computer Vision and Pattern Recognition NIH NIBIB R01EB030896, NIH NIBIB R01EB029272 and NIH IIS-1912270 (F.P.).
https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00132
1209–1218 (2018). Author contributions
65. Schira, M. M., Tyler, C. W., Breakspear, M. & Spehar, B. The foveal E.J.A. collected the neuroimaging data, coordinated the data collection effort and
confluence in human visual cortex. J. Neurosci. 29, 9050–9058 (2009). performed manual brain segmentations. G.S.-Y. performed neural network analyses.
66. Shahid, A., Wilkinson, K., Marcu, S. & Shapiro, C. M. Stanford Sleepiness Y.W. performed participant recruitment, assisted with scanning and prepared
Scale (SSS). In: STOP, THAT and One Hundred Other Sleep Scales (eds. eye-tracking videos. J.L.B. assisted in data analysis. J.S.P. performed the equivalent
Shahid, A., Wilkinson, K., Marcu, S. & Shapiro, C. M.) 369–370 (Springer, trials analysis on the NSD and BOLD5000. L.T.D. organized and prepared data in BIDS
2012). format. M.N. analyzed the eye-tracking data. B.C. and F.P. analyzed the diffusion data.
67. Marks, D. F. Visual imagery differences in the recall of pictures. Br. J. Psychol. I.C. performed representational similarity analyses. J.B.H. analyzed the behavioral data.
64, 17–24 (1973). K.K. and T.N. conceived of the project and designed the main experiment. J.B.H. and
68. Torgesen, J. K., Wagner, R. & Rashotte, C. TOWRE-2: Test of Word Reading I.C. designed the nsdmeadows and nsdmemory behavioral assessments. K.K. developed
Efficiency (Pearson, 2012). analysis methods, analyzed the neuroimaging data and directed the overall project. K.K.,
69. Duchaine, B. & Nakayama, K. The Cambridge Face Memory Test: results for T.N., E.J.A., M.N., B.C., F.P., I.C. and J.B.H. wrote the paper. All authors discussed and
neurologically intact individuals and an investigation of its validity using edited the manuscript.
inverted face stimuli and prosopagnosic participants. Neuropsychologia 44,
576–585 (2006).
70. Tardif, J., Watson, M., Giaschi, D. & Gosselin, F. Measuring the contrast Competing interests
sensitivity function in just three clicks. J. Vis. 16, 966–966 (2016). The authors declare no competing financial interests.
Nature Neuroscience | www.nature.com/natureneuroscience
Resource NaTurE NEuroSciEncE
Additional information Correspondence and requests for materials should be addressed to Kendrick Kay.
Extended data is available for this paper at https://doi.org/10.1038/s41593-021-00962-x. Peer review information Nature Neuroscience thanks Evan Gordon, Andrew Zalesky, and
Supplementary information The online version contains supplementary material the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
available at https://doi.org/10.1038/s41593-021-00962-x. Reprints and permissions information is available at www.nature.com/reprints.
Nature Neuroscience | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
Extended Data Fig. 1 | Design of the NSD experiment. a, Image presentations. Each of 10,000 distinct images was placed 3 times on a circle according
to a probability distribution created by mixing a relatively narrow von Mises distribution and a uniform distribution. The resulting image sequence was
divided into 40 equally-sized segments for the 40 NSD scan sessions. b, Basic statistics of image repetitions. We define novel trial as a trial involving an
image never shown before, old trial as a trial that is not a novel trial, and easy trial as an old trial for which the presented image had been shown previously
in the same scan session.
Nature Neuroscience | www.nature.com/natureneuroscience
Resource NaTurE NEuroSciEncE
Extended Data Fig. 2 | Overview of data collection. This table summarizes the overall NSD data collection effort. Structural and diffusion MRI data were
collected at 3T. Functional MRI data were collected at 7T. The breakdown of the 7T fMRI scan sessions is indicated: for example, subject 2 participated in
1 (prffloc) + 40 (nsd01–nsd40) + 1 (nsdsynthetic) + 1 (nsdimagery) = 43 7T fMRI scan sessions. Additional behavioral data were acquired outside of the
scanner (nsdpostbehavior, nsdmemory, nsdmeadows). Note that scan sessions were occasionally split across multiple magnet entries (see aquamarine
and yellow cells). For simplicity, we treat these cases as if they represent single scan sessions.
Nature Neuroscience | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
Extended Data Fig. 3 | Overview of data analysis. Analyses conducted in this paper can be divided into three parts. Part 1 consists of pre-processing,
in which raw functional, anatomical, diffusion, and eyetracking data are transformed into various useful intermediate outcomes. In addition, coordinate
transformations between various spaces are estimated and incorporated into the nsd_mapdata utility. Part 2 consists of analyses of the pre-processed
fMRI data. The GLMsingle algorithm introduced in this paper is used to analyze the fMRI data from the NSD experiment (Part 2a), and standard methods
are used to analyze the fMRI data from the pRF and fLoc experiments (Part 2b). Part 3 consists of specific scientific analyses demonstrated in this paper
that make use of the data prepared in Parts 1 and 2. Given the extensive data preparation procedures (Parts 1–2), it is useful to comment on which aspects
are fairly typical in MRI processing and which are more customized or unique to the present work. With respect to the pre-processing steps in Part 1,
the general outcomes that these steps achieve are typical in MRI and are necessary for basic interpretation of the data. For example, small shifts in head
position over the course of a scan session necessitate some motion compensation in order to interpret the signal from a given voxel in terms of a single
brain location. The specific methods by which we execute these pre-processing steps may differ from what is performed in commonly used software
packages (for example, SPM, FSL, AFNI). However, the outcomes are similar at a conceptual level: for example, the fMRI data are pre-processed using
temporal interpolation of voxel-wise time-series data and spatial interpolation of brain volumes. With respect to the additional preparation procedures
in Part 2, the procedures in Part 2b are fairly typical analyses used to functionally localize brain regions. More customized and unique to the present
work are the procedures in Part 2a, which are designed to improve the accuracy of single-trial fMRI amplitude estimates. We provide evidence that these
procedures do in fact perform as intended (see Fig. 3 and Extended Data Fig. 8).
Nature Neuroscience | www.nature.com/natureneuroscience
Resource NaTurE NEuroSciEncE
Extended Data Fig. 4 | Eyetracking results. a, Pre-processing of eyetracking data. Blinks and tracking noise were removed, followed by linear detrending,
median-centering, downsampling, and smoothing. Runs with less than 1/3 valid samples after these cleaning procedures were excluded from further
analysis (see Supplementary Note 5). Shown are results for an example run (subject 1, nsd31 scan session, run 6). Pre-processing reduced noise without
obscuring potential eye movements. b, Fraction of time during which deviation from central fixation was less than a specific threshold. Results are
shown for a range of thresholds (left) and for a threshold of 1° (right). c, 2D histograms of gaze positions. The main images show histogram results on a
linear scale; the inset images show results on a log scale. To summarize the results, we overlay a gray ellipse marking the central 90% of a multivariate
2D Gaussian distribution that has been fit to the gaze positions, as well as a blue circle containing 90% of the gaze positions. Both the parametric and
non-parametric approaches yield similar results and indicate that gaze positions of all subjects clustered around central fixation. The level of precision
varied across subjects. The number of usable eyetracking runs for each subject is indicated by the white text. d, Example of accurate fixation behavior
(subject 1, nsd31 scan session, run 8). Shown are pre-processed vertical gaze coordinates (top left), normalized pupil area (bottom left), and a 2D scatter
plot of gaze positions (right). e, Example of eye movements (subject 5, nsd29 scan session, run 11). Same format as d. Notice that eye movements
manifest as staircase structure in the vertical gaze coordinates and as dispersed gaze positions in the scatter plot. f, Trial-wise time-resolved analysis.
Relative to stimulus trial onsets, we plot the across-trial median deviation from central fixation (top), as well as the across-trial median pupil size after
mean-centering the pupil size within each trial (bottom). Results for subjects 3 and 8 are not available for this analysis. Overall, the results show that
subjects were able to maintain fixation most of the time: gaze positions were within 1° of central fixation 68–97% of the time (see b). Three subjects
are worth further discussion. Subject 4 exhibited eye movements after stimulus onset (see f, top); however, this is of minor concern given that these
movements were small. Subject 5 exhibited more substantial eye movements (see c, e, and f); we suggest exclusion of this subject from analyses of
the NSD fMRI data that are contingent on strict central fixation. Finally, while our results indicate fixation instability for subject 8 (see b and c), careful
inspection of the eyetracking video recordings (available online) suggests this reflects pupil tracking noise rather than actual eye movements made by
the subject.
Nature Neuroscience | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
Extended Data Fig. 5 | Improvements in spatial detail through upsampling. a, Comparison of approaches. For an example coronal slice in Subject 1, we
compare the non-upsampled 1.8-mm preparation of the data (left), the upsampled 1-mm preparation of the data (right), and a version of the 1.8-mm
results that has been post-hoc upsampled to 1-mm resolution to enable direct comparison (middle). Two quantities are shown: mean signal intensity and
variance explained by an ON-OFF GLM model. b, Zoomed view of white rectangle marked in a. c, Profile view of blue dotted horizontal line marked in
b. Error bars in the bottom plot indicate ± 1 SEM across 40 scan sessions (error bars are small and nearly invisible). d, Timecourse estimates for voxels
marked by orange arrowheads at the bottom of c. Each colored trace corresponds to an estimate of the hemodynamic timecourse for a single voxel in one
NSD scan session from the upsampled 1-mm data preparation. The beginning of the timecourses (first vertical line) corresponds to the onset of the 3-s
image presentation. The results shown in this figure support the idea that the upsampled data preparation preserves fine-scale spatial detail that is lost
(blurred away) under a non-upsampled data preparation. While the effects are small, preserving as much detail as possible may be critical for certain
neuroscientific questions.
Nature Neuroscience | www.nature.com/natureneuroscience
Resource NaTurE NEuroSciEncE
Extended Data Fig. 6 | Reliable diffusion derivatives facilitate investigation of white-matter connectivity. a, Fractional anisotropy (FA). The left shows
tractography and FA results for the optic radiation identified in subject 7. The right shows reliability of FA results for 61 white-matter tracts identified using
the atlas from Bullock et al.114 For other measures, see Supplementary Fig. 5c–e. b, Structural connectivity. Using 43 visual areas × 2 hemispheres = 86
regions from the HCP-MMP1 atlas109 (left), we construct group-average connectivity matrices indicating the density of fibers connecting pairs of regions
(right). c, Quantitative summary. Each dot represents fiber density between a pair of regions (as in b). Dot colors reflect different region pairs but are
otherwise arbitrary. Group-average results (main figure) and results for an individual subject (inset) are shown.
Nature Neuroscience | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
Extended Data Fig. 7 | Regions of interest (ROIs) provided with NSD. A variety of ROIs were defined based on auxiliary fMRI experiments (pRF, fLoc).
In a–c, we show example results for subject 3, right hemisphere. a, Early visual areas. Results are shown on FreeSurfer’s sphere surface as well as in the
0.8-mm anatomical volume space. b, Eccentricity-based regions. Similar format to a. Note that the total stimulus extent is 8.4° × 8.4° in the pRF, fLoc,
and NSD experiments. c, Face-selective regions. Regions were defined based on t-values computed for the contrast of faces against all other categories.
Results are shown on FreeSurfer’s inflated surface as well as in the 0.8-mm anatomical space. d, Probabilistic maps of ROI locations. For each of three
example ROIs, we map the location of the ROI in each subject to fsaverage and then compute, for each fsaverage vertex, the fraction of subjects labeled at
that vertex. Notice there is reasonable consistency across subjects in fsaverage space.
Nature Neuroscience | www.nature.com/natureneuroscience
Resource NaTurE NEuroSciEncE
Extended Data Fig. 8 | See next page for caption.
Nature Neuroscience | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
Extended Data Fig. 8 | Detailed visualization of NSD betas. We prepared three beta versions (b1, b2, b3) reflecting GLM analyses of increasing
sophistication. a, Inspection of NSD betas. The full set of estimated single-trial responses (1.8-mm preparation, beta version b1) is shown for voxels in
subject 1 right hemisphere region of interest (ROI) FFA-1 (fusiform face area subdivision 1). We observe horizontal stripes, indicative of gross variation in
percent BOLD signal change across voxels. b, Zoomed view of one scan session. Shown are all three beta versions, as well as the result of z-scoring betas
within each scan session (in general, we suggest that users may wish to z-score each voxel’s responses within each scan session in order to eliminate
potential non-stationarities and to equalize units across voxels). The different beta versions generally resemble one another (left column), implying that
the variations in GLM methods do not drastically change the data. Vertical stripes visible in the visualizations tend to decrease from b1 to b2, suggesting
that fitting voxel-wise HRFs reduces artifacts. Vertical stripes also tend to decrease from b2 to b3, which might reflect the reduction of correlated noise
achieved by GLMdenoise. c, Detailed inspection of one voxel. To assess the reliability of evoked responses, we group trials according to the image
presented. The estimated signal standard deviation (σsignal) and noise standard deviation (σnoise) are illustrated at the right of each subplot. Notice that
b2 and b3 reduce variability of betas across the 3 trials associated with each image. d, Response reliability. Here we plot single-trial responses observed
in two example ROIs (1.8-mm preparation, beta version b2, right hemisphere FFA-1 and PPA (parahippocampal place area), response averaged across
voxels in each ROI), showing the first 50 of the shared515 images. The left column shows responses for different trials in subject 1; the right column shows
trial-averaged responses in different subjects. Lines connecting consecutive images are used to aid visualization but do not indicate specific temporal
relationships between images. Thick black lines indicate the mean across trials (left) or subjects (right). Notice that reliability is reasonably high both
within and across subjects. e, Quantitative summary. To summarize results shown in d, we plot the correlation between responses to the shared515 images
across all trials and all subjects. Thin white horizontal and vertical lines separate different subjects (each having 3 trials). Notice there is high reliability
within each ROI, and responses are highly dissimilar across ROIs. The strong off-diagonal elements (white arrows) indicate the presence of spatial noise
correlations that occur on individual trials, which is typical in fMRI45. Noise correlations likely reflect a combination of measurement noise (for example,
head motion) and real neural activity variability (for example, arousal effects). In some cases, correlations are larger across subjects than within subjects;
one explanation is that there is, to some degree, a common ROI representation and a noisy measurement of this representation obtained in one subject
might actually be better correlated with a less noisy measurement of this representation obtained in a different subject. Also, the results indicate the
existence of temporal ordering effects (for example, trial 1 in a given subject tends to be more correlated with trial 1 in other subjects as opposed to trials 2
or 3). This likely indicates the presence of adaptation- and/or memory-related effects in the NSD data, given that the temporal ordering of trials was fixed
across subjects.
Nature Neuroscience | www.nature.com/natureneuroscience
Resource NaTurE NEuroSciEncE
Extended Data Fig. 9 | Angle and eccentricity estimates from the NSD data. Here we show results from the analysis of the pRF experiment and results
from an analogous analysis performed on trial-averaged NSD betas (see Supplementary Modeling Note 1 for details). Each panel shows an occipital
view of FreeSurfer’s sphere surface, and white lines indicate borders of visual areas V1–hV4 (defined based on results of the pRF experiment). Angle and
eccentricity estimates are plotted using the same colormaps as in Benson et al.30 We also plot the amount of time-series variance explained in the pRF
data (variance relative to the mean signal level) and the amount of variance explained in the NSD betas (variance relative to 0% BOLD signal change).
Clear retinotopic maps in early visual cortex are visible in the NSD results, including robust angle estimates even in foveal regions. In addition, there is
high consistency of retinotopic estimates across the pRF and NSD datasets. There is some discrepancy in absolute eccentricity estimates at peripheral
locations; this is likely due to technical differences in how modeling procedures behave for voxels near the stimulus edge.
Nature Neuroscience | www.nature.com/natureneuroscience
NaTurE NEuroSciEncE Resource
Extended Data Fig. 10 | Design of AlexNet- and GNet-based encoding models. a, Illustration of an encoding model that predicts brain activity in a given
voxel (rtv) in response to images (xt). Images are passed to nonlinear feature extractors, ηl (trapezoids), that output feature maps (grey cuboids). Feature
maps are grouped, passed through an element-wise nonlinearity, f(·), and then multiplied pixel-wise by a spatial pooling field (g1,…,gN where superscripts
index distinct groups of feature maps) that determines the region of visual space that drives voxel activity. The weighted pixel values in each feature
map are then summed, reducing each feature map to a scalar value. These scalar values are concatenated across all feature maps, forming a single
feature vector that is passed through another element-wise nonlinearity (left black rectangle) and then weighted by a set of feature weights, w (right
black rectangle), to yield predicted voxel activity. Note that for each type of encoding model (for example, AlexNet-based encoding model, GNet-based
encoding model), the feature extractors are identical for all voxels, but the spatial pooling fields and feature weights are optimized and may vary across
voxels. For the AlexNet-based encoding model, the feature extractors were pre-specified, the spatial pooling fields were optimized via line search, and the
feature weights w were optimized via ridge regression. For the GNet-based encoding model, stochastic gradient descent with early stopping was used to
optimize the parameters of the feature extractors ηl, the spatial pooling fields g1,…,gN, and the feature weights w. b, Illustration of spatial pooling fields.
For the AlexNet model, a single isotropic 2D Gaussian pooling field (middle) selected from a set of candidates (right) was applied to all feature maps.
For the GNet model, an independent, flexible pooling field (left) was applied to each group of feature maps. Applying flexible pooling fields to AlexNet
leads to lower prediction accuracy overall, so we present the version that uses isotropic 2D Gaussian fields. c, Comparative architecture of AlexNet and
GNet. AlexNet and GNet are both deep convolutional neural networks, but differ in the types and sequencing of layers (rows of the table). The first three
layers are the same for both networks and correspond to the first three layers of an AlexNet trained to classify objects in the ImageNet dataset. For both
networks, these shared ‘pre-filtering’ layers are followed by sequences of convolutional layers (rows labeled ‘conv’; values indicate feature depth and
convolutional filter resolution; ‘str’ = filter stride, ‘pad’ = convolutional padding), max-pooling layers (‘maxpool’), batch-normalization and weight-dropout
layers (‘batchnorm + dropout’), adaptive averaging layers (‘adaptive avg’), and fully-connected layers (‘fully con.’; value indicates number of units). Feature
maps in the convolutional or fully connected layers (indicated by red arrows; resolution of the feature maps in parentheses) are used as predictors of brain
activity in the context of an encoding model (see a).
Nature Neuroscience | www.nature.com/natureneuroscience
nature research | reporting summary
Corresponding author(s): Kendrick Kay
Last updated by author(s): Oct 10, 2021
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested
A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons
A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient)
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Software and code
Policy information about availability of computer code
Data collection Psychophysics Toolbox 3.0.14, MATLAB R2018a, Meadows web-based platform (http://meadows-research.com)
Data analysis MATLAB (https://www.mathworks.com/products/matlab.html), Python (https://www.python.org), SPM5 (https://www.fil.ion.ucl.ac.uk/spm/),
FSL 5.0.7, 5.0.9, 6.0.3 (https://fsl.fmrib.ox.ac.uk/), FreeSurfer 6.0 and 7.0 (https://surfer.nmr.mgh.harvard.edu), GLMdenoise 1.4 (https://
github.com/cvnlab/GLMdenoise), analyzePRF 1.2 (https://github.com/cvnlab/analyzePRF/), GLMsingle 0.9 (https://github.com/cvnlab/
GLMsingle), fracridge 1.3 (https://github.com/nrdg/fracridge), ANTs 2.1.0 (http://stnava.github.io/ANTs/), MRTrix 3.0 (https://
www.mrtrix.org), Dipy 1.1 (https://dipy.org), Vistasoft master branch (https://github.com/vistalab/vistasoft), Connectome Workbench 1.4.2
(https://github.com/Washington-University/workbench), PyTorch 3 (https://pytorch.org)
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
April 2020
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
- A description of any restrictions on data availability
The NSD dataset is freely available at http://naturalscenesdataset.org. The data are hosted in the cloud, allowing researchers to exploit high-performance cloud
computing to efficiently analyze the dataset. We provide both raw data in BIDS format (Gorgolewski et al., 2016) and prepared data files, along with extensive
technical documentation in the NSD Data Manual. To ensure strict validation for an upcoming Algonauts prediction challenge (Cichy et al., 2019), the initial public
1
release will withhold the last three NSD scan sessions from each participant (about 8.4% of the NSD data). Images used for NSD were taken from the Common
Objects in Context database (Lin et al., 2014) (https://cocodataset.org).
nature research | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Life sciences study design
All studies must disclose on these points even when the disclosure is negative.
Sample size This study collects massive amounts of data in individual subjects. Analyses demonstrated in this paper are conducted primarily at the within-
subject level, demonstrating the precision and robustness of the data collected. For group-level analyses, the number of subjects used for
NSD (n = 8) is sufficiently large to provide some power for statistical inference. The sample size (n = 8) was chosen based on consideration of
guarding against incidental findings that occur only in a few individuals and based on consideration of subject burden (if all images had been
presented to a single subject, data collection would have extended for 8 years).
Data exclusions We implemented a subject-selection procedure in which the best 8 subjects out of a pool of 14 potential subjects (on basis of criteria such as
head motion and BOLD signal strength) were selected for full NSD data acquisition. This was done to optimize the quality of the NSD dataset.
For the neural network analysis, due to computational memory limitations, we used data from the best 4 out of the 8 NSD subjects in terms of
signal-to-noise ratio (SNR); this analysis is intended primarily to demonstrate proof of concept, and the SNR-based selection is not expected to
incur significant inferential biases. Due to image artifacts, we excluded 2/52 (4%) of the acquired T1-weighted volumes (excluded volumes
were from Subject 8). Due to eyetracking noise, for the eyetracking results shown in Extended Data Figure 4, we excluded 1/24 (4%), 1/24
(4%), 7/8 (88%), 0/24 (0%), 0/24 (0%), 0/24 (0%), 0/24 (0%), and 2/8 (25%) of the acquired eyetracking runs from the 8 subjects, respectively
(in aggregate: 11/160 (7%)).
Replication This resource paper describes extensive quality checks on the data acquired from the 8 NSD subjects. We provide substantial evidence that
high-quality data were obtained from all subjects. Sufficient data were obtained such that we were able to demonstrate effects at the level of
individual subjects and replicate effects across multiple subjects. Note that some subjects fare better on certain quality metrics (e.g. head
motion) than others. In addition, there is some variation in the total amount of data collected across subjects (e.g. between 30–40 core NSD
scan sessions were acquired for each subject).
Randomization All participants engaged in the same set of experiments. However, somewhat non-overlapping sets of stimuli were chosen for each subject.
The allocation of stimuli to different subjects was done randomly from a fixed set of images pulled from the Microsoft COCO database. Given
the large scale of stimulus sampling (e.g. 9,000–10,000 unique images were shown to each subject), it is likely that although the exact same
images are not shown to each subject, the same general types of stimulus features are well sampled for each subject.
Blinding Blinding is not relevant to this study given that there is little that the investigators could have done to bias the nature of the recorded data
and given that the participants do not belong to any discrete groupings.
Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material,
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.
Materials & experimental systems Methods
n/a Involved in the study n/a Involved in the study
Antibodies ChIP-seq
Eukaryotic cell lines Flow cytometry
Palaeontology and archaeology MRI-based neuroimaging
Animals and other organisms
Human research participants
Clinical data
April 2020
Dual use research of concern
Human research participants
Policy information about studies involving human research participants
Population characteristics A mixture of males and females were used (2m, 6f). All participants were healthy young adults between 19–32 years old, and
all provided informed written consent. Participants were compensated at a rate of $30 per hour, plus performance bonuses.
2
Recruitment Participants were recruited through advertisements to the local community and were screened based on ability to participate
nature research | reporting summary
in this long-term neuroimaging study. In addition, we selected participants based on data quality from an initial 7T fMRI
session. This selection does induce a bias towards individuals with low head motion, high cognitive performance, and strong
BOLD responses. The goal of the NSD dataset is largely to create a massive dataset to inform studies of the basic mechanisms
of vision and memory, and does not represent an unbiased sampling of the human population.
Ethics oversight University of Minnesota Institutional Review Board
Note that full information on the approval of the study protocol must also be provided in the manuscript.
Magnetic resonance imaging
Experimental design
Design type The core NSD experiment is task-based and has an event-related design. The prf experiment is task-based and has a
continuous design. The floc experiment is task-based and has an event-related design. We also collected resting-state
data, as well as structural and diffusion data.
Design specifications In the core NSD experiment, images were presented for 3 seconds, and were followed by a minimum of 1 second of gap
before the next trial. Many thousands of distinct images were presented over the course of many distinct scan sessions,
with a maximum number of presentations per distinct image of 3.
Behavioral performance measures Button presses and associated reaction times for each trial in the NSD experiment were recorded. To ensure high data
quality, we monitored basic response metrics, like response rate. We quantified recognition performance in the NSD
experiment using signal detection theory.
Acquisition
Imaging type(s) Functional, structural, diffusion, venogram, angiogram
Field strength 7T and 3T
Sequence & imaging parameters The primary fMRI sequence involved gradient-echo EPI, FOV 216 mm x 216 mm, matrix size 120 x 120, slice thickness
1.8 mm, orientation axial, TR 1.6 s, TE 22.0 ms, and flip angle 62°.
Area of acquisition Whole-brain scans including the cerebellum
Diffusion MRI Used Not used
Parameters 99–100 directions; b-values of 0, 1,500, and 3,000; no cardiac gating
Preprocessing
Preprocessing software A combination of custom MATLAB and Python code, FreeSurfer 6, and selected tools from SPM, FSL, ANTs, and MRTrix3.
Normalization The NSD data were prepared in a variety of spaces including subject-native space and atlas spaces (MNI, fsaverage). Some of
the data demonstrations in this paper show results in subject-native spaces; some show results that reflecting averaging in
atlas spaces.
Normalization template For preparation of data in atlas spaces, the MNI152 and fsaverage templates were used.
Noise and artifact removal For the GLM preparation of the NSD data, the data-driven analysis method GLMdenoise and the statistical technique of ridge
regression were used. These methods can account for a variety of sources of noise (e.g., physiological, motion, scanner
artifacts, effects of collinearity). A version of the GLM results that omit these noise removal methods is also provided.
Volume censoring No censoring was performed.
Statistical modeling & inference
Model type and settings Trial-wise fMRI response amplitudes were estimated for individual voxels in individual subjects. A variety of analyses were
then performed on these response amplitudes, including univariate, multivariate, RSA, and encoding models.
Effect(s) tested We conducted rich sampling of the brain's response to a large number of complex natural scenes. The resulting
April 2020
measurements can now be used to test and explore a variety of different scientific hypotheses.
Specify type of analysis: Whole brain ROI-based Both
Atlas-based regions of interest were incorporated into this resource for user convenience. In addition, a
Anatomical location(s) number of manually defined regions of interest based on both functional and anatomical criteria were
created.
3
Statistic type for inference This paper provides a resource in which data from all voxels are processed and made available. Thus, thresholding and
(See Eklund et al. 2016)
nature research | reporting summary
specific inferential claims are largely not applicable here.
Correction Not applicable, as voxel-wise statistical significance inferences are not a primary focus of this paper.
Models & analysis
n/a Involved in the study
Functional and/or effective connectivity
Graph analysis
Multivariate modeling or predictive analysis
Multivariate modeling and predictive analysis For pRF estimation, we used local contrast of NSD images to predict NSD betas through a simple pRF model,
using nonlinear optimization to determine parameters for each voxel/vertex. For representational similarity
analysis, we constructed representational dissimilarity matrices by correlating multivariate brain activity
patterns. For neural network modeling, we used either pre-trained image-computable neural network
models (AlexNet, Gabor model) or brain-optimized image-computable neural network models (GNet). These
models were trained on a set of training data (the non-shared NSD images) and validated on a separate set
of validation data (the shared NSD images).
April 2020