Eeg Decoding To Text
Eeg Decoding To Text
Electroencephalography-to-Text Decoding
Hamza Amrani, Daniela Micucci, Paolo Napoletano
University of Milano - Bicocca, Milan, Italy
{hamza.amrani, daniela.micucci, paolo.napoletano}@unimib.it
arXiv:2312.09430v1 [eess.SP] 15 Nov 2023
Subject Generated
Sentence
Figure 1: The workflow of the proposed method involves several steps. Firstly, the raw EEG signals corresponding to each
word are input into the Brain module. This module extracts subject-dependent features, which are subsequently utilized by
a Language Module based on BART suitable trained for sentence generation. The resulting sentence is further refined using
GPT-4 APIs to produce the final output. In the example, the ground truth is: He is a prominent member of the Bush family, the
younger brother of President George W. Bush; the final sentence predicted by our model is: He was a member of the American
Bush family, brother of President George W. Bush. Bold font refers to the exact match between the ground truth and the
estimated sentence.
41.35% BLEU-1 and 30.69% Rouge-F on the ZuCo dataset. proposal, on previously unseen sentences, are a BLEU-1
Nevertheless, the unexplored impact of embedding EEG score of 42.75%, a ROUGE-1-F (Lin 2004) of 33.28%, and
signals within language models raises questions about the a BERTScore-F of 53.86%, surpassing the previous state-of-
optimal approach for enhancing decoding performance. Fur- the-art results by 3.38%, 8.43%, and 6.31%, respectively.
thermore, while the analysis of EEG signals is a valuable Our code is available for public access at: https://github.
way of studying brain activity, the interpretation of these com/hamzaamrani/EEG-to-Text-Decoding
fi
signals can indeed be influenced by subjectivity (Jeng et al.
2020). A recent study by Feng et al. (Feng, Feng, and Qin Related Work
2023), on EEG-to-Text decoding task, argued that this task Related work on brain-to-speech and brain-to-text decoding
is considerably challenged by the EEG representation that can be categorized into three methods by the features they
varies with individual subjects and the text representation are capturing: motor imagery based, overt speech based, and
influenced by semantics. Lastly, current evaluation metrics inner speech based.
that primarily focus on syntactic aspects do not adequately Different BCI devices have been explored encompass-
capture the semantics, resulting in limited comprehensibil- ing Electroencephalography (EEG), Electrocorticography
ity. (ECoG), and functional Magnetic Resonance Imaging
In this paper, we present an end-to-end deep learning (fMRI).
framework for non-invasive brain recordings that uses pre- Motor imagery based systems, such as for instance, point-
trained language models for open vocabulary EEG-to-text and-click (Jarosiewicz et al. 2015; Pandarinath et al. 2017;
decoding. Firstly, our end-to-end deep learning architecture Lee et al. 2018) and imaginary handwriting (Willett et al.
for open vocabulary EEG decoding incorporates a represen- 2021), have high accuracy but moderately low typing rate.
tation learning module for raw EEG encoding, a language Overt speech based methods for decoding or synthesizing
modeling module based on BART (Lewis et al. 2019), and speech show a faster communication rate. These methods
a GPT-4 (OpenAI 2023) refinement module, enhancing the require subjects to physically speak during neural record-
comprehensibility of the generated sentences. The represen- ing (Anumanchipalli, Chartier, and Chang 2019; Makin,
tation learning module includes a subject layer, which per- Moses, and Chang 2020) or to imagine the physical pro-
mits taking into account the subjectivity of EEG signals, and nunciation of the sentence (Moses et al. 2021; Willett et al.
a multi-layer transformer encoder that allows to extract la- 2023). These approaches make the decoding to be system
tent brain representations that are then aligned into language language-dependent, since the same concept may have com-
token embeddings. Second, we use the BERTScore (Zhang pletely distinct pronunciations in different languages.
et al. 2019) in the evaluation, which incorporates semantic Inner speech based approaches try to address language
judgment at the sentence level, resulting in a more com- articulation dependencies by decoding language from imag-
prehensive evaluation that is closer to human perception. ined speech and read text (Brigham and Kumar 2010;
Thirdly, we conducted an ablation study to analyze and dis- Panachakel and Ramakrishnan 2021; Wang and Ji 2022; Ni-
tinguish the contributions of each module within our pro- eto et al. 2022; Défossez et al. 2023; Tang et al. 2023).
posal, providing valuable insights for future research work. A major limitation for most of the approaches discussed
To demonstrate the efficacy of our approach, comprehen- is the constraint of using small closed vocabularies, with a
sive evaluations are conducted on two publicly available low and limited number of unique words (Pereira et al. 2018;
datasets, ZuCo v1.0 and v2.0 (Hollenstein et al. 2018, 2019), Dash, Ferrari, and Wang 2020; Moses et al. 2021).
comprising EEG recordings from 30 subjects actively en- In addition, most current approaches (Willett et al. 2021,
gaged in natural reading tasks. The results achieved by our 2023; Défossez et al. 2023) for language communication
use invasive devices (such as ECoG) or less accessible non- remains untrained. We start detailing the specifics of the
invasive devices (such as fMRI). This makes it challenging training stages. Then we offer a more comprehensive break-
to collect large-scale datasets and implement approaches to down of each module included in our architecture.
help people with paralysis who can no longer speak. Never-
theless, recent studies attempt to decode inner speech by us- Training Stage 1 We initiate training with the Brain mod-
ing both open vocabularies and non-invasive devices (Wang ule: word-level EEG signals are aligned with word-tokens,
and Ji 2022; Défossez et al. 2023; Duan et al. 2023). as encoded by a locked, pre-trained BART Language Model,
Our work opens the doors for similar studies of inner utilizing a Mean Square Error (MSE) Loss. This stage incor-
speech brain-to-text decoding. We investigate the represen- porates a learnable features module designed to account for
tation learning of EEG signals, the inter-subject variability, EEG encoding and subjectivity. The outcome of this training
the human judgment at the sentence level of generated sen- stage yields EEG subject-dependent features. The alignment
tences, and the use of pre-trained language models. procedure is done by mapping the learned EEG representa-
te
tion Z into the BART token embeddings BARTenc , using
te
MSE regression loss LM SE (BARTenc , Z):
Method
We aim to decode neural activity from a time series of high- te
min LM SE (BARTenc , fbrain (X)) (2)
dimensional brain signals recorded with non-invasive elec- fbrain
troencephalography during the natural reading of English
Training Stage 2 Moving on, the subsequent step involves
sentences. We first define the general task of open vocabu-
fine-tuning a pre-trained Language Model based on BART,
lary EEG-to-Text decoding and then introduce the proposed
aimed at generating word sequences through the utilization
end-to-end architecture.
of a Cross-Entropy Loss. As in Wang et al. (Wang and Ji
2022), we use the mapped embedded brain representation
Open Vocabulary EEG-to-Text Decoding
Z directly as initial word embeddings to feed into the pre-
Let’s define a sequence of word-level raw EEG signals as trained language model encoder-decoder BART (Lewis et al.
X ∈ RC×T , with C the number of EEG channels and T the 2019). The high-level idea here is that we consider each em-
number of time steps. These EEG signals are a reflection of bedded EEG representation as a word-level representation,
the recorded brain activity for a specific subject denoted as and leverage a pre-trained language model to decode to real
s, drawn from the set S consisting of various distinct sub- human language (English) like traditional machine transla-
jects. An EEG-decoding task is the task of predicting the tion tasks. Then, the last hidden states from the BART de-
corresponding text sentence Y in a Sequence-To-Sequence coder are fed into a multi-layer perception (MLP) to gener-
framework. Each text sentence Y is composed of English to- ate English tokens y n from the BART vocabulary V.
kens yn ∈ V from an open vocabulary V. During the training During the training, the objective is to minimize the text
phase, the EEG-subject-Text pairs can come from various reconstruction cross-entropy loss, defined as follows:
subjects and various categories of reading materials.
Thus, a supervised EEG-to-Text decoding task consists in N
X
finding a decoding function f : {C ×T }×S → V, such that Lrec = − log p(y n ∈ V) (3)
f predicts Y given X and s. We denote by Y = f (X, s) the n=1
decoded/predicted text sentence from the brain signals.
Learnable Features Module This module is included in
Searching for f , the task is to maximize the probability of
the Brain module and it is used for extracting subject-
the decoded text sentence Y :
dependent brain features from the raw EEG signals.
N Given a sequence of word-level raw EEG signals X =
{x0 , x1 , ..., xM } ∈ RC×T and the corresponding subject
Y
p(Y |X) = p(y n ∈ V|X, y <n ) (1)
n=1
s ∈ S, we first use a deep neural network fbrain to
get the latent subject-dependent brain representation Z =
where N is the length of the text sentence Y , and y n is {z0 , z1 , ..., zM } = fbrain (X) ∈ R. This architecture (Fig-
the n-th token of Y . ure 3) consists of (1) a learnable EEG feature block fol-
lowed (2) by a subject layer to leverage inter-subject vari-
Proposed Architecture ability, which is input to (3) a multi-layer transformer en-
An overview of the proposed architecture is given in Fig- coder named BT E (Brain Transformer Encoder), and then
ure 1 (refer to Appendix A for a detailed overview of the to (4) a multi-layer perceptron.
architecture). It is composed of two main components: 1) The brain data is first fed to a bi-directional Gated Recur-
a Brain module that implements a representation learning rent Unit (GRU) (Cho et al. 2014) which reads the multi-
approach for EEG encoding; and 2) a Language Modeling time series input in both forward and backward directions
module based on BART to produce EEG-to-Text sentences to extract learnable EEG features. The use of GRU al-
and on GPT-4 for sentence-level refinement. The training lows to dynamic address the different lengths of word-level
process is composed of two stages. An overview of the end- raw EEG signals. We then apply a fully-connected layer
to-end architecture is presented in Figure 2, where dashed to the concatenated forward and backward output. Simi-
boxes correspond to the modules of the architecture that un- larly to (Défossez et al. 2023), we then add a 1x1 point-
dergo training, while solid boxes represent the module that wise convolution (with a kernel size of 1) without activa-
Brain Module
Subject-dependent
Language Alignment with MSE Loss
brain features
LM Module
Figure 2: Overview of the proposed end-to-end architecture for open vocabulary EEG-to-Text decoding. Firstly, a sequence of
word-level raw EEG signals is fed to the Brain module to extract deep-embedded representations for raw EEG encoding. Then,
we use a Language Modeling (LM) module to generate EEG-to-Text sentences by leveraging the pre-trained language model
BART.
Subject Layer
Add & norm
will demonstrate subsequently in the ablation study, opting
block
Point-wise 1D Convolution
work, rather than directly handling pre-computed features
Encoder (BTE)
FFN
Table 3: Open Vocabulary EEG-to-Text decoding examples on ZuCo unseen test sentences. We report both predictions from
our model, with and without GPT-4 sentence refinement. (1-3) are in NR v1.0, v2.0. (4) is in SR v1.0. Bold means exact match,
Italic indicates semantic similarity. Underline denotes error match.
(1) Ground truth He is a prominent member of the Bush family, the younger brother of President George W. Bush...
(Wang and Ji 2022) was a former member of the American family, and son brother of President George W. Bush...
Prediction was the member member of the American family. and younger brother of President George W. Bush
Prediction + GPT-4 He was a member of the American Bush family, brother of President George W. Bush. . .
(2) Ground truth Raymond Arrieta (born March 26, 1965 in San Juan, Puerto Rico) is considered by many to be one
of Puerto Rico’s greatest comedians.
(Wang and Ji 2022) mond wasaga,19 in 17, 18) New Francisco, Puerto Rico) is a one many to be the of the Rico’s greatest
poets.
Prediction mond wasaga (born April 17, 1946) New Francisco, Puerto Rico) is a one many to be the of the Rico’s most artists.
Prediction + GPT-4 Ramon Wasaga (born April 17, 1946, in New Francisco, Puerto Rico) is one of the many to be considered
as one of the most prominent artists of Puerto Rico.
(3) Ground truth Following the 1980 presidential election, Bush and his family moved to Miami-Dade County, Florida.
(Wang and Ji 2022) the deaths election, the was his wife moved to California,Dade County, Florida
Prediction the wars election, Bush was his wife moved to Florida,Dade County, Florida.
Prediction + GPT-4 After the war’s election, Bush and his wife moved to Dade County, Florida.
(4) Ground truth It’s not a particularly good film, but neither is it a monsterous one.
(Wang and Ji 2022) was a a bad good story, but it is it bad bad. one.
Prediction ’s a a bad good movie, but it is it bad bad. one.
Prediction + GPT-4 It’s a bad good movie, but is it a bad one.
Figure 4: t-SNE visualization of EEG embedded representations of sentences in the training set, which are (a) original EEG
representations and (b) generated by the Brain module of our architecture. Distinct colors mean different subjects. Each dot
represents a sentence. The red triangle represents the EEG embedded representations corresponding to the same sentence ”With
his interest in race cars, he formed a second company, the Henry Ford Company”.
We verified that the Brain Transformer Encoder provides declared consent of the participants. Fortunately, the current
higher decoding performances. Finally, to test whether our nature of acquiring EEG and MEG (Magnetoencephalog-
model effectively leverages the pre trained BART model, raphy) signals requires participant awareness, unlike other
we trained it without fine-tuning the BART model weights. biomarkers such as DNA or facial features. Additionally, the
As reported, decoding performance decreases notably up to susceptibility of these signals to corruption by muscle move-
14.25%. This loss significantly confirms the use of fine- ments, such as teeth clenching or eye blinks, provides a pos-
tuning on the BART model. sible precaution against unauthorized acquisition and mis-
Then, we also show the hypothetical upper limit for EEG- use. Furthermore, it is critical to acknowledge the potential
to-Text decoding when no errors are made to map EEG sig- risk associated with the high subjectivity of neural signals,
nals to token words. Separately from our model, we fine- even in the absence of participant awareness, which could
tuned BART on only Eye-Tracking fixations words without compromise mental privacy.
considering the raw EEG signals to reconstruct the original We strongly believe that promoting and encouraging open
text sentence. It outperforms our proposed architecture by science practices remains essential for responsibly assessing
about 30% in terms of BLEU-1, 37% in terms of ROUGE- the potential risks and benefits associated with BCI and AI
1-F, and 15% in terms of BERTScore-F. The obtained re- technologies in this domain.
sults reveal the existence of two challenges within the EEG-
to-Text decoding task. The initial challenge pertains to the Conclusions and Future Works
model’s capacity to establish a dependable EEG-feature rep-
In this paper, we present an end-to-end deep learning frame-
resentation for the word tokens. The subsequent challenge
work for open vocabulary EEG-to-Text decoding task. By
involves the faithful reconstruction of the sentence. This ex-
leveraging a subject-dependent representation learning mod-
periment highlights that, between these two challenges, the
ule, a pre-trained BART language model, and a GPT-4 sen-
foremost one is undoubtedly the ability to discern an effi-
tence refinement module, this study offers a comprehensive
cacious representation of the EEG signals. This observation
solution that not only enhances decoding performance but
thereby points towards the direction of future research ef-
also delves into the human comprehensibility of the decoded
forts.
output. The incorporation of the BERTScore as an evalu-
ation metric has enabled a more holistic assessment, cap-
Ethical Implications turing not only syntactic accuracy but also taking into ac-
While the recent advancements in utilizing brain-computer count human understanding at the sentence level. Moreover,
interfaces and artificial intelligence to decode neural activity the conducted ablation study permitted us to understand the
into text hold significant potential in aiding individuals with contribution to the proposed architecture of each compo-
communication deficits, ethical considerations, and societal nent. This in-depth analysis not only validates the efficacy
impact must be carefully addressed. The scientific commu- of each module but also provides a roadmap for further re-
nity must maintain vigilance and ensure that the utilization search, guiding the development of refined and optimized
of such systems is not employed without the informed and approaches in the future.
The empirical validation on two publicly available Caucheteux, C.; and King, J.-R. 2022. Brains and algorithms
datasets demonstrates the effectiveness of the proposed ar- partially converge in natural language processing. Commu-
chitecture, achieving a BLEU-1 score of 42.75%, a ROUGE- nications biology, 5(1): 134.
1-F of 33.28%, and a BERTScore-F of 53.86%, outperform- Cho, K.; Van Merriënboer, B.; Bahdanau, D.; and Ben-
ing the previous state-of-the-art results by 3.38%, 8.43%, gio, Y. 2014. On the properties of neural machine
and 6.31%, respectively. When looking at larger n-grams translation: Encoder-decoder approaches. arXiv preprint
ratings (BLEU-2,3,4), there is an improvement of 7.24%, arXiv:1409.1259.
12.5%, and 16.30%, respectively. Our results show that the Dash, D.; Ferrari, P.; and Wang, J. 2020. Decoding imag-
use of raw EEG signals leads to improved results, demon- ined and spoken phrases from non-invasive neural (MEG)
strating the effectiveness of modern representational learn- signals. Frontiers in neuroscience, 14: 290.
ing approaches in neuroscience.
Défossez, A.; Caucheteux, C.; Rapin, J.; Kabeli, O.; and
In summary, this research not only fills critical voids in the King, J.-R. 2023. Decoding speech perception from non-
EEG decoding landscape but also shows the way for future invasive brain recordings. Nature Machine Intelligence, 1–
investigations. By combining advanced neural network ar- 11.
chitectures with sophisticated evaluation methodologies, the
Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2018.
study pushes the boundaries of EEG-to-text decoding and
Bert: Pre-training of deep bidirectional transformers for lan-
encourages continued innovation in the pursuit of more ac-
guage understanding. arXiv preprint arXiv:1810.04805.
curate and human-aligned results.
One future direction is to improve the quality of the gener- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn,
ated embedded representations by taking into account inter- D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.;
subject variability, so to increase the ability of the model Heigold, G.; Gelly, S.; et al. 2020. An image is worth 16x16
to generalize across individuals. Furthermore, ethical con- words: Transformers for image recognition at scale. arXiv
siderations need to be at the forefront as we move forward. preprint arXiv:2010.11929.
Ensuring privacy, establishing clear guidelines for consent, Duan, Y.; Zhou, J.; Wang, Z.; Wang, Y.-K.; and Lin,
and considering the potential long-term effects of this tech- C.-T. 2023. DeWave: Discrete EEG Waves Encoding
nology on users are critical. for Brain Dynamics to Text Translation. arXiv preprint
arXiv:2309.14030.
Feng, X.; Feng, X.; and Qin, B. 2023. Semantic-aware
Acknowledgement Contrastive Learning for Electroencephalography-to-Text
This work was partially funded by the National Plan for Generation with Curriculum Learning. arXiv preprint
NRRP Complementary Investments (PNC, established with arXiv:2301.09237.
the decree-law 6 May 2021, n. 59, converted by law n. Gauthier, J.; and Ivanova, A. 2018. Does the brain represent
101 of 2021) in the call for the funding of research ini- words? An evaluation of brain decoding studies of language
tiatives for technologies and innovative trajectories in the understanding. arXiv preprint arXiv:1806.00591.
health and care sectors (Directorial Decree n. 931 of 06- Hendrycks, D.; and Gimpel, K. 2016. Gaussian error linear
06-2022) - project n. PNC0000003 - AdvaNced Technolo- units (gelus). arXiv preprint arXiv:1606.08415.
gies for Human-centrEd Medicine (project acronym: AN-
Hollenstein, N.; Rotsztejn, J.; Troendle, M.; Pedroni, A.;
THEM). This work reflects only the authors’ views and
Zhang, C.; and Langer, N. 2018. ZuCo, a simultaneous EEG
opinions, neither the Ministry for University and Research
and eye-tracking resource for natural sentence reading. Sci-
nor the European Commission can be considered responsi-
entific data, 5(1): 1–13.
ble for them.
Hollenstein, N.; Troendle, M.; Zhang, C.; and Langer, N.
2019. ZuCo 2.0: A dataset of physiological recordings
References during natural reading and annotation. arXiv preprint
Anumanchipalli, G. K.; Chartier, J.; and Chang, E. F. 2019. arXiv:1912.00903.
Speech synthesis from neural decoding of spoken sentences. Huth, A. G.; De Heer, W. A.; Griffiths, T. L.; Theunissen,
Nature, 568(7753): 493–498. F. E.; and Gallant, J. L. 2016. Natural speech reveals the
semantic maps that tile human cerebral cortex. Nature,
Ba, J. L.; Kiros, J. R.; and Hinton, G. E. 2016. Layer nor-
532(7600): 453–458.
malization. arXiv preprint arXiv:1607.06450.
Jarosiewicz, B.; Sarma, A. A.; Bacher, D.; Masse, N. Y.;
Brigham, K.; and Kumar, B. V. 2010. Imagined speech clas- Simeral, J. D.; Sorice, B.; Oakley, E. M.; Blabe, C.; Pan-
sification with EEG signals for silent communication: a pre- darinath, C.; Gilja, V.; et al. 2015. Virtual typing by people
liminary investigation into synthetic telepathy. In 2010 4th with tetraplegia using a self-calibrating intracortical brain-
International Conference on Bioinformatics and Biomedical computer interface. Science translational medicine, 7(313):
Engineering, 1–4. IEEE. 313ra179–313ra179.
Broderick, M. P.; Anderson, A. J.; Di Liberto, G. M.; Crosse, Jeng, P.-Y.; Wei, C.-S.; Jung, T.-P.; and Wang, L.-C.
M. J.; and Lalor, E. C. 2018. Electrophysiological correlates 2020. Low-dimensional subject representation-based trans-
of semantic dissimilarity reflect the comprehension of natu- fer learning in EEG decoding. IEEE Journal of Biomedical
ral, narrative speech. Current Biology, 28(5): 803–809. and Health Informatics, 25(6): 1915–1925.
Lee, M.-H.; Williamson, J.; Won, D.-O.; Fazli, S.; and Lee, Tang, J.; LeBel, A.; Jain, S.; and Huth, A. G. 2023. Seman-
S.-W. 2018. A high performance spelling system based on tic reconstruction of continuous language from non-invasive
EEG-EOG signals with visual feedback. IEEE Transactions brain recordings. Nature Neuroscience, 1–9.
on Neural Systems and Rehabilitation Engineering, 26(7): Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones,
1443–1459. L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. At-
Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mo- tention is all you need. Advances in neural information pro-
hamed, A.; Levy, O.; Stoyanov, V.; and Zettlemoyer, L. cessing systems, 30.
2019. Bart: Denoising sequence-to-sequence pre-training Wang, C.; Subramaniam, V.; Yaari, A. U.; Kreiman, G.;
for natural language generation, translation, and comprehen- Katz, B.; Cases, I.; and Barbu, A. 2023. BrainBERT: Self-
sion. arXiv preprint arXiv:1910.13461. supervised representation learning for intracranial record-
Lin, C.-Y. 2004. Rouge: A package for automatic evaluation ings. arXiv preprint arXiv:2302.14367.
of summaries. In Text summarization branches out, 74–81. Wang, Z.; and Ji, H. 2022. Open vocabulary
Makin, J. G.; Moses, D. A.; and Chang, E. F. 2020. Ma- electroencephalography-to-text decoding and zero-shot
chine translation of cortical activity to text with an encoder– sentiment classification. In Proceedings of the AAAI Con-
decoder framework. Nature neuroscience, 23(4): 575–582. ference on Artificial Intelligence, volume 36, 5350–5358.
Moses, D. A.; Metzger, S. L.; Liu, J. R.; Anumanchipalli, Willett, F. R.; Avansino, D. T.; Hochberg, L. R.; Henderson,
G. K.; Makin, J. G.; Sun, P. F.; Chartier, J.; Dougherty, M. E.; J. M.; and Shenoy, K. V. 2021. High-performance brain-
Liu, P. M.; Abrams, G. M.; et al. 2021. Neuroprosthesis for to-text communication via handwriting. Nature, 593(7858):
decoding speech in a paralyzed person with anarthria. New 249–254.
England Journal of Medicine, 385(3): 217–227. Willett, F. R.; Kunz, E. M.; Fan, C.; Avansino, D. T.; Wilson,
Nieto, N.; Peterson, V.; Rufiner, H. L.; Kamienkowski, J. E.; G. H.; Choi, E. Y.; Kamdar, F.; Hochberg, L. R.; Druckmann,
and Spies, R. 2022. Thinking out loud, an open-access EEG- S.; Shenoy, K. V.; et al. 2023. A high-performance speech
based BCI dataset for inner speech recognition. Scientific neuroprosthesis. BioRxiv, 2023–01.
Data, 9(1): 52. Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K. Q.; and Artzi,
OpenAI. 2023. GPT-4 Technical Report. ArXiv, Y. 2019. Bertscore: Evaluating text generation with bert.
abs/2303.08774. arXiv preprint arXiv:1904.09675.
Panachakel, J. T.; and Ramakrishnan, A. G. 2021. Decoding
covert speech from EEG-a comprehensive review. Frontiers
in Neuroscience, 15: 392.
Pandarinath, C.; Nuyujukian, P.; Blabe, C. H.; Sorice, B. L.;
Saab, J.; Willett, F. R.; Hochberg, L. R.; Shenoy, K. V.;
and Henderson, J. M. 2017. High performance communi-
cation by people with paralysis using an intracortical brain-
computer interface. Elife, 6: e18554.
Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002.
Bleu: a method for automatic evaluation of machine trans-
lation. In Proceedings of the 40th annual meeting of the
Association for Computational Linguistics, 311–318.
Pereira, F.; Lou, B.; Pritchett, B.; Ritter, S.; Gershman, S. J.;
Kanwisher, N.; Botvinick, M.; and Fedorenko, E. 2018. To-
ward a universal decoder of linguistic meaning from brain
activation. Nature communications, 9(1): 963.
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.;
Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Explor-
ing the limits of transfer learning with a unified text-to-text
transformer. The Journal of Machine Learning Research,
21(1): 5485–5551.
Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning,
C. D.; Ng, A. Y.; and Potts, C. 2013. Recursive deep models
for semantic compositionality over a sentiment treebank. In
Proceedings of the 2013 conference on empirical methods in
natural language processing, 1631–1642.
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; and
Salakhutdinov, R. 2014. Dropout: a simple way to prevent
neural networks from overfitting. The journal of machine
learning research, 15(1): 1929–1958.
Appendix
A - Architecture
A detailed overview of the architecture is given in Figure 5. It is composed of two main components: 1) a Brain module that
implements a representation learning approach for EEG encoding; and 2) a Language Modeling module based on BART to
produce EEG-to-Text sentences and on GPT-4 for sentence-level refinement.
Brain Module LM Module
Mean Squared Error
Cross-Entropy
{ BART token embeddings Target Tokens {
Loss
Loss
MLP
Learnable Position
+ GPT-4
Embedding
Pre-Trained BART Decoder
Subject-speci c
Subject Layer
Block
+ Position Encoding
fi fi
B - Dataset EEG electrodes
In ZuCo dataset (Hollenstein et al. 2018, 2019), we follow Hollenstein et al. steps (Hollenstein et al. 2018, 2019) to perform
data pre-processing on raw EEG signals, leading to 105 EEG channels from the scalp recordings. It follows the full list of EEG
channels: E2, E3, E4, E5, E6, E7, E9, E10, E11, E12, E13, E15, E16, E18, E19, E20, E22, E23, E24, E26, E27, E28, E29, E30,
E31, E33, E34, E35, E36, E37, E38, E39, E40, E41, E42, E43, E44, E45, E46, E47, E50, E51, E52, E53, E54, E55, E57, E58,
E59, E60, E61, E62, E64, E65, E66, E67, E69, E70, E71, E72, E74, E75, E76, E77, E78, E79, E80, E82, E83, E84, E85, E86,
E87, E89, E90, E91, E92, E93, E95, E96, E97, E98, E100, E101, E102, E103, E104, E105, E106, E108, E109, E110, E111,
E112, E114, E115, E116, E117, E118, E120, E121, E122, E123, E124, Cz.
In this paper, the Cz EEG channel has been removed as it consists of all zeros.
C - Decoding accuracy results by subject
We report open vocabulary EEG-to-Text decoding results for each subject (see Table 4). The results show a significant difference
between subjects from the v1.0 and v2.0 of the dataset. The v2.0 results achieve a BLEU-1 score of 47.13%, a ROUGE-1-F of
40.16%, and a BERTScore-F of 57.35%, while the v1.0 results obtain a BLEU-1 score of 39.39%, a ROUGE-1-F of 28.16%,
and a BERTScore-F of 51.28%, so leading to an increment of 19.64%, 42.61% and 11.83% respectively.
Table 4: Open Vocabulary EEG-to-Text decoding model evaluation on ZuCo datasets by each subject.
Table 5: Open Vocabulary EEG-to-Text decoding examples on ZuCo unseen test sentences, with and without GPT-4 sentence
refinement.
(1) Ground truth An amateurish, quasi-improvised acting exercise shot on ugly digital video.
Prediction interesting actor, un-religiousprovised film performance, through a, video.
Prediction + GPT-4 Interesting actor, un-religious, improvised film performance, through a video.
(2) Ground truth Viewed as a comedy, a romance, a fairy tale, or a drama, there’s nothing remotely
triumphant about this motion picture.
Prediction the from a kind of it satire, and love tale, and a love, it’s a quite funny about it film picture.
Prediction + GPT-4 From a kind of satire, it’s a love tale and quite a funny film picture about love.
(3) Ground truth It’s solid and affecting and exactly as thought-provoking as it should be.
Prediction ’s a, well. it what it-provoking as the sounds be.
Prediction + GPT-4 Well, it’s as provoking as it sounds, what a be.
(4) Ground truth It’s a head-turner – thoughtfully written, beautifully read and, finally, deeply humanizing.
Prediction s a greatyscing, a to crafted, well acted, well most, a moving..
Prediction + GPT-4 It’s a great, most moving, well-crafted and well-acted scene.
(5) Ground truth “The Kid Stays in the Picture” is a great story, terrifically told by the man who wrote
it but this Cliff Notes edition is a cheat.
Prediction The movie”ays in the House” is a film movie about andally funny by a young
who wrote it. also ish version is a little.
Prediction + GPT-4 ”The movie, ’Days in the House’, is a film about a young man who wrote it.
It’s also randomly funny. The British version is a little different.
(6) Ground truth Fans of the TV series will be disappointed, and everyone else will be slightly bored.
Prediction of the film series will recognize familiar to but the will will be happy disappointed.
Prediction + GPT-4 of the film series will recognize familiar to but the will be happy disappointed.
(7 ) Ground truth Wedding feels a bit anachronistic
Prediction alting bells like little likeachronistic,
Prediction + GPT-4 alting bells like little likeachronistic.
(8) Ground truth But what’s nice is that there’s a casual intelligence that permeates the script.
Prediction he’s most about that it’s a sense, to’sates the film.
Prediction + GPT-4 He’s most about that. It’s a sense to states the film.
(9) Ground truth An important movie, a reminder of the power of film to move us and to make us examine our values.
Prediction interesting part about but must of the importance of the to shape people. of make us think our lives.
Prediction + GPT-4 interesting part about but must of the importance of the to shape people. of make us think our lives.
(10) Ground truth Jeb Bush was born in Midland, Texas, where his father was running an oil drilling company.
Prediction uan Bush was born in Newland, Texas, and his father was a a insurance company company.
Prediction + GPT-4 Juan Bush was born in Newland, Texas, and his father was an insurance company owner.