[go: up one dir, main page]

0% found this document useful (0 votes)
173 views12 pages

Eeg Decoding To Text

This document presents a novel deep learning framework for decoding open vocabulary EEG signals into text using non-invasive Brain-Computer Interfaces. The proposed architecture integrates a subject-dependent representation learning module, a BART language model, and a GPT-4 refinement module, achieving significant improvements over previous methods in decoding performance. Comprehensive evaluations on publicly available datasets demonstrate the effectiveness of the approach, highlighting its potential for enhancing communication for individuals with speech impairments.

Uploaded by

Soo Yee Min
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
173 views12 pages

Eeg Decoding To Text

This document presents a novel deep learning framework for decoding open vocabulary EEG signals into text using non-invasive Brain-Computer Interfaces. The proposed architecture integrates a subject-dependent representation learning module, a BART language model, and a GPT-4 refinement module, achieving significant improvements over previous methods in decoding performance. Comprehensive evaluations on publicly available datasets demonstrate the effectiveness of the approach, highlighting its potential for enhancing communication for individuals with speech impairments.

Uploaded by

Soo Yee Min
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Deep Representation Learning for Open Vocabulary

Electroencephalography-to-Text Decoding
Hamza Amrani, Daniela Micucci, Paolo Napoletano
University of Milano - Bicocca, Milan, Italy
{hamza.amrani, daniela.micucci, paolo.napoletano}@unimib.it
arXiv:2312.09430v1 [eess.SP] 15 Nov 2023

Abstract attempts to use this information have been limited to decode


Previous research has demonstrated the potential of using
sentences and words in small closed vocabularies (Pereira
pre-trained language models for decoding open vocabulary et al. 2018; Dash, Ferrari, and Wang 2020; Moses et al.
Electroencephalography (EEG) signals captured through a 2021), not clarifying whether current non-invasive record-
non-invasive Brain-Computer Interface (BCI). However, the ings have the spatial and temporal resolution necessary for
impact of embedding EEG signals in the context of language decoding natural language. In addition, existing approaches
models and the effect of subjectivity, remain unexplored, cannot decode semantically close words.
leading to uncertainty about the best approach to enhance Interestingly, previous works (Gauthier and Ivanova
decoding performance. Additionally, current evaluation met-
2018; Caucheteux and King 2022) demonstrate that the hu-
rics used to assess decoding effectiveness are predominantly
syntactic and do not provide insights into the comprehensi- man brain encodes language into higher-dimensional se-
bility of the decoded output for human understanding. We mantic representations. This is similar to how modern pre-
present an end-to-end deep learning framework for non- trained language models, such as BERT (Devlin et al. 2018),
invasive brain recordings that brings modern representational BART (Lewis et al. 2019), T5 (Raffel et al. 2020), and
learning approaches to neuroscience. Our proposal introduces GPT4 (OpenAI 2023), encode words into contextualized se-
the following innovations: 1) an end-to-end deep learning ar- mantic embedded representations in Natural Language Pro-
chitecture for open vocabulary EEG decoding, incorporating cessing (NLP). Thanks to their transfer learning abilities,
a subject-dependent representation learning module for raw diverse recent NLP downstream tasks, such as sequence
EEG encoding, a BART language model, and a GPT-4 sen- classification, text generation, and question answering, have
tence refinement module; 2) a more comprehensive sentence-
reached substantial improvements. Likewise, various stud-
level evaluation metric based on the BERTScore; 3) an ab-
lation study that analyses the contributions of each module ies (Wang and Ji 2022; Wang et al. 2023; Tang et al. 2023)
within our proposal, providing valuable insights for future experimented with combining brain signal decoding to NLP
research. We evaluate our approach on two publicly avail- models to produce semantic brain-embedded representa-
able datasets, ZuCo v1.0 and v2.0, comprising EEG record- tions. They demonstrate the ability of NLP models to extract
ings of 30 subjects engaged in natural reading tasks. Our semantic features that capture the meaning of input brain
model achieves a BLEU-1 score of 42.75%, a ROUGE-1-F recordings.
of 33.28%, and a BERTScore-F of 53.86%, outperforming The study by Wang et al. (Wang and Ji 2022) is the first
the previous state-of-the-art methods by 3.38%, 8.43%, and
6.31%, respectively.
to prove the potential of employing pre-trained language
models, such as BART, to decode open vocabulary Elec-
troencephalography (EEG) signals captured through a non-
Introduction invasive Brain-Computer Interface (BCI). The processing
The integration of deep learning into neuroscience is ad- pipeline suggested by the authors takes as input the EEG
vancing rapidly. Over the past decades, Brain-Computer features from the ZuCo dataset (Hollenstein et al. 2018,
Interfaces (BCIs) have made significant improvements in 2019). These pre-computed EEG features are subsequently
decoding natural language from brain recordings to re- adjusted using a transformer encoder before being input into
store communication to people who have lost the ability to the BART model. The BART model is then fine-tuned to ef-
speak (Willett et al. 2021; Moses et al. 2021). Although fectively suit the task of decoding EEG-to-Text. Recently,
effective, these approaches require invasive neurosurgery, Duan et al. (Duan et al. 2023) present DeWave, a framework
making them difficult for most other uses. that allows for the decoding of brain dynamics into natu-
Decoding methods that use non-invasive recordings could ral language without the need for eye-tracking fixations or
be more widely adopted, offering significant potential for event markers. DeWave uses a quantized variational encoder
application in both restorative and augmentative applica- to derive discrete codex encoding and align it with a pre-
tions. Non-invasive brain recordings can capture multiple trained language model. DeWave has shown superior per-
types of linguistic information (Huth et al. 2016; Broder- formance compared to the state-of-the-art methods, surpass-
ick et al. 2018; Caucheteux and King 2022), but previous ing the baseline by 3.06% and 1.9%, respectively, achieving
Subject-dependent
Brain Module LM Module
brain features

Subject Generated
Sentence

GPT-4 Re ned He was a member of the


Sentence
American Bush family,
(via APIs)
brother of President
George W. Bush
Sentence reading Word-level raw EEG recordings
(N tokens)

Figure 1: The workflow of the proposed method involves several steps. Firstly, the raw EEG signals corresponding to each
word are input into the Brain module. This module extracts subject-dependent features, which are subsequently utilized by
a Language Module based on BART suitable trained for sentence generation. The resulting sentence is further refined using
GPT-4 APIs to produce the final output. In the example, the ground truth is: He is a prominent member of the Bush family, the
younger brother of President George W. Bush; the final sentence predicted by our model is: He was a member of the American
Bush family, brother of President George W. Bush. Bold font refers to the exact match between the ground truth and the
estimated sentence.

41.35% BLEU-1 and 30.69% Rouge-F on the ZuCo dataset. proposal, on previously unseen sentences, are a BLEU-1
Nevertheless, the unexplored impact of embedding EEG score of 42.75%, a ROUGE-1-F (Lin 2004) of 33.28%, and
signals within language models raises questions about the a BERTScore-F of 53.86%, surpassing the previous state-of-
optimal approach for enhancing decoding performance. Fur- the-art results by 3.38%, 8.43%, and 6.31%, respectively.
thermore, while the analysis of EEG signals is a valuable Our code is available for public access at: https://github.
way of studying brain activity, the interpretation of these com/hamzaamrani/EEG-to-Text-Decoding
fi
signals can indeed be influenced by subjectivity (Jeng et al.
2020). A recent study by Feng et al. (Feng, Feng, and Qin Related Work
2023), on EEG-to-Text decoding task, argued that this task Related work on brain-to-speech and brain-to-text decoding
is considerably challenged by the EEG representation that can be categorized into three methods by the features they
varies with individual subjects and the text representation are capturing: motor imagery based, overt speech based, and
influenced by semantics. Lastly, current evaluation metrics inner speech based.
that primarily focus on syntactic aspects do not adequately Different BCI devices have been explored encompass-
capture the semantics, resulting in limited comprehensibil- ing Electroencephalography (EEG), Electrocorticography
ity. (ECoG), and functional Magnetic Resonance Imaging
In this paper, we present an end-to-end deep learning (fMRI).
framework for non-invasive brain recordings that uses pre- Motor imagery based systems, such as for instance, point-
trained language models for open vocabulary EEG-to-text and-click (Jarosiewicz et al. 2015; Pandarinath et al. 2017;
decoding. Firstly, our end-to-end deep learning architecture Lee et al. 2018) and imaginary handwriting (Willett et al.
for open vocabulary EEG decoding incorporates a represen- 2021), have high accuracy but moderately low typing rate.
tation learning module for raw EEG encoding, a language Overt speech based methods for decoding or synthesizing
modeling module based on BART (Lewis et al. 2019), and speech show a faster communication rate. These methods
a GPT-4 (OpenAI 2023) refinement module, enhancing the require subjects to physically speak during neural record-
comprehensibility of the generated sentences. The represen- ing (Anumanchipalli, Chartier, and Chang 2019; Makin,
tation learning module includes a subject layer, which per- Moses, and Chang 2020) or to imagine the physical pro-
mits taking into account the subjectivity of EEG signals, and nunciation of the sentence (Moses et al. 2021; Willett et al.
a multi-layer transformer encoder that allows to extract la- 2023). These approaches make the decoding to be system
tent brain representations that are then aligned into language language-dependent, since the same concept may have com-
token embeddings. Second, we use the BERTScore (Zhang pletely distinct pronunciations in different languages.
et al. 2019) in the evaluation, which incorporates semantic Inner speech based approaches try to address language
judgment at the sentence level, resulting in a more com- articulation dependencies by decoding language from imag-
prehensive evaluation that is closer to human perception. ined speech and read text (Brigham and Kumar 2010;
Thirdly, we conducted an ablation study to analyze and dis- Panachakel and Ramakrishnan 2021; Wang and Ji 2022; Ni-
tinguish the contributions of each module within our pro- eto et al. 2022; Défossez et al. 2023; Tang et al. 2023).
posal, providing valuable insights for future research work. A major limitation for most of the approaches discussed
To demonstrate the efficacy of our approach, comprehen- is the constraint of using small closed vocabularies, with a
sive evaluations are conducted on two publicly available low and limited number of unique words (Pereira et al. 2018;
datasets, ZuCo v1.0 and v2.0 (Hollenstein et al. 2018, 2019), Dash, Ferrari, and Wang 2020; Moses et al. 2021).
comprising EEG recordings from 30 subjects actively en- In addition, most current approaches (Willett et al. 2021,
gaged in natural reading tasks. The results achieved by our 2023; Défossez et al. 2023) for language communication
use invasive devices (such as ECoG) or less accessible non- remains untrained. We start detailing the specifics of the
invasive devices (such as fMRI). This makes it challenging training stages. Then we offer a more comprehensive break-
to collect large-scale datasets and implement approaches to down of each module included in our architecture.
help people with paralysis who can no longer speak. Never-
theless, recent studies attempt to decode inner speech by us- Training Stage 1 We initiate training with the Brain mod-
ing both open vocabularies and non-invasive devices (Wang ule: word-level EEG signals are aligned with word-tokens,
and Ji 2022; Défossez et al. 2023; Duan et al. 2023). as encoded by a locked, pre-trained BART Language Model,
Our work opens the doors for similar studies of inner utilizing a Mean Square Error (MSE) Loss. This stage incor-
speech brain-to-text decoding. We investigate the represen- porates a learnable features module designed to account for
tation learning of EEG signals, the inter-subject variability, EEG encoding and subjectivity. The outcome of this training
the human judgment at the sentence level of generated sen- stage yields EEG subject-dependent features. The alignment
tences, and the use of pre-trained language models. procedure is done by mapping the learned EEG representa-
te
tion Z into the BART token embeddings BARTenc , using
te
MSE regression loss LM SE (BARTenc , Z):
Method
We aim to decode neural activity from a time series of high- te
min LM SE (BARTenc , fbrain (X)) (2)
dimensional brain signals recorded with non-invasive elec- fbrain
troencephalography during the natural reading of English
Training Stage 2 Moving on, the subsequent step involves
sentences. We first define the general task of open vocabu-
fine-tuning a pre-trained Language Model based on BART,
lary EEG-to-Text decoding and then introduce the proposed
aimed at generating word sequences through the utilization
end-to-end architecture.
of a Cross-Entropy Loss. As in Wang et al. (Wang and Ji
2022), we use the mapped embedded brain representation
Open Vocabulary EEG-to-Text Decoding
Z directly as initial word embeddings to feed into the pre-
Let’s define a sequence of word-level raw EEG signals as trained language model encoder-decoder BART (Lewis et al.
X ∈ RC×T , with C the number of EEG channels and T the 2019). The high-level idea here is that we consider each em-
number of time steps. These EEG signals are a reflection of bedded EEG representation as a word-level representation,
the recorded brain activity for a specific subject denoted as and leverage a pre-trained language model to decode to real
s, drawn from the set S consisting of various distinct sub- human language (English) like traditional machine transla-
jects. An EEG-decoding task is the task of predicting the tion tasks. Then, the last hidden states from the BART de-
corresponding text sentence Y in a Sequence-To-Sequence coder are fed into a multi-layer perception (MLP) to gener-
framework. Each text sentence Y is composed of English to- ate English tokens y n from the BART vocabulary V.
kens yn ∈ V from an open vocabulary V. During the training During the training, the objective is to minimize the text
phase, the EEG-subject-Text pairs can come from various reconstruction cross-entropy loss, defined as follows:
subjects and various categories of reading materials.
Thus, a supervised EEG-to-Text decoding task consists in N
X
finding a decoding function f : {C ×T }×S → V, such that Lrec = − log p(y n ∈ V) (3)
f predicts Y given X and s. We denote by Y = f (X, s) the n=1
decoded/predicted text sentence from the brain signals.
Learnable Features Module This module is included in
Searching for f , the task is to maximize the probability of
the Brain module and it is used for extracting subject-
the decoded text sentence Y :
dependent brain features from the raw EEG signals.
N Given a sequence of word-level raw EEG signals X =
{x0 , x1 , ..., xM } ∈ RC×T and the corresponding subject
Y
p(Y |X) = p(y n ∈ V|X, y <n ) (1)
n=1
s ∈ S, we first use a deep neural network fbrain to
get the latent subject-dependent brain representation Z =
where N is the length of the text sentence Y , and y n is {z0 , z1 , ..., zM } = fbrain (X) ∈ R. This architecture (Fig-
the n-th token of Y . ure 3) consists of (1) a learnable EEG feature block fol-
lowed (2) by a subject layer to leverage inter-subject vari-
Proposed Architecture ability, which is input to (3) a multi-layer transformer en-
An overview of the proposed architecture is given in Fig- coder named BT E (Brain Transformer Encoder), and then
ure 1 (refer to Appendix A for a detailed overview of the to (4) a multi-layer perceptron.
architecture). It is composed of two main components: 1) The brain data is first fed to a bi-directional Gated Recur-
a Brain module that implements a representation learning rent Unit (GRU) (Cho et al. 2014) which reads the multi-
approach for EEG encoding; and 2) a Language Modeling time series input in both forward and backward directions
module based on BART to produce EEG-to-Text sentences to extract learnable EEG features. The use of GRU al-
and on GPT-4 for sentence-level refinement. The training lows to dynamic address the different lengths of word-level
process is composed of two stages. An overview of the end- raw EEG signals. We then apply a fully-connected layer
to-end architecture is presented in Figure 2, where dashed to the concatenated forward and backward output. Simi-
boxes correspond to the modules of the architecture that un- larly to (Défossez et al. 2023), we then add a 1x1 point-
dergo training, while solid boxes represent the module that wise convolution (with a kernel size of 1) without activa-
Brain Module
Subject-dependent
Language Alignment with MSE Loss
brain features

Learnable features LM BART encoder


(unlocked from the scratch) (locked pre-trained)

EEG EEG … EEG EEG Token Token … Token Token


N words
ID ID ID ID

N raw EEGs Subject

Generated LM BART encoder-decoder


Cross Entropy Loss
Sentence (unlocked pre-trained)

LM Module

Figure 2: Overview of the proposed end-to-end architecture for open vocabulary EEG-to-Text decoding. Firstly, a sequence of
word-level raw EEG signals is fed to the Brain module to extract deep-embedded representations for raw EEG encoding. Then,
we use a Language Modeling (LM) module to generate EEG-to-Text sentences by leveraging the pre-trained language model
BART.

Brain features to be aligned j


outputs BT Eout of the j-th layer, become the inputs to the
L
(j + 1)-th layer. Then, the final outputs BT Eout are fed into
MLP
a residual M LP network, composed of two fully connected
layers, obtaining the latent brain representations zm . As we
Subject-dependent

Subject Layer
Add & norm
will demonstrate subsequently in the ablation study, opting
block

to process the raw EEG signals using a recurrent neural net-


Positionwise
Brain Transformer

Point-wise 1D Convolution
work, rather than directly handling pre-computed features
Encoder (BTE)

FFN

xn as performed by Wang et al. (Wang and Ji 2022), facili-


Add & norm
tates the extraction of subject-dependent nuances present in
Learnable EEG

Linear Projection Layer


feature block

the brain recordings. These distinctive characteristics would


Multi-head
GRU GRU GRU GRU GRU
attention otherwise remain entirely overlooked.
Sentence Refinement during Inference During the infer-
Learnable Position
Raw EEG signals Subject +
Embedding ence phase, we propose the use of the pre-trained language
model GPT-4 (OpenAI 2023) via APIs on top of the gener-
ated text sentence Y . It results in significant improvements
Figure 3: The Learnable features module consists of (1) a
in text comprehensibility, as well as a reduction in grammat-
learnable EEG feature block, (2) a subject layer to leverage
ical errors and repetitive words, enhancing the utility and ef-
inter-subject variability, (3) a multi-layer transformer (Brain
fectiveness of the generated text sentence. The prompt used
Transformer Encoder), and (4) an MLP.
for the refinement is as follows:
As a text reconstructor, your task is to restore corrupted
sentences to their original form while making minimum
changes. You should adjust the spaces and punctuation
tion and a number D of output channels. To leverage inter- marks as necessary. Do not introduce any additional infor-
subject variability, we learn a row vector rs ∈ RD for each mation. If you are unable to reconstruct the text, respond
subject s ∈ S and apply it along the channel dimension. with [False]. Reconstruct the following text: [text sentence
We then apply a multi-layer transformer encoder (Vaswani Y ].
et al. 2017) BT E with L layers, each with H attention
heads and intermediate hidden dimension dh . The inputs
to the first layer BT Ein0
are produced using a weight ma-
Experiments
trix Win ∈ R dh ×l
and combined with a learnable 1D po- Data
sition embedding P (Dosovitskiy et al. 2020), which is We use Zurich Cognitive Language Processing Corpus
randomly initialized. Each layer applies self-attention with (ZuCo) (Hollenstein et al. 2018, 2019) datasets, which con-
causal attention masking and a feed-forward layer to the in- tain simultaneous electroencephalography and eye-tracking
put, with layer normalization (Ba, Kiros, and Hinton 2016) (ET) data recorded from natural reading tasks. The read-
and dropout (Srivastava et al. 2014) being applied after. The ing tasks include Normal Reading (NR) and Task-Specific
Table 1: ZuCo datasets statistics for each reading task. NR Ubuntu 22.04, 32GB RAM and 2 Nvidia GeForce GTX
stands for Normal Reading, while TSR stands for Task- 1070 with 8GB Memory.
Specific-Reading.
Evaluation In our experiments, we use BLEU and
Reading Task #Sentences #Train #Val #Test ROUGE metrics (Papineni et al. 2002; Lin 2004) to mea-
sure the number of words shared by two sequences. How-
NR v1.0 300 3,609 467 456 ever, the lexical congruence may not fully encapsulate se-
NR v2.0 349 2,645 343 350 mantic similarity due to lexical variations denoting similar
TSR v1.0 407 4,456 522 601 meanings. To this end, we use BERTScore (Zhang et al.
2019), an approach that uses machine learning to capture the
semantic similarity between two sequences by leveraging
Reading (TSR). The reading corpus of ZuCo are from movie advanced language representations derived from the BERT
reviews (Socher et al. 2013) and Wikipedia articles. We used model (Devlin et al. 2018). BERTScore allows the integra-
data from all the subjects in ZuCo v1.0 and v2.0 (12 and tion of semantic similarity at the sentence level, leading to a
18 respectively). For the EEG recordings, high-density data more comprehensive evaluation that aligns with human per-
were recorded at a sampling rate of 500 Hz with a band- ception.
pass of 0.1 to 100 Hz, using a 128-channel EEG Geodesic
Hydrocel system (Electrical Geodesics). The recording ref- Results
erence was set at electrode Cz. We follow Hollenstein et al.
steps (Hollenstein et al. 2018, 2019) to perform data pre- Improving Decoding Accuracy
processing on raw EEG signals, leading to 105 EEG chan- We compared our architecture with the current state-of-the-
nels from the scalp recordings. art models by Wang et al. (Wang and Ji 2022) and Duan
In this paper, we use concatenated sequences of word- et al. (Duan et al. 2023). As shown in Table 2, our pro-
level raw EEG signals, which were synchronized with ET posal achieves a BLEU-1 score of 42.75%, a ROUGE-1-F
fixations. We split each reading task’s data (by unique sen- of 33.28%, and a BERTScore-F of 53.86%, showing an im-
tences) into train, validation, and test (80%,10%,10%), as provement over the state-of-the-art by 3.38%, 8.43%, and
done by Wang et al. (Wang and Ji 2022). The sentences in 6.31%, respectively. For larger n-grams evaluation, we ob-
the test set are totally unseen. Table 1 shows the statistics of tain BLEU-{2,3,4} scores of 25.90%, 15.66%, and 9.56%
each reading task’s data. Please refer to Appendix B for a respectively, leading to an increase of 7.24%, 12.5%, and
detailed description of the electrodes used. 16.30%. Our decoding embeddings resulted in higher per-
formance for each metric, demonstrating the positive impact
Training Details of learning embedded EEG representations and exploiting
Architecture Details For the brain module, we set the intersubject variability. In Appendix C we report the ob-
GRU layer size to 512, and the fully connected layer to 1024. tained results of our architecture for each subject. The re-
The 1d convolution maps to 64 channels and the 1d subject sults show a significant difference between v1.0 and v2.0
vector size is set to 64. The BTE has 12 layers and 8 atten- participants. On average, v2.0 participants outperform v1.0
tion heads, with an intermediate hidden dimension of 4096 participants by 19.64%, 42.61%, and 11.83% for BLEU-1,
and GELU activations (Hendrycks and Gimpel 2016). The ROUGE1-F, and BERTScore-F respectively.
last hidden states of BTE are projected on a feature space In addition to numerical results, we report decoding ex-
of 1024. Then, we use the large version of BART, with 12 amples of generated EEG-to-Text sentences compared to the
layers for the encoder and decoder, 8 attention heads, and an ground truth and the state of the art, with and without GPT-4
intermediate hidden dimension of 4096. For GPT-4, we use sentence refinement (Table 3). We observe that our model
OpenAI’s APIs and the model version gpt-4. is sometimes able to precisely capture named entities that
do not exist in the training set. “George W. Bush” in (1)
Optimization Settings During training, we use the SGD and “Puerto Rico” in (2) are correctly decoded, while “pres-
optimizer with a cyclical learning rate set with 5e − 7 and idential election” in (3) is incorrectly decoded. Compared
5e − 5 as initial and upper values to update model parame- to (Wang and Ji 2022), our model results in significant im-
ters. The batch size is set to 1 during the mapping between provements in text comprehensibility, as well as a reduction
brain and word embeddings, and then 8 during the training in grammatical errors and repetitive words, as shown in ex-
phase. The number of epochs is set to 25. During the training ample (4). Please refer to Appendix D to see additional de-
phase, we freeze the brain module weights. During infer- coding examples of generated EEG-to-Text sentences.
ence, we use the model parameters on the best checkpoint The complexity of open vocabulary EEG decoding tasks
based on the performance of the validation set. arises from the high dimensionality, intersubjectivity, and
For our architecture implementation, we use PyTorch1 variability of EEG data, coupled with the intrinsic diffi-
and Transformers (HuggingFace)2 libraries. Both Stage1 culties associated with the language decoding capabilities
and Stage2 were trained on a workstation equipped with of AI-based language models. Our improvements represent
significant progress in overcoming these multiple challenges
1
https://github.com/pytorch/pytorch and suggest a promising direction for future research in non-
2
https://github.com/huggingface/transformers invasive brain decoding.
Table 2: Open Vocabulary EEG-to-Text decoding model evaluation on ZuCo datasets. We compare our architecture (without
GPT-4 sentence refinement since it is used just on the inference phase) with the current state-of-the-art by using three distinct
metrics: BLEU-N (N = 1, 2, 3, 4), ROUGE-1 (Precision, Recall, and F1 scores), and BERTScore (Precision, Recall, and F1
scores). We also report ablations and the hypothetical upper limit for BART with fixation words when no errors are made to
map EEG signals to token words. Bold numbers indicate the first best result, Underline numbers indicate the second best result.

Method BLEU-N (%) ↑ ROUGE-1 (%) ↑ BERTScore (%) ↑


N=1 N=2 N=3 N=4 R P F P R F
(Wang and Ji 2022) 40.1 23.1 12.5 6.8 28.8 31.7 30.1 48.84 52.71 50.66
(Duan et al. 2023) 41.35 24.15 13.92 8.22 28.82 33.71 30.69 - - -
Our Architecture 42.75 25.90 15.66 9.56 30.60 36.71 33.28 52.62 55.26 53.86
w/o subject layer 41.51 24.41 14.31 8.38 29.22 35.40 31.92 51.09 53.93 52.43
w/o language alignment 41.30 24.50 14.14 8.40 29.16 35.76 32.00 50.82 53.62 52.16
w/o BTE 35.51 20.51 12.61 8.98 25.62 26.38 25.83 46.44 50.52 48.34
w/o BART finetuning 28.50 14.35 7.01 3.38 21.32 23.07 22.03 39.67 47.90 43.13
BART with fixation words 72.45 62.16 53.80 46.84 67.16 75.25 70.65 66.72 74.47 69.89

Table 3: Open Vocabulary EEG-to-Text decoding examples on ZuCo unseen test sentences. We report both predictions from
our model, with and without GPT-4 sentence refinement. (1-3) are in NR v1.0, v2.0. (4) is in SR v1.0. Bold means exact match,
Italic indicates semantic similarity. Underline denotes error match.

(1) Ground truth He is a prominent member of the Bush family, the younger brother of President George W. Bush...
(Wang and Ji 2022) was a former member of the American family, and son brother of President George W. Bush...
Prediction was the member member of the American family. and younger brother of President George W. Bush
Prediction + GPT-4 He was a member of the American Bush family, brother of President George W. Bush. . .
(2) Ground truth Raymond Arrieta (born March 26, 1965 in San Juan, Puerto Rico) is considered by many to be one
of Puerto Rico’s greatest comedians.
(Wang and Ji 2022) mond wasaga,19 in 17, 18) New Francisco, Puerto Rico) is a one many to be the of the Rico’s greatest
poets.
Prediction mond wasaga (born April 17, 1946) New Francisco, Puerto Rico) is a one many to be the of the Rico’s most artists.
Prediction + GPT-4 Ramon Wasaga (born April 17, 1946, in New Francisco, Puerto Rico) is one of the many to be considered
as one of the most prominent artists of Puerto Rico.
(3) Ground truth Following the 1980 presidential election, Bush and his family moved to Miami-Dade County, Florida.
(Wang and Ji 2022) the deaths election, the was his wife moved to California,Dade County, Florida
Prediction the wars election, Bush was his wife moved to Florida,Dade County, Florida.
Prediction + GPT-4 After the war’s election, Bush and his wife moved to Dade County, Florida.
(4) Ground truth It’s not a particularly good film, but neither is it a monsterous one.
(Wang and Ji 2022) was a a bad good story, but it is it bad bad. one.
Prediction ’s a a bad good movie, but it is it bad bad. one.
Prediction + GPT-4 It’s a bad good movie, but is it a bad one.

Embedding Visualization We provide a visual compari- Ablations


son via t-distributed stochastic neighbor embedding (t-SNE)
between the precalculated EEG features (Figure 4 (left)) as Our ablations highlight the importance of (1) the subject
used by Wang et al. (Wang and Ji 2022), and EEG embedded layer, (2) the language alignment, (3) the use of the Brain
representations obtained by the proposed Brain module (Fig- Transformer Encoder, and (4) the BART finetuning (Ta-
ure 4 (right)). Distinct colors refer to different subjects. Each ble 2). First, a model trained to generate EEG-to-Text sen-
dot represents a sentence. The red triangle represents the tences without the use of the subject layer achieves lower
EEG embedded representations corresponding to the same decoding accuracy on average across datasets, that is, about
sentence “With his interest in race cars, he formed a sec- 1 − 1.5% lower than our model. While modest, these scores
ond company, the Henry Ford Company.” We can observe show the positive effect of leveraging inter-subject variabil-
that our learned EEG representations of sentences from the ity. Second, we show the effect of using language alignment
same subject are much more grouped compared with pre- with MSE. The results show small differences, especially in
calculated EEG representations, denoting the capacity of our the BLEU and ROUGE scores. For BERTScore we see small
latent space to model EEG subjectivity. improvements. Thirdly, sentence generation without the
Brain Transformer Encoder shows a significant drop in per-
formance compared to our model. For BLEU-1 the decrease
is 7.24%, while for BLEU-2 is 5.39%. While, ROUGE-1-
F and BERTScore-F lose 7.45% and 5.52%, respectively.
ZAB
ZDM
ZDN
ZGW
ZJM
ZJN
ZJS
ZKB
ZKH
ZKW
ZMG
ZPH
YSD
YFS
YMD
YAC
YFR
YHS
YLS
YDG
YRH
YRK
YMS
YIS
YTL
YSL
YRP
YAG
YDR
YAK

Figure 4: t-SNE visualization of EEG embedded representations of sentences in the training set, which are (a) original EEG
representations and (b) generated by the Brain module of our architecture. Distinct colors mean different subjects. Each dot
represents a sentence. The red triangle represents the EEG embedded representations corresponding to the same sentence ”With
his interest in race cars, he formed a second company, the Henry Ford Company”.

We verified that the Brain Transformer Encoder provides declared consent of the participants. Fortunately, the current
higher decoding performances. Finally, to test whether our nature of acquiring EEG and MEG (Magnetoencephalog-
model effectively leverages the pre trained BART model, raphy) signals requires participant awareness, unlike other
we trained it without fine-tuning the BART model weights. biomarkers such as DNA or facial features. Additionally, the
As reported, decoding performance decreases notably up to susceptibility of these signals to corruption by muscle move-
14.25%. This loss significantly confirms the use of fine- ments, such as teeth clenching or eye blinks, provides a pos-
tuning on the BART model. sible precaution against unauthorized acquisition and mis-
Then, we also show the hypothetical upper limit for EEG- use. Furthermore, it is critical to acknowledge the potential
to-Text decoding when no errors are made to map EEG sig- risk associated with the high subjectivity of neural signals,
nals to token words. Separately from our model, we fine- even in the absence of participant awareness, which could
tuned BART on only Eye-Tracking fixations words without compromise mental privacy.
considering the raw EEG signals to reconstruct the original We strongly believe that promoting and encouraging open
text sentence. It outperforms our proposed architecture by science practices remains essential for responsibly assessing
about 30% in terms of BLEU-1, 37% in terms of ROUGE- the potential risks and benefits associated with BCI and AI
1-F, and 15% in terms of BERTScore-F. The obtained re- technologies in this domain.
sults reveal the existence of two challenges within the EEG-
to-Text decoding task. The initial challenge pertains to the Conclusions and Future Works
model’s capacity to establish a dependable EEG-feature rep-
In this paper, we present an end-to-end deep learning frame-
resentation for the word tokens. The subsequent challenge
work for open vocabulary EEG-to-Text decoding task. By
involves the faithful reconstruction of the sentence. This ex-
leveraging a subject-dependent representation learning mod-
periment highlights that, between these two challenges, the
ule, a pre-trained BART language model, and a GPT-4 sen-
foremost one is undoubtedly the ability to discern an effi-
tence refinement module, this study offers a comprehensive
cacious representation of the EEG signals. This observation
solution that not only enhances decoding performance but
thereby points towards the direction of future research ef-
also delves into the human comprehensibility of the decoded
forts.
output. The incorporation of the BERTScore as an evalu-
ation metric has enabled a more holistic assessment, cap-
Ethical Implications turing not only syntactic accuracy but also taking into ac-
While the recent advancements in utilizing brain-computer count human understanding at the sentence level. Moreover,
interfaces and artificial intelligence to decode neural activity the conducted ablation study permitted us to understand the
into text hold significant potential in aiding individuals with contribution to the proposed architecture of each compo-
communication deficits, ethical considerations, and societal nent. This in-depth analysis not only validates the efficacy
impact must be carefully addressed. The scientific commu- of each module but also provides a roadmap for further re-
nity must maintain vigilance and ensure that the utilization search, guiding the development of refined and optimized
of such systems is not employed without the informed and approaches in the future.
The empirical validation on two publicly available Caucheteux, C.; and King, J.-R. 2022. Brains and algorithms
datasets demonstrates the effectiveness of the proposed ar- partially converge in natural language processing. Commu-
chitecture, achieving a BLEU-1 score of 42.75%, a ROUGE- nications biology, 5(1): 134.
1-F of 33.28%, and a BERTScore-F of 53.86%, outperform- Cho, K.; Van Merriënboer, B.; Bahdanau, D.; and Ben-
ing the previous state-of-the-art results by 3.38%, 8.43%, gio, Y. 2014. On the properties of neural machine
and 6.31%, respectively. When looking at larger n-grams translation: Encoder-decoder approaches. arXiv preprint
ratings (BLEU-2,3,4), there is an improvement of 7.24%, arXiv:1409.1259.
12.5%, and 16.30%, respectively. Our results show that the Dash, D.; Ferrari, P.; and Wang, J. 2020. Decoding imag-
use of raw EEG signals leads to improved results, demon- ined and spoken phrases from non-invasive neural (MEG)
strating the effectiveness of modern representational learn- signals. Frontiers in neuroscience, 14: 290.
ing approaches in neuroscience.
Défossez, A.; Caucheteux, C.; Rapin, J.; Kabeli, O.; and
In summary, this research not only fills critical voids in the King, J.-R. 2023. Decoding speech perception from non-
EEG decoding landscape but also shows the way for future invasive brain recordings. Nature Machine Intelligence, 1–
investigations. By combining advanced neural network ar- 11.
chitectures with sophisticated evaluation methodologies, the
Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2018.
study pushes the boundaries of EEG-to-text decoding and
Bert: Pre-training of deep bidirectional transformers for lan-
encourages continued innovation in the pursuit of more ac-
guage understanding. arXiv preprint arXiv:1810.04805.
curate and human-aligned results.
One future direction is to improve the quality of the gener- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn,
ated embedded representations by taking into account inter- D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.;
subject variability, so to increase the ability of the model Heigold, G.; Gelly, S.; et al. 2020. An image is worth 16x16
to generalize across individuals. Furthermore, ethical con- words: Transformers for image recognition at scale. arXiv
siderations need to be at the forefront as we move forward. preprint arXiv:2010.11929.
Ensuring privacy, establishing clear guidelines for consent, Duan, Y.; Zhou, J.; Wang, Z.; Wang, Y.-K.; and Lin,
and considering the potential long-term effects of this tech- C.-T. 2023. DeWave: Discrete EEG Waves Encoding
nology on users are critical. for Brain Dynamics to Text Translation. arXiv preprint
arXiv:2309.14030.
Feng, X.; Feng, X.; and Qin, B. 2023. Semantic-aware
Acknowledgement Contrastive Learning for Electroencephalography-to-Text
This work was partially funded by the National Plan for Generation with Curriculum Learning. arXiv preprint
NRRP Complementary Investments (PNC, established with arXiv:2301.09237.
the decree-law 6 May 2021, n. 59, converted by law n. Gauthier, J.; and Ivanova, A. 2018. Does the brain represent
101 of 2021) in the call for the funding of research ini- words? An evaluation of brain decoding studies of language
tiatives for technologies and innovative trajectories in the understanding. arXiv preprint arXiv:1806.00591.
health and care sectors (Directorial Decree n. 931 of 06- Hendrycks, D.; and Gimpel, K. 2016. Gaussian error linear
06-2022) - project n. PNC0000003 - AdvaNced Technolo- units (gelus). arXiv preprint arXiv:1606.08415.
gies for Human-centrEd Medicine (project acronym: AN-
Hollenstein, N.; Rotsztejn, J.; Troendle, M.; Pedroni, A.;
THEM). This work reflects only the authors’ views and
Zhang, C.; and Langer, N. 2018. ZuCo, a simultaneous EEG
opinions, neither the Ministry for University and Research
and eye-tracking resource for natural sentence reading. Sci-
nor the European Commission can be considered responsi-
entific data, 5(1): 1–13.
ble for them.
Hollenstein, N.; Troendle, M.; Zhang, C.; and Langer, N.
2019. ZuCo 2.0: A dataset of physiological recordings
References during natural reading and annotation. arXiv preprint
Anumanchipalli, G. K.; Chartier, J.; and Chang, E. F. 2019. arXiv:1912.00903.
Speech synthesis from neural decoding of spoken sentences. Huth, A. G.; De Heer, W. A.; Griffiths, T. L.; Theunissen,
Nature, 568(7753): 493–498. F. E.; and Gallant, J. L. 2016. Natural speech reveals the
semantic maps that tile human cerebral cortex. Nature,
Ba, J. L.; Kiros, J. R.; and Hinton, G. E. 2016. Layer nor-
532(7600): 453–458.
malization. arXiv preprint arXiv:1607.06450.
Jarosiewicz, B.; Sarma, A. A.; Bacher, D.; Masse, N. Y.;
Brigham, K.; and Kumar, B. V. 2010. Imagined speech clas- Simeral, J. D.; Sorice, B.; Oakley, E. M.; Blabe, C.; Pan-
sification with EEG signals for silent communication: a pre- darinath, C.; Gilja, V.; et al. 2015. Virtual typing by people
liminary investigation into synthetic telepathy. In 2010 4th with tetraplegia using a self-calibrating intracortical brain-
International Conference on Bioinformatics and Biomedical computer interface. Science translational medicine, 7(313):
Engineering, 1–4. IEEE. 313ra179–313ra179.
Broderick, M. P.; Anderson, A. J.; Di Liberto, G. M.; Crosse, Jeng, P.-Y.; Wei, C.-S.; Jung, T.-P.; and Wang, L.-C.
M. J.; and Lalor, E. C. 2018. Electrophysiological correlates 2020. Low-dimensional subject representation-based trans-
of semantic dissimilarity reflect the comprehension of natu- fer learning in EEG decoding. IEEE Journal of Biomedical
ral, narrative speech. Current Biology, 28(5): 803–809. and Health Informatics, 25(6): 1915–1925.
Lee, M.-H.; Williamson, J.; Won, D.-O.; Fazli, S.; and Lee, Tang, J.; LeBel, A.; Jain, S.; and Huth, A. G. 2023. Seman-
S.-W. 2018. A high performance spelling system based on tic reconstruction of continuous language from non-invasive
EEG-EOG signals with visual feedback. IEEE Transactions brain recordings. Nature Neuroscience, 1–9.
on Neural Systems and Rehabilitation Engineering, 26(7): Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones,
1443–1459. L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. At-
Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mo- tention is all you need. Advances in neural information pro-
hamed, A.; Levy, O.; Stoyanov, V.; and Zettlemoyer, L. cessing systems, 30.
2019. Bart: Denoising sequence-to-sequence pre-training Wang, C.; Subramaniam, V.; Yaari, A. U.; Kreiman, G.;
for natural language generation, translation, and comprehen- Katz, B.; Cases, I.; and Barbu, A. 2023. BrainBERT: Self-
sion. arXiv preprint arXiv:1910.13461. supervised representation learning for intracranial record-
Lin, C.-Y. 2004. Rouge: A package for automatic evaluation ings. arXiv preprint arXiv:2302.14367.
of summaries. In Text summarization branches out, 74–81. Wang, Z.; and Ji, H. 2022. Open vocabulary
Makin, J. G.; Moses, D. A.; and Chang, E. F. 2020. Ma- electroencephalography-to-text decoding and zero-shot
chine translation of cortical activity to text with an encoder– sentiment classification. In Proceedings of the AAAI Con-
decoder framework. Nature neuroscience, 23(4): 575–582. ference on Artificial Intelligence, volume 36, 5350–5358.
Moses, D. A.; Metzger, S. L.; Liu, J. R.; Anumanchipalli, Willett, F. R.; Avansino, D. T.; Hochberg, L. R.; Henderson,
G. K.; Makin, J. G.; Sun, P. F.; Chartier, J.; Dougherty, M. E.; J. M.; and Shenoy, K. V. 2021. High-performance brain-
Liu, P. M.; Abrams, G. M.; et al. 2021. Neuroprosthesis for to-text communication via handwriting. Nature, 593(7858):
decoding speech in a paralyzed person with anarthria. New 249–254.
England Journal of Medicine, 385(3): 217–227. Willett, F. R.; Kunz, E. M.; Fan, C.; Avansino, D. T.; Wilson,
Nieto, N.; Peterson, V.; Rufiner, H. L.; Kamienkowski, J. E.; G. H.; Choi, E. Y.; Kamdar, F.; Hochberg, L. R.; Druckmann,
and Spies, R. 2022. Thinking out loud, an open-access EEG- S.; Shenoy, K. V.; et al. 2023. A high-performance speech
based BCI dataset for inner speech recognition. Scientific neuroprosthesis. BioRxiv, 2023–01.
Data, 9(1): 52. Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K. Q.; and Artzi,
OpenAI. 2023. GPT-4 Technical Report. ArXiv, Y. 2019. Bertscore: Evaluating text generation with bert.
abs/2303.08774. arXiv preprint arXiv:1904.09675.
Panachakel, J. T.; and Ramakrishnan, A. G. 2021. Decoding
covert speech from EEG-a comprehensive review. Frontiers
in Neuroscience, 15: 392.
Pandarinath, C.; Nuyujukian, P.; Blabe, C. H.; Sorice, B. L.;
Saab, J.; Willett, F. R.; Hochberg, L. R.; Shenoy, K. V.;
and Henderson, J. M. 2017. High performance communi-
cation by people with paralysis using an intracortical brain-
computer interface. Elife, 6: e18554.
Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002.
Bleu: a method for automatic evaluation of machine trans-
lation. In Proceedings of the 40th annual meeting of the
Association for Computational Linguistics, 311–318.
Pereira, F.; Lou, B.; Pritchett, B.; Ritter, S.; Gershman, S. J.;
Kanwisher, N.; Botvinick, M.; and Fedorenko, E. 2018. To-
ward a universal decoder of linguistic meaning from brain
activation. Nature communications, 9(1): 963.
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.;
Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Explor-
ing the limits of transfer learning with a unified text-to-text
transformer. The Journal of Machine Learning Research,
21(1): 5485–5551.
Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning,
C. D.; Ng, A. Y.; and Potts, C. 2013. Recursive deep models
for semantic compositionality over a sentiment treebank. In
Proceedings of the 2013 conference on empirical methods in
natural language processing, 1631–1642.
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; and
Salakhutdinov, R. 2014. Dropout: a simple way to prevent
neural networks from overfitting. The journal of machine
learning research, 15(1): 1929–1958.
Appendix
A - Architecture
A detailed overview of the architecture is given in Figure 5. It is composed of two main components: 1) a Brain module that
implements a representation learning approach for EEG encoding; and 2) a Language Modeling module based on BART to
produce EEG-to-Text sentences and on GPT-4 for sentence-level refinement.
Brain Module LM Module
Mean Squared Error

Cross-Entropy
{ BART token embeddings Target Tokens {

Loss
Loss

Brain EEG embeddings Generated Tokens

MLP

Pre-Trained BART MLP Generated Text Sentence

Brain Transformer Encoder

Learnable Position
+ GPT-4
Embedding
Pre-Trained BART Decoder
Subject-speci c

Subject Layer
Block

Pre-Trained BART Encoder


Point-wise 1D Convolution
Learnable EEG

Linear Projection Layer Pre-Trained BART


Feature Block

+ Position Encoding

GRU GRU GRU GRU GRU

Raw EEG signals Subject Re ned Text Sentence

Figure 5: End-to-end architecture for open vocabulary EEG-to-Text decoding.

fi fi
B - Dataset EEG electrodes
In ZuCo dataset (Hollenstein et al. 2018, 2019), we follow Hollenstein et al. steps (Hollenstein et al. 2018, 2019) to perform
data pre-processing on raw EEG signals, leading to 105 EEG channels from the scalp recordings. It follows the full list of EEG
channels: E2, E3, E4, E5, E6, E7, E9, E10, E11, E12, E13, E15, E16, E18, E19, E20, E22, E23, E24, E26, E27, E28, E29, E30,
E31, E33, E34, E35, E36, E37, E38, E39, E40, E41, E42, E43, E44, E45, E46, E47, E50, E51, E52, E53, E54, E55, E57, E58,
E59, E60, E61, E62, E64, E65, E66, E67, E69, E70, E71, E72, E74, E75, E76, E77, E78, E79, E80, E82, E83, E84, E85, E86,
E87, E89, E90, E91, E92, E93, E95, E96, E97, E98, E100, E101, E102, E103, E104, E105, E106, E108, E109, E110, E111,
E112, E114, E115, E116, E117, E118, E120, E121, E122, E123, E124, Cz.
In this paper, the Cz EEG channel has been removed as it consists of all zeros.
C - Decoding accuracy results by subject
We report open vocabulary EEG-to-Text decoding results for each subject (see Table 4). The results show a significant difference
between subjects from the v1.0 and v2.0 of the dataset. The v2.0 results achieve a BLEU-1 score of 47.13%, a ROUGE-1-F of
40.16%, and a BERTScore-F of 57.35%, while the v1.0 results obtain a BLEU-1 score of 39.39%, a ROUGE-1-F of 28.16%,
and a BERTScore-F of 51.28%, so leading to an increment of 19.64%, 42.61% and 11.83% respectively.

Table 4: Open Vocabulary EEG-to-Text decoding model evaluation on ZuCo datasets by each subject.

Subject ZuCo BLEU-N (%) ↑ ROUGE-1 (%) ↑ BERTScore (%) ↑


N=1 N=2 N=3 N=4 R P F P R F
ZAB v1.0 39.38 22.11 11.92 6.61 25.94 30.92 28.11 49.88 52.62 51.16
ZDM v1.0 39.45 22.24 12.02 6.67 25.93 30.94 28.11 50.00 52.73 51.28
ZDN v1.0 39.06 21.93 11.81 6.63 26.12 31.25 28.35 49.80 52.45 51.04
ZGW v1.0 39.79 22.57 12.27 6.92 26.08 30.98 28.22 50.34 53.07 51.62
ZJM v1.0 39.27 21.99 11.97 6.67 25.94 30.96 28.12 49.73 52.46 51.00
ZJN v1.0 39.76 22.52 12.49 7.05 26.51 31.48 28.68 50.37 53.07 51.64
ZJS v1.0 39.22 22.49 12.23 6.82 25.66 30.29 27.69 50.47 53.23 51.76
ZKB v1.0 39.38 22.11 11.92 6.61 25.94 30.92 28.11 49.88 52.62 51.16
ZKH v1.0 39.32 22.01 11.86 6.60 26.00 31.00 28.18 49.86 52.60 51.14
ZKW v1.0 39.38 22.11 11.92 6.61 25.94 30.92 28.11 49.88 52.62 51.16
ZMG v1.0 39.29 22.22 12.02 6.70 25.95 30.93 28.12 49.93 52.68 51.22
ZPH v1.0 39.38 22.11 11.92 6.61 25.94 30.92 28.11 49.88 52.62 51.16
YSD v2.0 47.22 30.64 20.04 12.94 36.80 44.47 40.19 56.10 58.63 57.29
YFS v2.0 47.09 30.87 20.29 13.15 36.76 44.65 40.24 56.21 58.68 57.37
YMD v2.0 47.22 30.64 20.04 12.94 36.80 44.47 40.19 56.10 58.63 57.29
YAC v2.0 46.88 30.25 19.92 12.90 36.59 44.62 40.14 56.22 58.52 57.30
YFR v2.0 45.82 29.23 19.09 12.21 35.91 42.64 38.91 56.51 59.13 57.74
YHS v2.0 47.22 30.55 20.00 12.92 36.80 44.41 40.16 56.06 58.60 57.26
YLS v2.0 47.22 30.64 20.04 12.94 36.80 44.47 40.19 56.10 58.63 57.29
YDG v2.0 47.22 30.64 20.04 12.94 36.80 44.47 40.19 56.10 58.63 57.29
YRH v2.0 47.22 30.64 20.04 12.94 36.80 44.47 40.19 56.10 58.63 57.29
YRK v2.0 47.22 30.64 20.04 12.94 36.80 44.47 40.19 56.10 58.63 57.29
YMS v2.0 47.22 30.64 20.04 12.94 36.80 44.47 40.19 56.10 58.63 57.29
YIS v2.0 47.22 30.64 20.04 12.94 36.80 44.47 40.19 56.10 58.63 57.29
YTL v2.0 47.22 30.64 20.04 12.94 36.80 44.47 40.19 56.10 58.63 57.29
YSL v2.0 47.52 31.00 20.34 13.20 37.23 44.98 40.65 56.54 59.02 57.71
YRP v2.0 47.22 30.64 20.04 12.94 36.80 44.47 40.19 56.10 58.63 57.29
YAG v2.0 47.22 30.64 20.04 12.94 36.80 44.47 40.19 56.10 58.63 57.29
YDR v2.0 47.16 30.63 20.23 13.17 37.00 44.70 40.40 56.31 58.74 57.45
YAK v2.0 47.22 30.64 20.04 12.94 36.80 44.47 40.19 56.10 58.63 57.29
Average v1.0 39.39 22.2 12.03 6.71 26.0 30.96 28.16 50.0 52.73 51.28
v2.0 47.13 30.57 20.02 12.93 36.77 44.42 40.16 56.17 58.68 57.35
v1.0 + v2.0 42.75 25.90 15.66 9.56 30.60 36.71 33.28 52.62 55.26 53.86
D - Decoding Examples
We report additional decoding examples of generated EEG-to-Text sentences (see Table 5), with and without GPT-4 sentence
refinement. The prompt used for the GPT-4 sentence refinement is as follows:
As a text reconstructor, your task is to restore corrupted sentences to their original form while making minimum changes. You
should adjust the spaces and punctuation marks as necessary. Do not introduce any additional information. If you are unable
to reconstruct the text, respond with [False]. Reconstruct the following text: [text sentence Y ].

Table 5: Open Vocabulary EEG-to-Text decoding examples on ZuCo unseen test sentences, with and without GPT-4 sentence
refinement.

(1) Ground truth An amateurish, quasi-improvised acting exercise shot on ugly digital video.
Prediction interesting actor, un-religiousprovised film performance, through a, video.
Prediction + GPT-4 Interesting actor, un-religious, improvised film performance, through a video.
(2) Ground truth Viewed as a comedy, a romance, a fairy tale, or a drama, there’s nothing remotely
triumphant about this motion picture.
Prediction the from a kind of it satire, and love tale, and a love, it’s a quite funny about it film picture.
Prediction + GPT-4 From a kind of satire, it’s a love tale and quite a funny film picture about love.
(3) Ground truth It’s solid and affecting and exactly as thought-provoking as it should be.
Prediction ’s a, well. it what it-provoking as the sounds be.
Prediction + GPT-4 Well, it’s as provoking as it sounds, what a be.
(4) Ground truth It’s a head-turner – thoughtfully written, beautifully read and, finally, deeply humanizing.
Prediction s a greatyscing, a to crafted, well acted, well most, a moving..
Prediction + GPT-4 It’s a great, most moving, well-crafted and well-acted scene.
(5) Ground truth “The Kid Stays in the Picture” is a great story, terrifically told by the man who wrote
it but this Cliff Notes edition is a cheat.
Prediction The movie”ays in the House” is a film movie about andally funny by a young
who wrote it. also ish version is a little.
Prediction + GPT-4 ”The movie, ’Days in the House’, is a film about a young man who wrote it.
It’s also randomly funny. The British version is a little different.
(6) Ground truth Fans of the TV series will be disappointed, and everyone else will be slightly bored.
Prediction of the film series will recognize familiar to but the will will be happy disappointed.
Prediction + GPT-4 of the film series will recognize familiar to but the will be happy disappointed.
(7 ) Ground truth Wedding feels a bit anachronistic
Prediction alting bells like little likeachronistic,
Prediction + GPT-4 alting bells like little likeachronistic.
(8) Ground truth But what’s nice is that there’s a casual intelligence that permeates the script.
Prediction he’s most about that it’s a sense, to’sates the film.
Prediction + GPT-4 He’s most about that. It’s a sense to states the film.
(9) Ground truth An important movie, a reminder of the power of film to move us and to make us examine our values.
Prediction interesting part about but must of the importance of the to shape people. of make us think our lives.
Prediction + GPT-4 interesting part about but must of the importance of the to shape people. of make us think our lives.
(10) Ground truth Jeb Bush was born in Midland, Texas, where his father was running an oil drilling company.
Prediction uan Bush was born in Newland, Texas, and his father was a a insurance company company.
Prediction + GPT-4 Juan Bush was born in Newland, Texas, and his father was an insurance company owner.

You might also like