Joint processing of linguistic properties in brains and language models, Subba Reddy Oota, Manish Gupta and Mariya Toneva, NeurIPS-2023
21st year_narratives_listening_dataset
21st year dataset statistics:
- 18 subjects
- fMRI brain recordings
- 8267 words
- 2226 TRs (Time Repetition)
- TR = 1.5 secs
How to download 21st year dataset
- Datalad can be installed using pip
python -m pip install datalad- It is highly recommended to configure Git before using DataLad. Set both 'user.name' and 'user.email' configuration variables.
- git config --global user.name "username"
- git config --global user.email emailid- git-annex installation is required for downloading the dataset
sudo apt-get install git-annexDownload the dataset using datalad
datalad install https://datasets.datalad.org/labs/hasson/narratives/derivatives/afni-nosmoothDownload each subject data (considered the fsaverage6) using bash script
cd afni-nosmooth
bash download_data.sh
python brain_data_21styear_fsaverage6.pyExtract stimuli representations using bert model with context length 20
- Narratives 21st-year Dataset
python extract_features_words.py --input_file ./Narratives/21styear_align.csv --model bert-base --sequence_length 20 --output_file bert_conext20_21styearTo build voxelwise encoding model for different stimuli representations
- five arguments are passed as input:
- #subject_number
- #layers
- stimulus vector
- context length
- output directory
cd brain_predictions
python brain_predictions_21styear_text.py 1 12 bert_conext20_21styear.npy 20 output_predictions
@inproceedings{oota2022joint,
title={Joint processing of linguistic properties in brains and language models},
author={Oota, Subba Reddy and Gupta, Manish and Toneva, Mariya},
booktitle={Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems },
year={2023}
}
