Code for the paper Vectorizing string entries for data processing on tables: when are larger language models better?
Edited datasets can be downloaded at https://figshare.com/articles/dataset/Datasets_with_text_entries/24879042 Links for original datasets can be found in the paper.
Computations can be launched using the files in the scripts directory starting by launch_
and encode
. These scripts are made for a SLURM cluster and should be adapted to your cluster (by changing the executor
settings)
Figures can be reproduced using the final_plot.ipynb
notebook in the scripts
folder.