Hyperbolic nanoGPT

This project explores the benefits of hyperbolic geometry in language models by modifying various components of nanoGPT. The hypothesis is that relationships in languagemight be better represented in hyperbolic rather than Euclidean space.

Current Modifications

The following components can be switched between Euclidean and hyperbolic versions:

Language Model Head (`model/model.py` -> `LorentzMLR` class)

(currently outperforms the original slightly but has to be studied more)

Attention Layer (file: `model/model.py` -> `HyperbolicSelfAttention` class)

(currently unstable but learns some curvatures)

TBD:

Embeddings

Installation

git clone https://github.com/Alex2034/hyp-nanogpt
cd hyp-nanogpt
conda env create -f env.yaml
conda activate hyp-nanogpt

Experiment Scripts

For convenience, we provide shell scripts to run multiple experiments:

run_hyp.sh - Runs experiments with adjustable components
run_euc.sh - Runs baseline Euclidean experiments
run_single.sh - Useful for single experiment runs

Key Parameters

head_mode: Choose between 'hyp' (Hyperbolic) or 'euc' (Euclidean) for the LM head
attn_mode: Choose between 'hyp' or 'euc' for the attention
curvature: Initial curvature value for hyperbolic space (if using hyperbolic components)
k_lr: Learning rate for the curvature parameter (set to 0 to keep curvature fixed)

Acknowledgements

kellerjordan/nanoGPT for the baseline implementation
karpathy/nanoGPT for the original nanoGPT
kschwethelm/HyperbolicCV for the LorentzMLR code
geoopt for the Riemannian optimization code

Name		Name	Last commit message	Last commit date
Latest commit History 1,205 Commits
data		data
lib		lib
model		model
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
debug.py		debug.py
env.yaml		env.yaml
run.sh		run.sh
run_single.sh		run_single.sh
train_gpt2_main.py		train_gpt2_main.py
train_gpt2_orig.py		train_gpt2_orig.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hyperbolic nanoGPT

Current Modifications

Language Model Head (`model/model.py` -> `LorentzMLR` class)

Attention Layer (file: `model/model.py` -> `HyperbolicSelfAttention` class)

TBD:

Installation

Experiment Scripts

Key Parameters

Acknowledgements

About

Releases

Packages

Languages

License

Alex2034/hyp-nanogpt

Folders and files

Latest commit

History

Repository files navigation

Hyperbolic nanoGPT

Current Modifications

Language Model Head (model/model.py -> LorentzMLR class)

Attention Layer (file: model/model.py -> HyperbolicSelfAttention class)

TBD:

Installation

Experiment Scripts

Key Parameters

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Language Model Head (`model/model.py` -> `LorentzMLR` class)

Attention Layer (file: `model/model.py` -> `HyperbolicSelfAttention` class)

Packages