Forked from kellerjordan/nanoGPT
This project explores the benefits of hyperbolic geometry in language models by modifying various components of nanoGPT. The hypothesis is that relationships in languagemight be better represented in hyperbolic rather than Euclidean space.
The following components can be switched between Euclidean and hyperbolic versions:
(currently outperforms the original slightly but has to be studied more)
(currently unstable but learns some curvatures)
Embeddings
git clone https://github.com/Alex2034/hyp-nanogpt
cd hyp-nanogpt
conda env create -f env.yaml
conda activate hyp-nanogpt
For convenience, we provide shell scripts to run multiple experiments:
run_hyp.sh
- Runs experiments with adjustable componentsrun_euc.sh
- Runs baseline Euclidean experimentsrun_single.sh
- Useful for single experiment runs
head_mode
: Choose between 'hyp' (Hyperbolic) or 'euc' (Euclidean) for the LM headattn_mode
: Choose between 'hyp' or 'euc' for the attentioncurvature
: Initial curvature value for hyperbolic space (if using hyperbolic components)k_lr
: Learning rate for the curvature parameter (set to 0 to keep curvature fixed)
- kellerjordan/nanoGPT for the baseline implementation
- karpathy/nanoGPT for the original nanoGPT
- kschwethelm/HyperbolicCV for the LorentzMLR code
- geoopt for the Riemannian optimization code