Notebooks are beginner friendly. Read our guide. Add dataset, click "Run All", and export your trained model to GGUF, Ollama, vLLM or Hugging Face.
Unsloth supports | Free Notebooks | Performance | Memory use |
---|---|---|---|
gpt-oss (20B) | 1.5x faster | 70% less | |
Gemma 3n (4B) | 1.5x faster | 50% less | |
Qwen3 (14B) | 2x faster | 70% less | |
gpt-oss (20B): GRPO | 2x faster | 80% less | |
Qwen2.5-VL (7B): GSPO | 1.5x faster | 80% less | |
Phi-4 (14B) | 2x faster | 70% less | |
Llama 3.2 Vision (11B) | 2x faster | 50% less | |
Llama 3.1 (8B) | 2x faster | 70% less | |
Mistral v0.3 (7B) | 2.2x faster | 75% less | |
Orpheus-TTS (3B) | 1.5x faster | 50% less |
- See all our notebooks for: Kaggle, GRPO, TTS & Vision
- See all our models and all our notebooks
- See detailed documentation for Unsloth here
pip install unsloth
For Windows, pip install unsloth
works only if you have Pytorch installed. Read our Windows Guide.
Use our official Unsloth Docker image unsloth/unsloth
container. Read our Docker Guide.
For RTX 50x, B200, 6000 GPUs, simply do pip install unsloth
. Read our Blackwell Guide for more details.
-
8000
- Docker: Use Unsloth with no setup & environment issues with our new image. Guide • Docker image
- gpt-oss RL: Introducing the fastest possible inference for gpt-oss RL! Read blog
- Vision RL: You can now train VLMs with GRPO or GSPO in Unsloth! Read guide
- Memory-efficient RL: We're introducing even better RL. Our new kernels & algos allows faster RL with 50% less VRAM & 10× more context. Read blog
- gpt-oss by OpenAI: For details on Unsloth Flex Attention, long-context training, bug fixes, Read our Guide. 20B works on a 14GB GPU and 120B on 65GB VRAM. gpt-oss uploads.
- Gemma 3n by Google: Read Blog. We uploaded GGUFs, 4-bit models.
- Text-to-Speech (TTS) is now supported, including
sesame/csm-1b
and STTopenai/whisper-large-v3
. - Qwen3 is now supported. Qwen3-30B-A3B fits on 17.5GB VRAM.
- Introducing Dynamic 2.0 quants that set new benchmarks on 5-shot MMLU & Aider Polyglot.
- EVERYTHING is now supported - all models (TTS, BERT, Mamba), FFT, etc. MultiGPU coming soon. Enable FFT with
full_finetuning = True
, 8-bit withload_in_8bit = True
.
Click for more news
- 📣 DeepSeek-R1 - run or fine-tune them with our guide. All model uploads: here.
- 📣 Introducing Long-context Reasoning (GRPO) in Unsloth. Train your own reasoning model with just 5GB VRAM. Transform Llama, Phi, Mistral etc. into reasoning LLMs!
- 📣 Introducing Unsloth Dynamic 4-bit Quantization! We dynamically opt not to quantize certain parameters and this greatly increases accuracy while only using <10% more VRAM than BnB 4-bit. See our collection on Hugging Face here.
- 📣 Llama 4 by Meta, including Scout & Maverick are now supported.
- 📣 Phi-4 by Microsoft: We also fixed bugs in Phi-4 and uploaded GGUFs, 4-bit.
- 📣 Vision models now supported! Llama 3.2 Vision (11B), Qwen 2.5 VL (7B) and Pixtral (12B) 2409
- 📣 Llama 3.3 (70B), Meta's latest model is supported.
- 📣 We worked with Apple to add Cut Cross Entropy. Unsloth now supports 89K context for Meta's Llama 3.3 (70B) on a 80GB GPU - 13x longer than HF+FA2. For Llama 3.1 (8B), Unsloth enables 342K context, surpassing its native 128K support.
- 📣 We found and helped fix a gradient accumulation bug! Please update Unsloth and transformers.
- 📣 We cut memory usage by a further 30% and now support 4x longer context windows!
Type | Links |
---|---|
📚 Documentation & Wiki | Read Our Docs |
Follow us on X | |
💾 Installation | Pip install |
🔮 Our Models | Unsloth Releases |
✍️ Blog | Read our Blogs |
Join our Reddit |
- Supports full-finetuning, pretraining, 4b-bit, 16-bit and 8-bit training
- Supports all models including TTS, multimodal, BERT and more! Any model that works in transformers, works in Unsloth.
- The most efficient library for Reinforcement Learning (RL), using 80% less VRAM. Supports GRPO, GSPO, DrGRPO, DAPO etc.
- 0% loss in accuracy - no approximation methods - all exact.
- All kernels written in OpenAI's Triton language. Manual backprop engine.
- Supports NVIDIA (since 2018), AMD and Intel GPUs. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc)
- Works on Linux, WSL and Windows
- If you trained a model with 🦥Unsloth, you can use this cool sticker!
You can also see our documentation for more detailed installation and updating instructions here.
Unsloth does not support Python 3.14. Use 3.13 or lower.
Install with pip (recommended) for Linux devices:
pip install unsloth
To update Unsloth:
pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo
See here for advanced pip install instructions.
-
Install NVIDIA Video Driver: You should install the latest version of your GPUs driver. Download drivers here: NVIDIA GPU Drive.
-
Install Visual Studio C++: You will need Visual Studio, with C++ installed. By default, C++ is not installed with Visual Studio, so make sure you select all of the C++ options. Also select options for Windows 10/11 SDK. For detailed instructions with options, see here.
-
Install CUDA Toolkit: Follow the instructions to install CUDA Toolkit.
-
Install PyTorch: You will need the correct version of PyTorch that is compatible with your CUDA drivers, so make sure to select them carefully. Install PyTorch.
-
Install Unsloth:
pip install unsloth
To run Unsloth directly on Windows:
- Install Triton from this Windows fork and follow the instructions here (be aware that the Windows fork requires PyTorch >= 2.4 and CUDA 12)
- In the
SFTConfig
, setdataset_num_proc=1
to avoid a crashing issue:
SFTConfig(
dataset_num_proc=1,
...
)
For advanced installation instructions or if you see weird errors during installations:
First try using an isolated environment via then pip install unsloth
python -m venv unsloth
source unsloth/bin/activate
pip install unsloth
- Install
torch
andtriton
. Go to https://pytorch.org to install it. For examplepip install torch torchvision torchaudio triton
- Confirm if CUDA is installed correctly. Try
nvcc
. If that fails, you need to installcudatoolkit
or CUDA drivers. - Install
xformers
manually via:
pip install ninja
pip install -v --no-build-isolation -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
Check if `xformers` succeeded with `python -m xformers.info` Go to https://github.com/facebookresearch/xformers. Another option is to install `flash-attn` for Ampere GPUs and ignore `xformers`
- For GRPO runs, you can try installing
vllm
and seeing ifpip install vllm
succeeds. - Double check that your versions of Python, CUDA, CUDNN,
torch
,triton
, andxformers
are compatible with one another. The PyTorch Compatibility Matrix may be useful. - Finally, install
bitsandbytes
and check it withpython -m bitsandbytes
⚠️Only use Conda if you have it. If not, use Pip
. Select either pytorch-cuda=11.8,12.1
for CUDA 11.8 or CUDA 12.1. We support python=3.10,3.11,3.12
.
conda create --name unsloth_env \
python=3.11 \
pytorch-cuda=12.1 \
pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
-y
conda activate unsloth_env
pip install unsloth
If you're looking to install Conda in a Linux environment, read here, or run the below 🔽
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh
⚠️Do **NOT** use this if you have Conda.
Pip is a bit more complex since there are dependency issues. The pip command is different for torch 2.2,2.3,2.4,2.5
and CUDA versions.
For other torch versions, we support torch211
, torch212
, torch220
, torch230
, torch240
and for CUDA versions, we support cu118
and cu121
and cu124
. For Ampere devices (A100, H100, RTX3090) and above, use cu118-ampere
or cu121-ampere
or cu124-ampere
.
For example, if you have torch 2.4
and CUDA 12.1
, use:
pip install --upgrade pip
pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"
Another example, if you have torch 2.5
and CUDA 12.4
, use:
pip install --upgrade pip
pip install "unsloth[cu124-torch250] @ git+https://github.com/unslothai/unsloth.git"
And other examples:
pip install "unsloth[cu121-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu124-ampere-torch250] @ git+https://github.com/unslothai/unsloth.git"
Or, run the below in a terminal to get the optimal pip installation command:
wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py | python -
Or, run the below manually in a Python REPL:
try: import torch
except: raise ImportError('Install torch via `pip install torch`')
from packaging.version import Version as V
import re
v = V(re.match(r"[0-9\.]{3,}", torch.__version__).group(0))
cuda = str(torch.version.cuda)
is_ampere = torch.cuda.get_device_capability()[0] >= 8
USE_ABI = torch._C._GLIBCXX_USE_CXX11_ABI
if cuda not in ("11.8", "12.1", "12.4", "12.6", "12.8"): raise RuntimeError(f"CUDA = {cuda} not supported!")
if v <= V('2.1.0'): raise RuntimeError(f"Torch = {v} too old!")
elif v <= V('2.1.1'): x = 'cu{}{}-torch211'
elif v <= V('2.1.2'): x = 'cu{}{}-torch212'
elif v < V('2.3.0'): x = 'cu{}{}-torch220'
elif v < V('2.4.0'): x = 'cu{}{}-torch230'
elif v < V('2.5.0'): x = 'cu{}{}-torch240'
elif v < V('2.5.1'): x = 'cu{}{}-torch250'
elif v <= V('2.5.1'): x = 'cu{}{}-torch251'
elif v < V('2.7.0'): x = 'cu{}{}-torch260'
elif v < V('2.7.9'): x = 'cu{}{}-torch270'
elif v < V('2.8.0'): x = 'cu{}{}-torch271'
elif v < V('2.8.9'): x = 'cu{}{}-torch280'
else: raise RuntimeError(f"Torch = {v} too new!")
if v > V('2.6.9') and cuda not in ("11.8", "12.6", "12.8"): raise RuntimeError(f"CUDA = {cuda} not supported!")
x = x.format(cuda.replace(".", ""), "-ampere" if is_ampere else "")
print(f'pip install --upgrade pip && pip install "unsloth[{x}] @ git+https://github.com/unslothai/unsloth.git"')
You can use our pre-built Docker container with all dependencies to use Unsloth instantly with no setup required. Read our guide.
This container requires installing NVIDIA's Container Toolkit.
docker run -d -e JUPYTER_PASSWORD="mypassword" \
-p 8888:8888 -p 2222:22 \
-v $(pwd)/work:/workspace/work \
--gpus all \
unsloth/unsloth
Access Jupyter Lab at http://localhost:8888
and start fine-tuning!
- Go to our official Documentation for running models, saving to GGUF, checkpointing, evaluation and more!
- Read our Guides for: Fine-tuning, Reinforcement Learning, Text-to-Speech (TTS), Vision and any model.
- We support Huggingface's transformers, TRL, Trainer, Seq2SeqTrainer and Pytorch code.
Unsloth example code to fine-tune gpt-oss-20b:
from unsloth import FastLanguageModel, FastModel
import torch
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
max_seq_length = 2048 # Supports RoPE Scaling internally, so choose any!
# Get LAION dataset
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"
dataset = load_dataset("json", data_files = {"train" : url}, split = "train")
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/gpt-oss-20b-unsloth-bnb-4bit", #or choose any model
] # More models at https://huggingface.co/unsloth
model, tokenizer = FastModel.from_pretrained(
model_name = "unsloth/gpt-oss-20b",
max_seq_length = 2048, # Choose any for long context!
load_in_4bit = True, # 4-bit quantization. False = 16-bit LoRA.
load_in_8bit = False, # 8-bit quantization
load_in_16bit = False, # [NEW!] 16-bit LoRA
full_finetuning = False, # Use for full fine-tuning.
# token = "hf_...", # use one if using gated models
)
# Do model patching and add fast LoRA weights
model = FastLanguageModel.get_peft_model(
model,
r = 16,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
max_seq_length = max_seq_length,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
trainer = SFTTrainer(
model = model,
train_dataset = dataset,
tokenizer = tokenizer,
args = SFTConfig(
max_seq_length = max_seq_length,
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 10,
max_steps = 60,
logging_steps = 1,
output_dir = "outputs",
optim = "adamw_8bit",
seed = 3407,
),
)
trainer.train()
# Go to https://docs.unsloth.ai for advanced tips like
# (1) Saving to GGUF / merging to 16bit for vLLM
# (2) Continued training from a saved LoRA adapter
# (3) Adding an evaluation loop / OOMs
# (4) Customized chat templates
RL including GRPO, GSPO, DrGRPO, DAPO, PPO, Reward Modelling, Online DPO all work with Unsloth. List of RL notebooks:
- gpt-oss GSPO notebook: Link
- Qwen2.5-VL GSPO notebook: Link
- Advanced Qwen3 GRPO notebook: Link
- ORPO notebook: Link
- DPO Zephyr notebook: Link
- KTO notebook: Link
- SimPO notebook: Link
- For our most detailed benchmarks, read our Llama 3.3 Blog.
- Benchmarking of Unsloth was also conducted by 🤗Hugging Face.
We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):
Model | VRAM | 🦥 Unsloth speed | 🦥 VRAM reduction | 🦥 Longer context | 😊 Hugging Face + FA2 |
---|---|---|---|---|---|
Llama 3.3 (70B) | 80GB | 2x | >75% | 13x longer | 1x |
Llama 3.1 (8B) | 80GB | 2x | >70% | 12x longer | 1x |
We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
GPU VRAM | 🦥Unsloth context length | Hugging Face + FA2 |
---|---|---|
8 GB | 2,972 | OOM |
12 GB | 21,848 | 932 |
16 GB | 40,724 | 2,551 |
24 GB | 78,475 | 5,789 |
40 GB | 153,977 | 12,264 |
48 GB | 191,728 | 15,502 |
80 GB | 342,733 | 28,454 |
We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
GPU VRAM | 🦥Unsloth context length | Hugging Face + FA2 |
---|---|---|
48 GB | 12,106 | OOM |
80 GB | 89,389 | 6,916 |
You can cite the Unsloth repo as follows:
@software{unsloth,
author = {Daniel Han, Michael Han and Unsloth team},
title = {Unsloth},
url = {http://github.com/unslothai/unsloth},
year = {2023}
}
- The llama.cpp library that lets users save models with Unsloth
- The Hugging Face team and their libraries: transformers and TRL
- The Pytorch and Torch AO team for their contributions
- Erik for his help adding Apple's ML Cross Entropy in Unsloth
- Etherl for adding support for TTS, diffusion and BERT models
- And of course for every single person who has contributed or has used Unsloth!