Misc. bug: Overflow in Cast ( #13722

TheDarkTrumpet · 2025-05-23T09:13:16Z

Name and Version

Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA RTX 6000 Ada Generation, compute capability 8.9, VMM: yes
Device 1: NVIDIA RTX 6000 Ada Generation, compute capability 8.9, VMM: yes
version: 5433 (759e37b)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

Problem description & steps to reproduce

Error

gguf-py/gguf/lazy.py:217: RuntimeWarning: overflow encountered in cast

Description

Good morning. I've used Axolotl to work at training a 32B Lora model. From then, I've merged it into one large model. I have a "discussion" on axolotl-ai-cloud/axolotl#2705 where I discuss this some. The fine tuning went fine, the configuration is below. When I try to convert this to gguf, it throws this runtime warning. Which, when it quantizes the model, it fails entirely.

At first, I thought this may have been hardware - maybe a bad disk. But, I think I was wrong, because I bought a new NVME drive, and started fresh. Fresh build of llama.cpp and everything. Nothing fixed it.

Axolotl Config:

# Originally taken from: https://github.com/axolotl-ai-cloud/axolotl/blob/a27b909c5c1c2c561a8d503024b89afcce15226f/examples/qwen3/32b-qlora.yaml
base_model: Qwen/Qwen2.5-32B

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
strict: false

chat_template: qwen_25
datasets:
  - path: ./no_git/17_suggested_changes_with_new.csv
    type: alpaca
val_set_size: 0
eval_sample_packing: true  

output_dir: ./no_git/17_qwenmspe_changes/
dataset_prepared_path: last_run_prepared

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

load_in_4bit: true
adapter: qlora
lora_r: 32 # up from 16
lora_alpha: 64 # Up from 32
lora_target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - down_proj
  - up_proj
lora_mlp_kernel: true
lora_qkv_kernel: true
lora_o_kernel: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 10
optimizer: adamw_torch_4bit
lr_scheduler: cosine
learning_rate: 0.0002

bf16: auto
tf32: true

gradient_checkpointing: offload
gradient_checkpointing_kwargs:
  use_reentrant: false
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_steps: 50 # From 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.01 # up from 0
special_tokens:
use_tensorboard: true

Commands for merge and after:

python3 -m axolotl.cli.merge_lora 17-train-qwen32b-lora.yaml --lora_model_dir="./no_git/17_qwenmspe_changes"

# v-- Breaks here with the warning I mentioned.
python3 convert_hf_to_gguf.py ~/path/to/17_qwenmspe_changes/merged/

llama-quantize ~/path/to/17_qwenmspe_changes/merged/Merged-33B-F16.gguf ~/path/to/17_qwenmspe_changes/msom-qwen32b-mspe-suggestions-Q5.gguf Q5_0

I'm at a bit of a loss at what could have caused all this. I had successful runs in the past, but now it's something else going on.

First Bad Commit

No response

Relevant log output

The text was updated successfully, but these errors were encountered:

slaren · 2025-05-23T15:41:54Z

Maybe your model has values that cannot represented in F16. Try adding --outtype bf16 or --outtype f32 to the convert_hf_to_gguf.py command line.

TheDarkTrumpet · 2025-05-23T16:02:14Z

Thanks for the reply and help. I'm in the middle of another test, seeing if it was something in my original training and this one (basically I had a much earlier iteration this was based off of when I encountered the issue). That old version ran fine, this one didn't. It's training. now, and if it throws a similar error I'll try that as well. I'll update today with the tests. Thanks agian.

TheDarkTrumpet · 2025-05-23T23:44:25Z

I did some more testing. I think it has to do with a bad drive in the machine, and nothing wrong with the code. Two different sets of tests appear to make this the case. Closing issue. Thanks @slaren for helping. I'll keep in mind those options if i need them in the future.

TheDarkTrumpet added the bug-unconfirmed label May 23, 2025

TheDarkTrumpet closed this as completed May 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Overflow in Cast ( #13722

Misc. bug: Overflow in Cast ( #13722

Uh oh!

Uh oh!

Uh oh!

Misc. bug: Overflow in Cast ( #13722

Misc. bug: Overflow in Cast ( #13722

Comments

Name and Version

Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Error

Description

First Bad Commit

Relevant log output

Uh oh!

Uh oh!

Uh oh!