You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA RTX 6000 Ada Generation, compute capability 8.9, VMM: yes
Device 1: NVIDIA RTX 6000 Ada Generation, compute capability 8.9, VMM: yes
version: 5433 (759e37b)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-cli
Command line
Problem description & steps to reproduce
Error
gguf-py/gguf/lazy.py:217: RuntimeWarning: overflow encountered in cast
Description
Good morning. I've used Axolotl to work at training a 32B Lora model. From then, I've merged it into one large model. I have a "discussion" on axolotl-ai-cloud/axolotl#2705 where I discuss this some. The fine tuning went fine, the configuration is below. When I try to convert this to gguf, it throws this runtime warning. Which, when it quantizes the model, it fails entirely.
At first, I thought this may have been hardware - maybe a bad disk. But, I think I was wrong, because I bought a new NVME drive, and started fresh. Fresh build of llama.cpp and everything. Nothing fixed it.
Axolotl Config:
# Originally taken from: https://github.com/axolotl-ai-cloud/axolotl/blob/a27b909c5c1c2c561a8d503024b89afcce15226f/examples/qwen3/32b-qlora.yamlbase_model: Qwen/Qwen2.5-32Bplugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPluginstrict: falsechat_template: qwen_25datasets:
- path: ./no_git/17_suggested_changes_with_new.csvtype: alpacaval_set_size: 0eval_sample_packing: trueoutput_dir: ./no_git/17_qwenmspe_changes/dataset_prepared_path: last_run_preparedsequence_len: 4096sample_packing: truepad_to_sequence_len: trueload_in_4bit: trueadapter: qloralora_r: 32# up from 16lora_alpha: 64# Up from 32lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- down_proj
- up_projlora_mlp_kernel: truelora_qkv_kernel: truelora_o_kernel: truewandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 2micro_batch_size: 1num_epochs: 10optimizer: adamw_torch_4bitlr_scheduler: cosinelearning_rate: 0.0002bf16: autotf32: truegradient_checkpointing: offloadgradient_checkpointing_kwargs:
use_reentrant: falseresume_from_checkpoint:
logging_steps: 1flash_attention: truewarmup_steps: 50# From 10evals_per_epoch: 4saves_per_epoch: 1weight_decay: 0.01# up from 0special_tokens:
use_tensorboard: true
Commands for merge and after:
python3 -m axolotl.cli.merge_lora 17-train-qwen32b-lora.yaml --lora_model_dir="./no_git/17_qwenmspe_changes"
# v-- Breaks here with the warning I mentioned.
python3 convert_hf_to_gguf.py ~/path/to/17_qwenmspe_changes/merged/
llama-quantize ~/path/to/17_qwenmspe_changes/merged/Merged-33B-F16.gguf ~/path/to/17_qwenmspe_changes/msom-qwen32b-mspe-suggestions-Q5.gguf Q5_0
I'm at a bit of a loss at what could have caused all this. I had successful runs in the past, but now it's something else going on.
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered:
Thanks for the reply and help. I'm in the middle of another test, seeing if it was something in my original training and this one (basically I had a much earlier iteration this was based off of when I encountered the issue). That old version ran fine, this one didn't. It's training. now, and if it throws a similar error I'll try that as well. I'll update today with the tests. Thanks agian.
I did some more testing. I think it has to do with a bad drive in the machine, and nothing wrong with the code. Two different sets of tests appear to make this the case. Closing issue. Thanks @slaren for helping. I'll keep in mind those options if i need them in the future.
Name and Version
Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA RTX 6000 Ada Generation, compute capability 8.9, VMM: yes
Device 1: NVIDIA RTX 6000 Ada Generation, compute capability 8.9, VMM: yes
version: 5433 (759e37b)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-cli
Command line
Problem description & steps to reproduce
Error
gguf-py/gguf/lazy.py:217: RuntimeWarning: overflow encountered in cast
Description
Good morning. I've used Axolotl to work at training a 32B Lora model. From then, I've merged it into one large model. I have a "discussion" on axolotl-ai-cloud/axolotl#2705 where I discuss this some. The fine tuning went fine, the configuration is below. When I try to convert this to gguf, it throws this runtime warning. Which, when it quantizes the model, it fails entirely.
At first, I thought this may have been hardware - maybe a bad disk. But, I think I was wrong, because I bought a new NVME drive, and started fresh. Fresh build of llama.cpp and everything. Nothing fixed it.
Axolotl Config:
Commands for merge and after:
I'm at a bit of a loss at what could have caused all this. I had successful runs in the past, but now it's something else going on.
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: