Closed
Description
Prerequisites
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
Attempted to load "Phi-3-mini-4k-instruct-q4.gguf" with latest (0.2.84) release. Expected the load to be successful
Current Behavior
Load fails:
<snip>
llama_model_loader: - type q5_K: 32 tensors
llama_model_loader: - type q6_K: 17 tensors
llama_model_load: error loading model: error loading model hyperparameters: key not found in model: phi3.attention.sliding_window
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
File "C:\Users\riedgar\source\repos\guidance\llama_load_error.py", line 10, in <module>
llama_model = llama_cpp.Llama(downloaded_file, logits_all=True, verbose=True, n_ctx=4096)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\riedgar\AppData\Local\miniconda3\envs\guidance-312\Lib\site-packages\llama_cpp\llama.py", line 372, in __init__
_LlamaModel(
File "C:\Users\riedgar\AppData\Local\miniconda3\envs\guidance-312\Lib\site-packages\llama_cpp\_internals.py", line 55, in __init__
raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: C:\Users\riedgar\.cache\huggingface\hub\models--microsoft--Phi-3-mini-4k-instruct-gguf\snapshots\999f761fe19e26cf1a339a5ec5f9f201301cbb83\Phi-3-mini-4k-instruct-q4.gguf
Exception ignored in: <function Llama.__del__ at 0x0000029C3603D4E0>
Traceback (most recent call last):
File "C:\Users\riedgar\AppData\Local\miniconda3\envs\guidance-312\Lib\site-packages\llama_cpp\llama.py", line 2089, in __del__
if self._lora_adapter is not None:
^^^^^^^^^^^^^^^^^^
AttributeError: 'Llama' object has no attribute '_lora_adapter'
This was working with previous llama-cpp-python versions.
Environment and Context
- Physical (or virtual) hardware you are using, e.g. for Linux:
i7 13800-H (for Windows)
For Linux, using GitHub-provided Runner machines
- Operating System, e.g. for Linux:
Windows and Linux
- SDK version, e.g. for Linux:
Python 3.12.3
Failure Information (for bugs)
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
Steps to Reproduce
import llama_cpp
from huggingface_hub import hf_hub_download
repo_id = "microsoft/Phi-3-mini-4k-instruct-gguf"
filename = "Phi-3-mini-4k-instruct-q4.gguf"
downloaded_file = hf_hub_download(repo_id=repo_id, filename=filename)
llama_model = llama_cpp.Llama(downloaded_file, logits_all=True, verbose=True, n_ctx=4096)
assert llama_model is not None
This ends up with the stacktrace above.