Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
I am trying to run the most basic examples from the docs.
I am trying to load a model that I've downloaded:
from llama_cpp import Llama
if __name__ == "__main__":
llm = Llama(
model_path="models/qwen2-0_5b-instruct-q4_0.gguf",
verbose=False,
)
output = llm(
"Q: Name the planets in the solar system? A: ",
max_tokens=32,
stop=["Q:", "\n"],
echo=True,
)
print(output)
And I am also trying to load a model directly from Hugging Face hub:
from llama_cpp import Llama
if __name__ == "__main__":
llm = Llama.from_pretrained(
repo_id="Qwen/Qwen2-0.5B-Instruct-GGUF",
filename="*q4_0.gguf",
verbose=False,
)
I expect these basic examples to run without error and produce some reasonable looking output.
Current Behavior
In both cases -- whether using a pre-downloaded model or pulling from Hugging Face Hub -- I am getting the following exception:
Exception ignored in: <function Llama.__del__ at 0x14b2e5940>
Traceback (most recent call last):
File ".../.venv/lib/python3.11/site-packages/llama_cpp/llama.py", line 2205, in __del__
File ".../.venv/lib/python3.11/site-packages/llama_cpp/llama.py", line 2202, in close
File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 609, in close
File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 601, in __exit__
File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 586, in __exit__
File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 360, in __exit__
File ".../.venv/lib/python3.11/site-packages/llama_cpp/_internals.py", line 75, in close
File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 609, in close
File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 601, in __exit__
File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 586, in __exit__
File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 469, in _exit_wrapper
File ".../.venv/lib/python3.11/site-packages/llama_cpp/_internals.py", line 69, in free_model
TypeError: 'NoneType' object is not callable
This appears to be the line raising this exception:
llama-cpp-python/llama_cpp/_internals.py
Line 69 in 99f2ebf
In the case of using the pre-downloaded model, I see the printed model output first before I get the exception. In the case of pulling from Hugging Face Hub, I get the exception immediately.
Environment and Context
- macOS 15.4 running on M3 Apple Silicon
- Python 3.11.11
- GNU Make 3.81
- llama-cpp-python @ 99f2ebf (the latest from
main
as of 2025-04-11)
Steps to Reproduce
- Copy either test script from above into
test-llama.py
. - Run
python test-llama.py
This seems to be an issue specific to the Python bindings, so I did not trying building llama.cpp
.
This issue is possibly related to #1442.