Running basic example from docs results in TypeError: 'NoneType' object is not callable

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I am trying to run the most basic examples from the docs.

I am trying to load a model that I've downloaded:

from llama_cpp import Llama


if __name__ == "__main__":
    llm = Llama(
        model_path="models/qwen2-0_5b-instruct-q4_0.gguf",
        verbose=False,
    )
    output = llm(
        "Q: Name the planets in the solar system? A: ",
        max_tokens=32,
        stop=["Q:", "\n"],
        echo=True,
    )
    print(output)

And I am also trying to load a model directly from Hugging Face hub:

from llama_cpp import Llama


if __name__ == "__main__":
    llm = Llama.from_pretrained(
        repo_id="Qwen/Qwen2-0.5B-Instruct-GGUF",
        filename="*q4_0.gguf",
        verbose=False,
    )

I expect these basic examples to run without error and produce some reasonable looking output.

Current Behavior

In both cases -- whether using a pre-downloaded model or pulling from Hugging Face Hub -- I am getting the following exception:

Exception ignored in: <function Llama.__del__ at 0x14b2e5940>
Traceback (most recent call last):
  File ".../.venv/lib/python3.11/site-packages/llama_cpp/llama.py", line 2205, in __del__
  File ".../.venv/lib/python3.11/site-packages/llama_cpp/llama.py", line 2202, in close
  File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 609, in close
  File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 601, in __exit__
  File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 586, in __exit__
  File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 360, in __exit__
  File ".../.venv/lib/python3.11/site-packages/llama_cpp/_internals.py", line 75, in close
  File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 609, in close
  File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 601, in __exit__
  File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 586, in __exit__
  File ".../.pyenv/versions/3.11.11/lib/python3.11/contextlib.py", line 469, in _exit_wrapper
  File ".../.venv/lib/python3.11/site-packages/llama_cpp/_internals.py", line 69, in free_model
TypeError: 'NoneType' object is not callable

This appears to be the line raising this exception:

llama-cpp-python/llama_cpp/_internals.py

Line 69 in 99f2ebf

llama_cpp.llama_free_model(self.model)

In the case of using the pre-downloaded model, I see the printed model output first before I get the exception. In the case of pulling from Hugging Face Hub, I get the exception immediately.

Environment and Context

macOS 15.4 running on M3 Apple Silicon
Python 3.11.11
GNU Make 3.81
llama-cpp-python @ 99f2ebf (the latest from main as of 2025-04-11)

Steps to Reproduce

Copy either test script from above into test-llama.py.
Run python test-llama.py

This seems to be an issue specific to the Python bindings, so I did not trying building llama.cpp.

This issue is possibly related to #1442.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions