Allow Llama objects to be freed earlier again

notwa · notwa · commit a79c7ffaea76 · 2024-02-10T22:16:44.000-08:00
commit 9018270 introduced a cyclic dependency within Llama objects. That change causes old models to linger in memory longer than necessary, thereby creating memory bloat in most applications attempting to switch between models at runtime. This patch simply removes the problematic line, allowing models to deallocate without relying on GC. One might also consider combining `weakref.ref` with a `@property` if the `llama` attribute is absolutely necessary to expose in the tokenizer class.
diff --git a/llama_cpp/llama_tokenizer.py b/llama_cpp/llama_tokenizer.py
@@ -27,7 +27,6 @@ def detokenize(
 
 class LlamaTokenizer(BaseLlamaTokenizer):
     def __init__(self, llama: llama_cpp.Llama):
-        self.llama = llama
         self._model = llama._model  # type: ignore
 
     def tokenize(