Crash due to "The current task is not holding this lock" when {"stream": true}

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Using the API provided by llama_cpp.server with stream set to true should not crash.

Current Behavior

The server crashes during the final segment but does not halt. It also refuses future connections.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

Physical (or virtual) hardware you are using, e.g. for Linux:

$ lscpu

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          43 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   16
  On-line CPU(s) list:    0-15
Vendor ID:                AuthenticAMD
  Model name:             AMD Ryzen 7 1700X Eight-Core Processor
    CPU family:           23
    Model:                1
    Thread(s) per core:   2
    Core(s) per socket:   8
    Socket(s):            1
    Stepping:             1
    Frequency boost:      enabled
    CPU(s) scaling MHz:   65%
    CPU max MHz:          3400.0000
    CPU min MHz:          2200.0000
    BogoMIPS:             6790.09
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm co
                          nstant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx
                           f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext
                           perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero ir
                          perf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif ov
                          erflow_recov succor smca sev
Virtualization features:  
  Virtualization:         AMD-V
Caches (sum of all):      
  L1d:                    256 KiB (8 instances)
  L1i:                    512 KiB (8 instances)
  L2:                     4 MiB (8 instances)
  L3:                     16 MiB (2 instances)
NUMA:                     
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-15
Vulnerabilities:          
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Mitigation; untrained return thunk; SMT vulnerable
  Spec rstack overflow:   Mitigation; Safe RET
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected

Operating System, e.g. for Linux:

$ uname -a

Linux gaia 6.11.8-1-default #1 SMP PREEMPT_DYNAMIC Thu Nov 14 12:54:01 UTC 2024 (099023b) x86_64 x86_64 x86_64 GNU/Linux

openSUSE Tumbleweed

SDK version, e.g. for Linux:

$ python3 --version
Python 3.11.11
$ make --version
GNU Make 4.4.1
Built for x86_64-suse-linux-gnu
Copyright (C) 1988-2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ g++ --version
g++ (SUSE Linux) 13.3.1 20240807 [revision 9d368828bd4d04ce507e02a581be850fca849fae]
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

cd $(mktemp -d)
python3.11 -m venv --upgrade-deps .venv
source .venv/bin/activate
pip install "llama-cpp-python[server] @ git+https://github.com/abetlen/llama-cpp-python"
python3 -m llama_cpp.server --model ~/Downloads/llama-2-7b-chat.Q5_K_M.gguf # or any other model

# Then in a second shell run:
curl -X 'POST'   'http://127.0.0.1:8000/v1/completions'   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{
  "prompt": "\n\n### Instructions:\nWhat is the capital of France?\n\n### Response:\n",
  "stop": [
    "\n",
    "###"
  ],
  "stream": true
}'

Failure Logs

Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.

Also, please try to avoid using screenshots if at all possible. Instead, copy/paste the console output and use Github's markdown to cleanly format your logs for easy readability.

Example environment info:

INFO:     Started server process [51982]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://gaia:8000 (Press CTRL+C to quit)
INFO:     127.0.0.1:38422 - "POST /v1/completions HTTP/1.1" 200 OK
llama_perf_context_print:        load time =    2352.28 ms
llama_perf_context_print: prompt eval time =       0.00 ms /    25 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     7 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =    3552.87 ms /    32 tokens
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/sse_starlette/sse.py", line 289, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/sse_starlette/sse.py", line 278, in wrap
    await func()
  File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/sse_starlette/sse.py", line 228, in listen_for_disconnect
    message = await receive()
              ^^^^^^^^^^^^^^^
  File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 531, in receive
    await self.message_event.wait()
  File "/usr/lib64/python3.11/asyncio/locks.py", line 213, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fadac98c890

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
  |     result = await app(  # type: ignore[func-returns-value]
  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
  |     return await self.app(scope, receive, send)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
  |     await super().__call__(scope, receive, send)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/applications.py", line 113, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
  |     raise exc
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
  |     await self.app(scope, receive, _send)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
  |     await self.app(scope, receive, send)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette_context/middleware/raw_middleware.py", line 92, in __call__
  |     await self.app(scope, receive, send_wrapper)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
  |     raise exc
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/routing.py", line 715, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/routing.py", line 735, in app
  |     await route.handle(scope, receive, send)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/routing.py", line 288, in handle
  |     await self.app(scope, receive, send)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/routing.py", line 76, in app
  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
  |     raise exc
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/starlette/routing.py", line 74, in app
  |     await response(scope, receive, send)
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/sse_starlette/sse.py", line 275, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/anyio/_backends/_asyncio.py", line 815, in __aexit__
  |     raise BaseExceptionGroup(
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/llama_cpp/server/app.py", line 185, in get_event_publisher
    |     await on_complete()
    |   File "/usr/lib64/python3.11/contextlib.py", line 687, in aclose
    |     await self.__aexit__(None, None, None)
    |   File "/usr/lib64/python3.11/contextlib.py", line 745, in __aexit__
    |     raise exc_details[1]
    |   File "/usr/lib64/python3.11/contextlib.py", line 728, in __aexit__
    |     cb_suppress = await cb(*exc_details)
    |                   ^^^^^^^^^^^^^^^^^^^^^^
    |   File "/usr/lib64/python3.11/contextlib.py", line 217, in __aexit__
    |     await anext(self.gen)
    |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/llama_cpp/server/app.py", line 86, in get_llama_proxy
    |     llama_inner_lock.release()
    |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/anyio/_core/_synchronization.py", line 241, in release
    |     self._lock.release()
    |   File "/tmp/tmp.mimYBwBzDw/.venv/lib64/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1874, in release
    |     raise RuntimeError("The current task is not holding this lock")
    | RuntimeError: The current task is not holding this lock
    +------------------------------------

data: {"id": "cmpl-3dee3c42-303f-4442-9f37-f3d8ef059d52", "object": "text_completion", "created": 1733954334, "model": "/home/ignaz/Downloads/llama-2-7b-chat.Q5_K_M.gguf", "choices": [{"text": "The", "index": 0, "logprobs": null, "finish_reason": null}]}

data: {"id": "cmpl-3dee3c42-303f-4442-9f37-f3d8ef059d52", "object": "text_completion", "created": 1733954334, "model": "/home/ignaz/Downloads/llama-2-7b-chat.Q5_K_M.gguf", "choices": [{"text": " capital", "index": 0, "logprobs": null, "finish_reason": null}]}

data: {"id": "cmpl-3dee3c42-303f-4442-9f37-f3d8ef059d52", "object": "text_completion", "created": 1733954334, "model": "/home/ignaz/Downloads/llama-2-7b-chat.Q5_K_M.gguf", "choices": [{"text": " of", "index": 0, "logprobs": null, "finish_reason": null}]}

data: {"id": "cmpl-3dee3c42-303f-4442-9f37-f3d8ef059d52", "object": "text_completion", "created": 1733954334, "model": "/home/ignaz/Downloads/llama-2-7b-chat.Q5_K_M.gguf", "choices": [{"text": " France", "index": 0, "logprobs": null, "finish_reason": null}]}

data: {"id": "cmpl-3dee3c42-303f-4442-9f37-f3d8ef059d52", "object": "text_completion", "created": 1733954334, "model": "/home/ignaz/Downloads/llama-2-7b-chat.Q5_K_M.gguf", "choices": [{"text": " is", "index": 0, "logprobs": null, "finish_reason": null}]}

data: {"id": "cmpl-3dee3c42-303f-4442-9f37-f3d8ef059d52", "object": "text_completion", "created": 1733954334, "model": "/home/ignaz/Downloads/llama-2-7b-chat.Q5_K_M.gguf", "choices": [{"text": " Paris", "index": 0, "logprobs": null, "finish_reason": null}]}

data: {"id": "cmpl-3dee3c42-303f-4442-9f37-f3d8ef059d52", "object": "text_completion", "created": 1733954334, "model": "/home/ignaz/Downloads/llama-2-7b-chat.Q5_K_M.gguf", "choices": [{"text": ".", "index": 0, "logprobs": null, "finish_reason": null}]}

data: {"id": "cmpl-3dee3c42-303f-4442-9f37-f3d8ef059d52", "object": "text_completion", "created": 1733954334, "model": "/home/ignaz/Downloads/llama-2-7b-chat.Q5_K_M.gguf", "choices": [{"text": "", "index": 0, "logprobs": null, "finish_reason": null}]}

data: {"id": "cmpl-3dee3c42-303f-4442-9f37-f3d8ef059d52", "object": "text_completion", "created": 1733954334, "model": "/home/ignaz/Downloads/llama-2-7b-chat.Q5_K_M.gguf", "choices": [{"text": "", "index": 0, "logprobs": null, "finish_reason": "stop"}]}

data: [DONE]

curl: (18) transfer closed with outstanding read data remaining

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions