8000 Misc. bug: Server does not always cancel requests for disconnected connections · Issue #13262 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content
Misc. bug: Server does not always cancel requests for disconnected connections #13262
Open
@CyberShadow

Description

@CyberShadow

Name and Version

$ ./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
version: 0 (unknown)
built with gcc (GCC) 13.3.0 for x86_64-unknown-linux-gnu

(Actually version 5161)

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./llama-server -m gemma-3-27b-pt-q4_0.gguf -ngl 9999 --host 127.0.0.1 --port 8000 --threads-http 1

# ...

curl -v --request POST \
    --url http://127.0.0.1:8000/completion \
    --header "Content-Type: application/json" \
    --data '{"prompt": "Five, Four, Three, Two, One, '$RANDOM'\n\n\n\nThe countdown","n_predict": 256, "n_probs":10, "temperature":0,"stream":true}'

Problem description & steps to reproduce

It looks like sometimes the server will try to generate responses for HTTP requests that have been queued, but the client has since disconnected.

I can reproduce the problem as follows:

  1. Start the server
  2. Start the curl command above. The key aspects of it is that it must be long-running (i.e. n_predict is high).
  3. While it's still running, in another terminal, start and then immediately cancel (with Ctrl+C) the same command a few times, in quick succession.
  4. Start the curl command once more.
  5. Cancel the original curl command in step 2.

Expected behavior: The server should start to immediately reply to the command from step 4.
Actual behavior: The server seems to hang, because it is pointlessly generating replies to the canceled commands in step 3.

I tried to force the server to handle one request at a time with the --threads-http 1 option, but it doesn't seem to make a difference.

First Bad Commit

This seems to be a regression, but it was introduced about a year ago, so the exact change which introduced it is probably not relevant.

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0