Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working #13700

ZV-Liu · 2025-05-22T08:53:51Z

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
error: invalid argument: -V

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./llama-server -m /output/Qwen3-8B-Q4_K_M.gguff --host 0.0.0.0 --port 8080

Problem description & steps to reproduce

First Bad Commit

No response

Relevant log output

prd-tuong-nguyen · 2025-05-22T10:09:37Z

I think it should be "max_tokens": 50

ZV-Liu · 2025-05-22T10:15:24Z

max_tokens has been deprecated by OpenAI API

ZV-Liu added the bug-unconfirmed label May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working #13700

Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working #13700

Uh oh!

Uh oh!

Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working #13700

Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working #13700

Comments

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Uh oh!

Uh oh!