8000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes error: invalid argument: -V
Linux
llama-server
./llama-server -m /output/Qwen3-8B-Q4_K_M.gguff --host 0.0.0.0 --port 8080
No response
The text was updated successfully, but these errors were encountered:
I think it should be "max_tokens": 50
"max_tokens": 50
Sorry, something went wrong.
max_tokens has been deprecated by OpenAI API
No branches or pull requests
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
error: invalid argument: -V
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: