10000 batch : auto-gen positions + verify multi-sequence input by ggerganov · Pull Request #14177 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

batch : auto-gen positions + verify multi-sequence input #14177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 15, 2025

Conversation

ggerganov
Copy link
Member
@ggerganov ggerganov commented Jun 13, 2025
  • Auto-generate input positions when missing
  • Sanitize input batches with multiple sequences per token.

@ggerganov ggerganov changed the title batch : verify multi-sequence input batches batch : auto-gen positions + verify multi-sequence input Jun 14, 2025
@ggerganov ggerganov marked this pull request as ready for review June 14, 2025 08:04
@ggerganov ggerganov merged commit b9912ac into master Jun 15, 2025
1 check passed
@ggerganov ggerganov deleted the gg/batch-mutli-seq-id-verify branch June 15, 2025 06:18
@broadbit-hu
Copy link
broadbit-hu commented Jun 16, 2025

This commit appears to have broken the --batch-size 1 workaround for multimodal, see: #13694 (comment)

The last working release is: https://github.com/ggml-org/llama.cpp/releases/tag/b5664

Test call:

./build/bin/llama-mtmd-cli -m ../models/Qwen2.5-VL-7B-Instruct-Q8_0.gguf --mmproj ../models/mmproj-Qwen2.5-VL-7B-Instruct-f16.gguf --image ../4.png -p "Please first output bbox coordinates and colors of every rectangle in this image in JSON format, and then answer how many rectangles are there in the image." --seed 1 -ngl 99 --temp 0.0 -c 20000 -b 1

Result of release https://github.com/ggml-org/llama.cpp/releases/tag/b5666:

encoding image slice...
image slice encoded in 1396 ms
decoding image batch 1/1369, n_tokens_batch = 1
image decoded (batch 1/1369) in 3 ms
decoding image batch 2/1369, n_tokens_batch = 1
failed to decode image
init: sequence 0 does not start from the last position stored in the memory
decode: failed to initialize batch
failed to decode image
failed to eval chunk 1
llama_decode: failed to decode, ret = -1
Unable to eval prompt

Same results with -b 1360 or less...

The result of release https://github.com/ggml-org/llama.cpp/releases/tag/b5664

encoding image slice...
image slice encoded in 1418 ms
decoding image batch 1/1369, n_tokens_batch = 1
image decoded (batch 1/1369) in 2 ms
decoding image batch 2/1369, n_tokens_batch = 1
image decoded (batch 2/1369) in 15 ms
decoding image batch 3/1369, n_tokens_batch = 1
image decoded (batch 3/1369) in 13 ms
decoding image batch 4/1369, n_tokens_batch = 1
image decoded (batch 4/1369) in 14 ms
decoding image batch 5/1369, n_tokens_batch = 1
image decoded (batch 5/1369) in 14 ms
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0