sampling: make top_n_sigma no-op at <=0 rather than <0 #13345

DocShotgun · 2025-05-06T17:18:48Z

This changes the behavior of the recently-added top_n_sigma sampler to a short-circuit no-op state at values <= 0 rather than < 0. The rationale for this change is as follows:

The current behavior of top_n_sigma == 0 is redundant as it is a more roundabout way to achieve greedy decoding, which already has other means of being specified, i.e. top_k == 1
top_n_sigma == 0 represents no-op rather than greedy decoding in other existing tooling (i.e. text-generation-webui, aphrodite-engine, koboldcpp, YALS), so this would keep the interface consistent for frontend developers

CISC · 2025-05-06T17:34:54Z

~~Do you know the rationale for koboldcpp also checking for cur_p->size <= 1?~~

Well, looking closer I see why, so perhaps add that too?

src/llama-sampling.cpp

* avoid running nsigma when only a single candidate remains Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

CISC · 2025-05-06T18:10:42Z

Ouch, test-sampling fails, can you look into why?

DocShotgun · 2025-05-06T18:42:45Z

Ouch, test-sampling fails, can you look into why?

I took a look at it, and far as I can tell, it's because this line:

llama.cpp/tests/test-sampling.cpp

Line 363 in 91a86a6

test_top_n_sigma({0.1f, 0.2f, 0.3f, 0.4f}, {1.0f, 0.0f, 0.0f, 0.0f}, 0.00f);

explicitly checks for top_n_sigma == 0 leading to greedy decoding, and that behavior is changed by this PR.

If I change the test to check for no-op instead, it passes:

test_top_n_sigma({0.1f, 0.2f, 0.3f, 0.4f}, {0.4f, 0.3f, 0.2f, 0.1f}, 0.00f);

* adjust the sampling test to reflect top_n_sigma == 0 behaving as no-op rather than greedy decoding

CISC · 2025-05-06T19:14:22Z

If I change the test to check for no-op instead, it passes

Great, can you also add a comment explaining why it was changed?

sampling: make nsigma == 0 a no-op

6d6877d

DocShotgun mentioned this pull request May 6, 2025

Updates/fixes for llama.cpp textgen settings SillyTavern/SillyTavern#3961

Merged

1 task

CISC requested changes May 6, 2025

View reviewed changes

src/llama-sampling.cpp Outdated Show resolved Hide resolved

sampling: short-circuit nsigma when cur_p <= 1

62e51ab

* avoid running nsigma when only a single candidate remains Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

CISC approved these changes May 6, 2025

View reviewed changes

sampling: fix top_n_sigma == 0 test to reflect new behavior

699256a

* adjust the sampling test to reflect top_n_sigma == 0 behaving as no-op rather than greedy decoding

github-actions bot added the testing Everything test related label May 6, 2025

sampling: additional context to top_n_sigma test change

91644d8

CISC merged commit ffc7272 into ggml-org:master May 6, 2025
45 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sampling: make top_n_sigma no-op at <=0 rather than <0 #13345

sampling: make top_n_sigma no-op at <=0 rather than <0 #13345

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sampling: make top_n_sigma no-op at <=0 rather than <0 #13345

sampling: make top_n_sigma no-op at <=0 rather than <0 #13345

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!