-
Notifications
You must be signed in to change notification settings - Fork 12.6k
feat: add support for min_p (resolve #1142) #1825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@pdevine Since you're around 🙂 may I ask you to look into this one? It's not a priority, but some people would like to use it in ollama. For context https://www.reddit.com/r/LocalLLaMA/comments/17vonjo/your_settings_are_probably_hurting_your_model_why/ |
I definitely would like to use it. |
I've patched this into my local build several days ago and it's been working great. Honestly believe min_p should be the goto sampler for a lot of use cases. |
@jmorganca any chances merging this PR ? Its only few lines of codes. Nothing that would break ollama.. |
@mxyng Would you mind reviewing this one? Intent is to add a new parameter called min_p. Please see this thread on reddit for explanation of what min_p is doing and why it matters: ggml-org/llama.cpp#3841 |
Please get this done. It would be a great benefit to the community. Thank you for the effort in advance. |
Hi there, would it be possible to rebase this PR? Sorry for the delay all 🥺 |
@jmorganca Hey, I've reproduced it upon current main ( Concerning the second commit, are you ok with exposing |
+1 please expose min_p! |
Does this one need to be rebased again? Need this sampler. I can do a rebase if needed. |
@jmorganca Hi, just an alert to let you know that it has been rebased. :) It definitely looks like a very smart sampling method, after I read the author's thread. It gives token variety while maintaining coherence. I'd love to start using |
I had to switch to LM Studio for now (it supports minP), but I'm really looking forward to trying out minP in Open-WebUI whenever you get around to merging this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed the code and it looks good. I also built and testing it locally.
I did the same for my primary models for writing (Miqu and WizardLM2) and using ollama for small agent models. Its a useful combination regardless of the fact I would prefer min_p exposed in ollama... as it should be. |
Just FYI for people who use min_p I've been keeping a branch current with ollama/main, with this Here's the link: https://github.com/FellowTraveler/ollama/tree/min_p After it's built, I close Ollama from the systray (since I already had it running). |
@jmorganca as you requested this has been rebased a while ago now, mind taking a look again? |
This has been pending for last 6 months. From what it looks like, it's pretty stable since people have been building it and using it. |
@jmorganca I've read through the PR twice. The code looks good to me. Let's merge it? |
I'm gonna keep this PR in your attention, @jmorganca |
For anyone curious (including @jmorganca), here is the min_p paper: https://arxiv.org/abs/2407.01082 |
@jmorganca Min_p would make Ollama useable and and actual competitor to others... |
I think pinging him everyday about it a little bit excessive |
I don’t think so. I think having a PR that has been kept alive, tested by users and includes docs for 6 months is a bit excessive. I am integrating ollama into my project where every single other backend supports Good news, I guess? |
I find it positive that this gained some traction recently. The absence of min p sampling has been a huge downside of Ollama for local LM researchers.
This doesn't appear like a concern to me. Does official OpenAI API support mirostat, TFS, typical sampling, or even repetition penalty? As Ollama prefers to hide as much control over sampling from the end user as possible, in the best-case scenario, 99% of users would just use parameters that are prebaked in the modelfile. Power users who want ultimate control over the sampling process should not be limited by arbitrary constraints of the third-party API. |
Very valid points. |
Best time to merge this PR was 7 months ago. |
Stop avoiding trying to add min-p, It's the greatest of all samplers. It needed to be added yesterday. |
Hey @jmorganca just a reminder that you wanted to have a look at this PR |
This is just sad and ridiculous, half a year of begging for merging simple PR. |
openai/openai.go
Outdated
@@ -63,6 +63,7 @@ type ChatCompletionRequest struct { | |||
FrequencyPenalty *float64 `json:"frequency_penalty"` | |||
PresencePenalty *float64 `json:"presence_penalty_penalty"` | |||
TopP *float64 `json:"top_p"` | |||
MinP *float64 `json:"min_p"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's omit min_p
the OpenAI API for now: there's a few parameters that are not in the OpenAI API that should be added at some point (num_ctx
) etc – but I don't want to break the API if we choose to do this differently later (e.g. nest these non-standard options under options
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share your concerns about the presence of extra parameters?
I have heard you say this several times, yet I can’t see how this can break OpenAI-compatible clients: ones that use strict adherence to OpenAI schema would just not send it ever, and therefore won’t break. Am I missing something?..
Instead of re-inventing the wheel with options
, check out how other server implementations add parameters that are missing from OpenAI API. From what I have seen, this is the way extra parameters are implemented (like in this PR) - just added to the root, like if OpenAI supports it.
VLLM
Aphrodite
Oobabooga
Sorry all for the delay all, and thanks for the follow ups. @Robitx looks good, however would it be possible to omit it from the OpenAI-compatible API? I'm not sure yet how we should "extend" the OpenAI-compatible API and would prefer to bring that in later when we do. Once that's done we can get this merged and in the next release of Ollama |
I'm not a developer, but as a frequent API user, I don't fully understand the issue with extending it here. One approach could be to follow KoboldCPP's lead and not extend the OpenAI API. However, in my humble and most professional (/s) opinion, why not add it as an optional parameter? If it's set, great, if not, just ignore it. |
@jmorganca I think I understand your original concern. It seems like Ollama supports emulating both an OpenAI API and the regular Ollama API? And I see that this pull request was previously modifying Well, the PR author has now force-pushed an update which no longer changes Does this mean that OpenWebUI will still be able to use If so, it seems ready to merge. :) PS: Really looking forward to using |
@jmorganca done @Arcitec if OpenAI ever decides to add min_p to their API, they are bound by the established conventions
as @kubernetes-bad pointed out with examples, everyone is following this convention for similar opt parameters and I don't see OpenAI needlessly breaking API for existing customers any time soon.. But the relevant commit keeping the convention is gone and whoever is willing can try to fight this battle in another PR after this one gets merged. |
Well then guys! See you in 7 months when it got merged |
Well I found the answer to my question in the first line of Line 1 in f3d7a48
So yes, Ollama has emulation of the OpenAI API to allow OpenAI GUI frontends to connect to Ollama instead. Therefore it didn't make sense to add And with that removal/fix, thank you @jmorganca for the merge! I look forward to using Edit: If anyone is confused by what I'm saying, this doesn't affect Ollama's own API, which now supports |
Heavy lifting was done by ggml-org/llama.cpp#3841 this PR just makes the option accessible.