8000 feat: add support for min_p (resolve #1142) by Robitx · Pull Request #1825 · ollama/ollama · GitHub
[go: up one dir, main page]

Skip to content

feat: add support for min_p (resolve #1142) #1825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 27, 2024
Merged

Conversation

Robitx
Copy link
Contributor
@Robitx Robitx commented Jan 6, 2024

Heavy lifting was done by ggml-org/llama.cpp#3841 this PR just makes the option accessible.

@Robitx
Copy link
Contributor Author
Robitx commented Jan 16, 2024

@pdevine Since you're around 🙂 may I ask you to look into this one?

It's not a priority, but some people would like to use it in ollama. For context https://www.reddit.com/r/LocalLLaMA/comments/17vonjo/your_settings_are_probably_hurting_your_model_why/

@suhr
Copy link
suhr commented Jan 16, 2024

I definitely would like to use it.

@nathanpbell
Copy link

I've patched this into my local build several days ago and it's been working great. Honestly believe min_p should be the goto sampler for a lot of use cases.

8000

@Robitx
Copy link
Contributor Author
Robitx commented Jan 20, 2024

@jmorganca

image

@JoseConseco
Copy link

@jmorganca any chances merging this PR ? Its only few lines of codes. Nothing that would break ollama..

@mherrmann3 mherrmann3 mentioned this pull request Feb 22, 2024
@isgallagher
Copy link
isgallagher commented Feb 28, 2024

@mxyng Would you mind reviewing this one? Intent is to add a new parameter called min_p. Please see this thread on reddit for explanation of what min_p is doing and why it matters: ggml-org/llama.cpp#3841

@twalderman
Copy link

Please get this done. It would be a great benefit to the community. Thank you for the effort in advance.

@jmorganca
Copy link
Member

Hi there, would it be possible to rebase this PR? Sorry for the delay all 🥺

@Robitx
Copy link
Contributor Author
Robitx commented May 9, 2024

@jmorganca Hey, I've reproduced it upon current main (llm/dyn_ext_server.go morphed into llm/server.go, types/model/file_test.go added).

Concerning the second commit, are you ok with exposing min_p as optional flag in OpenAI api? The official OpenAI api doesn't support it yet, but it doesn't break normal usage and I think people who want to use min_p would appreciate it.

@morpheus2448
Copy link

+1 please expose min_p!

@FellowTraveler
Copy link
Contributor

Does this one need to be rebased again? Need this sampler. I can do a rebase if needed.

@Arcitec
Copy link
Arcitec commented Jun 1, 2024

@jmorganca Hi, just an alert to let you know that it has been rebased. :)

It definitely looks like a very smart sampling method, after I read the author's thread.

It gives token variety while maintaining coherence. I'd love to start using min_p! :)

@FellowTraveler
Copy link
Contributor

I had to switch to LM Studio for now (it supports minP), but I'm really looking forward to trying out minP in Open-WebUI whenever you get around to merging this PR.

Copy link
Contributor
@FellowTraveler FellowTraveler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the code and it looks good. I also built and testing it locally.

@twalderman
Copy link

I had to switch to LM Studio for now (it supports minP), but I'm really looking forward to trying out minP in Open-WebUI whenever you get around to merging this PR.

I did the same for my primary models for writing (Miqu and WizardLM2) and using ollama for small agent models. Its a useful combination regardless of the fact I would prefer min_p exposed in ollama... as it should be.

@FellowTraveler
Copy link
Contributor
FellowTraveler commented Jun 25, 2024

Just FYI for people who use min_p I've been keeping a branch current with ollama/main, with this min_p PR rebased on top of it. It's easy enough to build for me.

Here's the link: https://github.com/FellowTraveler/ollama/tree/min_p
Clone https: git clone https://github.com/FellowTraveler/ollama.git
Clone ssh: git clone git@github.com:FellowTraveler/ollama.git
To build on Mac: cd ollama && git checkout min_p && git submodule update --init --recursive && ./scripts/build_darwin.sh

After it's built, I close Ollama from the systray (since I already had it running).
Then I do this to install it: cd dist && unzip Ollama-darwin.zip && open Ollama.app
It will pop up a dialog saying: "Ollama runs best from the Applications folder, you want me to move it?"
Click yes and you're off to the races.
PS if you had tried instead to move it to the Applications folder by hand, it will NOT work. This is the only way it works for me.

@tannisroot
Copy link
tannisroot commented Jun 28, 2024

@jmorganca as you requested this has been rebased a while ago now, mind taking a look again?

@kubernetes-bad
Copy link

This has been pending for last 6 months. From what it looks like, it's pretty stable since people have been building it and using it.
@jmorganca what's the problem with merging it?

@Arcitec
Copy link
Arcitec commented Jul 11, 2024

@jmorganca I've read through the PR twice. The code looks good to me. Let's merge it?

@kubernetes-bad
Copy link

I'm gonna keep this PR in your attention, @jmorganca
This is important.

@kubernetes-bad
Copy link

For anyone curious (including @jmorganca), here is the min_p paper: https://arxiv.org/abs/2407.01082

@LumiWasTaken
628C Copy link
LumiWasTaken commented Jul 14, 2024

@jmorganca Min_p would make Ollama useable and and actual competitor to others...

@tannisroot
Copy link

I think pinging him everyday about it a little bit excessive

@kubernetes-bad
Copy link

I don’t think so.

I think having a PR that has been kept alive, tested by users and includes docs for 6 months is a bit excessive.

I am integrating ollama into my project where every single other backend supports min_p except of Ollama - for no good reason.
For what it’s worth, I did also contact them on Discord and the response was that it was not yet added due to potential issues with staying compatible with official OpenAI API, and was told it’s going to be looked at again soon.

Good news, I guess?

@Cohee1207
Copy link

I find it positive that this gained some traction recently. The absence of min p sampling has been a huge downside of Ollama for local LM researchers.

the response was that it was not yet added due to potential issues with staying compatible with official OpenAI API, and was told it’s going to be looked at again soon.

This doesn't appear like a concern to me. Does official OpenAI API support mirostat, TFS, typical sampling, or even repetition penalty? As Ollama prefers to hide as much control over sampling from the end user as possible, in the best-case scenario, 99% of users would just use parameters that are prebaked in the modelfile. Power users who want ultimate control over the sampling process should not be limited by arbitrary constraints of the third-party API.

@kubernetes-bad
Copy link

Very valid points.
@jmorganca what do you think?

@kubernetes-bad
Copy link

Best time to merge this PR was 7 months ago.
Second best time to do it is now.
@jmorganca

@ghost
Copy link
ghost commented Jul 20, 2024

Stop avoiding trying to add min-p, It's the greatest of all samplers. It needed to be added yesterday.

@kubernetes-bad
Copy link

Hey @jmorganca just a reminder that you wanted to have a look at this PR soon.

@Robitx
Copy link
Contributor Author
Robitx commented Jul 27, 2024

This is just sad and ridiculous, half a year of begging for merging simple PR.

openai/openai.go Outdated
@@ -63,6 +63,7 @@ type ChatCompletionRequest struct {
FrequencyPenalty *float64 `json:"frequency_penalty"`
PresencePenalty *float64 `json:"presence_penalty_penalty"`
TopP *float64 `json:"top_p"`
MinP *float64 `json:"min_p"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's omit min_p the OpenAI API for now: there's a few parameters that are not in the OpenAI API that should be added at some point (num_ctx) etc – but I don't want to break the API if we choose to do this differently later (e.g. nest these non-standard options under options)

Copy link
@kubernetes-bad kubernetes-bad Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share your concerns about the presence of extra parameters?
I have heard you say this several times, yet I can’t see how this can break OpenAI-compatible clients: ones that use strict adherence to OpenAI schema would just not send it ever, and therefore won’t break. Am I missing something?..
Instead of re-inventing the wheel with options, check out how other server implementations add parameters that are missing from OpenAI API. From what I have seen, this is the way extra parameters are implemented (like in this PR) - just added to the root, like if OpenAI supports it.
VLLM
Aphrodite
Oobabooga

@jmorganca
Copy link
Member

Sorry all for the delay all, and thanks for the follow ups. @Robitx looks good, however would it be possible to omit it from the OpenAI-compatible API? I'm not sure yet how we should "extend" the OpenAI-compatible API and would prefer to bring that in later when we do. Once that's done we can get this merged and in the next release of Ollama

@LumiWasTaken
Copy link
LumiWasTaken commented Jul 27, 2024

Sorry all for the delay all, and thanks for the follow ups. @Robitx looks good, however would it be possible to omit it from the OpenAI-compatible API? I'm not sure yet how we should "extend" the OpenAI-compatible API and would prefer to bring that in later when we do. Once that's done we can get this merged and in the next release of Ollama

I'm not a developer, but as a frequent API user, I don't fully understand the issue with extending it here. One approach could be to follow KoboldCPP's lead and not extend the OpenAI API. However, in my humble and most professional (/s) opinion, why not add it as an optional parameter? If it's set, great, if not, just ignore it.

@Arcitec
Copy link
Arcitec commented Jul 27, 2024

@jmorganca I think I understand your original concern. It seems like Ollama supports emulating both an OpenAI API and the regular Ollama API?

And I see that this pull request was previously modifying openai/openai.go and added the min_p parameter to the OpenAI API emulation, which of course doesn't exist on the real OpenAI API.

Well, the PR author has now force-pushed an update which no longer changes openai.go.

Does this mean that OpenWebUI will still be able to use min_p? It seems like that's handled by llm/server.go?

If so, it seems ready to merge. :)

PS: Really looking forward to using min_p to get more varied output. :)

@Robitx
Copy link
Contributor Author
Robitx commented Jul 27, 2024

@jmorganca done

@Arcitec if OpenAI ever decides to add min_p to their API, they are bound by the established conventions

  • snake_case (top_p => min_p)
  • having optional flags with default not breaking current usage (min_p = 0)
  • and exposure on the same level as other options in the same category (root of the payload)

as @kubernetes-bad pointed out with examples, everyone is following this convention for similar opt parameters and I don't see OpenAI needlessly breaking API for existing customers any time soon..

But the relevant commit keeping the convention is gone and whoever is willing can try to fight this battle in another PR after this one gets merged.

@Robitx Robitx requested a review from jmorganca July 27, 2024 19:57
@LumiWasTaken
Copy link

Well then guys! See you in 7 months when it got merged

@jmorganca jmorganca merged commit f3d7a48 into ollama:main Jul 27, 2024
12 checks passed
@Arcitec
Copy link
Arcitec commented Jul 27, 2024

Well I found the answer to my question in the first line of openai.go:

// openai package provides middleware for partial compatibility with the OpenAI REST API

So yes, Ollama has emulation of the OpenAI API to allow OpenAI GUI frontends to connect to Ollama instead.

Therefore it didn't make sense to add min_p to the OpenAI emulation, since it doesn't exist on the real OpenAI API.

And with that removal/fix, thank you @jmorganca for the merge! I look forward to using min_p! :)

Edit: If anyone is confused by what I'm saying, this doesn't affect Ollama's own API, which now supports min_p. :) It was just removed from the OpenAI emulation, where it didn't make sense to add it anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0