feat: add support for min_p (resolve #1142) #1825

Robitx · 2024-01-06T15:28:07Z

Heavy lifting was done by ggml-org/llama.cpp#3841 this PR just makes the option accessible.

Robitx · 2024-01-16T18:36:11Z

@pdevine Since you're around 🙂 may I ask you to look into this one?

It's not a priority, but some people would like to use it in ollama. For context https://www.reddit.com/r/LocalLLaMA/comments/17vonjo/your_settings_are_probably_hurting_your_model_why/

suhr · 2024-01-16T19:00:08Z

I definitely would like to use it.

nathanpbell · 2024-01-18T17:45:48Z

I've patched this into my local build several days ago and it's been working great. Honestly believe min_p should be the goto sampler for a lot of use cases.

Robitx · 2024-01-20T14:28:07Z

@jmorganca

JoseConseco · 2024-01-23T12:41:07Z

@jmorganca any chances merging this PR ? Its only few lines of codes. Nothing that would break ollama..

isgallagher · 2024-02-28T01:27:01Z

@mxyng Would you mind reviewing this one? Intent is to add a new parameter called min_p. Please see this thread on reddit for explanation of what min_p is doing and why it matters: ggml-org/llama.cpp#3841

twalderman · 2024-03-25T12:30:13Z

Please get this done. It would be a great benefit to the community. Thank you for the effort in advance.

jmorganca · 2024-05-09T03:21:41Z

Hi there, would it be possible to rebase this PR? Sorry for the delay all 🥺

Robitx · 2024-05-09T05:16:08Z

@jmorganca Hey, I've reproduced it upon current main (llm/dyn_ext_server.go morphed into llm/server.go, types/model/file_test.go added).

Concerning the second commit, are you ok with exposing min_p as optional flag in OpenAI api? The official OpenAI api doesn't support it yet, but it doesn't break normal usage and I think people who want to use min_p would appreciate it.

morpheus2448 · 2024-05-12T14:17:42Z

+1 please expose min_p!

FellowTraveler · 2024-05-22T10:47:14Z

Does this one need to be rebased again? Need this sampler. I can do a rebase if needed.

Arcitec · 2024-06-01T04:36:52Z

@jmorganca Hi, just an alert to let you know that it has been rebased. :)

It definitely looks like a very smart sampling method, after I read the author's thread.

It gives token variety while maintaining coherence. I'd love to start using min_p! :)

FellowTraveler · 2024-06-06T04:12:02Z

I had to switch to LM Studio for now (it supports minP), but I'm really looking forward to trying out minP in Open-WebUI whenever you get around to merging this PR.

FellowTraveler

I've reviewed the code and it looks good. I also built and testing it locally.

twalderman · 2024-06-13T14:26:57Z

I had to switch to LM Studio for now (it supports minP), but I'm really looking forward to trying out minP in Open-WebUI whenever you get around to merging this PR.

I did the same for my primary models for writing (Miqu and WizardLM2) and using ollama for small agent models. Its a useful combination regardless of the fact I would prefer min_p exposed in ollama... as it should be.

FellowTraveler · 2024-06-25T08:19:01Z

Just FYI for people who use min_p I've been keeping a branch current with ollama/main, with this min_p PR rebased on top of it. It's easy enough to build for me.

Here's the link: https://github.com/FellowTraveler/ollama/tree/min_p
Clone https: git clone https://github.com/FellowTraveler/ollama.git
Clone ssh: git clone git@github.com:FellowTraveler/ollama.git
To build on Mac: cd ollama && git checkout min_p && git submodule update --init --recursive && ./scripts/build_darwin.sh

After it's built, I close Ollama from the systray (since I already had it running).
Then I do this to install it: cd dist && unzip Ollama-darwin.zip && open Ollama.app
It will pop up a dialog saying: "Ollama runs best from the Applications folder, you want me to move it?"
Click yes and you're off to the races.
PS if you had tried instead to move it to the Applications folder by hand, it will NOT work. This is the only way it works for me.

tannisroot · 2024-06-28T02:18:23Z

@jmorganca as you requested this has been rebased a while ago now, mind taking a look again?

kubernetes-bad · 2024-07-10T21:27:40Z

This has been pending for last 6 months. From what it looks like, it's pretty stable since people have been building it and using it.
@jmorganca what's the problem with merging it?

Arcitec · 2024-07-11T15:47:36Z

@jmorganca I've read through the PR twice. The code looks good to me. Let's merge it?

kubernetes-bad · 2024-07-12T19:21:45Z

I'm gonna keep this PR in your attention, @jmorganca
This is important.

kubernetes-bad · 2024-07-14T22:31:40Z

For anyone curious (including @jmorganca), here is the min_p paper: https://arxiv.org/abs/2407.01082

LumiWasTaken · 2024-07-14T22:43:33Z

@jmorganca Min_p would make Ollama useable and and actual competitor to others...

tannisroot · 2024-07-15T06:30:23Z

I think pinging him everyday about it a little bit excessive

kubernetes-bad · 2024-07-15T07:00:17Z

I don’t think so.

I think having a PR that has been kept alive, tested by users and includes docs for 6 months is a bit excessive.

I am integrating ollama into my project where every single other backend supports min_p except of Ollama - for no good reason.
For what it’s worth, I did also contact them on Discord and the response was that it was not yet added due to potential issues with staying compatible with official OpenAI API, and was told it’s going to be looked at again soon.

Good news, I guess?

Cohee1207 · 2024-07-16T15:54:15Z

I find it positive that this gained some traction recently. The absence of min p sampling has been a huge downside of Ollama for local LM researchers.

the response was that it was not yet added due to potential issues with staying compatible with official OpenAI API, and was told it’s going to be looked at again soon.

This doesn't appear like a concern to me. Does official OpenAI API support mirostat, TFS, typical sampling, or even repetition penalty? As Ollama prefers to hide as much control over sampling from the end user as possible, in the best-case scenario, 99% of users would just use parameters that are prebaked in the modelfile. Power users who want ultimate control over the sampling process should not be limited by arbitrary constraints of the third-party API.

kubernetes-bad · 2024-07-17T08:34:32Z

Very valid points.
@jmorganca what do you think?

kubernetes-bad · 2024-07-19T07:12:44Z

Best time to merge this PR was 7 months ago.
Second best time to do it is now.
@jmorganca

ghost · 2024-07-20T09:09:22Z

Stop avoiding trying to add min-p, It's the greatest of all samplers. It needed to be added yesterday.

kubernetes-bad · 2024-07-25T06:39:24Z

Hey @jmorganca just a reminder that you wanted to have a look at this PR soon.

Robitx · 2024-07-27T12:46:41Z

This is just sad and ridiculous, half a year of begging for merging simple PR.

jmorganca · 2024-07-27T17:57:12Z

openai/openai.go

@@ -63,6 +63,7 @@ type ChatCompletionRequest struct {
 	FrequencyPenalty *float64        `json:"frequency_penalty"`
 	PresencePenalty  *float64        `json:"presence_penalty_penalty"`
 	TopP             *float64        `json:"top_p"`
+	MinP             *float64        `json:"min_p"`


Let's omit min_p the OpenAI API for now: there's a few parameters that are not in the OpenAI API that should be added at some point (num_ctx) etc – but I don't want to break the API if we choose to do this differently later (e.g. nest these non-standard options under options)

Can you share your concerns about the presence of extra parameters?
I have heard you say this several times, yet I can’t see how this can break OpenAI-compatible clients: ones that use strict adherence to OpenAI schema would just not send it ever, and therefore won’t break. Am I missing something?..
Instead of re-inventing the wheel with options, check out how other server implementations add parameters that are missing from OpenAI API. From what I have seen, this is the way extra parameters are implemented (like in this PR) - just added to the root, like if OpenAI supports it.
VLLM
Aphrodite
Oobabooga

jmorganca · 2024-07-27T17:59:09Z

Sorry all for the delay all, and thanks for the follow ups. @Robitx looks good, however would it be possible to omit it from the OpenAI-compatible API? I'm not sure yet how we should "extend" the OpenAI-compatible API and would prefer to bring that in later when we do. Once that's done we can get this merged and in the next release of Ollama

LumiWasTaken · 2024-07-27T18:40:45Z

Sorry all for the delay all, and thanks for the follow ups. @Robitx looks good, however would it be possible to omit it from the OpenAI-compatible API? I'm not sure yet how we should "extend" the OpenAI-compatible API and would prefer to bring that in later when we do. Once that's done we can get this merged and in the next release of Ollama

I'm not a developer, but as a frequent API user, I don't fully understand the issue with extending it here. One approach could be to follow KoboldCPP's lead and not extend the OpenAI API. However, in my humble and most professional (/s) opinion, why not add it as an optional parameter? If it's set, great, if not, just ignore it.

Arcitec · 2024-07-27T19:26:30Z

@jmorganca I think I understand your original concern. It seems like Ollama supports emulating both an OpenAI API and the regular Ollama API?

And I see that this pull request was previously modifying openai/openai.go and added the min_p parameter to the OpenAI API emulation, which of course doesn't exist on the real OpenAI API.

Well, the PR author has now force-pushed an update which no longer changes openai.go.

Does this mean that OpenWebUI will still be able to use min_p? It seems like that's handled by llm/server.go?

If so, it seems ready to merge. :)

PS: Really looking forward to using min_p to get more varied output. :)

Robitx · 2024-07-27T19:52:04Z

@jmorganca done

@Arcitec if OpenAI ever decides to add min_p to their API, they are bound by the established conventions

snake_case (top_p => min_p)
having optional flags with default not breaking current usage (min_p = 0)
and exposure on the same level as other options in the same category (root of the payload)

as @kubernetes-bad pointed out with examples, everyone is following this convention for similar opt parameters and I don't see OpenAI needlessly breaking API for existing customers any time soon..

But the relevant commit keeping the convention is gone and whoever is willing can try to fight this battle in another PR after this one gets merged.

LumiWasTaken · 2024-07-27T20:10:35Z

Well then guys! See you in 7 months when it got merged

Arcitec · 2024-07-27T21:42:19Z

Well I found the answer to my question in the first line of openai.go:

ollama/openai/openai.go

Line 1 in f3d7a48

    
           // openai package provides middleware for partial compatibility with the OpenAI REST API

So yes, Ollama has emulation of the OpenAI API to allow OpenAI GUI frontends to connect to Ollama instead.

Therefore it didn't make sense to add min_p to the OpenAI emulation, since it doesn't exist on the real OpenAI API.

And with that removal/fix, thank you @jmorganca for the merge! I look forward to using min_p! :)

Edit: If anyone is confused by what I'm saying, this doesn't affect Ollama's own API, which now supports min_p. :) It was just removed from the OpenAI emulation, where it didn't make sense to add it anyway.

Robitx mentioned this pull request Jan 6, 2024

Add support for llamacpp min_p sampler #1142

Closed

nathanpbell mentioned this pull request Jan 16, 2024

Add support for min_p sampling (original by @Robitx) #2013

Closed

Robitx force-pushed the main branch from 0b9c6a5 to b0aa5e2 Compare January 16, 2024 08:56

Robitx force-pushed the main branch from b0aa5e2 to 92b930a Compare January 16, 2024 19:05

mherrmann3 mentioned this pull request Feb 22, 2024

Support for Min_p #2344

Closed

twalderman mentioned this pull request Mar 11, 2024

Feature Request - Expose Min_p in Ollama Parameters longy2k/obsidian-bmo-chatbot#62

Closed

Robitx force-pushed the main branch from 7755cdf to 61f8206 Compare May 9, 2024 04:52

feat: add support for min_p (resolve ollama#1142)

3d7cd81

Robitx force-pushed the main branch from 61f8206 to 2d24eaa Compare May 22, 2024 10:51

FellowTraveler approved these changes Jun 9, 2024

View reviewed changes

jmorganca reviewed Jul 27, 2024

View reviewed changes

Robitx force-pushed the main branch from 2d24eaa to 3d7cd81 Compare July 27, 2024 19:19

Robitx requested a review from jmorganca July 27, 2024 19:57

jmorganca approved these changes Jul 27, 2024

View reviewed changes

jmorganca merged commit f3d7a48 into ollama:main Jul 27, 2024
12 checks passed

davidmigloz mentioned this pull request Jul 29, 2024

feat: Add support for min_p in Ollama davidmigloz/langchain_dart#512

Merged

feat: add support for min_p (resolve #1142) #1825

feat: add support for min_p (resolve #1142) #1825

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!