8000 Add proper implementation of ollama's /api/chat by R-Dson · Pull Request #13777 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

Add proper implementation of ollama's /api/chat #13777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

R-Dson
Copy link
Contributor
@R-Dson R-Dson commented May 25, 2025

Apparently, the /api/chat endpoint that Ollama provide is the expected /chat/completions. These changes makes the response to /api/chat behave the same.

Here is an example of the response from Ollama on /api/chat

curl http://localhost:11434/api/chat -s -d '{
   "model": "gemma3:1b",
   "messages": [
     {
       "role": "user",
       "content": "Hello! only echo the same as response."
     }
   ]
 }' 
{"model":"gemma3:1b","created_at":"2025-05-25T13:59:34.988660245Z","message":{"role":"assistant","content":"Hello"},"done":false}
{"model":"gemma3:1b","created_at":"2025-05-25T13:59:35.06925362Z","message":{"role":"assistant","content":"!"},"done":false}
{"model":"gemma3:1b","created_at":"2025-05-25T13:59:35.145075198Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":2201092987,"load_duration":955128944,"prompt_eval_count":18,"prompt_eval_duration":1088166698,"eval_count":3,"eval_duration":157155494}

And with streaming off

curl http://localhost:11434/api/chat -s -d '{
   "model": "gemma3:1b",
   "messages": [
     {
       "role": "user",
       "content": "Hello! only echo the same as response."
     }
   ], 
   "stream": false
 }'
{"model":"gemma3:1b","created_at":"2025-05-25T14:01:11.931155269Z","message":{"role":"assistant","content":"Hello!"},"done_reason":"stop","done":true,"total_duration":257075482,"load_duration":32849085,"prompt_eval_count":18,"prompt_eval_duration":76913200,"eval_count":3,"eval_duration":146920846}

With the updated code the responses looks like this:

curl http://localhost:8099/api/chat -s -d '{
   "model": "llama-server",
   "messages": [
     {
       "role": "user",
       "content": "Hello! only echo the same as response."
     }
   ]
 }' 
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":"Hello"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":"!"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":" only"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":" echo"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":" the"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":" same"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":" as"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":" response"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":"."},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":""},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":1351000000,"load_duration":766000000,"prompt_eval_count":24,"prompt_eval_duration":766000000,"eval_count":10,"eval_duration":585000000,"prompt_tokens":24,"completion_tokens":10,"total_tokens":34,"id_slot":0,"id":"chatcmpl-LvYe3EkCxX6Kd17WkvkQFX5Vxk9Wa16f","system_fingerprint":"b5485-de2ef53a","object":"chat.completion"}

Stream off:

curl http://localhost:8099/api/chat -s -d '{
   "model": "llama-server",
   "messages": [
     {
       "role": "user",
       "content": "Hello! only echo the same as response."
     }
   ], "stream": false
 }'
{"model":"gemma3:1b","created_at":1748181832,"message":{"role":"assistant","content":"Hello!"},"done_reason":"stop","done":true,"total_duration":149000000,"load_duration":51000000,"prompt_eval_count":24,"prompt_eval_duration":51000000,"eval_count":3,"eval_duration":98000000,"prompt_tokens":24,"completion_tokens":3,"total_tokens":27,"id_slot":0,"id":"chatcmpl-9PIo6ot5rkDHmef023S8jOAonpzV30jH","system_fingerprint":"b5485-de2ef53a","object":"chat.completion"}

Not exactly the same, as the date format is different but should be close enough. Since it is a string we could also convert it or simply use a dummy value like "".

This change can help in cases where the Openai compatible api is not supported, but Ollama is.

@ngxson
Copy link
Collaborator
ngxson commented May 26, 2025

Could you explain why this API is needed? AFAIK this API is specific to ollama and OAI doesn't have it (they moved to the new Response API instead)

@@ -77,6 +77,7 @@ enum oaicompat_type {
OAICOMPAT_TYPE_CHAT,
OAICOMPAT_TYPE_COMPLETION,
OAICOMPAT_TYPE_EMBEDDING,
OAICOMPAT_TYPE_API_CHAT
Copy link
Collaborator
@ngxson ngxson May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If OAI does not support this API, then having the prefix OAICOMPAT here will be very confused for other contributors who doesn't know much about the story of ollama.

Tbh, I think this is not very necessary, as most applications nowadays will support OAI-compat API. If they don't, you can have a proxy to convert the API, I bet someone already made that.

Also, since OAI introduced the new Response API, I think we should keep things simple by only supporting OAI specs (which has good support for reasoning and multimodal models). The API for ollama can be added if more users ask for it

Copy link
Contributor Author
@R-Dson R-Dson May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct with that name and it should be changed.

If the endpoint is not part of the OAI API then should it instead be removed completely?

This was mostly added for the cases where someone want to swap out ollama for llama-server.

Copy link
Member
@ggerganov ggerganov May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why this API is needed? AFAIK this API is specific to ollama and OAI doesn't have it (they moved to the new Response API instead)

Tbh, I think this is not very necessary, as most applications nowadays will support OAI-compat API.

I second that. The main question that should be asked is if these ollama APIs enable any sort of new useful functionality, compared to the existing standard APIs. If the answer is no, then these APIs should not exist in the first place and we should not support them.

As an example, we introduced the /infill API in llama-server, because the existing /v1/completions spec was not enough to support the needs for advanced local fill-in-the-middle use-cases (#9787).

Currently, there is rudimentary support added for /api/show, /api/tags and /api/chat mainly because VS Code made the mistake to support require them. As soon as this is fixed (microsoft/vscode#249605), these endpoints should be removed from llama-server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that makes it clear why they currently exist.

@R-Dson
Copy link
Contributor Author
R-Dson commented May 26, 2025

Could you explain why this API is needed? AFAIK this API is specific to ollama and OAI doesn't have it (they moved to the new Response API instead)

Since /api/chat is already in there and it is most likely expected to work how Ollama uses it, and in cases where a tool or software have implemented Ollama support only (and not openai api), then this endpoint may be used.

@R-Dson
Copy link
Contributor Author
R-Dson commented May 26, 2025

I'll close this for now, and re-open if the issue/demand comes up in the future.

@R-Dson R-Dson closed this May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0