Add proper implementation of ollama's /api/chat #13777

R-Dson · 2025-05-25T14:14:44Z

Apparently, the /api/chat endpoint that Ollama provide is the expected /chat/completions. These changes makes the response to /api/chat behave the same.

Here is an example of the response from Ollama on /api/chat

curl http://localhost:11434/api/chat -s -d '{
   "model": "gemma3:1b",
   "messages": [
     {
       "role": "user",
       "content": "Hello! only echo the same as response."
     }
   ]
 }' 
{"model":"gemma3:1b","created_at":"2025-05-25T13:59:34.988660245Z","message":{"role":"assistant","content":"Hello"},"done":false}
{"model":"gemma3:1b","created_at":"2025-05-25T13:59:35.06925362Z","message":{"role":"assistant","content":"!"},"done":false}
{"model":"gemma3:1b","created_at":"2025-05-25T13:59:35.145075198Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":2201092987,"load_duration":955128944,"prompt_eval_count":18,"prompt_eval_duration":1088166698,"eval_count":3,"eval_duration":157155494}

And with streaming off

curl http://localhost:11434/api/chat -s -d '{
   "model": "gemma3:1b",
   "messages": [
     {
       "role": "user",
       "content": "Hello! only echo the same as response."
     }
   ], 
   "stream": false
 }'
{"model":"gemma3:1b","created_at":"2025-05-25T14:01:11.931155269Z","message":{"role":"assistant","content":"Hello!"},"done_reason":"stop","done":true,"total_duration":257075482,"load_duration":32849085,"prompt_eval_count":18,"prompt_eval_duration":76913200,"eval_count":3,"eval_duration":146920846}

With the updated code the responses looks like this:

curl http://localhost:8099/api/chat -s -d '{
   "model": "llama-server",
   "messages": [
     {
       "role": "user",
       "content": "Hello! only echo the same as response."
     }
   ]
 }' 
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":"Hello"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":"!"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":" only"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":" echo"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":" the"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":" same"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":" as"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":" response"},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":"."},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":""},"done":false}
{"model":"gemma3:1b","created_at":1748181808,"message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":1351000000,"load_duration":766000000,"prompt_eval_count":24,"prompt_eval_duration":766000000,"eval_count":10,"eval_duration":585000000,"prompt_tokens":24,"completion_tokens":10,"total_tokens":34,"id_slot":0,"id":"chatcmpl-LvYe3EkCxX6Kd17WkvkQFX5Vxk9Wa16f","system_fingerprint":"b5485-de2ef53a","object":"chat.completion"}

Stream off:

curl http://localhost:8099/api/chat -s -d '{
   "model": "llama-server",
   "messages": [
     {
       "role": "user",
       "content": "Hello! only echo the same as response."
     }
   ], "stream": false
 }'
{"model":"gemma3:1b","created_at":1748181832,"message":{"role":"assistant","content":"Hello!"},"done_reason":"stop","done":true,"total_duration":149000000,"load_duration":51000000,"prompt_eval_count":24,"prompt_eval_duration":51000000,"eval_count":3,"eval_duration":98000000,"prompt_tokens":24,"completion_tokens":3,"total_tokens":27,"id_slot":0,"id":"chatcmpl-9PIo6ot5rkDHmef023S8jOAonpzV30jH","system_fingerprint":"b5485-de2ef53a","object":"chat.completion"}

Not exactly the same, as the date format is different but should be close enough. Since it is a string we could also convert it or simply use a dummy value like "".

This change can help in cases where the Openai compatible api is not supported, but Ollama is.

ngxson · 2025-05-26T10:12:35Z

Could you explain why this API is needed? AFAIK this API is specific to ollama and OAI doesn't have it (they moved to the new Response API instead)

ngxson · 2025-05-26T10:16:13Z

tools/server/server.cpp

    OAICOMPAT_TYPE_CHAT,
    OAICOMPAT_TYPE_COMPLETION,
    OAICOMPAT_TYPE_EMBEDDING,
+    OAICOMPAT_TYPE_API_CHAT


If OAI does not support this API, then having the prefix OAICOMPAT here will be very confused for other contributors who doesn't know much about the story of ollama.

Tbh, I think this is not very necessary, as most applications nowadays will support OAI-compat API. If they don't, you can have a proxy to convert the API, I bet someone already made that.

Also, since OAI introduced the new Response API, I think we should keep things simple by only supporting OAI specs (which has good support for reasoning and multimodal models). The API for ollama can be added if more users ask for it

You are correct with that name and it should be changed.

If the endpoint is not part of the OAI API then should it instead be removed completely?

This was mostly added for the cases where someone want to swap out ollama for llama-server.

Could you explain why this API is needed? AFAIK this API is specific to ollama and OAI doesn't have it (they moved to the new Response API instead)

Tbh, I think this is not very necessary, as most applications nowadays will support OAI-compat API.

I second that. The main question that should be asked is if these ollama APIs enable any sort of new useful functionality, compared to the existing standard APIs. If the answer is no, then these APIs should not exist in the first place and we should not support them.

As an example, we introduced the /infill API in llama-server, because the existing /v1/completions spec was not enough to support the needs for advanced local fill-in-the-middle use-cases (#9787).

Currently, there is rudimentary support added for /api/show, /api/tags and /api/chat mainly because VS Code made the mistake to ~~support~~ require them. As soon as this is fixed (microsoft/vscode#249605), these endpoints should be removed from llama-server.

Thank you, that makes it clear why they currently exist.

R-Dson · 2025-05-26T10:40:46Z

Could you explain why this API is needed? AFAIK this API is specific to ollama and OAI doesn't have it (they moved to the new Response API instead)

Since /api/chat is already in there and it is most likely expected to work how Ollama uses it, and in cases where a tool or software have implemented Ollama support only (and not openai api), then this endpoint may be used.

R-Dson · 2025-05-26T21:48:49Z

I'll close this for now, and re-open if the issue/demand comes up in the future.

Add proper implementation of ollama's /api/chat

24d263d

R-Dson requested a review from ngxson as a code owner May 25, 2025 14:14

github-actions bot added examples server labels May 25, 2025

ngxson reviewed May 26, 2025

View reviewed changes

R-Dson closed this May 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add proper implementation of ollama's /api/chat #13777

Add proper implementation of ollama's /api/chat #13777

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add proper implementation of ollama's /api/chat #13777

Add proper implementation of ollama's /api/chat #13777

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants