10000 API to get performance status/information about GPU/CPU of instance · Issue #9658 · ollama/ollama · GitHub
[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API to get performance status/information about GPU/CPU of instance #9658

Open
trollkarlen opened this issue Mar 11, 2025 · 2 comments
Open
Assignees
Labels
feature request New feature or request

Comments

@trollkarlen
Copy link

This is a feature request but may mitigate instances that break due to current and feature bugs.

I use a proxy infront of my ollama instances to make it multi user/requests.
But some times the ollama server loses the connection with the GPU, and then the perfomance reduces alot.

This happens some times due to the cgroup issue that can be mitigated with the docker/daemon.json

"exec-opts": [
        "native.cgroupdriver=cgroupfs"
    ]

And somtimes due to other reasons(GPU hangs mm).

So it would be nice to be able to query the instance throught the API.
To get the performance of the node CPU/GPU wise, this way the proxy can detect the performance of the instances and detect when performance decline. This data can be used from both a proxy and/or client.

This feature request is on the same theme and maybe could be combined with this request, to have one place to get information about the node.
#2004

@trollkarlen trollkarlen added the feature request New feature or request label Mar 11, 2025
@rick-github
Copy link
Collaborator

#3144

WRT the GPU hanging, it might be that the model has lost coherence and is "rambling". This can be limited by setting num_predict.

@trollkarlen
Copy link
Author

#3144

Looks like the no gpus and no gpus avalible is missing in metrics, but i guess it can be added.
Also it looks like the feature request is stalled :/

WRT the GPU hanging, it might be that the model has lost coherence and is "rambling". This can be limited by setting num_predict.

Thanx will look in to the num_predict, and see if it helps mitigate the issue with offline GPU:s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants
0