-
Notifications
You must be signed in to change notification settings - Fork 18.8k
[WIP] api: add a GPU resource type #37434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
api/types/gpu/gpu.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is pretty confusing ;)
There's already some work (in swarmkit) around "GenericResources". https://godoc.org/github.com/docker/docker/api/types/swarm#GenericResource It's a bit more abstract because it's designed to be pretty loose as resources are defined by admins for a given node and the scheduler matches the resources defined on the node with resources defined in the service.... so maybe not necessarily the best fit. |
Yes, the But those are not major problems, we could work with the existing |
So the fundamental question here is:
I'm fine either ways. |
I think a GPU type works fine and is much simpler to deal with. |
f45a887
to
ad193a6
Compare
Updated my PR, we now have the following:
I don't think we need more than that today. If we agree on the API, I can start to flesh out the next step: integration in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure what makes sense from the GPU side of things, but I'd like to make sure the API does not limit how we can use this later on (or to what the proposed docker CLI solution can handle).
Would make make sense to have the API be GPUs []gpu.Resource
where a gpu.Resource
is:
package gpu
type Resource {
Type string // e.g. 'nvidia'
ID string // GPU UUID... whatever the vendor defines
Options map[string]string
}
Also this may not require it's own package since there's not really much to it.
Anyway as I said I'm not sure what makes sense from the vendor perspective here but this seems like it makes the API a bit more extensible and the complexity of it can be handled by the client.
api/types/container/host_config.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This limits the API quite a bit.
What do you think about using a slice here and then we can validate as needed in code.
I have a concern with this approach:
Some options only make sense if they are applied to all GPUs from the same vendor. For instance, for us, the following would be non-sensical:
The My API above assumes that other vendors will be in the same situation, so I wanted to be able to enforce that at the API level. With this knowledge, if you agree to put this burden on the vendor implementation, then your approach is fine. |
@flx42 Thanks for the feedback! So when you say "all" of the GPU's, does that mean literally all GPU's on the system or all the selected GPUs? Do we know anyone from AMD that would be able to provide input here as well? |
All the selected GPUs for the Maybe @kad can chime in on the Intel side for OpenCL, maybe they could have a similar option for selecting/modifying the OpenCL ICD. |
@flx42 It seems like your structs work well (but still would prefer a slice over a map in hostconfig, and leave it up to the implementation to validate). Perhaps we can change |
Just to confirm, to avoid re-triggering the CI with a push, are you OK with the following?
|
Is |
Well the existing moby/api/types/container/host_config.go Lines 398 to 399 in 25ec60d
I think it makes sense to consider a GPU as a |
The |
Ah I see, yeah that's perfect. |
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
ad193a6
to
b673450
Compare
Made the changes, also added Does anyone disagree with the current approach? |
Assuming we are OK with this design, I will go ahead with the corresponding implementation in the CLI (`--gpus) and in dockerd (for calling into the containerd features). |
Codecov Report
@@ Coverage Diff @@
## master #37434 +/- ##
==========================================
+ Coverage 35.31% 35.78% +0.46%
==========================================
Files 616 617 +1
Lines 48851 51403 +2552
==========================================
+ Hits 17254 18396 +1142
- Misses 29225 30422 +1197
- Partials 2372 2585 +213 |
@yongtang looks that this one passes tests. PTAL |
/cc @cpuguy83 |
Mainly concerned about other GPU vendors chiming in, but no one has in all this time... moving to code review. |
CLI discussion: docker/cli#1200
First draft of what the API might look like. I don't believe we need something more complex than that.
Discussion points
Resources
(plural) is all the GPUs from all vendor.Resource
is all the GPUs from a single vendor. Thoughts?GPU
struct later without breaking existing code.--gpus nvidia
. However at this point it's not possible to get the actual list of GPU ids. We have two options: a) ifgpu.Resource.GPUs
is empty, then select all GPUs OR b) ifgpu.Resource.GPUs[0].ID == "*"
then select all GPUs.