8000 dockerd gpu support by flx42 · Pull Request #37504 · moby/moby · GitHub
[go: up one dir, main page]

Skip to content

Conversation

flx42
Copy link
Contributor

API discussion: #37434
CLI discussion: docker/cli#1200

This is a simple prototype for GPU support in dockerd. It uses a simple in-tree plugin mechanism similar to the graphdriver plugins in daemon/graphdriver/..., but stateless.

Plugin API

Since plugins have access to the full OCI spec, I expect plugins to do one (or multiple) of the following:

  • Add mounts/devices/environments variables to the OCI spec.
  • Add hooks to the OCI spec.
  • Set OCI annotations that will be picked up later by a component closer to the container runtime implementation (containerd, runc, kata?). This is for complex cases that are largely outside of the OCI spec.

Is this acceptable? Or do you want to restrict which part of the spec the plugins can edit?

Discussion

The demo implementation for the nvidia plugin in this PR relies on the existing nvidia-container-runtime-hook binary since I'm not sure yet how you think we should leverage the recently added GPU support in containerd.

  1. Should it be part of this GPU plugin API as another handler?
import "github.com/containerd/containerd/oci"
import "github.com/opencontainers/runtime-spec/specs-go"
func ContainerdHandler(spec specs.Spec, opts oci.SpecOpts)

This feel a bit awkward since we're not sure what other implementations could look like.
If we do this, we technically could get rid of OCI annotations since everything is handled in dockerd, but I think it's good to keep a trace of what happened in config.json, even if it's plain annotations.

  1. Should we try to standardize on the format of the OCI annotations for GPUs and thus add the code for handling these annotations in containerd directly?

@GordonTheTurtle GordonTheTurtle added the dco/no Automatically set by a bot when one of the commits lacks proper signature label Jul 20, 2018
@GordonTheTurtle
Copy link

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "dockerd-gpu-support" git@github.com:flx42/moby.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842353798760
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@flx42 flx42 force-pushed the dockerd-gpu-support branch from 724547a to 70214b0 Compare July 20, 2018 01:21
@GordonTheTurtle GordonTheTurtle removed the dco/no Automatically set by a bot when one of the commits lacks proper signature label Jul 20, 2018
@flx42
Copy link
Contributor Author
flx42 commented Jul 20, 2018

@cpuguy83 @crosbymichael

@flx42 flx42 force-pushed the dockerd-gpu-support branch from 70214b0 to eade168 Compare July 25, 2018 01:31
@GordonTheTurtle GordonTheTurtle added the dco/no Automatically set by a bot when one of the commits lacks proper signature label Jul 25, 2018
@flx42 flx42 force-pushed the dockerd-gpu-support branch from eade168 to 1c66597 Compare July 25, 2018 01:31
@GordonTheTurtle GordonTheTurtle removed the dco/no Automatically set by a bot when one of the commits lacks proper signature label Jul 25, 2018
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just call this code in the handler. You can call the spec opts like any other function. You can just make a simple combinator for this.

in your gpu handler dont use annotations and just process the fields like:

func applyOpts(s *specs.Spec, opts ...SpecOpts) {
    for _, o := range opts {
         o(ctx, nil, nil, s)
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you just build your gpu opts and call nvidia.WithGPUs(nvidiaOpts)(ctx, nil, nil, spec)

@flx42 flx42 force-pushed the dockerd-gpu-support branch 3 times, most recently from c4a42d3 to b6f5ffe Compare August 2, 2018 04:30
@flx42 flx42 requested a review from tianon as a code owner August 2, 2018 04:30
@codecov
Copy link
codecov bot commented Aug 2, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@a3e78ca). Click here to learn what that means.
The diff coverage is 0%.

@@            Coverage Diff            @@
##             master   #37504   +/-   ##
=========================================
  Coverage          ?    35.6%           
=========================================
  Files             ?      611           
  Lines             ?    44971           
  Branches          ?        0           
=========================================
  Hits              ?    16014           
  Misses            ?    26744           
  Partials          ?     2213

@flx42
Copy link
Contributor Author 8000
flx42 commented Aug 3, 2018

This new feature is now behind the experimental flag of dockerd, as requested.
It's in a functional state today and the approach is straightforward, please help review the design/code. I just need to add a few unit tests.

@flx42 flx42 changed the title [WIP] dockerd gpu support dockerd gpu support Aug 3, 2018
@flx42
Copy link
Contributor Author
flx42 commented Aug 3, 2018

Removed the WIP tag from the title

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think you need to do this in a func init()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to double-check the order of initialization. I did the same as:
https://github.com/moby/moby/blob/master/daemon/graphdriver/driver.go#L148-L150

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use an error from the errdefs package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the experimental check should be higher up in the stack, and greatly simplify this function.

< 8000 details class="details-overlay details-reset position-relative d-inline-block"> Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and do you prefer passing a boolean to setResources instead of the daemon object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I would just not call setGPUResources if the daemon is not experimental.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I'm talking about the function one step up: setResources, this function needs to be called in all cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... a little gross but... maybe we can do setResources and setResourcesExperimental?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

Copy link
Contributor Author
67E6

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it says on the tin, sorry for the revendoring noise :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default behavior from Michael's nvidia plugin is to use containerd as the binary. The hook is executed as:

containerd oci-hook -- nvidia-container-cli ... [args] ...

Maybe it should have been a separate binary, but that's the status right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this case we need docker-containerd, not containerd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errdefs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context should never be passed as a nil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was mentioning above, do you want to pass a boolean here instead of the daemon?

flx42 added 3 commits August 8, 2018 15:23
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
@flx42 flx42 force-pushed the dockerd-gpu-support branch from b6f5ffe to 88eff77 Compare August 8, 2018 22:47
@flx42
Copy link
Contributor Author
flx42 commented Aug 8, 2018

Updated my PR to make the changes requested by @cpuguy83. I dropped the idea of setResourcesExperimental since it was too ugly, I pass a boolean flag to setResources.

The vendoring is still hacky since this is not the right place to do it I believe, and my understanding is that we would want to wait for a new minor/patch version of containerd anyway.

EDIT: messed up the commit split, fixing now.

flx42 added 2 commits August 8, 2018 15:54
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
@flx42 flx42 force-pushed the dockerd-gpu-support branch from 88eff77 to 658675e Compare August 8, 2018 22:56
cpuguy83
< 67DE div data-view-component="true" class="TimelineItem-body d-flex flex-column flex-md-row flex-justify-start">
}

if len(r.GPUs) > 0 {
if !experimental {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to push this experimental check up the stack so we don't have to deal with it here?
Also, do you really feel this should be experimental? Are you worried the API is not sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to push this experimental check up the stack so we don't have to deal with it here?

I initially considered what you said previously, that is to say to create a function setResourcesExperimental and put it here after setResources:

moby/daemon/oci_linux.go

Lines 737 to 739 in 658675e

if err := setResources(&s, c.HostConfig.Resources, daemon.HasExperimental()); err != nil {
return nil, fmt.Errorf("linux runtime spec resources: %v", err)
}

Then I had two options:

if daemon.HasExperimental() {
        if err := setResourcesExperimental(...) {
                return nil, fmt.Errorf(...)
        }
}

That's not great since we lose the ability to warn/error if GPUs are requested but the experimental flag is not set.

The other option would have been:

if err := setResourcesExperimental(...,  daemon.HasExperimental()) {
        return nil, fmt.Errorf(...)
}

But it's bizarre to have a function setResourcesExperimental and pass a boolean flag for this function to really do something.

Let me know if you see another option.

@flx42
Copy link
Contributor Author
flx42 commented Aug 13, 2018

Also, do you really feel this should be experimental? Are you worried the API is not sufficient?

Moving this conversation to the main thread since it's a more fundamental question.
@crosbymichael asked me to put this PR behind this experimental flag, the rationale was that the multiple APIs might be unstable:

  1. The CLI API, i.e. --gpus. This is clearly something we don't want to break if it's not behind an experimental flag. I think we mostly agreed on the format, except maybe for how it would map to Compose.
  2. The docker API. Same, we don't want to break this API if it's not experimental. But I think what we have today is simple and extensible, so that should be fine.
  3. The internal plugin API. The input is the OCI runtime spec, the output is a modified OCI runtime spec. This is not very likely to change, and it's actually not a big deal if it does since it's not a public API.

But for all of the above, with my implementation being the sole client/consumer, I would understand gating that behind an experimental flag. I would be happy to change it back.

@crosbymichael
Copy link
Contributor

@thaJeztah how does vendoring work when there are client and server changes?

@thaJeztah
Copy link
Member

@crosbymichael you mean if changes are needed both in the CLI and on the daemon side?

@flx42
Copy link
Contributor Author
flx42 commented Aug 15, 2018

I guess that's what he meant. My guess is that the ordering needs to be:

  1. Release of a new tagged version of containerd (e.g. 1.1.3 or 1.2.0)
  2. Revendoring of containerd in moby to get the latest GPU patches from @crosbymichael (bundled in this PR in a very hacky way)
  3. Merge of the API change: [WIP] api: add a GPU resource type #37434 (also bundled in this PR)
  4. Merge of the dockerd implementation for this new API (this PR)
  5. Merge of the CLI changes: [RFC] GPU support in CLI docker/cli#1200

Is that correct?

@flx42
Copy link
Contributor Author
flx42 commented Aug 20, 2018

Ping @thaJeztah

@crosbymichael do you confirm we should gate this feature behind the experimental flag?

@flx42
Copy link
Contributor Author
flx42 commented Aug 28, 2018

@thaJeztah @crosbymichael since I imagine we're now close to the 18.09 release and the built containerd is still 1.1.2, are we targeting this feature for docker CE 19.03?

@cpuguy83
Copy link
Member

@flx42 Yes.

@olljanat
Copy link
Contributor

@flx42 this one needs to be rebased

@derek derek bot added the status/failing-ci Indicates that the PR in its current state fails the test suite label Dec 22, 2018
@tiborvass
Copy link
Contributor

Closed by #38828

@tiborvass tiborvass closed this Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/failing-ci Indicates that the PR in its current state fails the test suite status/2-code-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0