8000 Match device driver on name and ignore capabilities by elezar · Pull Request #50717 · moby/moby · GitHub
[go: up one dir, main page]

Skip to content

Conversation

elezar
Copy link
Contributor
@elezar elezar commented Aug 13, 2025

This change ignores requested capabilities when a driver is explicitly requested. This simplifies the logic for selecting a driver and means that users need not spefify redundant capabilities. This aligns the behaviour for all drivers with the CDI driver.

With the exception of the catch-all "gpu" capability the remaining capabilities are only relevant for the "nvidia" driver.

See also the discussion in #50099

- What I did

Changed the logic for selecting a device driver to ignore capabilities when selecting a driver by name. If no driver name is specified, the current behaviour remains and an appropriate driver is selected based on the set of required capabilities.

- How I did it

Updated the handleDevice implementation

- How to verify it

When running a modified daemon in debug mode on a system with nvidia devices (and therefore an nvidia driver registered):

  1. Confirm that the standard flag works as expected:
$ docker run --rm -ti --gpus all ubuntu nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-4cf8db2d-06c0-7d70-1a51-e59b25b2c16c)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-4404041a-04cf-1ccf-9e70-f139a9b1e23c)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-79a2ba02-a537-ccbf-2965-8e9d90c0bd54)
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-662077db-fa3f-0d8f-9502-21ab0ef058a2)
GPU 4: NVIDIA A100-SXM4-40GB (UUID: GPU-ec9d53cc-125d-d4a3-9687-304df8eb4749)
GPU 5: NVIDIA A100-SXM4-40GB (UUID: GPU-3eb87630-93d5-b2b6-b8ff-9b359caf4ee2)
GPU 6: NVIDIA A100-SXM4-40GB (UUID: GPU-8216274a-c05d-def0-af18-c74647300267)
GPU 7: NVIDIA A100-SXM4-40GB (UUID: GPU-b1028956-cfa2-0990-bf4a-5da9abb51763)

Check the logs:

DEBU[2025-08-13T14:18:46.795782525Z] Selecting device driver by capabilities       capabilities="map[requested:[[gpu]] selected:[gpu]]" driver=nvidia
  1. Confirm that selecting a driver by name bypasses capabilities:
$ docker run --rm -ti --gpus "all,driver=nvidia" ubuntu nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-4cf8db2d-06c0-7d70-1a51-e59b25b2c16c)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-4404041a-04cf-1ccf-9e70-f139a9b1e23c)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-79a2ba02-a537-ccbf-2965-8e9d90c0bd54)
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-662077db-fa3f-0d8f-9502-21ab0ef058a2)
GPU 4: NVIDIA A100-SXM4-40GB (UUID: GPU-ec9d53cc-125d-d4a3-9687-304df8eb4749)
GPU 5: NVIDIA A100-SXM4-40GB (UUID: GPU-3eb87630-93d5-b2b6-b8ff-9b359caf4ee2)
GPU 6: NVIDIA A100-SXM4-40GB (UUID: GPU-8216274a-c05d-def0-af18-c74647300267)
GPU 7: NVIDIA A100-SXM4-40GB (UUID: GPU-b1028956-cfa2-0990-bf4a-5da9abb51763)

Check the logs:

DEBU[2025-08-13T14:20:06.294560418Z] Selecting device driver by driver name; possibly ignoring capabilities  capabilities="[[gpu]]" driver=nvidia
  1. Confirm that a driver that is not fails:
$ docker run --rm -ti --gpus "all,driver=amd" ubuntu nvidia-smi -L
docker: Error response from daemon: could not select device driver "amd" with capabilities: [[gpu]]

Run 'docker run --help' for more information
DEBU[2025-08-13T14:20:44.913657033Z] Selecting device driver by capabilities       capabilities="map[requested:[[gpu]] selected:[gpu]]" driver=nvidia
  1. Ensure that non-matching capabilities still return an error:
$ docker run --rm -ti --gpus "all,capabilities=foo" ubuntu nvidia-smi -L
docker: Error response from daemon: could not select device driver "" with capabilities: [[foo gpu]]

Run 'docker run --help' for more information
  1. Ensure that non-matching capabilities are ignored when matching by driver:
$ docker run --rm -ti --gpus "all,capabilities=foo,driver=nvidia" ubuntu nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-4cf8db2d-06c0-7d70-1a51-e59b25b2c16c)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-4404041a-04cf-1ccf-9e70-f139a9b1e23c)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-79a2ba02-a537-ccbf-2965-8e9d90c0bd54)
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-662077db-fa3f-0d8f-9502-21ab0ef058a2)
GPU 4: NVIDIA A100-SXM4-40GB (UUID: GPU-ec9d53cc-125d-d4a3-9687-304df8eb4749)
GPU 5: NVIDIA A100-SXM4-40GB (UUID: GPU-3eb87630-93d5-b2b6-b8ff-9b359caf4ee2)
GPU 6: NVIDIA A100-SXM4-40GB (UUID: GPU-8216274a-c05d-def0-af18-c74647300267)
GPU 7: NVIDIA A100-SXM4-40GB (UUID: GPU-b1028956-cfa2-0990-bf4a-5da9abb51763)

Check the logs

DEBU[2025-08-13T14:27:12.029915404Z] Selecting device driver by driver name; possibly ignoring capabilities  capabilities="[[foo gpu]]" driver=nvidia

- Human readable description for the release notes

Prefer explicit device driver name over GPU capabilities when selecting the device driver with `docker run --gpus`

- A picture of a cute animal (not mandatory but encouraged)

@elezar elezar force-pushed the match-on-device-driver-name branch from 702bbb7 to 80f1b0b Compare August 13, 2025 14:44
"driver": req.Driver,
"capabilities": req.Capabilities,
}).Debugf("Selecting device driver by driver name; possibly ignoring capabilities")
return dd.updateSpec(spec, &deviceInstance{req: req})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thaJeztah I moved this over from #50099 since I think it may be easier to reason about as a standalone change following the discussion in #50099 (comment).

This change ignores requested capabilities when a driver is explicitly
requested. This simplifies the logic for selecting a driver and means
that users need not spefify redundant capabilities.

With the exception of the catch-all "gpu" capability the remaining
capabilities are only relevant for the "nvidia" driver.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Copy link
Contributor
@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT

"requested": req.Capabilities,
"selected": selected,
},
}).Debugf("Selecting device driver by driver name; possibly ignoring capabilities")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
}).Debugf("Selecting device driver by driver name; possibly ignoring capabilities")
}).Debug("Selecting device driver by driver name; possibly ignoring capabilities")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in #50228

Copy link
Member
@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah merged commit b87b7c5 into moby:master Aug 14, 2025
212 of 215 checks passed
@thompson-shaun thompson-shaun moved this from New to Complete in 🔦 Maintainer spotlight Aug 14, 2025
@thaJeztah thaJeztah mentioned this pull request Aug 14, 2025
@vvoland vvoland added the kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. label Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api API impact/changelog kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/2-code-review
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants
0