8000 CUDA not found in NVIDIA runners · Issue #153760 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content 8000
CUDA not found in NVIDIA runners #153760
@wdvr

Description

@wdvr

Current Status

mitigated. Some jobs will have failures and need to be restarted. Any job after 5/16 2:20pm PT should have the correct runtime

Error looks like

*Job failures with: No CUDA runtime is found

Incident timeline (all times pacific)

Include when the incident began, when it was detected, mitigated, root caused, and finally closed.

started: 5/16 7:15am PT
detected: 5/16 11:48am PT
resolved: 5/16 2:20pm PT

Root cause

An upgrade of nvidia-container-toolkit container

Mitigation

We pinned the version of nvidia-container-toolkit - see pytorch/test-infra#6637

follow ups:

cc @seemethere @malfet @pytorch/pytorch-dev-infra

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci: sevcritical failure affecting PyTorch CImodule: ciRelated to continuous integrationmodule: regressionIt used to work, and now it doesn'ttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0