8000 Release 2.3 manual validations · Issue #123736 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Release 2.3 manual validations #123736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
8 of 12 tasks
atalman opened this issue Apr 10, 2024 · 2 comments
Closed
8 of 12 tasks

Release 2.3 manual validations #123736

atalman opened this issue Apr 10, 2024 · 2 comments
Assignees
Labels
oncall: releng In support of CI and Release Engineering triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@atalman
Copy link
Contributor
atalman commented Apr 10, 2024

🐛 Describe the bug

We need to make sure that:

Versions

2.3.0

@atalman atalman converted this from a draft issue Apr 10, 2024
@PaliC
Copy link
Contributor
PaliC commented Apr 10, 2024

Validated pypi binaries with slimmed dependencies are usable in standard AWS containers (amazonlinux:2 regression in 1.13)

Ran on cuda 12.1

System

bash-4.2$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"

Smoke test output

bash-4.2$ python smoke_test.py
torch: 2.3.0+cu121
Skip version check for channel None as stable version is None
Testing smoke_test_conv2d
Testing smoke_test_conv2d with cuda
Testing smoke_test_conv2d with cuda for torch.float16
/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/3/envs/pytorch-test/lib/python3.10/site-packages/torch/nn/modules/conv.py:456: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
  return F.conv2d(input, weight, bias, self.stride,
Testing smoke_test_conv2d with cuda for torch.float32
Testing smoke_test_conv2d with cuda for torch.float64
Testing smoke_test_linalg on cpu
Testing smoke_test_linalg on cuda
Testing smoke_test_linalg with cuda for torch.float32
Testing smoke_test_linalg with cuda for torch.float64
Output:
torchvision: 0.18.0+cu121
torch.cuda.is_available: True
torch.ops.image._jpeg_version() = 62
Is torchvision usable? True
German shepherd (cpu): 37.6%
German shepherd (cuda): 37.6%
/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/3/envs/pytorch-test/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:124: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
torch.compile model output: torch.Size([1, 1000])


Path does not exist: /home/ec2-user/builder/test/smoke_test/audio
Output:
Skipping ffmpeg test.
Smoke test passed.


torchvision CUDA: 12010
torchaudio CUDA: 12010
torch cuda: 12.1
torch cudnn: 8902
cuDNN enabled? True
torch nccl version: (2, 20, 5)
Testing smoke_test_compile for torch.float16
False
Testing smoke_test_compile for torch.float32
True
Testing smoke_test_compile for torch.float64
True
Testing smoke_test_compile with mode 'max-autotune' for torch.float32
AUTOTUNE convolution(64x1x28x28, 32x1x3x3)
  convolution 0.0225 ms 100.0%
  triton_convolution_0 0.0276 ms 81.5%
  triton_convolution_3 0.0328 ms 68.8%
  triton_convolution_1 0.0348 ms 64.7%
  triton_convolution_4 0.0358 ms 62.9%
  triton_convolution_5 0.0389 ms 57.9%
  triton_convolution_2 0.0502 ms 44.9%
SingleProcess AUTOTUNE takes 2.7410 seconds
AUTOTUNE convolution(64x32x26x26, 64x32x3x3)
  triton_convolution_7 0.0942 ms 100.0%
  convolution 0.0983 ms 95.8%
  triton_convolution_6 0.1188 ms 79.3%
  triton_convolution_12 0.1270 ms 74.2%
  triton_convolution_9 0.1300 ms 72.4%
  triton_convolution_10 0.1710 ms 55.1%
  triton_convolution_11 0.1853 ms 50.8%
  triton_convolution_8 0.2140 ms 44.0%
SingleProcess AUTOTUNE takes 4.1028 seconds
AUTOTUNE addmm(64x1, 64x9216, 9216x1)
  addmm 0.0164 ms 100.0%
  triton_mm_17 0.0604 ms 27.1%
  triton_mm_16 0.0666 ms 24.6%
  triton_mm_15 0.0707 ms 23.2%
  triton_mm_18 0.0778 ms 21.1%
  triton_mm_14 0.1004 ms 16.3%
  triton_mm_19 0.1024 ms 16.0%
  triton_mm_13 0.1935 ms 8.5%
  triton_mm_20 0.2560 ms 6.4%
  triton_mm_21 0.2980 ms 5.5%
SingleProcess AUTOTUNE takes 2.8861 seconds
Testing test_cuda_runtime_errors_captured
../aten/src/ATen/native/cuda/TensorCompare.cu:106: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `input[0] != 0` failed.
Caught CUDA exception with success: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

@mikaylagawarecki mikaylagawarecki added oncall: releng In support of CI and Release Engineering triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 11, 2024
@juliagmt-google
Copy link
Collaborator

Validate docker release builds: https://github.com/pytorch/builder/actions/workflows/validate_docker_images.yml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: releng In support of CI and Release Engineering triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants
0