Release 2.3 manual validations #123736

atalman · 2024-04-10T16:55:46Z

PaliC · 2024-04-10T23:29:17Z

Validated pypi binaries with slimmed dependencies are usable in standard AWS containers (amazonlinux:2 regression in 1.13)

Ran on cuda 12.1

System

bash-4.2$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"

Smoke test output

bash-4.2$ python smoke_test.py
torch: 2.3.0+cu121
Skip version check for channel None as stable version is None
Testing smoke_test_conv2d
Testing smoke_test_conv2d with cuda
Testing smoke_test_conv2d with cuda for torch.float16
/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/3/envs/pytorch-test/lib/python3.10/site-packages/torch/nn/modules/conv.py:456: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
  return F.conv2d(input, weight, bias, self.stride,
Testing smoke_test_conv2d with cuda for torch.float32
Testing smoke_test_conv2d with cuda for torch.float64
Testing smoke_test_linalg on cpu
Testing smoke_test_linalg on cuda
Testing smoke_test_linalg with cuda for torch.float32
Testing smoke_test_linalg with cuda for torch.float64
Output:
torchvision: 0.18.0+cu121
torch.cuda.is_available: True
torch.ops.image._jpeg_version() = 62
Is torchvision usable? True
German shepherd (cpu): 37.6%
German shepherd (cuda): 37.6%
/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/3/envs/pytorch-test/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:124: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
torch.compile model output: torch.Size([1, 1000])


Path does not exist: /home/ec2-user/builder/test/smoke_test/audio
Output:
Skipping ffmpeg test.
Smoke test passed.


torchvision CUDA: 12010
torchaudio CUDA: 12010
torch cuda: 12.1
torch cudnn: 8902
cuDNN enabled? True
torch nccl version: (2, 20, 5)
Testing smoke_test_compile for torch.float16
False
Testing smoke_test_compile for torch.float32
True
Testing smoke_test_compile for torch.float64
True
Testing smoke_test_compile with mode 'max-autotune' for torch.float32
AUTOTUNE convolution(64x1x28x28, 32x1x3x3)
  convolution 0.0225 ms 100.0%
  triton_convolution_0 0.0276 ms 81.5%
  triton_convolution_3 0.0328 ms 68.8%
  triton_convolution_1 0.0348 ms 64.7%
  triton_convolution_4 0.0358 ms 62.9%
  triton_convolution_5 0.0389 ms 57.9%
  triton_convolution_2 0.0502 ms 44.9%
SingleProcess AUTOTUNE takes 2.7410 seconds
AUTOTUNE convolution(64x32x26x26, 64x32x3x3)
  triton_convolution_7 0.0942 ms 100.0%
  convolution 0.0983 ms 95.8%
  triton_convolution_6 0.1188 ms 79.3%
  triton_convolution_12 0.1270 ms 74.2%
  triton_convolution_9 0.1300 ms 72.4%
  triton_convolution_10 0.1710 ms 55.1%
  triton_convolution_11 0.1853 ms 50.8%
  triton_convolution_8 0.2140 ms 44.0%
SingleProcess AUTOTUNE takes 4.1028 seconds
AUTOTUNE addmm(64x1, 64x9216, 9216x1)
  addmm 0.0164 ms 100.0%
  triton_mm_17 0.0604 ms 27.1%
  triton_mm_16 0.0666 ms 24.6%
  triton_mm_15 0.0707 ms 23.2%
  triton_mm_18 0.0778 ms 21.1%
  triton_mm_14 0.1004 ms 16.3%
  triton_mm_19 0.1024 ms 16.0%
  triton_mm_13 0.1935 ms 8.5%
  triton_mm_20 0.2560 ms 6.4%
  triton_mm_21 0.2980 ms 5.5%
SingleProcess AUTOTUNE takes 2.8861 seconds
Testing test_cuda_runtime_errors_captured
../aten/src/ATen/native/cuda/TensorCompare.cu:106: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `input[0] != 0` failed.
Caught CUDA exception with success: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

juliagmt-google · 2024-04-11T18:52:54Z

Validate docker release builds: https://github.com/pytorch/builder/actions/workflows/validate_docker_images.yml

atalman added this to Release Milestone Review Apr 10, 2024

atalman converted this from a draft issue Apr 10, 2024

atalman assigned atalman, juliagmt-google and PaliC Apr 10, 2024

mikaylagawarecki added oncall: releng In support of CI and Release Engineering triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 11, 2024

atalman closed this as completed Apr 24, 2024

atalman removed this from Release Milestone Review May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release 2.3 manual validations #123736

Release 2.3 manual validations #123736

Uh oh!

Uh oh!

Release 2.3 manual validations #123736

Release 2.3 manual validations #123736

Comments

Uh oh!

🐛 Describe the bug

Versions

Uh oh!

Uh oh!