Remove set_default_dtype call in jit_metaprogramming_utils.py #69624

kurtamohler · 2021-12-08T18:52:54Z

cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel

pytorch-probot · 2021-12-08T18:52:57Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/kurtam 8000 ohler/pytorch/blob/9b8ca2dce1205a8056960dc43cf7e19aed1f8a08/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-12-08T18:53:00Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/69624
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

❌ 7 New Failures

As of commit 5ffb8d3b0e (more details on the Dr. CI page):

Expand to see more

7/7 failures introduced in this PR

🕵️ 7 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

pull / linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge) (1/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T00:21:45.8947289Z RuntimeError: test_binary_ufuncs failed!

2022-06-21T00:21:44.3149795Z 
2022-06-21T00:21:44.3149982Z FAILED (errors=2, skipped=1421, expected failures=12)
2022-06-21T00:21:44.3150213Z 
2022-06-21T00:21:44.3150347Z Generating XML reports...
2022-06-21T00:21:45.1951681Z Generated XML report: test-reports/python-unittest/test_binary_ufuncs/TEST-TestBinaryUfuncsCPU-20220621001817.xml
2022-06-21T00:21:45.8939585Z Traceback (most recent call last):
2022-06-21T00:21:45.8939874Z   File "test/run_test.py", line 946, in <module>
2022-06-21T00:21:45.8943333Z     main()
2022-06-21T00:21:45.8943528Z   File "test/run_test.py", line 924, in main
2022-06-21T00:21:45.8947034Z     raise RuntimeError(err_message)
2022-06-21T00:21:45.8947289Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T00:21:46.4380040Z 
2022-06-21T00:21:46.4380590Z real	31m38.379s
2022-06-21T00:21:46.4380821Z user	32m53.566s
2022-06-21T00:21:46.4381000Z sys	0m51.911s
2022-06-21T00:21:46.4381235Z + cleanup
2022-06-21T00:21:46.4381385Z + retcode=1
2022-06-21T00:21:46.4381547Z + set +x
2022-06-21T00:21:46.4418619Z ##[error]Process completed with exit code 1.
2022-06-21T00:21:46.4453789Z Prepare all required actions
2022-06-21T00:21:46.4454091Z Getting action download info

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (2/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T01:31:49.5587450Z RuntimeError: test_meta failed!

2022-06-21T01:31:47.0449297Z FAILED (errors=2, skipped=140, expected failures=6)
2022-06-21T01:31:47.0449514Z 
2022-06-21T01:31:47.0449644Z Generating XML reports...
2022-06-21T01:31:48.4818536Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaCUDA-20220621002132.xml
2022-06-21T01:31:48.4830355Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaConverter-20220621002132.xml
2022-06-21T01:31:49.5581725Z Traceback (most recent call last):
2022-06-21T01:31:49.5582228Z   File "test/run_test.py", line 946, in <module>
2022-06-21T01:31:49.5584539Z     main()
2022-06-21T01:31:49.5584838Z   File "test/run_test.py", line 924, in main
2022-06-21T01:31:49.5587125Z     raise RuntimeError(err_message)
2022-06-21T01:31:49.5587450Z RuntimeError: test_meta failed!
2022-06-21T01:31:50.2065601Z + cleanup
2022-06-21T01:31:50.2065943Z + retcode=1
2022-06-21T01:31:50.2066157Z + set +x
2022-06-21T01:31:50.2127735Z ##[error]Process completed with exit code 1.
2022-06-21T01:31:50.2171924Z Prepare all required actions
2022-06-21T01:31:50.2172394Z Getting action download info
2022-06-21T01:31:50.4021870Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-06-21T01:31:50.4022192Z with:
2022-06-21T01:31:50.4022669Z   github-token: ***
2022-06-21T01:31:50.4023339Z env:

pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (3/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T01:02:04.3490020Z RuntimeError: test_binary_ufuncs failed!

2022-06-21T01:02:03.9287640Z 
2022-06-21T01:02:03.9287773Z FAILED (errors=2, skipped=898, expected failures=20)
2022-06-21T01:02:03.9287921Z 
2022-06-21T01:02:03.9288006Z Generating XML reports...
2022-06-21T01:02:03.9288371Z Generated XML report: test-reports\python-unittest\test_binary_ufuncs\TEST-TestBinaryUfuncsCPU-20220621010051.xml
2022-06-21T01:02:04.3488773Z Traceback (most recent call last):
2022-06-21T01:02:04.3489129Z   File "run_test.py", line 946, in <module>
2022-06-21T01:02:04.3489332Z     main()
2022-06-21T01:02:04.3489561Z   File "run_test.py", line 924, in main
2022-06-21T01:02:04.3489796Z     raise RuntimeError(err_message)
2022-06-21T01:02:04.3490020Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T01:02:04.6064311Z 
2022-06-21T01:02:04.6064741Z (base) C:\actions-runner\_work\pytorch\pytorch\test>if ERRORLEVEL 1 goto fail 
2022-06-21T01:02:04.6066486Z 
2022-06-21T01:02:04.6066708Z (base) C:\actions-runner\_work\pytorch\pytorch\test>exit /b 1 
2022-06-21T01:02:04.6093607Z + cleanup
2022-06-21T01:02:04.6093807Z + retcode=1
2022-06-21T01:02:04.6093965Z + set +x
2022-06-21T01:02:04.6129903Z ##[error]Process completed with exit code 1.
2022-06-21T01:02:04.6264533Z Prepare all required actions
2022-06-21T01:02:04.6265048Z Getting action download info

pull / linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge) (4/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T00:38:59.9281118Z RuntimeError: test_binary_ufuncs failed!

2022-06-21T00:38:58.7641785Z 
2022-06-21T00:38:58.7641902Z FAILED (errors=2, skipped=898, expected failures=20)
2022-06-21T00:38:58.7642048Z 
2022-06-21T00:38:58.7642119Z Generating XML reports...
2022-06-21T00:38:59.5238465Z Generated XML report: test-reports/python-unittest/test_binary_ufuncs/TEST-TestBinaryUfuncsCPU-20220621003755.xml
2022-06-21T00:38:59.9276375Z Traceback (most recent call last):
2022-06-21T00:38:59.9276827Z   File "test/run_test.py", line 946, in <module>
2022-06-21T00:38:59.9278728Z     main()
2022-06-21T00:38:59.9279081Z   File "test/run_test.py", line 924, in main
2022-06-21T00:38:59.9280833Z     raise RuntimeError(err_message)
2022-06-21T00:38:59.9281118Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T00:39:00.1894520Z 
2022-06-21T00:39:00.1894832Z real	42m53.571s
2022-06-21T00:39:00.1895091Z user	73m14.265s
2022-06-21T00:39:00.1895270Z sys	1m46.341s
2022-06-21T00:39:00.1895426Z + cleanup
2022-06-21T00:39:00.1895585Z + retcode=1
2022-06-21T00:39:00.1895949Z + set +x
2022-06-21T00:39:00.1934358Z ##[error]Process completed with exit code 1.
2022-06-21T00:39:00.1965673Z Prepare all required actions
2022-06-21T00:39:00.1966015Z Getting action download info

pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (5/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T00:27:26.3797995Z RuntimeError: test_binary_ufuncs failed!

2022-06-21T00:27:25.2547532Z 
2022-06-21T00:27:25.2547648Z FAILED (errors=2, skipped=898, expected failures=20)
2022-06-21T00:27:25.2547792Z 
2022-06-21T00:27:25.2547882Z Generating XML reports...
2022-06-21T00:27:25.9999577Z Generated XML report: test-reports/python-unittest/test_binary_ufuncs/TEST-TestBinaryUfuncsCPU-20220621002621.xml
2022-06-21T00:27:26.3775760Z Traceback (most recent call last):
2022-06-21T00:27:26.3776165Z   File "test/run_test.py", line 946, in <module>
2022-06-21T00:27:26.3795719Z     main()
2022-06-21T00:27:26.3796111Z   File "test/run_test.py", line 924, in main
2022-06-21T00:27:26.3797562Z     raise RuntimeError(err_message)
2022-06-21T00:27:26.3797995Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T00:27:26.6146893Z 
2022-06-21T00:27:26.6147139Z real	34m31.004s
2022-06-21T00:27:26.6147441Z user	80m2.522s
2022-06-21T00:27:26.6147690Z sys	4m52.167s
2022-06-21T00:27:26.6147861Z + cleanup
2022-06-21T00:27:26.6148025Z + retcode=1
2022-06-21T00:27:26.6148349Z + set +x
2022-06-21T00:27:26.6188435Z ##[error]Process completed with exit code 1.
2022-06-21T00:27:26.6243226Z Prepare all required actions
2022-06-21T00:27:26.6243642Z Getting action download info

pull / linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge) (6/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T00:33:54.6543028Z RuntimeError: test_binary_ufuncs failed!

2022-06-21T00:33:53.4509666Z 
2022-06-21T00:33:53.4509781Z FAILED (errors=2, skipped=898, expected failures=20)
2022-06-21T00:33:53.4509928Z 
2022-06-21T00:33:53.4510011Z Generating XML reports...
2022-06-21T00:33:54.2199225Z Generated XML report: test-reports/python-unittest/test_binary_ufuncs/TEST-TestBinaryUfuncsCPU-20220621003206.xml
2022-06-21T00:33:54.6527169Z Traceback (most recent call last):
2022-06-21T00:33:54.6527464Z   File "test/run_test.py", line 946, in <module>
2022-06-21T00:33:54.6540225Z     main()
2022-06-21T00:33:54.6540527Z   File "test/run_test.py", line 924, in main
2022-06-21T00:33:54.6542727Z     raise RuntimeError(err_message)
2022-06-21T00:33:54.6543028Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T00:33:54.9271752Z 
2022-06-21T00:33:54.9272145Z real	40m53.699s
2022-06-21T00:33:54.9272488Z user	99m4.452s
2022-06-21T00:33:54.9282463Z sys	4m41.819s
2022-06-21T00:33:54.9282849Z + cleanup
2022-06-21T00:33:54.9283006Z + retcode=1
2022-06-21T00:33:54.9283174Z + set +x
2022-06-21T00:33:54.9312811Z ##[error]Process completed with exit code 1.
2022-06-21T00:33:54.9346664Z Prepare all required actions
2022-06-21T00:33:54.9346955Z Getting action download info

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu) (7/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T01:13:39.6576831Z RuntimeError: test_binary_ufuncs failed!

2022-06-21T01:13:38.1251628Z 
2022-06-21T01:13:38.1251800Z FAILED (errors=2, skipped=647, expected failures=27)
2022-06-21T01:13:38.1252015Z 
2022-06-21T01:13:38.1252122Z Generating XML reports...
2022-06-21T01:13:39.1116069Z Generated XML report: test-reports/python-unittest/test_binary_ufuncs/TEST-TestBinaryUfuncsCUDA-20220621002415.xml
2022-06-21T01:13:39.6571309Z Traceback (most recent call last):
2022-06-21T01:13:39.6571750Z   File "test/run_test.py", line 946, in <module>
2022-06-21T01:13:39.6573906Z     main()
2022-06-21T01:13:39.6574493Z   File "test/run_test.py", line 924, in main
2022-06-21T01:13:39.6576475Z     raise RuntimeError(err_message)
2022-06-21T01:13:39.6576831Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T01:13:40.2548002Z + cleanup
2022-06-21T01:13:40.2548273Z + retcode=1
2022-06-21T01:13:40.2548799Z + set +x
2022-06-21T01:13:40.2598057Z ##[error]Process completed with exit code 1.
2022-06-21T01:13:40.2638072Z Prepare all required actions
2022-06-21T01:13:40.2638514Z Getting action download info
2022-06-21T01:13:40.4621307Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-06-21T01:13:40.4621635Z with:
2022-06-21T01:13:40.4622123Z   github-token: ***
2022-06-21T01:13:40.4622350Z env:

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

kurtamohler · 2021-12-09T19:46:46Z

Hey @mruberry, I've made progress so far getting a lot of the tests to work with float32 being the default dtype. However, there are some tests defined in torch/testing/_internal/common_nn.py where the reference functions have too much floating point error when they use float32, making them fail. In these cases, I think there are two possible solutions.

One option is to try to modify the reference functions to decrease the error, but that probably isn't always guaranteed to be possible for every case and it would require a good understanding of each function.

The other option is to just go back to using float64 for the tests that fail with due to precision. This is a much easier option. Would that be alright with you?

EDIT: I suppose a third option is to increase the error tolerance of some of these. But in cases where the error is large, this is probably not a good idea

Another detail is that test_nn.py tests use gradcheck, which tends to be unstable for float32. So we might just have to keep using float64 for many of the tests in torch/testing/_internal/common_nn.py because of this.

mruberry · 2021-12-10T16:31:49Z

Hey @mruberry, I've made progress so far getting a lot of the tests to work with float32 being the default dtype. However, there are some tests defined in torch/testing/_internal/common_nn.py where the reference functions have too much floating point error when they use float32, making them fail. In these cases, I think there are two possible solutions.

One option is to try to modify the reference functions to decrease the error, but that probably isn't always guaranteed to be possible for every case and it would require a good understanding of each function.

The other option is to just go back to using float64 for the tests that fail with due to precision. This is a much easier option. Would that be alright with you?

Yes! We should just add a comment when we do it.

EDIT: I suppose a third option is to increase the error tolerance of some of these. But in cases where the error is large, this is probably not a good idea

Another detail is that test_nn.py tests use gradcheck, which tends to be unstable for float32. So we might just have to keep using float64 for many of the tests in torch/testing/_internal/common_nn.py because of this.

Yes for gradcheck we should always be in double, you're right.

cc @jbschlosser who's updating nn tests. Keeping them working is the priority as long as we call out where they're using double more explicitly.

jbschlosser · 2021-12-10T16:52:23Z

Hey @kurtamohler, I'm in the process of migrating the old torch/testing/_internal/common_nn.py module tests to more understandable versions in test/test_modules.py. Out of curiosity, have you seen any problems with the tests defined there?

kurtamohler · 2021-12-10T19:06:21Z

Hey @kurtamohler, I'm in the process of migrating the old torch/testing/_internal/common_nn.py module tests to more understandable versions in test/test_modules.py. Out of curiosity, have you seen any problems with the tests defined there?

test/test_modules.py isn't giving me any problems. The only remaining problem I know about is that some tests in torch/testing/_internal/common_nn.py need to explicitly initialize with doubles to fix gradcheck errors.

kurtamohler · 2021-12-14T03:51:53Z

So, it turns out that if we remove the set_default_dtype call from test_nn.py, a couple hundred tests in torch/testing/_internal/common_nn.py will need to be changed to explicitly be torch.double. Is it worth the effort for test_nn.py?

Just to be clear, test_ops.py and test_jit.py require much fewer edits, so it's definitely a good idea to keep going with those test suites.

mruberry · 2021-12-14T04:47:12Z

So, it turns out that if we remove the set_default_dtype call from test_nn.py, a couple hundred tests in torch/testing/_internal/common_nn.py will need to be changed to explicitly be torch.double. Is it worth the effort for test_nn.py?

Just to be clear, test_ops.py and test_jit.py require much fewer edits, so it's definitely a good idea to keep going with those test suites.

This PR is already great -- what if we just explicitly set the default dtype to double in test_nn.py and file a follow-up issue (with a comment in the code linking the issue) for a "better engineering" TODO. That way we can get this PR merged, and we'll still have an issue to track further test_nn.py changes (which @jbschlosser) is already working on.

How does that sound @kurtamohler? At least that way test_ops.py and test_jit.py won't have the default double dtype, and with asserts to protect them they won't regress again. Plus test_nn.py will have a clear TODO.

jbschlosser · 2021-12-15T22:34:56Z

I'm good with that plan as well. My hope is that eventually the old stuff will just be destroyed and we won't have to make sweeping changes to it in test_nn.py.

kurtamohler · 2022-01-03T19:23:28Z

@mruberry, this is ready for another review

eellison

Wow, nice!! LGTM

mruberry

Thanks for the update, @kurtamohler! Sorry this slipped through the crackers in early January!

facebook-github-bot · 2022-10-04T01:10:45Z

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

linux-foundation-easycla · 2022-10-04T01:10:50Z

The committers listed above are authorized under a signed CLA.

✅ login: kurtamohler (5ffb8d3b0ee4791491a87b43ccc1ede35a6cf98e)

kit1980 · 2022-11-22T22:24:02Z

/easycla

kit1980 · 2022-11-22T22:26:16Z

@kurtamohler Do you still want to merge this? If yes, please resolve merge conflicts and I can take over Meta-internal verification.

628C

kurtamohler · 2022-12-06T18:34:22Z

@kit1980, thanks for the ping. Yes I would like to get this merged. I will resolve the conflicts

pytorch-bot · 2022-12-06T20:55:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/69624

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 Failures

As of commit 29eb0ef:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2022-12-07T21:13:41Z

@kit1980 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kurtamohler · 2022-12-15T19:12:01Z

@kit1980 , I think this is ready for Meta-internal verification. The remaining CI failure is upstream

facebook-github-bot · 2022-12-15T19:17:47Z

@kit1980 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-01-14T00:38:30Z

@kit1980 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pytorchmergebot · 2023-01-14T00:39:12Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-01-14T00:39:16Z

Tried to rebase and push PR #69624, but it was already up to date

kit1980 · 2023-01-19T00:49:34Z

@kurtamohler I see one relevant Meta-internal test failure:

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "<string>", line 3, in the_method

def the_method(i0, i1, i2):
    return torch.nn.functional.bilinear(i0, i1, i2)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: expected scalar type Double but found Float

This is a generated test. Not sure why we don't have similar on GitHub or if it's even a good test.
Is this something you can address on your end or I can dig deeper?

kurtamohler · 2023-01-23T20:11:27Z

Thanks for the update @kit1980. I'm guessing that's something that will have to be fixed internally. It seems like the arguments to bilinear are not all the same type. For instance, I get the same error with this:

>>> torch.nn.functional.bilinear(torch.randn(3).double(), torch.randn(5), torch.randn(1, 3, 5))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: expected scalar type Double but found Float

kit1980 · 2023-01-23T20:52:27Z

@kurtamohler Actually lots of tests failing on GitHub with what looks like the same cause:
https://github.com/pytorch/pytorch/actions/runs/3914201447/jobs/6809694610

RuntimeError: mixed dtype (CPU): expect parameter to have scalar type of Float

RuntimeError: expected scalar type Float but found Double

kurtamohler · 2023-01-23T22:29:32Z

True, it's possible that fixing those errors will fix the internal ones. I've figured out how to reproduce some of the errors in my environment, now I'll work on fixing them

github-actions · 2023-03-24T22:33:50Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

pytorch-probot bot added the ciflow/default label Dec 8, 2021

facebook-github-bot added the cla signed label Dec 8, 2021

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Dec 8, 2021

kurtamohler mentioned this pull request Dec 8, 2021

Remove set_default_dtype call in jit_metaprogramming_utils.py (old) #69171

Closed

pytorchbot added the open source label Dec 8, 2021

kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch 2 times, most recently from 80ec3be to 3c40e37 Compare December 9, 2021 00:11

kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch from 3c40e37 to 117e312 Compare December 9, 2021 23:46

kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch 3 times, most recently from 72517cc to aba1eef Compare December 15, 2021 18:06

kurtamohler marked this pull request as ready for review December 15, 2021 18:06

kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch 2 times, most recently from e046bed to 9b8ca2d Compare January 1, 2022 01:20

anjali411 requested a review from eellison January 25, 2022 20:44

anjali411 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 25, 2022

kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch from 9b8ca2d to ce5d813 Compare January 25, 2022 22:33

eellison approved these changes Jan 25, 2022

View reviewed changes

mruberry self-requested a review January 26, 2022 05:20

mruberry approved these changes Jan 26, 2022

View reviewed changes

kurtamohler removed the Stale label Aug 25, 2022

kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch from 5ffb8d3 to 839021c Compare December 6, 2022 20:55

kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch from 839021c to 678fbc5 Compare December 6, 2022 22:28

kit1980 self-assigned this Dec 8, 2022

kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch 2 times, most recently from b10ff10 to 5f982ef Compare December 15, 2022 00:45

Remove set_default_dtype call in jit_metaprogramming_utils.py

29eb0ef

kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch from 5f982ef to 29eb0ef Compare January 13, 2023 19:10

F438

kurtamohler added the with-ssh label Jan 13, 2023

kit1980 added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Jan 19, 2023

github-actions bot added the Stale label Mar 24, 2023

github-actions bot closed this May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove set_default_dtype call in jit_metaprogramming_utils.py #69624

Remove set_default_dtype call in jit_metaprogramming_utils.py #69624

⚛️ CI Flow

🕵️ 7 new failures recognized by patterns

pull / linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge) (1/7)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (2/7)

pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (3/7)

pull / linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge) (4/7)

pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (5/7)

pull / linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge) (6/7)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu) (7/7)

Remove set_default_dtype call in jit_metaprogramming_utils.py #69624

Remove set_default_dtype call in jit_metaprogramming_utils.py #69624

Conversation

⚛️ CI Flow

🔗 Helpful links

❌ 7 New Failures

🕵️ 7 new failures recognized by patterns

pull / linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge) (1/7)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (2/7)

pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (3/7)

pull / linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge) (4/7)

pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (5/7)

pull / linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge) (6/7)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu) (7/7)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/69624

❌ 5 Failures