8000 Remove set_default_dtype call in jit_metaprogramming_utils.py by kurtamohler · Pull Request #69624 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Remove set_default_dtype call in jit_metaprogramming_utils.py #69624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

kurtamohler
Copy link
Collaborator
@kurtamohler kurtamohler commented Dec 8, 2021

@pytorch-probot
Copy link
pytorch-probot bot commented Dec 8, 2021
CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/kurtam 8000 ohler/pytorch/blob/9b8ca2dce1205a8056960dc43cf7e19aed1f8a08/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor
facebook-github-bot commented Dec 8, 2021

🔗 Helpful links

❌ 7 New Failures

As of commit 5ffb8d3b0e (more details on the Dr. CI page):

Expand to see more
  • 7/7 failures introduced in this PR

🕵️ 7 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge) (1/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T00:21:45.8947289Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T00:21:44.3149795Z 
2022-06-21T00:21:44.3149982Z FAILED (errors=2, skipped=1421, expected failures=12)
2022-06-21T00:21:44.3150213Z 
2022-06-21T00:21:44.3150347Z Generating XML reports...
2022-06-21T00:21:45.1951681Z Generated XML report: test-reports/python-unittest/test_binary_ufuncs/TEST-TestBinaryUfuncsCPU-20220621001817.xml
2022-06-21T00:21:45.8939585Z Traceback (most recent call last):
2022-06-21T00:21:45.8939874Z   File "test/run_test.py", line 946, in <module>
2022-06-21T00:21:45.8943333Z     main()
2022-06-21T00:21:45.8943528Z   File "test/run_test.py", line 924, in main
2022-06-21T00:21:45.8947034Z     raise RuntimeError(err_message)
2022-06-21T00:21:45.8947289Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T00:21:46.4380040Z 
2022-06-21T00:21:46.4380590Z real	31m38.379s
2022-06-21T00:21:46.4380821Z user	32m53.566s
2022-06-21T00:21:46.4381000Z sys	0m51.911s
2022-06-21T00:21:46.4381235Z + cleanup
2022-06-21T00:21:46.4381385Z + retcode=1
2022-06-21T00:21:46.4381547Z + set +x
2022-06-21T00:21:46.4418619Z ##[error]Process completed with exit code 1.
2022-06-21T00:21:46.4453789Z Prepare all required actions
2022-06-21T00:21:46.4454091Z Getting action download info

See GitHub Actions build pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (2/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T01:31:49.5587450Z RuntimeError: test_meta failed!
2022-06-21T01:31:47.0449297Z FAILED (errors=2, skipped=140, expected failures=6)
2022-06-21T01:31:47.0449514Z 
2022-06-21T01:31:47.0449644Z Generating XML reports...
2022-06-21T01:31:48.4818536Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaCUDA-20220621002132.xml
2022-06-21T01:31:48.4830355Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaConverter-20220621002132.xml
2022-06-21T01:31:49.5581725Z Traceback (most recent call last):
2022-06-21T01:31:49.5582228Z   File "test/run_test.py", line 946, in <module>
2022-06-21T01:31:49.5584539Z     main()
2022-06-21T01:31:49.5584838Z   File "test/run_test.py", line 924, in main
2022-06-21T01:31:49.5587125Z     raise RuntimeError(err_message)
2022-06-21T01:31:49.5587450Z RuntimeError: test_meta failed!
2022-06-21T01:31:50.2065601Z + cleanup
2022-06-21T01:31:50.2065943Z + retcode=1
2022-06-21T01:31:50.2066157Z + set +x
2022-06-21T01:31:50.2127735Z ##[error]Process completed with exit code 1.
2022-06-21T01:31:50.2171924Z Prepare all required actions
2022-06-21T01:31:50.2172394Z Getting action download info
2022-06-21T01:31:50.4021870Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-06-21T01:31:50.4022192Z with:
2022-06-21T01:31:50.4022669Z   github-token: ***
2022-06-21T01:31:50.4023339Z env:

See GitHub Actions build pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (3/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T01:02:04.3490020Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T01:02:03.9287640Z 
2022-06-21T01:02:03.9287773Z FAILED (errors=2, skipped=898, expected failures=20)
2022-06-21T01:02:03.9287921Z 
2022-06-21T01:02:03.9288006Z Generating XML reports...
2022-06-21T01:02:03.9288371Z Generated XML report: test-reports\python-unittest\test_binary_ufuncs\TEST-TestBinaryUfuncsCPU-20220621010051.xml
2022-06-21T01:02:04.3488773Z Traceback (most recent call last):
2022-06-21T01:02:04.3489129Z   File "run_test.py", line 946, in <module>
2022-06-21T01:02:04.3489332Z     main()
2022-06-21T01:02:04.3489561Z   File "run_test.py", line 924, in main
2022-06-21T01:02:04.3489796Z     raise RuntimeError(err_message)
2022-06-21T01:02:04.3490020Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T01:02:04.6064311Z 
2022-06-21T01:02:04.6064741Z (base) C:\actions-runner\_work\pytorch\pytorch\test>if ERRORLEVEL 1 goto fail 
2022-06-21T01:02:04.6066486Z 
2022-06-21T01:02:04.6066708Z (base) C:\actions-runner\_work\pytorch\pytorch\test>exit /b 1 
2022-06-21T01:02:04.6093607Z + cleanup
2022-06-21T01:02:04.6093807Z + retcode=1
2022-06-21T01:02:04.6093965Z + set +x
2022-06-21T01:02:04.6129903Z ##[error]Process completed with exit code 1.
2022-06-21T01:02:04.6264533Z Prepare all required actions
2022-06-21T01:02:04.6265048Z Getting action download info

See GitHub Actions build pull / linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge) (4/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T00:38:59.9281118Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T00:38:58.7641785Z 
2022-06-21T00:38:58.7641902Z FAILED (errors=2, skipped=898, expected failures=20)
2022-06-21T00:38:58.7642048Z 
2022-06-21T00:38:58.7642119Z Generating XML reports...
2022-06-21T00:38:59.5238465Z Generated XML report: test-reports/python-unittest/test_binary_ufuncs/TEST-TestBinaryUfuncsCPU-20220621003755.xml
2022-06-21T00:38:59.9276375Z Traceback (most recent call last):
2022-06-21T00:38:59.9276827Z   File "test/run_test.py", line 946, in <module>
2022-06-21T00:38:59.9278728Z     main()
2022-06-21T00:38:59.9279081Z   File "test/run_test.py", line 924, in main
2022-06-21T00:38:59.9280833Z     raise RuntimeError(err_message)
2022-06-21T00:38:59.9281118Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T00:39:00.1894520Z 
2022-06-21T00:39:00.1894832Z real	42m53.571s
2022-06-21T00:39:00.1895091Z user	73m14.265s
2022-06-21T00:39:00.1895270Z sys	1m46.341s
2022-06-21T00:39:00.1895426Z + cleanup
2022-06-21T00:39:00.1895585Z + retcode=1
2022-06-21T00:39:00.1895949Z + set +x
2022-06-21T00:39:00.1934358Z ##[error]Process completed with exit code 1.
2022-06-21T00:39:00.1965673Z Prepare all required actions
2022-06-21T00:39:00.1966015Z Getting action download info

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (5/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T00:27:26.3797995Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T00:27:25.2547532Z 
2022-06-21T00:27:25.2547648Z FAILED (errors=2, skipped=898, expected failures=20)
2022-06-21T00:27:25.2547792Z 
2022-06-21T00:27:25.2547882Z Generating XML reports...
2022-06-21T00:27:25.9999577Z Generated XML report: test-reports/python-unittest/test_binary_ufuncs/TEST-TestBinaryUfuncsCPU-20220621002621.xml
2022-06-21T00:27:26.3775760Z Traceback (most recent call last):
2022-06-21T00:27:26.3776165Z   File "test/run_test.py", line 946, in <module>
2022-06-21T00:27:26.3795719Z     main()
2022-06-21T00:27:26.3796111Z   File "test/run_test.py", line 924, in main
2022-06-21T00:27:26.3797562Z     raise RuntimeError(err_message)
2022-06-21T00:27:26.3797995Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T00:27:26.6146893Z 
2022-06-21T00:27:26.6147139Z real	34m31.004s
2022-06-21T00:27:26.6147441Z user	80m2.522s
2022-06-21T00:27:26.6147690Z sys	4m52.167s
2022-06-21T00:27:26.6147861Z + cleanup
2022-06-21T00:27:26.6148025Z + retcode=1
2022-06-21T00:27:26.6148349Z + set +x
2022-06-21T00:27:26.6188435Z ##[error]Process completed with exit code 1.
2022-06-21T00:27:26.6243226Z Prepare all required actions
2022-06-21T00:27:26.6243642Z Getting action download info

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge) (6/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T00:33:54.6543028Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T00:33:53.4509666Z 
2022-06-21T00:33:53.4509781Z FAILED (errors=2, skipped=898, expected failures=20)
2022-06-21T00:33:53.4509928Z 
2022-06-21T00:33:53.4510011Z Generating XML reports...
2022-06-21T00:33:54.2199225Z Generated XML report: test-reports/python-unittest/test_binary_ufuncs/TEST-TestBinaryUfuncsCPU-20220621003206.xml
2022-06-21T00:33:54.6527169Z Traceback (most recent call last):
2022-06-21T00:33:54.6527464Z   File "test/run_test.py", line 946, in <module>
2022-06-21T00:33:54.6540225Z     main()
2022-06-21T00:33:54.6540527Z   File "test/run_test.py", line 924, in main
2022-06-21T00:33:54.6542727Z     raise RuntimeError(err_message)
2022-06-21T00:33:54.6543028Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T00:33:54.9271752Z 
2022-06-21T00:33:54.9272145Z real	40m53.699s
2022-06-21T00:33:54.9272488Z user	99m4.452s
2022-06-21T00:33:54.9282463Z sys	4m41.819s
2022-06-21T00:33:54.9282849Z + cleanup
2022-06-21T00:33:54.9283006Z + retcode=1
2022-06-21T00:33:54.9283174Z + set +x
2022-06-21T00:33:54.9312811Z ##[error]Process completed with exit code 1.
2022-06-21T00:33:54.9346664Z Prepare all required actions
2022-06-21T00:33:54.9346955Z Getting action download info

See GitHub Actions build pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu) (7/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-21T01:13:39.6576831Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T01:13:38.1251628Z 
2022-06-21T01:13:38.1251800Z FAILED (errors=2, skipped=647, expected failures=27)
2022-06-21T01:13:38.1252015Z 
2022-06-21T01:13:38.1252122Z Generating XML reports...
2022-06-21T01:13:39.1116069Z Generated XML report: test-reports/python-unittest/test_binary_ufuncs/TEST-TestBinaryUfuncsCUDA-20220621002415.xml
2022-06-21T01:13:39.6571309Z Traceback (most recent call last):
2022-06-21T01:13:39.6571750Z   File "test/run_test.py", line 946, in <module>
2022-06-21T01:13:39.6573906Z     main()
2022-06-21T01:13:39.6574493Z   File "test/run_test.py", line 924, in main
2022-06-21T01:13:39.6576475Z     raise RuntimeError(err_message)
2022-06-21T01:13:39.6576831Z RuntimeError: test_binary_ufuncs failed!
2022-06-21T01:13:40.2548002Z + cleanup
2022-06-21T01:13:40.2548273Z + retcode=1
2022-06-21T01:13:40.2548799Z + set +x
2022-06-21T01:13:40.2598057Z ##[error]Process completed with exit code 1.
2022-06-21T01:13:40.2638072Z Prepare all required actions
2022-06-21T01:13:40.2638514Z Getting action download info
2022-06-21T01:13:40.4621307Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-06-21T01:13:40.4621635Z with:
2022-06-21T01:13:40.4622123Z   github-token: ***
2022-06-21T01:13:40.4622350Z env:

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Dec 8, 2021
@kurtamohler kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch 2 times, most recently from 80ec3be to 3c40e37 Compare December 9, 2021 00:11
@kurtamohler
Copy link
Collaborator Author
kurtamohler commented Dec 9, 2021

Hey @mruberry, I've made progress so far getting a lot of the tests to work with float32 being the default dtype. However, there are some tests defined in torch/testing/_internal/common_nn.py where the reference functions have too much floating point error when they use float32, making them fail. In these cases, I think there are two possible solutions.

One option is to try to modify the reference functions to decrease the error, but that probably isn't always guaranteed to be possible for every case and it would require a good understanding of each function.

The other option is to just go back to using float64 for the tests that fail with due to precision. This is a much easier option. Would that be alright with you?

EDIT: I suppose a third option is to increase the error tolerance of some of these. But in cases where the error is large, this is probably not a good idea

Another detail is that test_nn.py tests use gradcheck, which tends to be unstable for float32. So we might just have to keep using float64 for many of the tests in torch/testing/_internal/common_nn.py because of this.

@kurtamohler kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch from 3c40e37 to 117e312 Compare December 9, 2021 23:46
@mruberry
Copy link
Collaborator

Hey @mruberry, I've made progress so far getting a lot of the tests to work with float32 being the default dtype. However, there are some tests defined in torch/testing/_internal/common_nn.py where the reference functions have too much floating point error when they use float32, making them fail. In these cases, I think there are two possible solutions.

One option is to try to modify the reference functions to decrease the error, but that probably isn't always guaranteed to be possible for every case and it would require a good understanding of each function.

The other option is to just go back to using float64 for the tests that fail with due to precision. This is a much easier option. Would that be alright with you?

Yes! We should just add a comment when we do it.

EDIT: I suppose a third option is to increase the error tolerance of some of these. But in cases where the error is large, this is probably not a good idea

Another detail is that test_nn.py tests use gradcheck, which tends to be unstable for float32. So we might just have to keep using float64 for many of the tests in torch/testing/_internal/common_nn.py because of this.

Yes for gradcheck we should always be in double, you're right.

cc @jbschlosser who's updating nn tests. Keeping them working is the priority as long as we call out where they're using double more explicitly.

@jbschlosser
Copy link
Contributor

Hey @kurtamohler, I'm in the process of migrating the old torch/testing/_internal/common_nn.py module tests to more understandable versions in test/test_modules.py. Out of curiosity, have you seen any problems with the tests defined there?

@kurtamohler
Copy link
Collaborator Author

Hey @kurtamohler, I'm in the process of migrating the old torch/testing/_internal/common_nn.py module tests to more understandable versions in test/test_modules.py. Out of curiosity, have you seen any problems with the tests defined there?

test/test_modules.py isn't giving me any problems. The only remaining problem I know about is that some tests in torch/testing/_internal/common_nn.py need to explicitly initialize with doubles to fix gradcheck errors.

@kurtamohler
Copy link
Collaborator Author

So, it turns out that if we remove the set_default_dtype call from test_nn.py, a couple hundred tests in torch/testing/_internal/common_nn.py will need to be changed to explicitly be torch.double. Is it worth the effort for test_nn.py?

Just to be clear, test_ops.py and test_jit.py require much fewer edits, so it's definitely a good idea to keep going with those test suites.

@mruberry
Copy link
Collaborator

So, it turns out that if we remove the set_default_dtype call from test_nn.py, a couple hundred tests in torch/testing/_internal/common_nn.py will need to be changed to explicitly be torch.double. Is it worth the effort for test_nn.py?

Just to be clear, test_ops.py and test_jit.py require much fewer edits, so it's definitely a good idea to keep going with those test suites.

This PR is already great -- what if we just explicitly set the default dtype to double in test_nn.py and file a follow-up issue (with a comment in the code linking the issue) for a "better engineering" TODO. That way we can get this PR merged, and we'll still have an issue to track further test_nn.py changes (which @jbschlosser) is already working on.

How does that sound @kurtamohler? At least that way test_ops.py and test_jit.py won't have the default double dtype, and with asserts to protect them they won't regress again. Plus test_nn.py will have a clear TODO.

@kurtamohler kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch 3 times, most recently from 72517cc to aba1eef Compare December 15, 2021 18:06
@kurtamohler kurtamohler marked this pull request as ready for review December 15, 2021 18:06
@jbschlosser
Copy link
Contributor

I'm good with that plan as well. My hope is that eventually the old stuff will just be destroyed and we won't have to make sweeping changes to it in test_nn.py.

@kurtamohler kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch 2 times, most recently from e046bed to 9b8ca2d Compare January 1, 2022 01:20
@kurtamohler
Copy link
Collaborator Author

@mruberry, this is ready for another review

@anjali411 anjali411 requested a review from eellison January 25, 2022 20:44
@anjali411 anjali411 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 25, 2022
@kurtamohler kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch from 9b8ca2d to ce5d813 Compare January 25, 2022 22:33
Copy link
Contributor
@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, nice!! LGTM

@mruberry mruberry self-requested a review January 26, 2022 05:20
Copy link
Collaborator
@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, @kurtamohler! Sorry this slipped through the crackers in early January!

@kurtamohler kurtamohler removed the Stale label Aug 25, 2022
@facebook-github-bot
Copy link
Contributor

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

@linux-foundation-easycla
Copy link
linux-foundation-easycla bot commented Oct 4, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: kurtamohler (5ffb8d3b0ee4791491a87b43ccc1ede35a6cf98e)

@kit1980
Copy link
Contributor
kit1980 commented Nov 22, 2022

/easycla

@kit1980
Copy link
Contributor
kit1980 commented Nov 22, 2022

@kurtamohler Do you still want to merge this? If yes, please resolve merge conflicts and I can take over Meta-internal verification.

628C

@kurtamohler
Copy link
Collaborator Author

@kit1980, thanks for the ping. Yes I would like to get this merged. I will resolve the conflicts

@kurtamohler kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch from 5ffb8d3 to 839021c Compare December 6, 2022 20:55
@pytorch-bot
Copy link
pytorch-bot bot commented Dec 6, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/69624

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 Failures

As of commit 29eb0ef:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@kurtamohler kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch from 839021c to 678fbc5 Compare December 6, 2022 22:28
@facebook-github-bot
Copy link
Contributor

@kit1980 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kit1980 kit1980 self-assigned this Dec 8, 2022
@kurtamohler kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch 2 times, most recently from b10ff10 to 5f982ef Compare December 15, 2022 00:45
@kurtamohler
Copy link
Collaborator Author

@kit1980 , I think this is ready for Meta-internal verification. The remaining CI failure is upstream

@facebook-github-bot
Copy link
Contributor

@kit1980 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kurtamohler kurtamohler force-pushed the jit-utils-dtype-fix-alt-68972 branch from 5f982ef to 29eb0ef Compare January 13, 2023 19:10
F438
@facebook-github-bot
Copy link
Contributor

@kit1980 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #69624, but it was already up to date

@kit1980
Copy link
Contributor
kit1980 commented Jan 19, 2023

@kurtamohler I see one relevant Meta-internal test failure:

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "<string>", line 3, in the_method

def the_method(i0, i1, i2):
    return torch.nn.functional.bilinear(i0, i1, i2)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: expected scalar type Double but found Float

This is a generated test. Not sure why we don't have similar on GitHub or if it's even a good test.
Is this something you can address on your end or I can dig deeper?

@kit1980 kit1980 added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Jan 19, 2023
@kurtamohler
Copy link
Collaborator Author

Thanks for the update @kit1980. I'm guessing that's something that will have to be fixed internally. It seems like the arguments to bilinear are not all the same type. For instance, I get the same error with this:

>>> torch.nn.functional.bilinear(torch.randn(3).double(), torch.randn(5), torch.randn(1, 3, 5))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: expected scalar type Double but found Float

@kit1980
Copy link
Contributor
kit1980 commented Jan 23, 2023

@kurtamohler Actually lots of tests failing on GitHub with what looks like the same cause:
https://github.com/pytorch/pytorch/actions/runs/3914201447/jobs/6809694610

RuntimeError: mixed dtype (CPU): expect parameter to have scalar type of Float

RuntimeError: expected scalar type Float but found Double

@kurtamohler
Copy link
Collaborator Author

True, it's possible that fixing those errors will fix the internal ones. I've figured out how to reproduce some of the errors in my environment, now I'll work on fixing them

@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Mar 24, 2023
@github-actions github-actions bot closed this May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request cla signed oncall: jit Add this issue/PR to JIT oncall triage queue open source Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module with-ssh
Projects
None yet
Development

Successfully merging this pull request may close these issues.

default dtype is incorrectly set to torch.double when running test_ops.py
10 participants
0