Improved speed of frobenous norm for non-complex dtype #30871

dylanbespalko · 2019-12-06T16:32:56Z

In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for CUDA complex numbers is here: pytorch-cuda-strided-complex extension

Changes:
[x] Fixed performance issue raise in #30704 so that non-complex numbers do not call conj() and real().
[x] Fixed tensor_to_numpy() conversion likely broken by a checkBackend() in #27064.
[x] Fixed some ReduceOps and TensorCompare Ops that recently added a checkBackend().
- checkBackend() is replaced with a device type check and a layout check.
- This ensures the ComplexCPU Type ID is supported.
[x] Added AVX support for complex exp(), as requested in #755

…t ComplexCPU type

rgommers · 2019-12-07T21:48:59Z

Fixed tensor_to_numpy() conversion likely broken by a checkBackend() in #27064.

I think you meant gh-30281 here? Cc @ngoldbaum for this change

ngoldbaum · 2019-12-07T22:39:55Z

FWIW, here’s the deeplink to the relevant change from #30281: https://github.com/pytorch/pytorch/pull/30281/files#diff-d9318428a98d4f4d86210a8f4a736ebd.

Is the output of tensor.type().backend() not always the same as tensor.options().backend()? That would indicate deeper issues wrt the changes I made in #30281.

ezyang · 2019-12-08T21:52:42Z

@dylanbespalko Heads up that I will be out of country for three weeks so you'll need to find someone else to merge this PR.

ezyang · 2019-12-08T21:53:51Z

torch/csrc/utils/tensor_numpy.cpp

+  if (tensor.device().type() != DeviceType::CPU) {
+      throw TypeError(
+        "can't convert %s device type tensor to numpy. Use Tensor.cpu() to "
+        "copy the tensor to host memory first.", tensor.device().type());


Wow, does this really do printf style formatting strings? News to me.

dylanbespalko · 2019-12-08T22:17:18Z

Is the output of tensor.type().backend() not always the same as tensor.options().backend()? That would indicate deeper issues wrt the changes I made in #30281.

My limited understanding is that Backend dispatch was replaced with a DeviceType X Layout X Dtype multi-layered kernel dispatch.. I think checkBackend(Backend::CPU) corresponds to DeviceType::CPU, Layout::Strided, DType:(All internally supported dtypes), but I could be wrong.

Anyways, the backend checks are supposed to migrate into DeviceType, Layout, Dtype checks as described in PyTorch Internals.

dylanbespalko · 2019-12-08T22:18:11Z

@ezyang,

Hopefully you are going on a much needed vacation. Happy holidays.

dylanbespalko · 2019-12-20T18:39:28Z

@VitalyFedyunin, @cpuhrsch, @gchanan,

Hey Guys, @ezyang is away on vacation. Can you forward me to someone that can merge this PR?

Pulling changes from master and re-running CI.

cc @ MuawizChaudhary

kostmo · 2019-12-20T20:24:54Z

💊 CircleCI build failures summary and rem 8000 ediations

As of commit 221bc7d:

1/1 failures introduced in this PR

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakage:

pytorch_linux_xenial_cuda10_1_cudnn7_py3_nogpu_test (1/1)

Step: "Test" (full log | pattern match details)

Jan 27 17:32:20 RuntimeError: test_quantization failed!

Jan 27 17:32:20 Ran 36 tests in 56.980s 
Jan 27 17:32:20  
Jan 27 17:32:20 FAILED (errors=1, skipped=1) 
Jan 27 17:32:20  
Jan 27 17:32:20 Generating XML reports... 
Jan 27 17:32:20 Traceback (most recent call last): 
Jan 27 17:32:20   File "test/run_test.py", line 456, in <module> 
Jan 27 17:32:20     main() 
Jan 27 17:32:20   File "test/run_test.py", line 449, in main 
Jan 27 17:32:20     raise RuntimeError(message) 
Jan 27 17:32:20 RuntimeError: test_quantization failed! 
Jan 27 17:32:20 + cleanup 
Jan 27 17:32:20 + retcode=1 
Jan 27 17:32:20 + set +x 
Jan 27 17:32:20 =================== sccache compilation log =================== 
Jan 27 17:32:20 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Jan 27 17:32:20 Compile requests                  7 
Jan 27 17:32:20 Compile requests executed         6 
Jan 27 17:32:20 Cache hits                        0 
Jan 27 17:32:20 Cache misses                      6 
Jan 27 17:32:20 Cache timeouts                    0

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 2 times.

facebook-github-bot

@gchanan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

dylanbespalko · 2020-01-27T18:15:57Z

@ezyang,

I'm back from my FPGA adventure. I am looking to split my time between the CPU and FPGA for the next while and I plan to improve my CUDA coverage when I get a new PC.

Let me know if this PR is too old.

ezyang · 2020-01-28T03:09:08Z

Nope, this pr is good, just gotta land it. Are you gonna blog about your FPGA adventures? :)

dylanbespalko · 2020-01-28T15:02:14Z

@ezyang,

Yes, I think I can write a good blog. I'll let you know when I post it.

Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for CUDA complex numbers is here: [pytorch-cuda-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cuda-strided-complex) Changes: [x] Fixed performance issue raise in pytorch#30704 so that non-complex numbers do not call `conj()` and `real()`. [x] Fixed tensor_to_numpy() conversion likely broken by a `checkBackend()` in pytorch#27064. [x] Fixed some ReduceOps and TensorCompare Ops that recently added a `checkBackend()`. - `checkBackend()` is replaced with a device type check and a layout check. - This ensures the ComplexCPU Type ID is supported. [x] Added AVX support for complex `exp()`, as requested in pytorch#755 Pull Request resolved: pytorch#30871 Differential Revision: D19200726 Pulled By: ezyang fbshipit-source-id: d7e1be0b0a89c5d6e5f4a68ce5fcd2adc5b88277

facebook-github-bot · 2020-01-30T07:14:38Z

@ezyang merged this pull request in 2471ddc.

Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for CUDA complex numbers is here: [pytorch-cuda-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cuda-strided-complex) Changes: [x] Fixed performance issue raise in pytorch#30704 so that non-complex numbers do not call `conj()` and `real()`. [x] Fixed tensor_to_numpy() conversion likely broken by a `checkBackend()` in pytorch#27064. [x] Fixed some ReduceOps and TensorCompare Ops that recently added a `checkBackend()`. - `checkBackend()` is replaced with a device type check and a layout check. - This ensures the ComplexCPU Type ID is supported. [x] Added AVX support for complex `exp()`, as requested in pytorch#755 Pull Request resolved: pytorch#30871 Differential Revision: D19200726 Pulled By: ezyang fbshipit-source-id: d7e1be0b0a89c5d6e5f4a68ce5fcd2adc5b88277

dylanbespalko and others added 3 commits December 6, 2019 09:29

Improved speed of frobenous norm for non-complex dtype

4a849bc

Fixed tensor_to_numpy() conversion that didn't support ComplexCPU

fe8aad6

Converted Tensor Backend Checks to DeviceType/Layout Checks to suppor…

70daab5

…t ComplexCPU type

dylanbespalko mentioned this pull request Dec 7, 2019

Integrating complex tensors #755

Closed

10 tasks

Added AVX support for complex exp()

5ef6220

ezyang reviewed Dec 8, 2019

View reviewed changes

ezyang approved these changes Dec 8, 2019

View reviewed changes

Merge branch 'master' of github.com:pytorch/pytorch into norm_fro_fix

c803b06

facebook-github-bot reviewed Dec 20, 2019

View reviewed changes

pytorchbot added the open source label Jan 6, 2020

Merge remote-tracking branch 'origin/master' into norm_fro_fix

221bc7d

facebook-github-bot reviewed Jan 27, 2020

View reviewed changes

facebook-github-bot closed this in 2471ddc Jan 29, 2020

facebook-github-bot added the merged label Jan 30, 2020

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved speed of frobenous norm for non-complex dtype #30871

Improved speed of frobenous norm for non-complex dtype #30871

Improved speed of frobenous norm for non-complex dtype #30871

Improved speed of frobenous norm for non-complex dtype #30871

Conversation

Choose a reason for hiding this comment

💊 CircleCI build failures summary and rem 8000 ediations

Detailed failure analysis

🕵️ 1 new failure recognized by patterns

pytorch_linux_xenial_cuda10_1_cudnn7_py3_nogpu_test (1/1)

Choose a reason for hiding this comment

Choose a reason for hiding this comment