8000 Improved speed of frobenous norm for non-complex dtype by dylanbespalko · Pull Request #30871 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Improved speed of frobenous norm for non-complex dtype #30871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

dylanbespalko
Copy link
Contributor
@dylanbespalko dylanbespalko commented Dec 6, 2019

In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for CUDA complex numbers is here: pytorch-cuda-strided-complex extension

Changes:
[x] Fixed performance issue raise in #30704 so that non-complex numbers do not call conj() and real().
[x] Fixed tensor_to_numpy() conversion likely broken by a checkBackend() in #27064.
[x] Fixed some ReduceOps and TensorCompare Ops that recently added a checkBackend().
- checkBackend() is replaced with a device type check and a layout check.
- This ensures the ComplexCPU Type ID is supported.
[x] Added AVX support for complex exp(), as requested in #755

@dylanbespalko dylanbespalko mentioned this pull request Dec 7, 2019
10 tasks
@rgommers
Copy link
Collaborator
rgommers commented Dec 7, 2019

Fixed tensor_to_numpy() conversion likely broken by a checkBackend() in #27064.

I think you meant gh-30281 here? Cc @ngoldbaum for this change

@ngoldbaum
Copy link
Contributor

FWIW, here’s the deeplink to the relevant change from #30281: https://github.com/pytorch/pytorch/pull/30281/files#diff-d9318428a98d4f4d86210a8f4a736ebd.

Is the output of tensor.type().backend() not always the same as tensor.options().backend()? That would indicate deeper issues wrt the changes I made in #30281.

@ezyang
Copy link
Contributor
ezyang commented Dec 8, 2019

@dylanbespalko Heads up that I will be out of country for three weeks so you'll need to find someone else to merge this PR.

if (tensor.device().type() != DeviceType::CPU) {
throw TypeError(
"can't convert %s device type tensor to numpy. Use Tensor.cpu() to "
"copy the tensor to host memory first.", tensor.device().type());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, does this really do printf style formatting strings? News to me.

@dylanbespalko
Copy link
Contributor Author

Is the output of tensor.type().backend() not always the same as tensor.options().backend()? That would indicate deeper issues wrt the changes I made in #30281.

My limited understanding is that Backend dispatch was replaced with a DeviceType X Layout X Dtype multi-layered kernel dispatch.. I think checkBackend(Backend::CPU) corresponds to DeviceType::CPU, Layout::Strided, DType:(All internally supported dtypes), but I could be wrong.

Anyways, the backend checks are supposed to migrate into DeviceType, Layout, Dtype checks as described in PyTorch Internals.

@dylanbespalko
Copy link
Contributor Author

@ezyang,

Hopefully you are going on a much needed vacation. Happy holidays.

@dylanbespalko
Copy link
Contributor Author

@VitalyFedyunin, @cpuhrsch, @gchanan,

Hey Guys, @ezyang is away on vacation. Can you forward me to someone that can merge this PR?

Pulling changes from master and re-running CI.

cc @ MuawizChaudhary

@kostmo
Copy link
Member
kostmo commented Dec 20, 2019

💊 CircleCI build failures summary and rem 8000 ediations

As of commit 221bc7d:

  • 1/1 failures introduced in this PR

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakage:

See CircleCI build pytorch_linux_xenial_cuda10_1_cudnn7_py3_nogpu_test (1/1)

Step: "Test" (full log | pattern match details)

Jan 27 17:32:20 RuntimeError: test_quantization failed!
Jan 27 17:32:20 Ran 36 tests in 56.980s 
Jan 27 17:32:20  
Jan 27 17:32:20 FAILED (errors=1, skipped=1) 
Jan 27 17:32:20  
Jan 27 17:32:20 Generating XML reports... 
Jan 27 17:32:20 Traceback (most recent call last): 
Jan 27 17:32:20   File "test/run_test.py", line 456, in <module> 
Jan 27 17:32:20     main() 
Jan 27 17:32:20   File "test/run_test.py", line 449, in main 
Jan 27 17:32:20     raise RuntimeError(message) 
Jan 27 17:32:20 RuntimeError: test_quantization failed! 
Jan 27 17:32:20 + cleanup 
Jan 27 17:32:20 + retcode=1 
Jan 27 17:32:20 + set +x 
Jan 27 17:32:20 =================== sccache compilation log =================== 
Jan 27 17:32:20 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Jan 27 17:32:20 Compile requests                  7 
Jan 27 17:32:20 Compile requests executed         6 
Jan 27 17:32:20 Cache hits                        0 
Jan 27 17:32:20 Cache misses                      6 
Jan 27 17:32:20 Cache timeouts                    0 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 2 times.

Copy link
Contributor
@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gchanan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor
@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@dylanbespalko
Copy link
Contributor Author

@ezyang,

I'm back from my FPGA adventure. I am looking to split my time between the CPU and FPGA for the next while and I plan to improve my CUDA coverage when I get a new PC.

Let me know if this PR is too old.

@ezyang
Copy link
Contributor
ezyang commented Jan 28, 2020

Nope, this pr is good, just gotta land it. Are you gonna blog about your FPGA adventures? :)

@dylanbespalko
Copy link
Contributor Author

@ezyang,

Yes, I think I can write a good blog. I'll let you know when I post it.

wuhuikx pushed a commit to wuhuikx/pytorch that referenced this pull request Jan 30, 2020
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for CUDA complex numbers is here: [pytorch-cuda-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cuda-strided-complex)

Changes:
[x] Fixed performance issue raise in pytorch#30704 so that non-complex numbers do not call `conj()` and `real()`.
[x] Fixed tensor_to_numpy() conversion likely broken by a `checkBackend()` in pytorch#27064.
[x] Fixed some ReduceOps and TensorCompare Ops that recently added a `checkBackend()`.
    - `checkBackend()` is replaced with a device type check and a layout check.
    - This ensures the ComplexCPU Type ID is supported.
[x] Added AVX support for complex `exp()`, as requested in pytorch#755
Pull Request resolved: pytorch#30871

Differential Revision: D19200726

Pulled By: ezyang

fbshipit-source-id: d7e1be0b0a89c5d6e5f4a68ce5fcd2adc5b88277
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 2471ddc.

ttumiel pushed a commit to ttumiel/pytorch that referenced this pull request Mar 4, 2020
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for CUDA complex numbers is here: [pytorch-cuda-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cuda-strided-complex)

Changes:
[x] Fixed performance issue raise in pytorch#30704 so that non-complex numbers do not call `conj()` and `real()`.
[x] Fixed tensor_to_numpy() conversion likely broken by a `checkBackend()` in pytorch#27064.
[x] Fixed some ReduceOps and TensorCompare Ops that recently added a `checkBackend()`.
    - `checkBackend()` is replaced with a device type check and a layout check.
    - This ensures the ComplexCPU Type ID is supported.
[x] Added AVX support for complex `exp()`, as requested in pytorch#755
Pull Request resolved: pytorch#30871

Differential Revision: D19200726

Pulled By: ezyang

fbshipit-source-id: d7e1be0b0a89c5d6e5f4a68ce5fcd2adc5b88277
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0