You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Implement C++ API version of torch.nn.functional.one_hot (#27081) (#27177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27177
Add support for F::one_hot C++ function.
Test Plan:
Added 3 new tests to verify API is working
Imported from OSS
Differential Revision: D17697934
fbshipit-source-id: a8127fb87c00daa119bb92a5702bc4bbba48290d
* Refactor torch::jit::script::Module::register_* API. (#27189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27189
Conceptually, Module is just a view over ClassType and ivalue::object.
register_ methods are the only methods that are exception from this:
they provide an API not available on ClassType or object directly. This
PR ports this API to ClassType and makes Module truly just a view over
those two.
Test Plan: Imported from OSS
Differential Revision: D17703533
Pulled By: ZolotukhinM
fbshipit-source-id: 2cdb9fb486b3fb8527986483c7f34be7bd59fabf
* Add c10_experimental ops to BC check white list (#27235)
Summary:
experimental ops doesn't provide bc guarantee.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27235
Reviewed By: hl475
Differential Revision: D17723292
Pulled By: houseroad
fbshipit-source-id: 644ae34d130418a810e0f9d802fa25f6e34c5ccf
* Rename _intrinsic to intrinsic
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27194
Test Plan: Imported from OSS
Differential Revision: D17704957
Pulled By: zafartahirov
fbshipit-source-id: 46f02d129aa77c3047b2a6c606bfadd831a6b0fc
* Allow set for qconfig for dynamic_quantize
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27181
Test Plan: Imported from O
9E81
SS
Differential Revision: D17717482
Pulled By: jamesr66a
fbshipit-source-id: f3930fc87831cbdcf4390cd769c594bb13f5cd81
* Fix reprs for _intrinsic modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27184
Test Plan: Imported from OSS
Differential Revision: D17717481
Pulled By: jamesr66a
fbshipit-source-id: 4bd72bcd42191d9b21d03f5bb6698198dbffffda
* skip all rpc and dist autograd spawn tests for <PY36 (#27191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27191
skip rpc and distautograd spawns tests for <python 3.6
ghstack-source-id: 91231565
close #27157
Test Plan: unit tests
Differential Revision: D17697368
fbshipit-source-id: bb8cf1f47de41f9d350fd60afe37fece293d8680
* Add send and recv backward functions for builtin operators RPC. (#25527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25527
Master GH issue: https://github.com/pytorch/pytorch/issues/23110.
This change builds upon https://github.com/pytorch/pytorch/pull/24876 and
provides all the autograd hooks needed for a forward pass with distributed rpc
for builtin operators. This change does not address distributed rpc for python
UDFs and that will be addressed in follow up PRs.
Summary of changes:
1. Attach send autograd functions when a request is sent from the client and
response is sent from the server.
2. Attach receive autograd functions when a request is received on the server
and a response is received on the client.
3. Generate a globally unique autograd_message_id for each send/recv autograd
function pair to uniquely identify them.
ghstack-source-id: 91240466
Test Plan: unit tests.
Differential Revision: D17148077
fbshipit-source-id: 192d8a3f552ed7cc939f55dcca332965c9bd3233
* Rename jit Function to ScriptFunction
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27219
Test Plan: Imported from OSS
Differential Revision: D17715306
Pulled By: albanD
fbshipit-source-id: d11a7634dbee6a885c7177b240958e5aed2544f3
* Make cpp-backed jit classes appear as being in torch.jit
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27220
Test Plan: Imported from OSS
Differential Revision: D17715305
Pulled By: albanD
fbshipit-source-id: 574704ad23ece6da7aa2780b78867307bef523cc
* Avoid configuring ROCm if USE_CUDA is on. (#26910)
Summary:
Move the resolution of conflict between `USE_CUDA` and `USE_ROCM` to CMake as to effectuate:
- `USE_CUDA=ON` and CUDA is found, `USE_ROCM=ON` and ROCM is found --> fatal error
- Either `USE_CUDA=ON` and CUDA is found or `USE_ROCM=ON` and ROCM is found --> The respective GPU feature is ON
- Otherwise no GPU support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26910
Differential Revision: D17738652
Pulled By: ezyang
fbshipit-source-id: 8e07cc7e922e0abda24a6518119c28952276064e
* Revert "Add std::variant backport as c10::variant (#26836)" (#27277)
Summary:
This reverts commit 0cd188035a27fc38ce1e8eee205f6d47cd7650e6.
As reported by jerryzh168 and pritamdamania87, mpark::variant doesn’t compile with gcc 7.3.1 on fb devserver and throws error similar to https://github.com/mpark/variant/issues/43. (However, it doesn’t fail with gcc 7.3.1 in OSS CI, based on https://circleci.com/api/v1.1/project/github/pytorch/pytorch/2995606/output/107/0?file=true)
A plausible workaround is to upgrade devserver to devtoolset-8, but that would in turn causes CUDA build to complain:
```
/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/host_config.h:119:2: error: #error -- unsupported GNU version! gcc versions later than 7 are not supported!
#error -- unsupported GNU version! gcc versions later than 7 are not supported!
```
(Thanks pritamdamania87 for the report!)
The solution for now is to revert the mpark::variant addition, and I will find alternatives that will work with gcc 7.3.1 on fb devserver.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27277
Differential Revision: D17739804
fbshipit-source-id: ad945b3d86ab7ddbff58f4ecab95e0e1ac725ae9
* Implement LpNorm regularizer to be used on the inputs for feature importance (#26376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26376
* Create the new dense_feature_reg (FCInputLpNorm) for feature importance to be applied to the fully-connected layer for feature-importance.
Test Plan: * Unit test located in: `caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test.py`
Reviewed By: un-disclosed
Differential Revision: D17360361
fbshipit-source-id: 1a0e119eeb17199a13dfffe58b3036ea4255e301
* Provide (but skip) 3.5 job by default on all PRs. (#27293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27293
This doesn't turn on 3.5 signal, but it makes it so that [test all]
will include it if you do request it.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D17738741
Pulled By: ezyang
fbshipit-source-id: 2b1af4d7bf26fd84a593fde292d6bfa2aabc1148
* more profiler changes in C++ before enabling checkScript changes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26909
Differential Revision: D17683632
Pulled By: Krovatkin
fbshipit-source-id: 5d36c3c4cf7411c56485ef19fe59262b9f8b45b2
* Fix segfault while printing value type for an error msg in emitListComprehension
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27261
Differential Revision: D17740159
Pulled By: Krovatkin
fbshipit-source-id: 90439282aea14d8634eb41ffece5b6320d615fa7
* Factored out the default mappings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27164
Test Plan: Imported from OSS
Differential Revision: D17694475
Pulled By: zafartahirov
fbshipit-source-id: df8df5f7d66062ed35da957064a31344e1d3c961
* Add memory format argument to the `clone` operator (#27106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27106
Adds memory_format option to the `clone` operator.
Introduce new `clone` behavior if used with `input_t.clone(memory_format=torch.preserve_format)`:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.
---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.
Test Plan: Imported from OSS
Differential Revision: D17699357
Pulled By: VitalyFedyunin
fbshipit-source-id: 5ae1537c2aca1abf0bf1eec4416846129c156f66
* Extract version to version.txt (#27149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27149
Extract version to version.txt and add reading version logic to setup.py and fb/torch_version.py
ghstack-source-id: 91271883
Test Plan: N/A
Reviewed By: gchanan, ezyang
Differential Revision: D17689307
fbshipit-source-id: 21899502027cec71b63d9dc151e09ff5ff3f279d
* add AutoNonVariableTypeMode for USE_STATIC_DISPATCH on JIT->ATen path (#27274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27274
This is yet another fix to address #26764.
PR #26908 toggles NonVariableTypeMode in ATen dispatcher, which is where
USE_STATIC_DISPATCH takes place thus it's most logically sound place to do
such tweaks.
However, we observed nontrivial perf regression due to this fix. Turns out
the numel() tensor method gets called in several for-loops thus incurs ~7M
thread_local updates in a single forward call:
```
7173330 numel
558 size
416 q_scale
302 _empty_affine_quantized
288 contiguous
257 q_zero_point
216 qscheme
173 empty
110 set_
105 as_strided
104 permute
...
```
As numel() is not called from a single place so a natural workaround is to
update function_wrapper.py so that it only adds the guard on gen_namespace_function()
case and ignore the gen_tensor_method() case. But some tensor methods are actually
being called from JIT side directly (e.g. "aten::eq_" -> "(self).eq_") so the
only "band aid" left on the table is to insert guard on JIT->aten path as originally
did on #26868 - this is a simplified version of it as it doesn't hurt to extend the
NonVariableMode scope a little bit to also cover stack drop/pack calls.
On Android we only expose JIT API so we don't need worry about TensorMethods being
called directly. On iOS we don't provide a wrapper yet but we can mention this caveat
in the doc. Hopefully by the time it's widely used we can finish Variable/Tensor
unification and remove all these hacks.
Test Plan:
- Verified it runs quantized/fp32 MobileNetV2 models;
- Verified it fixes the perf regression (revert #26908 separately);
Differential Revision: D17732489
Pulled By: ljk53
fbshipit-source-id: c14ca66aebc6b6f17ad6efac7ca47f9487c98de5
* Updating submodules
Summary:
GitHub commits:
https://github.com/pytorch/fbgemm/commit/8786c0819029c076b0e28320e880ba3ac192ea8b
Test Plan: n/a
Reviewed By: zpao
fbshipit-source-id: 9c04a2ba7cc2166db0203f186ece261ca8b186dd
* Avoid calling tensor.numel() in for loops (#27298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27298
PR #26908 toggles NonVariableTypeMode in ATen dispatcher, which is where
USE_STATIC_DISPATCH takes place.
This causes an issue with numel() as it gets called through the dispatch mode and probably not getting inlined.
Also the thread local state is expensive to read/write so many times and this kills perf.
PR #27274 is another approach to fix this and has more details.
Test Plan:
Quantized mobilenetV2 perf before this change
Main run finished. Milliseconds per iter: 28.6782. Iters per second: 34.8696
Perf after this change
Main run finished. Milliseconds per iter: 22.2585. Iters per second: 44.9267
Imported from OSS
Differential Revision: D17742565
fbshipit-source-id: 43c6045cc001c46916ba339555c9d809a2537eff
* Fix circle CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27307
Test Plan: Imported from OSS
Differential Revision: D17746444
Pulled By: xta0
fbshipit-source-id: ed37f91921f1ea7db6c63ba69f04883856341c39
* Update the link for iOS demo app in README.md (#27145)
Summary:
Update the link for iOS demo app in README.md
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27145
Differential Revision: D17746591
Pulled By: xta0
fbshipit-source-id: 6f49a0daddc8b79804e1b8487ba1db3807a3f481
* Allow use cpu_serial_kernel with void-lambda (#27271)
Summary:
Currently we use CPU_tensor_apply1 to loop through the tensor in single thread and aggregate data:
```
// compute variance per input
accscalar_t var_sum = 0;
CPU_tensor_apply1<scalar_t>(in, [&] (const scalar_t& i) {
var_sum += (i - mean) * (i - mean);
});
```
and we don't have the ability to use TensorIterator for this.
```
accscalar_t var_sum = 0;
auto iter = TensorIterator::unary_op(self, self);
cpu_serial_kernel(iter, [&](scalar_t i) -> scalar_t {
var_sum += (i - mean) * (i - mean);
return a; //Unable to set value back, because self should be const
});
```
This PR should resolve this problem and allow to use void-lambda:
```
auto iter = at::TensorIterator();
iter.add_input(in);
iter.build();
accscalar_t var_sum = 0; \
at::native::cpu_serial_kernel(iter, [&](scalar_t i) -> void {
var_sum += (i - mean) * (i - mean);
});
```
In the future it make sense to change Reduction part and allow to reduce to a scalar, not just to a tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27271
Differential Revision: D17743310
Pulled By: ifedan
fbshipit-source-id: a149751f2d671aefd3ed84bd50b2c0543a63b701
* Move the CUDA implementation of log10 to ATen. (#26733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26733
Close #24587
Test Plan: Imported from OSS
Differential Revision: D17606981
Pulled By: VitalyFedyunin
fbshipit-source-id: 732f07b981287da3ca235b272b7b6f78144f8ebe
* Mention magma-cuda101 package in install instructions (#27325)
Summary:
There is a magma package for the newest CUDA verson (10.1), mention it here lest someone try to mistakenly use the version for CUDA 10.0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27325
Differential Revision: D17749535
Pulled By: soumith
fbshipit-source-id: 2d34a7af1218e6157935bfd5e03f4d2c0f00f200
* C++ API parity: TensorTest.BackwardNonScalarOutputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27314
Test Plan: Imported from OSS
Differential Revision: D17746371
Pulled By: pbelevich
fbshipit-source-id: 246fae22a60ed9a6d7b9843239b4b3391cc9dc3e
* Fix build (#27318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27318
Fix TBB build
USE_TBB=1 ATEN_THREADING=TBB python setup.py develop install --cmake
Test Plan: Imported from OSS
Differential Revision: D17747449
Pulled By: ilia-cher
fbshipit-source-id: 421f362bd10f3be34bffe86ae4f26e8f1c15f1a4
* Relax restrictions on set_num_threads (#27190)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27190
Allow set_num_threads to be called multiple times in case of TBB
parallel backend
Test Plan:
BUILD_BINARY=1 USE_TBB=1 ATEN_THREADING=TBB python setup.py develop
install --cmake
./build/bin/test_parallel
./build/bin/thread_init_test
Reviewed By: kostmo
Differential Revision: D17704236
Pulled By: ilia-cher
fbshipit-source-id: 274380795e78ba417301c5faa18c9e9d3198bd5e
* Migrate the cpu and gpu implementations of resize nearest 3D from vision to caffe2
Summary: As title. Fix the build failures in unicorn-build-restrictions as discussed in D17330625
Test Plan:
buck test mode/opt caffe2/caffe2/quantization/server:resize_nearest_3d_dnnlowp_op_test
In vision libs, no need to explicitly add dep to resize 3d op as the caffe2_cpu dep is added by default.
Reviewed By: stephenyan1231
Differential Revision: D17676082
fbshipit-source-id: c034ab67a9078f72077b396991ffb9e54e6ab40b
* Add method add_hparams to API doc (#27344)
Summary:
Adds the method `add_hparams` to `torch.utils.tensorboard` API docs. Will want to have this in PyTorch 1.3 release.
cc sanekmelnikov lanpa natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27344
Differential Revision: D17753689
Pulled By: orionr
fbshipit-source-id: cc8636e0bdcf3f434444cd29471c62105491039d
* Support interface python assignment as an attribute (#26734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26734
This PR added the python assignment for interface as an attribute in the
module, it enables any object that implicitly inheriting the specific
interface to be able to be assigned to the interface type in python.
Serialization support for interface/class assignment will be done in the
follow up PR
Test Plan: Imported from OSS
Differential Revision: D17742708
Pulled By: wanchaol
fbshipit-source-id: a0a2d8c74b60ed3fa6c05e1b0d49b7ad1abc670b
* Skip tests that use numpy if it's not present
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27165
Pulled By: driazati
Differential Revision: D17695078
fbshipit-source-id: d25c920f4c43285028537f88761d47a2c9db7b8f
* Add Python RRef as args and return value (#25499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25499
See #23110 for model parallel design details, and #26759 for the RRef
protocol. This commit add support for using RRef as Python UDF arguments
and return value. RRefs can now be shared from owner to user, from user to
owner, or from user to user.
Limitations:
1. No implicit type conversion yet. (#27099)
2. No failure handling and retry. (#26116)
3. UDF is not yet blocked until all RRefs are confirmed. (#27098)
4. Internal RRef control messages are not idempotent yet. (#26116)
5. Cannot delete RRefs correctly when there are circular dependencies. (#27096)
Main changes:
1. Added `SCRIPT_REMOTE_CALL` and `PYTHON_REMOTE_CALL` to `Message.h` to represent `dist.remote` invocations.
2. Added `SCRIPT_RREF_FETCH_CALL`, `PYTHON_RREF_FETCH_CALL`, `RREF_USER_ACCEPT`, `RREF_USER_DELETE`, `RREF_CHILD_ACCEPT`, and `RREF_FORK_REQUEST` to `Message.h` as internal RRef control messages.
3. New message request handling code is added to `functions.cpp`, and message format is added in `script_remote_call.h`, `python_remote_call.h`, and `rref_proto.h`.
4. Added a `PyRRef` type in `py_rref.h` and `py_rref.cpp` which holds a shared pointer to C++ `RRef` type. `PyRRef` wraps the C++ API and also implements RRef pickling and unpickling. RRef fork related control messages will be sent during RRef pickling/unpickling procedure.
5. Update `RRef.h` and `RRef.cpp` accordingly to support `py::object` RRefs.
6. RRef context (reference count, etc.) are tracked in `rref_context.h` and `rref_context.cpp`.
Test Plan:
Imported from OSS
buck test mode/dev-nosan //caffe2/test:rpc_fork
Differential Revision: D17184146
Pulled By: mrshenli
fbshipit-source-id: a3a268efc087ac1ef489136ab957080382629265
* Set MINIZ_NO_TIME to avoid computing localtime on each pickle/unpickle (#27268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27268
For small pickle/unpickle, we spend a disproportionate amount of time in
time functions - roughly 23% in __tzset() for unpickle case.
We're currently not using the .m_time currently, though we can add this feature
back if it's ever needed.
An alternative would be to -DMINIZ_NO_TIME in compiler_flags, but we would
need to also consistently # define MINIZ_NO_TIME in any .cpp including this .h,
since this # define modifies the struct length in an unfortunate manner.
Test Plan:
buck test mode/dev-nosan caffe2/test/...
Run benchmark:
buck-out/opt/gen/caffe2/torch/fb/distributed/thriftRpcBackend/test/ThriftRpcAgentBench
Differential Revision: D17724198
fbshipit-source-id: b44a0217b1d9f8ce6c0f24297f59045c7cadf4b1
* Add a test case to RpcTest, check src/dst (#27322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27322
# Problem
Existing test cases are too symmetric, so that didn't detect this error, request sent to the wrong worker.
Because of wrong `worker_names` setup, worker0 sends request to itself, while it should had sent to worker1.
# Solution
Add a test case, letting the dst side to check if it's an request from the expected src.
ghstack-source-id: 91299312
Reviewed By: satgera
Differential Revision: D17069062
fbshipit-source-id: ef7a532dd497bfc0f0ee8446fcd5d29656aaf175
* Update to ROCm 2.8 (#27337)
Summary:
New docker images built with tag 324.
Related jenkins changes:
https://github.com/pytorch/ossci-job-dsl/commit/83ec81335742e66b02af90b7c74021b8792fc63f
https://github.com/pytorch/ossci-job-dsl/commit/aa235a14c82db69d0544cd8fc1da03ef9a50096e
Triggered CI runs:
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-devtoolset7-rocmrpm-centos7.5-trigger-test/48682/
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-trigger/55638/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27337
Differential Revision: D17753827
Pulled By: bddppq
fbshipit-source-id: 2c3f77b0b7c680013c7cc6d7953fe0da4922fe48
* add sdk support for xcodebuild script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27358
Test Plan: Imported from OSS
Differential Revision: D17757389
Pulled By: xta0
fbshipit-source-id: ed8e470b9c6329b96297ee7c65ba08759251baad
* export remainder (#24410)
Summary:
Added ONNX export support for torch.remainder and torch.fmod
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24410
Reviewed By: hl475
Differential Revision: D17466791
Pulled By: houseroad
fbshipit-source-id: afe6519e5f370824e3b4a45b69036a7260fb72cf
* Replacing the skip_list with white_list in the qconfig propagation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27183
Test Plan: Imported from OSS
Differential Revision: D17700548
Pulled By: zafartahirov
fbshipit-source-id: 18e6ffbda496b14ac1da1783f928ad539cdb1d16
* Show a warning that not all dir members of quantized work. (#27339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27339
This PR just shows a warning message.
Eventually we will show a correct __dir__
Test Plan: Imported from OSS
Differential Revision: D17751333
Pulled By: zafartahirov
fbshipit-source-id: e9bc62fd8dd0147979291d0aac3f1afe5b8c7a9f
* improve error messages when a method or attribute is missing (#27110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27110
Previously missing methods on some types like tensors would talk about
'builtins' which are only a thing inside of the compiler. Furthermore,
the error would only occur when the builtin was applied and it was discovered
that no builtin existed. This changes the error message so that it
discovers that method on our builtin types does not exist on attribute lookup.
Test Plan: Imported from OSS
Differential Revision: D17677616
Pulled By: zdevito
fbshipit-source-id: 2f7cf6c6093a9c832569c44f4b1044a2e56fe205
* refactor extra sugared values (#26270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26270
We've accumulated a lot of sugared values whose only purpose is
to be instanced-checked against in emitApplyExpr. I need to add
another one to insert an unchecked_cast, and do not want to continue
the pattern. This creates an abstraction for this concept (SpecialFormValue),
and removes all the unneeded sugared values. There is no functionality
change here just a bunch of code movement in compiler.cpp
Test Plan: Imported from OSS
Differential Revision: D17412854
Pulled By: zdevito
fbshipit-source-id: 15877c91decaea5a00d1fe737ed2d0f0f8a79a28
* Minor readability fixes to C++ documentation (#27338)
Summary:
Changed `yieldings` to `yielding`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27338
Differential Revision: D17758406
Pulled By: yf225
fbshipit-source-id: 1633834a6ad80449c061ebc330ac24f3e42f5506
* Choose num_threads in parallel_for based on GRAIN_SIZE (#26963)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24080, Continuation of https://github.com/pytorch/pytorch/issues/26886
What soumith said in https://github.com/pytorch/pytorch/pull/26886#issuecomment-535760635 seems plausible
> I wonder if it has to do with `#pragma omp parallel num_threads(num_threads)` which has unintended consequences, where even if `num_threads=1`, entering an omp block inside an omp block results in bad behavior.
I know for a fact that gcc's openmp doesn't start the thread pool when given `num_threads(1)` but it seems clang behaves differently.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26963
Differential Revision: D17626981
Pulled By: soumith
fbshipit-source-id: 484ffe6cc172382bb5ff49ce1fceda7eba20a512
* Enable Python3.6 PyTorch ROCm CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27353
Differential Revision: D17758495
Pulled By: bddppq
fbshipit-source-id: 95e329bc30f092e4093a33c408f1647b803d9983
* Fixes PackedSequence.to (and unifies PackedSequence conversions) (#27245)
Summary:
PackedSequence.to(device) incorrectly places one of three tensors on the device and leaves the other two tensors where they are. If these devices are distinct then further operations on PackedSequence will fail. This behavior is inconsistent with Tensor.to and PackedSequence's behavior when .cuda() is called.
Additionally, PackedSequence defines multiple other conversion functions that were independently and inconsistently implemented.
This PR unifies all implementations and makes the PackedSequence.to behavior more consistent with Tensor.to. It is not completely consistent per comments. test_device_mask in test_nn.py is updated to validate the new functionality.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27245
Differential Revision: D17757850
Pulled By: mruberry
fbshipit-source-id: 58f0bd40f1aa300fb0a91ee743483d645f977dc5
* Makes test_cuda.py's generated tensor op tests generic (#27210)
Summary:
- The tensor op tests generated in test_cuda.py are now generic and appear in test_torch,py
- Data previously held in auxiliary data structures and files, like test_cuda_ignores.txt, is inlined
Previously the tensor op tests used several auxiliary data structures, a file, and exception handling to filter the test suite. If a function wasn't implemented, for example, that exception would be caught. This let functions like trigamma, which isn't callable, appear to be tested. See https://github.com/pytorch/pytorch/issues/27230. Filtering from additional data stores is error prone, too. It requires developers understand what data stores are used and how they're used. The existing sources are also sometimes incorrect. The txt file claims that dist_ doesn't work on half tensors, for example, but the updated tests verify it does.
In addition to making these tests generic, this PR removes those auxiliary data structures and does not catch any exceptions. Exceptions are errors. (This also means that if something implemented breaks it will now report as an error. Previously the test suite would have reported a pass.) The test infrastructure was also simplified to not perform computations with CPU half tensors since they do not support many operations. This introduces a float<->half conversion quirk but eliminates awkward functions that would first convert cpu tensors to float, perform an operation, and convert them back.
With this change test_cuda.py is almost entirely CUDA-specific.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27210
Differential Revision: D17757907
Pulled By: mruberry
fbshipit-source-id: b3c191c379667b1a7d5361087bdf82f397f77f65
* Remove six dependency (#27282)
Summary:
https://github.com/pytorch/pytorch/pull/27136 added a dependency on `six`, which is not available by default and is not marked as a dependency on PyTorch binaries, causing torchvision CI to break, see https://circleci.com/gh/pytorch/vision/20778?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link for example.
This PR use `torch._six` instead of `six` as a replacement.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27282
Reviewed By: lerks
Differential Revision: D17737561
Pulled By: fmassa
fbshipit-source-id: 7dcd0cc2c8bab27b8f4535f664f60388818d3497
* Make `align_to` method-only. (#27304)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27304
The ellipsis version of `align_to` only works if it is called as a
method. To prevent any confusion, this PR disables `torch.align_to` (but
keeps `Tensor.align_to`.
Test Plan: - [namedtensor ci]
Differential Revision: D17743809
Pulled By: zou3519
fbshipit-source-id: cf5c53dcf45ba244f61bb1e00e4853de5db6c241
* Remove CUDA_VERSION from Python script (which has already been detected in CMake) (#27316)
Summary:
(Intentionally left blank)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27316
Differential Revision: D17762715
Pulled By: ezyang
fbshipit-source-id: 044c0ea6e8c2d12912c946a9a50b934b5253d8c8
* Revert D17743310: [pytorch][PR] Allow use cpu_serial_kernel with void-lambda
Test Plan: revert-hammer
Differential Revision:
D17743310
Original commit changeset: a149751f2d67
fbshipit-source-id: 043240201d67966dd08b7b1bc2f9bf4897923e00
* Implement pickle support for sparse tensors and torch.layout instances (#27062)
Summary:
Resolves issue https://github.com/pytorch/pytorch/issues/16667 and https://github.com/OpenMined/PySyft/issues/2326
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27062
Differential Revision: D17762932
Pulled By: ezyang
fbshipit-source-id: dd99c1f4ac8eb2286eb55aa20ce973f60ce7b7e1
* move new_zeros to core from THP (#26511)
Summary:
Fix for issue https://github.com/pytorch/pytorch/issues/25831
ezyang can you please have a look?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26511
Differential Revision: D17763037
Pulled By: ezyang
fbshipit-source-id: 3596c01c4ab421e7785d6055cc813806f840a5c7
* autograd: double backwards function for binary_cross_entropy loss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26983
Reviewed By: albanD
Differential Revision: D17714357
Pulled By: anjali411
fbshipit-source-id: cebfe09a9048c4be457b7f2718bc396c06ecabee
* Change schedulers to chainable form (#26423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26423
Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208).
* Changing the behavior of schedulers to the chainable formula when available
* Using the closed form whenever epoch is different from None until the next release with a deprecation warning
* Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax)
* Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release.
* `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch
* `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax.
# #20527
### Before
The user calls scheduler with a constant epoch either across loops or in the same loop.
```
import torch.optim as optim
from torch import nn
conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)
# Scheduler with sometimes-constant epoch number
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:
lr_scheduler.step(epoch)
print(optimizer.param_groups[0]['lr'])
```
### After
If the user wants to step
```
import torch.optim as optim
from torch import nn
conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)
last_epoch = -1
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:
# Check if epoch number has changed manually
if epoch-last_epoch > 0:
lr_scheduler.step()
last_epoch = epoch
print(epoch, scheduler.get_computed_values())
```
# #22107
### Before
```
import torch
from torchvision.models import resnet18
net = resnet18()
optimizer = torch.optim.SGD(net.parameters(), 0.1)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)
for i in range(10):
# Scheduler computes and returns new learning rate, leading to unexpected behavior
print(i, scheduler.get_lr())
scheduler.step()
```
### After
```
import torch
from torchvision.models import resnet18
net = resnet18()
optimizer = torch.optim.SGD(net.parameters(), 0.1)
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)
for i in range(10):
# Returns last computed learning rate by scheduler
print(i, lr_scheduler.get_computed_values())
lr_scheduler.step()
```
# ghstack
This contains the changes from #24352. Opening again since they were reverted.
This reverts commit 1c477b7e1f378e9c1f8efed296241f68a8a4372b.
Test Plan: Imported from OSS
Differential Revision: D17460427
Pulled By: vincentqb
fbshipit-source-id: 8c10f4e7246d6756ac91df734e8bed65bdef63c9
* Make RpcTest re-usable by other RPC backends by using init_method to initialize a RPC backend (#27320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27320
https://github.com/pytorch/pytorch/pull/27208/
# Problem
Other RPC backends take init_method.
# Solution
Set up init_method in rpc tests.
ghstack-source-id: 91335127
Differential Revision: D17709219
fbshipit-source-id: 3184c6e9b922a6ff9f4d1cb9abfa118b23f43eeb
* Add OPN instruction and vararg operator table (#27104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27104
* The use case here is to replace prim::ListConstruct, which requires Node, but Node is not available in mobile lite interpreter.
* (OPN, X, N), X is the index to the vararg operator-name and operator tables. N is number of inputs. For ListConstruct example, operator name can be "aten::listconstruct" and the overloaded name is the output type ("int", "float", "bool", "tensor" and "generic").
* A vararg operator table is built with void(int input_size, Stack& stack) functions.
## Unit test
LiteInterpreterConv covers OPN instruction and conv operator.
Test Plan: Imported from OSS
Differential Revision: D17762853
fbshipit-source-id: 475aa0c6678e3760cec805862a78510913a89c83
* Allow use cpu_serial_kernel with void-lambda (#27370)
Summary:
https://github.com/pytorch/pytorch/pull/27271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27370
Differential Revision: D17763265
Pulled By: ifedan
fbshipit-source-id: d670560dfc555db529b18c01aa42f0ccb2127889
* From docs of scatter_add_() removed erroneous comment on uniqueness of indices. (#27132)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27080
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27132
Differential Revision: D17765307
Pulled By: soumith
fbshipit-source-id: b0892ff442f3b49f8e3cdf029e2a08b51fa88f28
* Reduce error context from 10 -> 3 (#26765)
Summary:
10 lines of error context (on both sides) is overkill, especially now
that we have line numbers. With a compilation stack of a couple
functions, it becomes a pain to scroll to the top of the stack to see
the real error every time.
This also fixes class names in the compilation stack to a format of
`ClassName.method_name` instead of the the full qualified name
Old output
```
clip_boxes_to_image(Tensor boxes, (int, int) size) -> (Tensor):
Expected a value of type 'Tuple[int, int]' for argument 'size' but instead found type 'Tuple[int, int, int]'.
:
at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:365:20
top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level)
batch_idx = torch.arange(num_images, device=device)[:, None]
objectness = objectness[batch_idx, top_n_idx]
levels = levels[batch_idx, top_n_idx]
proposals = proposals[batch_idx, top_n_idx]
final_boxes = []
final_scores = []
for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes):
boxes = box_ops.clip_boxes_to_image(boxes, img_shape)
~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
keep = box_ops.remove_small_boxes(boxes, self.min_size)
boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep]
# non-maximum suppression, independently done per level
keep = box_ops.batched_nms(boxes, scores, lvl, self.nms_thresh)
# keep only topk scoring predictions
keep = keep[:self.post_nms_top_n]
boxes, scores = boxes[keep], scores[keep]
final_boxes.append(boxes)
final_scores.append(scores)
'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward'
at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:446:8
num_images = len(anchors)
num_anchors_per_level = [o[0].numel() for o in objectness]
objectness, pred_bbox_deltas = \
concat_box_prediction_layers(objectness, pred_bbox_deltas)
# apply pred_bbox_deltas to anchors to obtain the decoded proposals
# note that we detach the deltas because Faster R-CNN do not backprop through
# the proposals
proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
proposals = proposals.view(num_images, -1, 4)
boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
losses = {}
if self.training:
assert targets is not None
labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
loss_objectness, loss_rpn_box_reg = self.compute_loss(
objectness, pred_bbox_deltas, labels, regression_targets)
losses = {
'RegionProposalNetwork.forward' is being compiled since it was called from 'MaskRCNN.forward'
at /home/davidriazati/dev/vision/torchvision/models/detection/generalized_rcnn.py:53:8
"""
if self.training and targets is None:
raise ValueError("In training mode, targets should be passed")
original_image_sizes = [(img.shape[-2], img.shape[-3]) for img in images]
images, targets = self.transform(images, targets)
features = self.backbone(images.tensors)
if isinstance(features, torch.Tensor):
features = OrderedDict([(0, features)])
proposals, proposal_losses = self.rpn(images, features, targets)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)
losses = {}
losses.update(detector_losses)
losses.update(proposal_losses)
# TODO: multiple return types??
# if self.training:
```
New output
```
RuntimeError:
clip_boxes_to_image(Tensor boxes, (int, int) size) -> (Tensor):
Expected a value of type 'Tuple[int, int]' for argument 'size' but instead found type 'Tuple[int, int, int]'.
:
at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:365:20
final_scores = []
for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes):
boxes = box_ops.clip_boxes_to_image(boxes, img_shape)
~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
keep = box_ops.remove_small_boxes(boxes, self.min_size)
boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep]
'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward'
at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:446:8
proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
proposals = proposals.view(num_images, -1, 4)
boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
losses = {}
'RegionProposalNetwork.forward' is being compiled since it was called from 'MaskRCNN.forward'
at /home/davidriazati/dev/vision/torchvision/models/detection/generalized_rcnn.py:53:8
if isinstance(features, torch.Tensor):
features = OrderedDict([(0, features)])
proposals, proposal_losses = self.rpn(images, features, targets)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
detections = self.transform.postprocess
```
](https://our.intern.facebook.com/intern/diff/17560963/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26765
Pulled By: driazati
Differential Revision: D17560963
fbshipit-source-id: e463548744b505ca17f0158079b80e08fda47d49
* Fix some return std::move warnings (#27
F438
384)
Summary:
clang-tidy was complaining about these
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27384
Pulled By: driazati
Differential Revision: D17767412
fbshipit-source-id: 03e2630790edf3f6bbf9064e754156613032b464
* add function to get nccl version for error messages (#27068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27068
Adds a function that uses ncclGetVersion from the NCCL API to retrieve the NCCL version. Converts it into a readable string, and is called in NCCL-related error messages to log the NCCL version. Hopefully this will help with debugging NCCL errors.
Test Plan:
Modify C10D_NCCL_CHECK in NCCLUtils.hpp to always error by setting ncclResult_t error = ncclSystemError
force an NCCL error with script test/simulate_nccl_errors.py:
Start master node: python test/simulate_nccl_errors.py localhost 9124 0 2
Start other node: python test/simulate_nccl_errors.py localhost 9124 1 2
On the master node, should see the following error message w/NCCL version:
```
Traceback (most recent call last):
File "simulate_nccl_errors.py", line 29, in <module>
process_group.allreduce(torch.rand(10).cuda(rank)).wait()
RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:375, unhandled system error, NCCL version 2.4.8
```
Differential Revision: D17639476
fbshipit-source-id: a2f558ad9e883b6be173cfe758ec56cf140bc1ee
* C++ API parity: Hardtanh
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27038
Test Plan: Imported from OSS
Differential Revision: D17682405
Pulled By: pbelevich
fbshipit-source-id: f65e76696e0041c3518f56da94f2e3b800305234
* fix OSX CI build (#27373)
Summary:
fix OSX caffe2 CI build, attempt 1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27373
Differential Revision: D17768461
Pulled By: soumith
fbshipit-source-id: b0a076c07382327730b5d86b8a00f5388c368b5e
* ProcessGroupNCCL should respect timeout passed in to init_process_group. (#27224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27224
As part of adding error handling to NCCL, we are now able to specify a
timeout for operations using ProcessGroupNCCL. Although, this timeout had a
default of 10 seconds and didn't respect the timeout specified in
init_process_group.
In this change, I've ensured we pass the appropriate timeout to
ProcessGroupNCCL.
ghstack-source-id: 91283548
Test Plan:
Added unit test to verify timeout passed in to init_process_group is
respected.
Differential Revision: D17717992
fbshipit-source-id: c73320187f1f3b2693ba1e177d80646e282d01a2
* Add clip_grad_norm_ to c++ api (#26140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26140
Per https://github.com/pytorch/pytorch/issues/25883, we want to work
towards C++/Python API parity. This diff adds clip_grad_norm_ to the c++ API to
improve parity.
ghstack-source-id: 91334333
ghstack-source-id: 91334333
Test Plan: Added a unit test
Differential Revision: D17312367
fbshipit-source-id: 753ba3a4d084d01f3cc8919da3108e67c809ad65
* C++ API parity: LeakyReLU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27059
Test Plan: Imported from OSS
Differential Revision: D17682407
Pulled By: pbelevich
fbshipit-source-id: 2a4f42e9438799ba8de7282ac7a6fd3ff97ee048
* Some hipify script cleanups (#27375)
Summary:
continue https://github.com/pytorch/pytorch/issues/26363
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27375
Differential Revision: D17764992
Pulled By: bddppq
fbshipit-source-id: ecc06521179677efcedb1d58ceda63df7d63627e
* add some support for the occupancy API on ROCm (#27390)
Summary:
Unfortunately, the HIP function takes uint32_t* instead of int*, so we still need to ifdef for the time being.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27390
Differential Revision: D17768832
Pulled By: bddppq
fbshipit-source-id: c65176660cb0783a04f0a4a064f686818d759589
* Add gfx908 to the list of per-default compiled architectures. (#27388)
Summary:
ROCm 2.8 added preliminary support for gfx908.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27388
Differential Revision: D17767772
Pulled By: bddppq
fbshipit-source-id: 172daf5bb66d3db86a13e287059af4b9b90a7f57
* Change nightly builds version to 1.4.0-SNAPSHOT (#27381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27381
Changing android nightly builds from master to version 1.4.0-SNAPSHOT, as we also have 1.3.0-SNAPSHOT from the branch v1.3.0
Test Plan: Imported from OSS
Differential Revision: D17773620
Pulled By: IvanKobzarev
fbshipit-source-id: c39a1dbf5e06f79c25367c3bc602cc8ce42cd939
* Pickup proxy parameters for publishing (#27389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27389
Pickup gradle proxy parameters (handy for publishing from devserver) in maven publishing gradle plugin
Test Plan: Imported from OSS
Differential Revision: D17773548
Pulled By: IvanKobzarev
fbshipit-source-id: 662c0b2835e6cf1e4009da79e27268d4a19c2ceb
* MovingAverage Observer (#27396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27396
Observer that estimates moving averages of min and max values per batch, more suited for quantization aware training instead of minmax observers that track extremal values across batches
ghstack-source-id: 91369018
Test Plan:
buck test caffe2/test:quantization -- 'test_per_tensor_observers \(test_quantization\.ObserverTest\)' --print-passing-details
buck test caffe2/test:quantization -- 'test_per_channel_observers \(test_quantization\.ObserverTest\)' --print-passing-details
Differential Revision: D17727213
fbshipit-source-id: 024a890bf3dd0bf269d8bfe61f19871d027326f0
* Add methods to write image tensor content to buffer (#27359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27359
Adding methods to TensorImageUtils:
```
bitmapToFloatBuffer(..., FloatBuffer outBuffer, int outBufferOffset)
imageYUV420CenterCropToFloat32Tensor(..., FloatBuffer outBuffer, int outBufferOffset)
```
To be able to
- reuse FloatBuffer for inference
- to create batch-Tensor (contains several images/bitmaps)
As we reuse FloatBuffer for example demo app - image classification,
profiler shows less memory allocations (before that for every run we created new input tensor with newly allocated FloatBuffer) and ~-20ms on my PixelXL
Known open question:
At the moment every tensor element is written separatly calling `outBuffer.put()`, which is native call crossing lang boundaries
As an alternative - to allocation `float[]` on java side and fill it and put it in `outBuffer` with one call, reducing native calls, but increasing memory allocation on java side.
Tested locally just eyeballing durations - have not noticed big difference - decided to go with less memory allocations.
Will be good to merge into 1.3.0, but if not - demo app can use snapshot dependencies with this change.
PR with integration to demo app:
https://github.com/pytorch/android-demo-app/pull/6
Test Plan: Imported from OSS
Differential Revision: D17758621
Pulled By: IvanKobzarev
fbshipit-source-id: b4f1a068789279002d7ecc0bc680111f781bf980
* add warning to dnnlowp fc if quantization kind is not min_max
Summary:
Print warning when using DNNLOWP dynamic int8 quant for FC and activation_quantization_kind != min_max.
Warning will display in console but not in Bento. Would have to use CAFFE_ENFORCE to alert in Bento.
Test Plan: buck run unit test forcing DNNLOWP FC with activation_quantization_kind = "l2" and saw warning printed in console.
Reviewed By: csummersea
Differential Revision: D17770921
fbshipit-source-id: b6532e4c9a86d74e3db4cb432735505d378a366e
* Add interface/object serialization as module attribute (#26770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26770
This PR added the interface/object serialization as module attribute, to
allow initializing object as a interface type during python
initialization. Because interface type can be backed by any class object
that implements that interface, if we declare it in
python/module.__init__, we will need to collect the run time types of the
value and serialize them to ensure complete code information
Test Plan: Imported from OSS
Differential Revision: D17742707
fbshipit-source-id: 7f614ad4f982996d320a0e2dd3515bf47370e730
* Adding docstrings for nnq.functional
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27363
Test Plan: Imported from OSS
Differential Revision: D17758907
Pulled By: zafartahirov
fbshipit-source-id: f560f2726cf51ceebdbf22ebef2d067422340cf2
* Enable RCCL in ROCm build (#27383)
Summary:
continues https://github.com/pytorch/pytorch/pull/23884
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27383
Differential Revision: D17767248
Pulled By: bddppq
fbshipit-source-id: 3a506844ca6f01d7bbe8be5bde0976999e3a2b90
* Add randomFill to test_utils.h
Summary: Add helper function randomFill to test_utils.h so we can use it in benchmark scrips as well tests.
Test Plan:
```
buck run mode/opt //tvm/sparse:cblas_bench
```
Reviewed By: yinghai
Differential Revision: D17759193
fbshipit-source-id: e4909b04e83ca9382ab4718855fb63743d028de1
* Use deepcopy inputs for ONNX ort test cases (#27186)
Summary:
Running models with inplace operators will change values of input tensors.
Deepcopy input tensors each time to keep the original input tensors intact.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27186
Differential Revision: D17776598
Pulled By: jerryzh168
fbshipit-source-id: d4808a11185a9ab0d782a62d7d708dfe7e94559c
* Remove dependency on six from dist_autograd_test.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27369
Test Plan: Imported from OSS
Differential Revision: D17763104
Pulled By: mrshenli
fbshipit-source-id: dd146809686e7720f2b77012eebb6aed72851556
* Docstring fix (#27225)
Summary:
Correcting docstring for `add_image_with_boxes` method. Fixed spelling mistake.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27225
Differential Revision: D17776604
Pulled By: jerryzh168
fbshipit-source-id: 45f69643ec3b58c46b9fb67411c42a6d09b7290e
* Tweak docs on building docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27364
Differential Revision: D17777402
Pulled By: dzhulgakov
fbshipit-source-id: 304c678e5c80d7f8c779d65c11f9bf1b0facdb52
* Upgrade to ROCm 2.9 (#27417)
Summary:
New docker images built with tag 325: https://ci.pytorch.org/jenkins/job/caffe2-docker-trigger/325
Related ossci-job-dsl commits:
https://github.com/pytorch/ossci-job-dsl/commit/a00a76f927944aed961a3bbbc4f17aff0fc30d71
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27417
Differential Revision: D17777517
Pulled By: bddppq
fbshipit-source-id: a6b8cb86b37f537d402f6d2c7d28ad28a6a5a317
* enable rocTX API (#27416)
Summary:
ROCm 2.9 brings support for the rocTX API through rocTracer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27416
Differential Revision: D17777480
Pulled By: bddppq
fbshipit-source-id: 6bce9b54c94e5b4c5787570d2b85736882bd23a7
* C++ API parity: LogSigmoid
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27060
Test Plan: Imported from OSS
Differential Revision: D17682404
Pulled By: pbelevich
fbshipit-source-id: d60d64cd4caf1f56a2e05c516f91321d46ec9624
* Remove Tensor.h, TensorMethods.h from src/core. (#27086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27086
This is a major source of merge conflicts, and AFAICT isn't necessary anymore (it may have been necessary for some mobile build stuff in the past).
This is a commandeer of #25031
Test Plan: Imported from OSS
Reviewed By: ljk53
Differential Revision: D17687345
Pulled By: ezyang
fbshipit-source-id: bf6131af835ed1f9e3c10699c81d4454a240445f
* Remove outdated note in cholesky_solve and triangular_solve doc strings (#26989)
Summary:
We do support inputs with dim > 2 in _out variants
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26989
Differential Revision: D17785632
Pulled By: soumith
fbshipit-source-id: d42ba7ca9c225ad1a26ff3b410d0c5c08eaed001
* Disable tsan for test_multiprocessing. (#27410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27410
Similar to https://github.com/pytorch/pytorch/pull/25005, TSAN is not
safe to use in a multi-threaded program with fork and can cause deadlocks. As a
result, disabling this test for TSAN.
ghstack-source-id: 91393545
Test Plan: buildbot
Differential Revision: D17775141
fbshipit-source-id: 109b8095240ad43ee4a6380f70b9efca863c0a4a
* Unfold export (#24970)
Summary:
ONNX export for Unfold in symbolic opset9 + op and ORT tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24970
Reviewed By: hl475
Differential Revision: D17495106
Pulled By: houseroad
fbshipit-source-id: fcd179a1213c0f219628f25c09e66fcfe4c5df50
* Reduce special casing around 'training' (#27109)
Summary:
Most of this was old cruft left over from special handling of `training` before we had a `bool` type. This makes all modules have a `training` attribute that is true by default and removes all other special handling.
Fixes #26884
](https://our.intern.facebook.com/intern/diff/17728129/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27109
Pulled By: driazati
Differential Revision: D17728129
fbshipit-source-id: 8ddc9fbb07a953dd05529538bfdd01ed88b5cb57
* Put metrics back to torch.utils.tensorboard similar we have in TensorboardX
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27252
Test Plan: Check metrics in the Scuba table: https://fburl.com/scuba/k5x8yosj
Reviewed By: sanekmelnikov
Differential Revision: D17723414
fbshipit-source-id: 64d42e0b4582f635d38f38feb2b2a6c4826f2065
* Automatic update of fbcode/onnx to 2891e1459745933f4bba9a8cb3371cf3c9eb1d16 (#27474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27474
Previous import was 034921bd574cc84906b7996c07873454b7dd4135
Included changes:
- **[2891e145](https://github.com/onnx/onnx/commit/2891e145)**: Fix Unique unit test (#2381) <Scott McKay>
- **[25cf73e5](https://github.com/onnx/onnx/commit/25cf73e5)**: update shapeInference h file link (#2369) <prcvih>
- **[e3074bc0](https://github.com/onnx/onnx/commit/e3074bc0)**: modify file path (#2378) <prcvih>
- **[9058d3a4](https://github.com/onnx/onnx/commit/9058d3a4)**: Incrementing version number to 1.6.0 (#2353) (#2385) <Kevin Chen>
- **[c963586d](https://github.com/onnx/onnx/commit/c963586d)**: Remove typing packages from test requirements (#2375) <Aiken Cairncross>
Test Plan: ci
Reviewed By: bddppq
Differential Revision: D17791527
fbshipit-source-id: 23ad5abe313cd4e4eedcbe7794b98450b3b7d3bc
* Fixed Select symbolic to export slice when index = negative one (#25273)
Summary:
Exporting torch.select when index = negative one (x[:,-1]) was broken. This PR has the fix in symbolic function for select.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25273
Reviewed By: hl475
Differential Revision: D17159707
Pulled By: houseroad
fbshipit-source-id: 2c3b275421082758f1b63c1c9b6e578f03ca9f76
* Avoid variable shadowing in ``::at::philox_engine::single_round()`` (#27486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27486
Rename `key` argument of `single_round` method to `in_key`
Test Plan: CI
Reviewed By: stepancheg, soumith
Differential Revision: D17782904
fbshipit-source-id: 6feae55c407f39d41db099b013dcbd3990768603
* Refactor python_android test to separate Android-specific components (#27453)
Summary:
All of the test cases move into a base class that is extended by the
intrumentation test and a new "HostTests" class that can be run in
normal Java. (Some changes to the build script and dependencies are
required before the host test can actually run.)
ghstack-source-id: fe1165b513241b92c5f4a81447f5e184b3bfc75e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27453
Test Plan: Imported from OSS
Reviewed By: IvanKobzarev
Differential Revision: D17800410
fbshipit-source-id: 1184f0caebdfa219f4ccd1464c67826ac0220181
* Various cleanups to pytorch_android API (#27454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27454
See detailed discussion at
https://github.com/pytorch/pytorch/issues/27350
Test Plan: Imported from OSS
Reviewed By: IvanKobzarev
Differential Revision: D17800480
Pulled By: dreiss
fbshipit-source-id: bf174e8b16231b89be771de0fa54c41e864a3eb0
* Clean up JavaDoc comments in pytorch_android
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27455
Test Plan: Imported from OSS
Differential Revision: D17800658
Pulled By: dreiss
fbshipit-source-id: dbd01d9fa5ac82c50daf54c2869dc18be233d8dd
* FunctionEventAvg implements __iadd__ interface (#27498)
Summary:
Resolving issue https://github.com/pytorch/pytorch/issues/26433 by making FunctionEventAvg implement the `__iadd__` interface again, like it used to.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27498
Differential Revision: D17801918
Pulled By: ezyang
fbshipit-source-id: 0597059c903ac168ed64a05ac1decff3ffd14f06
* Move hipify to torch/utils to bundle them into torch package (#27425)
Summary:
Similar to https://github.com/pytorch/pytorch/pull/27418 but try to put it under "torch" namespace
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27425
Differential Revision: D17779490
Pulled By: bddppq
fbshipit-source-id: 688338d143509b37dfc110df17af3331db48a42b
* Ensure NCCL error handling code is disabled for NCCL versions < 2.4 (#27124)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27124
ncclCommAbort() and ncclGetAsyncError() were two APIs added in NCCL
2.4 to detect errors in NCCL communicators. These were used as part of
ProcesGroupNCCL and we also enforced that only NCCL versions 2.4+ were
supported. Although, there is still legitimate use for older NCCL versions and
hence we should still support those.
For that purpose, in this change I've ensured we disable NCCL error checking
for versions < 2.4.
ghstack-source-id: 91452959
Test Plan:
1) Test with 2.4.8
2) Test with 2.2.13
3) unit tests.
Differential Revision: D17178988
fbshipit-source-id: 5dc44b5f7b4b00466c67fd452315f1d4f5c47698
* #include <stdexcept> into flat_hash_map.h (#27478)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/27266
In general we should not rely on transitively included headers, we should implicitly include all headers if their members are used in the source file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27478
Differential Revision: D17799522
Pulled By: pbelevich
fbshipit-source-id: 5818394a212c947cfac3a6cf042af9ebb8b9d9a0
* Fix broken name mangling
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27511
Test Plan: Imported from OSS
Differential Revision: D17801185
Pulled By: jamesr66a
fbshipit-source-id: 3eaa9542a445c9401f3f96e11138ec09b0d8350a
* Updating submodules
Summary:
GitHub commits:
https://github.com/facebook/fbthrift/commit/e80ecd1d63c956ed34b257fbd1aaef73ef8eb781
https://github.com/facebook/proxygen/commit/6c7a36b1b3f2825fd30ba00c708ec5ceaa5db760
https://github.com/facebookincubator/mvfst/commit/875046204325f9bd8cc5343b98a8fa4b99187a3c
https://github.com/facebook/proxygen/commit/442d7def679c297427f5d0b679685db92fe3d28c
https://github.com/facebook/wangle/commit/c138dc3d2c0c4f4f68ab4931e44b87a6becb194c
https://github.com/facebookincubator/fizz/commit/3833f10989711256704260a01e0c9f7d1c33e468
https://github.com/facebookincubator/katran/commit/6fc473d5304985aa31d351c6305904e80af4b614
https://github.com/pytorch/fbgemm/commit/82d259dade58e53775a534f88b7b48e760f09a64
Test Plan: n/a
Reviewed By: 2d2d2d2d2d
fbshipit-source-id: 7834a4a8620d0ab9b60060e0abadfba457fb2890
* Revert D17159707: [pytorch][PR] [ONNX] Fixed Select symbolic to export slice when index = negative one
Test Plan: revert-hammer
Differential Revision:
D17159707
Original commit changeset: 2c3b27542108
fbshipit-source-id: accce910abdbe13270d0f592810a48b1dabe4b01
* Roll master to 1.4.0 (#27374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27374
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D17809770
Pulled By: ezyang
fbshipit-source-id: 75bd97426494a7bbbf08f9bce7563d35871443d8
* Exponential decay of the weight of task loss (#27508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27508
Implemented a simple exponential decay of the weight of lr loss function, with a lower bound.
Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests:mtml_test -- test_task_weight_decay
https://our.intern.facebook.com/intern/testinfra/testrun/3377699729136308
canary: f140103452
Reviewed By: chenshouyuan
Differential Revision: D17524101
fbshipit-source-id: 9a653e21a4ecb74dfc4ac949c9e3388f36ef3a20
* docstring only formatting changes: quantize.py, fake_quantize.py, observer.…
0 commit comments