-
Notifications
You must be signed in to change notification settings - Fork 24.2k
Insights: pytorch/pytorch
Overview
Could not load contribution data
Please try again later
8 Pull requests merged by 5 people
-
Remove 3.13 hack when installing TIMM
#153648 merged
May 15, 2025 -
[FlexAttention] Remove Old Constraint on lastdim strides
#153104 merged
May 15, 2025 -
Fix license check for setuptools>=77
#153581 merged
May 15, 2025 -
Only print dde partial fx graph for export
#153218 merged
May 14, 2025 -
Revert "Cleanup VS 2019 refs in pytorch (#145863)" (#152613)
#153390 merged
May 14, 2025 -
Make numpy check optional
#153421 merged
May 14, 2025 -
Add device guard for xpu conv on multi device
#153345 merged
May 14, 2025 -
[release only] Bump triton version to 3.3.1
#153554 merged
May 14, 2025
172 Pull requests opened by 101 people
-
Fix lcm_ crash with int16 scalar and large int32 tensor
#153314 opened
May 10, 2025 -
try relanding cublaslt autotuning support for TunableOp #
#153316 opened
May 10, 2025 -
[associative_scan] Autograd for additional inputs
#153317 opened
May 10, 2025 -
[DEBUG] dump combined_traceback
#153318 opened
May 10, 2025 -
[DEBUG] only combined_traceback
#153319 opened
May 10, 2025 -
[DEBUG] only comment
#153320 opened
May 10, 2025 -
[don't merge] upgrade vs2022 to v17.13.6
#153322 opened
May 10, 2025 -
[CUDA][MPS] Fix torch.arange bound validation for large float inputs
#153328 opened
May 10, 2025 -
Fixing gpu cpu inconsistency
#153331 opened
May 10, 2025 -
[inductor][cutlass backend] Add 2 stage autotuning aka prescreening
#153335 opened
May 10, 2025 -
[HOP] Rework Autograd DispatchKey for scan and map
#153336 opened
May 10, 2025 -
[Dynamo][TVM] Check TVM existence and version
#153338 opened
May 11, 2025 -
CMake: update FindCUDAToolkit.cmake, use torch::nvtx3 if present, mod…
#153339 opened
May 11, 2025 -
Adding view and reduction tags
#153342 opened
May 11, 2025 -
[HOP, map] Rework of map autograd to the new interface
#153343 opened
May 11, 2025 -
[Ez][BE]: Make implicit subpackage explicit
#153347 opened
May 11, 2025 -
Fix loading sparse tensors with pinning check in fork context.
8000 #153348 opened
May 11, 2025 -
[DEBUG] REmove has CUDA
#153349 opened
May 11, 2025 -
devmate attempt multi kernel
#153353 opened
May 12, 2025 -
Add Vectorized FP8 E5M2
#153364 opened
May 12, 2025 -
[Inductor][CPP] Enable vectorized fp8 E5M2 quant dequant
#153365 opened
May 12, 2025 -
[ATen][CUDA][CUB] Implement changes to CCCL (CUB/Thrust/LibCUDACXX) usage in ATen
#153373 opened
May 12, 2025 -
Remove mut marker for fused_adagrad in native_functions.yaml
#153376 opened
May 12, 2025 -
nn: add DenseGeneral generalized linear layer
#153381 opened
May 12, 2025 -
basic compile support for grouped_mm
#153384 opened
May 12, 2025 -
[AOTI Debugging] Add Environment Variable to control output path
#153391 opened
May 12, 2025 -
[ONNX] Cast before calling Softmax when dtype is specified
#153393 opened
May 12, 2025 -
Print correct variable names in cuda.cmake
#153402 opened
May 12, 2025 -
Make precompilation timeout configurable via TORCHINDUCTOR_PRECOMPILATION_TIMEOUT_SECONDS environment variable.
#153403 opened
May 12, 2025 -
[DDP] rebuilt bucket order when find_unused_parameters=true
#153404 opened
May 12, 2025 -
Make test_create_graph_and_full_backward_hook_cycle more robust to unrelated warnings
#153407 opened
May 12, 2025 -
defer to aot eager instead of skip frame
#153409 opened
May 12, 2025 -
Enable accelerator to perform streaming backward
#153412 opened
May 12, 2025 -
[BE] Move `BUILD_AOT_INDUCTOR_TEST` to build stage
#153419 opened
May 12, 2025 -
[PT2][Optimus][fp8 compuation quantizatoin] Add fallback logic
#153430 opened
May 12, 2025 -
[WIP] test if short circuite is material
#153431 opened
May 12, 2025 -
introduce is_known _contiguous and use it for reshape and tensor meta data computation.
#153432 opened
May 12, 2025 -
[multigraph] add specialize_on kwarg to mark_{dynamic,unbacked}
#153433 opened
May 13, 2025 -
[dynamo][compile-time] Cache frame summaries
#153434 opened
May 13, 2025 -
[executorch hash update] update the pinned executorch hash
#153436 opened
May 13, 2025 -
use known_contiguous for _prim_elementwise_meta short circuit
#153441 opened
May 13, 2025 -
[WIP] suggest whitelist for dynamic shape recompilations
#153442 opened
May 13, 2025 -
[Typing] Refactor `torch.types.Device` in `torch/cuda/__init__.py`
#153447 opened
May 13, 2025 -
[not for review] benchmark script
#153448 opened
May 13, 2025 -
[multigraph] use specializations in compile_and_call_fx_graph
#153449 opened
May 13, 2025 -
[Monitoring] enable local logs and add mac test monitoring
#153454 opened
May 13, 2025 -
[Monitoring] enable rocm monitoring for trunk and general tests
#153455 opened
May 13, 2025 -
[Monitoring] Add util for linux build
#153456 opened
May 13, 2025 -
[cutlass backend] Reduce log level for cutlass runtime error
#153457 opened
May 13, 2025 -
Fix submodule recording in torch script prepare function
#153465 opened
May 13, 2025 -
[BE]: Enable RUFF TRY400 rule - log.exception
#153473 opened
May 13, 2025 -
Fix vs2022 caused AVX512 illegal instruction issue.
#153480 opened
May 13, 2025 -
test pr time
#153481 opened
May 13, 2025 -
Allow HOP-ifying out-of-tree functions in compile
#153487 opened
May 13, 2025 -
Clean PR: Replace _device_t with torch.types.Device and fix lint issues (#152952)
#153493 opened
May 13, 2025 -
[PP] Allow unused kwargs in ZB path
#153498 opened
May 13, 2025 -
[aoti] return a specific error code for sticky cuda errors
#153499 opened
May 13, 2025 -
[wip][ca][ddp] traceable C++ reducer
#153501 opened
May 13, 2025 -
[nativert] port semaphore to c10 util
#153504 opened
May 13, 2025 -
Add `flag _metrics_log_runtime` to disable runtime metric logging by default
#153506 opened
May 13, 2025 -
[dynamo, nested graph breaks] refactor codegen to minimize NULL codegen'ing
#153510 opened
May 14, 2025 -
[CD] Fix the libgomp twice load issue (#150084)
#153518 opened
May 14, 2025 -
Avoid calling fallback directly for symmetric memory tests
#153520 opened
May 14, 2025 -
[multigraph] fix composabilty with aotautograd cache
#153526 opened
May 14, 2025 -
[dynamo] replace `unimplemented` with `unimplemented_v2` in `variables/functions.py`
#153533 opened
May 14, 2025 -
[BE] Move static package info from `setup.py` to `pyproject.toml`
#153538 opened
May 14, 2025 -
[AUTOCAST] FEAT: Allow passing a `torch.device` object to autocast
#153539 opened
May 14, 2025 -
[Ez][BE]: Remove accidental classvar
#153540 opened
May 14, 2025 -
[BE]: Update CUTLASS submodule to 4.0.0rc
#153541 opened
May 14, 2025 -
[ROCm] update state check for test_trace_while_active*
#153545 opened
May 14, 2025 -
[TESTING] [DO NOT MERGE] Updated triton commit pin
#153548 opened
May 14, 2025 -
[SetSubclass] [wip] Add support for user defined sets
#153553 opened
May 14, 2025 -
[cuBLAS][cuBLASLt] Use cuBLAS default workspace size in Lt
#153556 opened
May 14, 2025 -
[PP] wip, allow grad to be None
#153557 opened
May 14, 2025 -
Improve torch.ops typing
#153558 opened
May 14, 2025 -
10000 Recheck autotune cache on static cuda launcher load
#153565 opened
May 14, 2025 -
[caffe2] Eliminate implicit calls to strlen when using the RECORD_FUNCTION macros
#153567 opened
May 14, 2025 -
[PT2][memory] add missing dependencies due to mutations
#153569 opened
May 14, 2025 -
Treat dim=[] same as dim=None
#153570 opened
May 14, 2025 -
Add getDeviceProperties api to torch mtia device
#153577 opened
May 14, 2025 -
Add torch.profile benchmarking function to feedback_fns
#153579 opened
May 14, 2025 -
Update CMake to latest in MacOS CI jobs
#153583 opened
May 15, 2025 -
[PT2][Optimus][Observability] Refactor the logging to avoid excessive tlparse log
#153584 opened
May 15, 2025 -
[Cutlass] Enable fusion with FusedSchedulerNodes
#153588 opened
May 15, 2025 -
Remove Caffe2_DEPENDENCY_INCLUDE
#153589 opened
May 15, 2025 -
[internal] Expose additional metadata to compilation callbacks
#153596 opened
May 15, 2025 -
Support fp8 output of _scaled_mm for CPU
#153600 opened
May 15, 2025 -
determine whether to round according to dtype
#153601 opened
May 15, 2025 -
support scaled mm on inductor
#153602 opened
May 15, 2025 -
fix a compilation issue when TORCH_XPU_ARCH_LIST is an empty string
#153604 opened
May 15, 2025 -
Fix missing module import graph_break_hints
#153609 opened
May 15, 2025 -
[ROCm] Prefer hipblaslt for gfx1200, gfx1201
#153610 opened
May 15, 2025 -
[Ez][BE] Make implicit subpackage explicit
#153613 opened
May 15, 2025 -
S390x update docker image
#153619 opened
May 15, 2025 -
Update rnn.py, fix `torch.nn.RNN` document error
#153620 opened
May 15, 2025 -
[Dynamo] Introduce hook receiving set of traced files
#153622 opened
May 15, 2025 -
Updates contextlib with ParamSpec
#153623 opened
May 15, 2025 -
[DEBUG] fsdp cpu_offload via sym_mem
#153628 opened
May 15, 2025 -
[BE]: Remove redundant copy
#153629 opened
May 15, 2025 -
[nativert] move recordfunction (#153088)
#153630 opened
May 15, 2025 -
Update serialization docs
#153631 opened
May 15, 2025 -
[ROCm] Improve vectorized elementwise kernel performance in MI300X
#153634 opened
May 15, 2025 -
Add torch/header_only_apis.txt and enforce they're tested
#153635 opened
May 15, 2025 -
[PyTorch][NCCL PG][Resubmit D67193887] Change getNCCLCommDumpMap to use new ncclCommDumpAll API
#153636 opened
May 15, 2025 -
[Release-Only] Make pull linux-jammy-py3.9-gcc11 green
#153639 opened
May 15, 2025 -
[FlexAttention] explicilty create grad_q w/ strides
#153641 opened
May 15, 2025 -
[JIT] Optimize DCE by storing a MemoryLocations for an entire set<Value*>
#153645 opened
May 15, 2025 -
[Torch] Fix error message formatting in fp8 comparison logic
#153647 opened
May 15, 2025 -
[executorch][codegen] support function + method variants.
#153651 opened
May 15, 2025 -
[triton][fb] Move build_paths into triton_utils
#153652 opened
May 15, 2025 -
[Distributed][CI] Rework continuous TestCase
#153653 opened
May 15, 2025 -
init
#153654 opened
May 15, 2025 -
[dynamo] raise observed exception for module attribute errors
#153659 opened
May 15, 2025 -
Fix: specializing symbols after runtime assertions added cause codegen issue
#153661 opened
May 15, 2025 -
[pytorch][triton] Enabling TMA for flex-attention for supported device types
#153662 opened
May 15, 2025 -
[refactor] extract create_resume_fn from create_call_resume_at
#153663 opened
May 15, 2025 -
Redirect mobile_optimizer.rst to executorch
#153664 opened
May 15, 2025 -
ci: add install_llvm_triton.sh to download prebuilt LLVM for Triton
#153665 opened
May 15, 2025 -
Fused RMSNorm implementation
#153666 opened
May 15, 2025 -
[c10d] Consolidate monitoring thread in PGNCCL
#153668 opened
May 15, 2025 -
[MTIA Aten Backend] Migrate "_unsafe_view" and "view" ops from out-of-tree to pytorch in-tree
#153670 opened
May 15, 2025 -
inductor codecache: include private inductor configs in cache key
#153672 opened
May 15, 2025 -
change guard_or impl for better perf and simplicity
#153674 opened
May 16, 2025 -
[cuBLASLt] relax `addmm` cuBLASLt constraint
#153675 opened
May 16, 2025 -
[SymmMem] Speed up tests
#153677 opened
May 16, 2025 -
[Draft][Just for CI VAL]Xpu flex attn ci test
#153680 opened
May 16, 2025 -
[XPU] [Windows] Auto turn on kineto XPU build when compiler version support.
#153681 opened
May 16, 2025 -
init
#153682 opened
May 16, 2025 -
Use latest CMake on Windows
#153683 opened
May 16, 2025 -
Use latest mkl-include and mkl-devel on Windows CI
#153684 opened
May 16, 2025 -
Fix some CMake issues
#153686 opened
May 16, 2025 -
Updated ONNX Opset Version to Support Attention Operator #153611
#153687 opened
May 16, 2025 -
Support boolean tensor for torch.fused_moving_avg_obs_fake_quant on CUDA
#153699 opened
May 16, 2025 -
Updated onnx->symbolic_opset23.py
#153702 opened
May 16, 2025 -
Use magma 2.9.0
#153703 opened
May 16, 2025 -
Integrated AMD AWS runners into Pytor 10000 ch CI
#153704 opened
May 16, 2025 -
[Torch][NJT] relax schema checks for ops that support more general NJTs
#153706 opened
May 16, 2025 -
[Tiling rewrite pt1] Normalize reads and writes to common iter space
#153723 opened
May 16, 2025 -
[don't merge] dummy pr
#153724 opened
May 16, 2025 -
Add option to statically launch user defined triton kernels
#153725 opened
May 16, 2025 -
[BE]: Try to improve decorator typing in torch._jit_internal
#153726 opened
May 16, 2025 -
[BE]: Improve typing in torch/modules/container.py
#153728 opened
May 16, 2025 -
Analyze coalesced mem
#153730 opened
May 16, 2025 -
[partitioner] Fix _broadcast_on_rank0 to use deterministic hash function
#153734 opened
May 16, 2025 -
Add missing arg descriptions for class RMSNorm(Module)
#153738 opened
May 16, 2025 -
[AOTI] Add a SlimTensor representation
#153739 opened
May 16, 2025 -
[ATen-CPU] Use `math.h` for GeLU as well as `cmath`
#153742 opened
May 16, 2025 -
[not for land] small compile-on-one-rank example
#153743 opened
May 16, 2025 -
[ONNX] Update onnx to 1.18
#153746 opened
May 16, 2025 -
Solve for tilings
#153748 opened
May 16, 2025 -
Patch the _is_conv_node function
#153749 opened
May 16, 2025 -
Update ExecuTorch pin to latest viable/strict 3/28/2025 (#150308)
#153750 opened
May 16, 2025 -
Incorporate coalesce analysis in codegen
#153751 opened
May 16, 2025 -
[Inductor] Construct subgraph with benchmarking args not example_inputs (#153667)
#153752 opened
May 16, 2025 -
[Inductor] Construct subgraph with benchmarking args not example_inputs
#153753 opened
May 16, 2025 -
[Inductor] Subgraph support dynamic input expressions
#153754 opened
May 16, 2025 -
[Inductor] Subgraph check output strides
#153755 opened
May 16, 2025 -
[to run ci] try globally import CUDATemplateCaller
#153758 opened
May 16, 2025 -
[amd] fix tunableop gemm
#153764 opened
May 16, 2025 -
[BE] Import CUDATemplateCaller non-lazily in select_algorithm.py
#153765 opened
May 16, 2025 -
convert inductor codecache to use getArtifactLogger
#153766 opened
May 16, 2025 -
[pytorch] Delete TorchScript based Android demo app and point user to ExecuTorch
#153767 opened
May 16, 2025 -
[aoti] fix corner case in unbacked replacements for atomically_apply_size_hint
#153768 opened
May 16, 2025 -
Don't upload compiler benchmark debug info to the benchmark database
#153769 opened
May 16, 2025 -
[Inductor][XPU] Fallback bmm to mm when batch == 1, align with cuda.
#153770 opened
May 17, 2025 -
RFC: Unbreak torch.is_vulkan_available() on Mac
#153771 opened
May 17, 2025 -
[dynamo, nested graph breaks] remove block stack graph break in output_graph
#153772 opened
May 17, 2025 -
[dynamo, nested graph breaks] add skip_frame debugging function
#153773 opened
May 17, 2025 -
Auto rewrite python if into IF + MERGE
#153774 opened
May 17, 2025
160 Issues closed by 43 people
-
[ROCm] sdpa group query attention bf16 numeric error
#139352 closed
May 16, 2025 -
Ignore this
#153759 closed
May 16, 2025 -
nn.Conv1d gives incorrect result for the last element on CUDA with PyTorch 2.7.0+cu128
#153698 closed
May 16, 2025 -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int64 (__main__.TestForeachCUDA)
#150392 closed
May 16, 2025 -
DISABLED test_comprehensive_sort_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152054 closed
May 16, 2025 -
DISABLED test_comprehensive_nn_functional_max_pool3d_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152053 closed
May 16, 2025 -
DISABLED test_comprehensive_native_layer_norm_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152056 closed
May 16, 2025 -
DISABLED test_comprehensive_pca_lowrank_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152318 closed
May 16, 2025 -
DISABLED test_comprehensive_bitwise_right_shift_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152057 closed
May 16, 2025 -
DISABLED test_comprehensive_floor_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152058 closed
May 16, 2025 -
DISABLED test_comprehensive_linalg_pinv_singular_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152793 closed
May 16, 2025 -
DISABLED test_comprehensive_cummin_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152672 closed
May 16, 2025 -
DISABLED test_comprehensive_index_select_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152416 closed
May 16, 2025 -
DISABLED test_comprehensive_rot90_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152796 closed
May 16, 2025 -
DISABLED test_comprehensive_polygamma_polygamma_n_0_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152469 closed
May 16, 2025 -
DISABLED test_comprehensive_polygamma_polygamma_n_1_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152470 closed
May 16, 2025 -
DISABLED test_comprehensive_repeat_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152500 closed
May 16, 2025 -
DISABLED test_comprehensive_lu_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152520 closed
May 16, 2025 -
DISABLED test_comprehensive_signal_windows_hamming_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152559 closed
May 16, 2025 -
DISABLED test_comprehensive_nansum_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152666 closed
May 16, 2025 -
DISABLED test_comprehensive_polygamma_polygamma_n_0_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152671 closed
May 16, 2025 -
DISABLED test_comprehensive_select_scatter_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152684 closed
May 16, 2025 -
DISABLED test_comprehensive_fliplr_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152797 closed
May 16, 2025 -
DISABLED test_comprehensive_sort_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152892 closed
May 16, 2025 -
DISABLED test_comprehensive___rmul___cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152851 closed
May 16, 2025 -
DISABLED test_comprehensive_nn_functional_conv3d_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152893 closed
May 16, 2025 -
DISABLED test_comprehensive_ormqr_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152943 closed
May 16, 2025 -
DISABLED test_comprehensive_rsub_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152996 closed
May 16, 2025 -
DISABLED test_comprehensive_triu_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152928 closed
May 16, 2025 -
DISABLED test_comprehensive_diagonal_copy_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152890 closed
May 16, 2025 -
DISABLED test_comprehensive_trunc_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#153046 closed
May 16, 2025 -
DISABLED test_comprehensive_asinh_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#153029 closed
May 16, 2025 -
DISABLED test_comprehensive_svd_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#153031 closed
May 16, 2025 -
DISABLED test_comprehensive_special_ndtri_cuda_int64 (__main__.TestInductorOpInfoCUDA)
#153047 closed
May 16, 2025 -
DISABLED test_comprehensive_unbind_copy_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152795 closed
May 16, 2025 -
DISABLED test_comprehensive_rot90_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152927 closed
May 16, 2025 -
DISABLED test_comprehensive_slice_scatter_cuda_bool (__main__.TestInductorOpInfoCUDA)
#152794 closed
May 16, 2025 -
DISABLED test_comprehensive_scatter_reduce_prod_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#153148 closed
May 16, 2025 -
Using int(shape) in export would result in silent specialization
#138853 closed
May 16, 2025 -
DISABLED test_resize_as_mps (__main__.GPUTests)
#153714 closed
May 16, 2025 -
DISABLED test_dynamic_shape_m_20480_k_5_n_2_should_decompose_True_has_bias_True (__main__.TestDecomposeMemMM)
#153732 closed
May 16, 2025 -
OptimizedModule __getattr__ may causes dead recursive call loop
#138157 closed
May 16, 2025 -
DISABLED test_dynamic_shape_m_20480_k_5_n_2_should_decompose_True_has_bias_False (__main__.TestDecomposeMemMM)
#153731 closed
May 16, 2025 -
DISABLED test_checkpointing_without_reentrant_dataparallel (__main__.TestAutogradWithCompiledAutograd)
#153608 closed
May 16, 2025 -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_int8 (__main__.TestForeachCUDA)
#150630 closed
May 16, 2025 -
DISABLED test_partitioning_with_view (__main__.MinCutPartitioningTests)
#145345 closed
May 16, 2025 -
DISABLED test_partitioning_unremat_bw (__main__.MinCutPartitioningTests)
#145343 closed
May 16, 2025 -
DISABLED test_torchvision_models_efficientnet_v2_l (__main__.TestVisionTracing)
#152632 closed
May 16, 2025 -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_float64 (__main__.TestForeachCUDA)
#150752 closed
May 16, 2025 -
DISABLED test_dtensor_seq_par_shard_dim_0 (__main__.MicroPipelineTPTest)
#145924 closed
May 16, 2025 -
DISABLED test_aoti_debug_printer_codegen_cuda (__main__.AOTInductorTestABICompatibleGpu)
#149080 closed
May 16, 2025 -
DISABLED test_sdpa_rewriter_14_cuda (__main__.SDPAPatternRewriterCudaTests)
#148391 closed
May 16, 2025 -
Long pause with distributed.tensor.distribute_tensor using B200 GPU
#153401 closed
May 16, 2025 -
[Intel GPU][XPU] torchinfo.summary implicitly transfer the model residing on XPU back to CPU
#153435 closed
May 15, 2025 -
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_complex128 (__main__.TestForeachCUDA)
#150933 closed
May 15, 2025 -
RuntimeError: "cuda_scatter_gather_base_kernel_func" not implemented for 'Float8_e4m3fn'
#153621 closed
May 15, 2025 -
`test_aoti_inference` is broken in CI at the moment
#153422 closed
May 15, 2025 -
[ROCm] [Upstream Triton] Use HIPAttrsDescriptor on ROCm to support emitting buffer operations
#139393 closed
May 15, 2025 -
Illegal Instruction Caused by `grid_sample` Under Windows
#152385 closed
May 15, 2025 -
[cudagraphs][HF][torch 2.7] Excessive cudagraph re-recording for HF LLM models
#152275 closed
May 15, 2025 -
`torch._dynamo.config.cache_size_limit` behaviour with DDP
#137081 closed
May 15, 2025 -
`torch.floor_divide` causes inconsistent precision
#153597 closed
May 15, 2025 -
DISABLED test_pending_fusion_pro_and_epi (__main__.TestPrologueFusion)
#152560 closed
May 15, 2025 -
DISABLED test_abs_cuda (__main__.TestInductorDynamicCUDA)
#137224 closed
May 15, 2025 -
DISABLED test_comprehensive_rsub_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#140179 closed
May 15, 2025 -
DISABLED test_graph_partition_reorder_cpu_and_gpu_interleave (__main__.CudaGraphTreeTests)6302 li>
#152561 closed
May 15, 2025 -
DISABLED test_comprehensive_nanquantile_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#139593 closed
May 15, 2025 -
DISABLED test_comprehensive_linalg_pinv_singular_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152059 closed
May 15, 2025 -
DISABLED test_comprehensive_scatter_reduce_prod_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#140003 closed
May 15, 2025 -
Large queue time for `macos-m2-15` instances
#153563 closed
May 15, 2025 -
[dynamo] Activation checkpointing tests erroring at runtime
#127115 closed
May 15, 2025 -
[inductor][cpu]performance regression in 2025-03-10 nightly release
#149116 closed
May 15, 2025 -
DISABLED test_comprehensive_nanmean_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#140339 closed
May 15, 2025 -
Can't kill compile_worker process
#153593 closed
May 15, 2025 -
`torch.nn.functional.interpolate` doesn't have `[source]`
#153591 closed
May 15, 2025 -
Investigate FlexAttention performance degradation on low precision inputs
#147336 closed
May 15, 2025 -
torchrun in environments without DNS support
#150532 closed
May 15, 2025 -
Restoring SequentialLR has undocumented side-effects on Optimizer
#119168 closed
May 15, 2025 -
DISABLED test_comprehensive_amin_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152558 closed
May 15, 2025 -
DISABLED AotInductorTest.UpdateUserManagedConstantsCuda (build.bin.test_aoti_inference)
#153496 closed
May 14, 2025 -
Better namings for triton fusion ops when a custom triton kernel is present?
#134706 closed
May 14, 2025 -
Softmax Decomp Causes Incorrect Gradients when Using `torch.compile` with `F.multi_head_attention_forward`
#152309 closed
May 14, 2025 -
Newly added lint-urls jobs are very flaky
#152439 closed
May 14, 2025 -
[NJT] NestedTensor repr has contiguous=True while the NJT isn't contiguous
#153237 closed
May 14, 2025 -
DISABLED AotInductorTest.BasicPackageLoaderTestCuda (build.bin.test_aoti_inference)
#152674 closed
May 14, 2025 -
[Intel GPU][PT2.8]scaled_dot_product_attention returns wrong output
#152290 closed
May 14, 2025 -
[PREEMPTIVE] Removal of `ephemeral` variants on `scale-config.yml`
#153468 closed
May 14, 2025 -
Loss parallel's override of log_softmax doesn't support negative dims
#152016 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_float32 (__main__.TestForeachCUDA)
#150747 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_uint8 (__main__.TestForeachCUDA)
#150662 closed
May 14, 2025 -
DISABLED test_foreach_l2_large_value_input__foreach_norm_cuda_bfloat16 (__main__.TestForeachCUDA)
#150467 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_int64 (__main__.TestForeachCUDA)
#150617 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_float64 (__main__.TestForeachCUDA)
#150562 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_float16 (__main__.TestForeachCUDA)
#150510 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#150668 closed
May 14, 2025 -
DISABLED test_foreach_l2_large_value_input__foreach_norm_cuda_float16 (__main__.TestForeachCUDA)
#150509 closed
May 14, 2025 -
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#150902 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_int8 (__main__.TestForeachCUDA)
#150837 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_bool (__main__.TestForeachCUDA)
#150680 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_int16 (__main__.TestForeachCUDA)
#150590 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_int32 (__main__.TestForeachCUDA)
#150602 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_int64 (__main__.TestForeachCUDA)
#150822 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_int32 (__main__.TestForeachCUDA)
#150800 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_bool (__main__.TestForeachCUDA)
#150468 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_uint8 (__main__.TestForeachCUDA)
#150878 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_int16 (__main__.TestForeachCUDA)
#150772 closed
May 14, 2025 -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_float16 (__main__.TestForeachCUDA)
#150712 closed
May 14, 2025 -
Torch index is missing hash for aarch64 wheels
#153469 closed
May 14, 2025 -
Pip-installed pytorch limits threads to 1 when setting GOMP_CPU_AFFINITY (likely due to bundled GOMP)
#149422 closed
May 14, 2025 -
Elastic training crashes on killed agent
#150916 closed
May 14, 2025 -
[torch/elastic] unexpected behavior of torch elastic
#147064 closed
May 14, 2025 -
Export doesn't work with patched forward
#153086 closed
May 13, 2025 -
multiple values for argument `softmax_scale`
#101603 closed
May 13, 2025 -
UNSTABLE inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper)
#152916 closed
May 13, 2025 -
DISABLED test_comprehensive_nansum_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#139710 closed
May 13, 2025 -
DISABLED test_input_moved_to_cuda_device_script (__main__.TensorPipeCudaRemoteModuleTest)
#152415 closed
May 13, 2025 -
[Pytorch 2.0] torch::nn::Dropout output is incorrect on Windows
#103056 closed
May 13, 2025 -
[Performance] `tensordot` has substantial overhead
#145731 closed
May 13, 2025 -
Remove PyTorch conda installation instructions from the documentation and tutorials
#149551 closed
May 13, 2025 -
Segmentation Fault in torch.lu_unpack() with bfloat16 Tensor
#153232 closed
May 13, 2025 -
OSS CI Infra Storm (Scenario 1 + 2) - May 7, 2025
#153068 closed
May 13, 2025 -
AOTAutograd support for torch.export targeting model *inference*
#153251 closed
May 13, 2025 -
`torch.quantize_per_channel` performs differently on cpu and cuda
#153341 closed
May 13, 2025 -
`torch.quantize_per_tensor` performs differently on cpu and cuda
#153340 closed
May 13, 2025 -
torch.compile on MPS fails: generated Metal kernel uses loop-local variable out of scope
#152155 closed
May 13, 2025 -
UNSTABLE pull / linux-jammy-py3-clang12-executorch / test (executorch)
#144480 closed
May 13, 2025 -
DISABLED test_comprehensive_std_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152673 closed
May 12, 2025 -
[RFC] Universal Device Context and Safe GPU/CPU Execution Decorators
#152679 closed
May 12, 2025 -
avoid falling back to as_strided for non-contiguous in-place reshape.
#152972 closed
May 12, 2025 -
`torch.ldexp` incorrectly returns infinity if `exp` is larger than log2 of the max representable number
#133265 closed
May 12, 2025 -
`torch.cumsum()` on bfloat16 tensor with dtype=torch.int8 produce inconsistent results between CPU and GPU
#153359 closed
May 12, 2025 -
[c10d] ProcessGroupNCCL cuda streams got merged in nightly
#153296 closed
May 12, 2025 -
DISABLED test_input_codegen_with_sympy_expr_xpu (__main__.AOTInductorTestABICompatibleGpu)
#153123 closed
May 12, 2025 -
Backport CVEs in Pytorch v2.1.2 from Pytorch v2.6.0
#153370 closed
May 12, 2025 -
Semi-Structured Sparsity unsupported for Windows
#125302 closed
May 12, 2025 -
DISABLED test_tensor_with_grad_to_scalar_warning (__main__.TestTorch)
#150273 closed
May 12, 2025 -
DISABLED test_comprehensive_scatter_xpu_bool (__main__.TestInductorOpInfoXPU)
#153009 closed
May 12, 2025 -
[ROCm] MI300X FP8 scaled_mm is extremely slow on nightly
#143465 closed
May 12, 2025 -
DISABLED test_comprehensive_scatter_xpu_int64 (__main__.TestInductorOpInfoXPU)
#153008 closed
May 12, 2025 -
DISABLED test_byte_tensor_assignment (__main__.TestAdvancedIndexing)
#137028 closed
May 12, 2025 -
DISABLED test_cdist_large_batch (__main__.TestMPS)
#92078 closed
May 12, 2025 -
DISABLED test_numpy_ref_mps_nn_functional_conv_transpose1d_mps_float32 (__main__.TestCommonMPS)
#87542 closed
May 12, 2025 -
DISABLED test_numpy_ref_mps_nn_functional_group_norm_mps_float32 (__main__.TestCommonMPS)
#90894 closed
May 12, 2025 -
`AutoModel.from_pretrained(...)` fails under `with torch.device("meta")` with PyTorch 2.7.0
#153332 closed
May 10, 2025 -
We should include where specialization happens when we throw a constraint violation error
#152918 closed
May 10, 2025
135 Issues opened by 74 people
-
DISABLED test_impl_device_cpu (__main__.TestCustomOp)
#153763 opened
May 16, 2025 -
DISABLED test_var_mean_tile_reduction_False_mps (__main__.GPUTests)
#153762 opened
May 16, 2025 -
DISABLED test_tensor_index_put_slice_mps (__main__.GPUTests)
#153761 opened
May 16, 2025 -
CUDA not found in NVIDIA runners
#153760 opened
May 16, 2025 -
Runtime assertions ignored in many cases
#153756 opened
May 16, 2025 -
DISABLED test_mean_mps (__main__.GPUTests)
#153747 opened
May 16, 2025 -
[RFC] [Feature] Intra-Device Heterogeneous Memory Allocation Support
#153745 opened
May 16, 2025 -
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats - PyTorch compile fails with Python 3.12
#153737 opened
May 16, 2025 -
DISABLED test_dynamic_shape_m_20480_k_5_n_2_should_decompose_True_has_bias_True (__main__.TestDecomposeMemMM)
#153736 opened
May 16, 2025 -
DISABLED test_dynamic_shape_m_20480_k_5_n_2_should_decompose_True_has_bias_False (__main__.TestDecomposeMemMM)
#153735 opened
May 16, 2025 -
[SOURCE] links for patch releases lead to 2.7.1
#153733 opened
May 16, 2025 -
JIT Script doesn't handle LLONG_MIN correctly
#153722 opened
May 16, 2025 -
DISABLED test_max_pool2d_with_indices_backward5_mps (__main__.GPUTests)
#153721 opened
May 16, 2025 -
DISABLED test_view_uint8_through_differing_bitwidths_mps (__main__.GPUTests)
#153720 opened
May 16, 2025 -
DISABLED test_shape_padding_mps (__main__.GPUTests)
#153719 opened
May 16, 2025 -
DISABLED test_topk_mps (__main__.GPUTests)
#153718 opened
May 16, 2025 -
DISABLED test_view_on_aliased_mps (__main__.GPUTests)
#153717 opened
May 16, 2025 -
DISABLED test_resize_as_mps (__main__.GPUTests)
#153716 opened
May 16, 2025 -
DISABLED test_xblock_divides_xnumel_mps (__main__.GPUTests)
#153715 opened
May 16, 2025 -
DISABLED test_lerp_mps (__main__.GPUTests)
#153713 opened
May 16, 2025 -
DISABLED test_where_with_logical_op_mps (__main__.GPUTests)
#153712 opened
May 16, 2025 -
DISABLED test_searchsorted_mps (__main__.GPUTests)
#153711 opened
May 16, 2025 -
DISABLED test_dtypeview_bfloat16_bfloat16_mps (__main__.GPUTests)
#153710 opened
May 16, 2025 -
DISABLED test_tensor1_mps (__main__.GPUTests)
#153709 opened
May 16, 2025 -
DISABLED test_zero_element_mutation_mps (__main__.GPUTests)
#153708 opened
May 16, 2025 -
DISABLED test_jacobian_vectorize_raises_no_warnings_logging_tensor (__main__.TestAutogradFunctional)
#153707 opened
May 16, 2025 -
Unable to export a model using scan with inplace modification
#153705 opened
May 16, 2025 -
Divergence of handling python del in dynamo vs eager
#153701 opened
May 16, 2025 -
`torch.nn.functional.conv_transpose2d` has inconsistent handling of `float16` overflow on CPU
#153700 opened
May 16, 2025 -
AOTInductor: Artifact compiled on A10 (SM_86) fails on H20 (SM_90) despite torch._inductor.config.cuda.arch="90"
#153697 opened
May 16, 2025 -
DISABLED test_dropout_trivial_1_mps (__main__.GPUTests)
#153695 opened
May 16, 2025 -
DISABLED test_div_zero_dim_mps (__main__.GPUTests)
#153694 opened
May 16, 2025 -
DISABLED test_tmp_not_defined_issue2_mps (__main__.GPUTests)
#153693 opened
May 16, 2025 -
DISABLED test_views1_mps (__main__.GPUTests)
#153692 opened
May 16, 2025 -
DISABLED test_var_correction_mps (__main__.GPUTests)
#153691 opened
May 16, 2025 -
Torch tensor can not register for RDMA,call ibv_reg_mr() failed: Bad address
#153688 opened
May 16, 2025 -
torch._higher_order_ops.scan incorrect/mismatched gradients for non-trailing layers with torch.compile
#153679 opened
May 16, 2025 -
Update FlexAttention TMA usage to TensorDescriptor when we bump Triton and remove BlockPtr Usage
#153678 opened
May 16, 2025 -
[FSDP2] revisit NCCL group coalescing vs copy-in/copy-out
#153673 opened
May 16, 2025 -
Make tlparse able to show a summary of distinct graph breaks
#153669 opened
May 15, 2025 -
Use opmath_t and not double compute in fused optimizers
#153649 opened
May 15, 2025 -
[dynamo, logging] Move extra graph_code logging to a verbose artifact
#153646 opened
May 15, 2025 -
DISABLED test_hessian_vectorize_raises_no_warnings_logging_tensor (__main__.TestAutogradFunctional)
#153644 opened
May 15, 2025 -
supporting dynamo compilation of end-to-end send/recv in distributed
#153642 opened
May 15, 2025 -
Can't run lintrunner locally on Windows machine
#153638 opened
May 15, 2025 -
[XPU] Kineto profiler fails on XPU with `PTI_ERROR_NOT_IMPLEMENTED`
#153632 opened
May 15, 2025 -
Non-negligible overhead of `OpOverloadPacket` dispatch w/o overload
#153626 opened
May 15, 2025 -
DISABLED test_dict_keys_match (__main__.TestGuardSerialization)
#153617 opened
May 15, 2025 -
DISABLED test_decompose_mm_cpu_m_1_k_64_n_32_should_decompose_False (__main__.TestDecomposeMemMM)
#153616 opened
May 15, 2025 -
[DTensor] Backward Pass Failure with 2D Sharded DTensor After Mean Reduction
#153615 opened
May 15, 2025 -
Add opentelemetry traces
#153614 opened
May 15, 2025 -
Update ONNX Opset Version to Support Attention Operator
#153611 opened
May 15, 2025 -
native/indexingUtils.h `AdvancedIndex` has internal linkage
#153606 opened
May 15, 2025 -
[dynamo] `aot_eager` can't process `try...except` when meeting `AttributeError`
#153605 opened
May 15, 2025 -
[DTensor] Scalar multiplication after reduction doesn't update result without calling .full_tensor() before
#153603 opened
May 15, 2025 -
`torch.export` (with `strict=True`) in torch 2.7 and 2.7.1 RC fails some test cases that work with torch 2.6
#153599 opened
May 15, 2025 -
[cuBLAS] relax the restrictions on the use of cublasLt
#153590 opened
May 15, 2025 -
do_bench_using_profiling fails with: Failed to divide all profiling events into #repeat groups.
#153587 opened
May 15, 2025 -
UT failure in test_decompose_mem_bound_mm.py for Inductor
#153585 opened
May 15, 2025 -
Documentation Preview and Build Not Reflecting Latest Changes
#153574 opened
May 14, 2025 -
torch.cuda.memory._record_memory_history(enabled=None) does not clean up previously added hooks
#153571 opened
May 14, 2025 -
torch.linalg.vector_norm regression in torch.compile in PT2.8
#153568 opened
May 14, 2025 -
Improve PyTorch Logging to Distinguish ROCm from CUDA
#153566 opened
May 14, 2025 -
Sum difference for equal channels of tensor
#153564 opened
May 14, 2025 -
Slicing of large tensors is wrong on MPS
#153560 opened
May 14, 2025 -
Updated Scaled_mm to support more scaling formats via CuBlas
#153555 opened
May 14, 2025 -
Set inplace operations are not updating the set inplace
#153552 opened
May 14, 2025 -
Make implicit packages (PEP420) explicit PyTorch
#153546 opened
May 14, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float64 (__main__.TestForeachCUDA)
#153544 opened
May 14, 2025 -
torch.set_ on a view does not sever view relation
#153542 opened
May 14, 2025 -
DISABLED test_parity__foreach_add_fastpath_outplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#153537 opened
May 14, 2025 -
`torch.utils.data.default_collate` raises misleading warning for read-only NumPy arrays
#153536 opened
May 14, 2025 -
libtorch c++: torch::split does not work on 1d tensors
#153535 opened
May 14, 2025 -
`pca_lowrank` on CUDA is significantly slower than on CPU
#153534 opened
May 14, 2025 -
[RFC] [Inductor] Custom pass registration interface
#153532 opened
May 14, 2025 -
Documentation: A possible mistake in the STFT formula
#153531 opened
May 14, 2025 -
DISABLED test_dict_contains (__main__.TestGuardSerialization)
#153530 opened
May 14, 2025 -
`flex attention + torch.compile`: works with torch 2.6 but fails with torch 2.7 / 2.7.1-RC
#153527 opened
May 14, 2025 -
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_uint8 (__main__.TestForeachCUDA)
#153525 opened
May 14, 2025 -
[CI][CUDA][Distributed] test_non_blocking_with_eager_init timeout
#153517 opened
May 14, 2025 -
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_int8 (__main__.TestForeachCUDA)
#153512 opened
May 14, 2025 -
Add MXFP8 Support to scaled_grouped_gemm
#153502 opened
May 13, 2025 -
NotImplementedError: Could not run 'aten::log' with arguments from the 'SparseCUDA' backend.
#153497 opened
May 13, 2025 -
DISABLED test_flop_counter_op_options1_cuda_float32 (__main__.TestSchedulerCUDA)
#153495 opened
May 13, 2025 -
DISABLED test_flop_counter_op_options1_cuda_float16 (__main__.TestSchedulerCUDA)
#153494 opened
May 13, 2025 -
at::Tag::needs_fixed_stride_order doesn't work with lists?
#153489 opened
May 13, 2025 -
Double backward error in Pytorch PP
#153485 opened
May 13, 2025 -
Pytorch PP requires all parameters to have grad in backward
#153484 opened
May 13, 2025 -
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_int64 (__main__.TestForeachCUDA)
#153482 opened
May 13, 2025 -
[CI][CUDA][Distributed] test_assert_nan_float16 unit test hangs with certain Host OS + CUDA KMD 570.133.07
#153479 opened
May 13, 2025 -
cpp wrapper calls back to python for custom op even when a C++ registration is made
#153478 opened
May 13, 2025 -
[BUG] `einops` is unsupported and break dynamo graph with torch 2.7
#153476 opened
May 13, 2025 -
User Triton Kernels Are not Serialized in Fx Graph Runnable
#153475 opened
May 13, 2025 -
Multihead Attention does not work with jagged tensors due to __torch_function__
#153472 opened
May 13, 2025 -
RFC for vector length agnostic SVE Vectorized class
#153471 opened
May 13, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float32 (__main__.TestForeachCUDA)
#153470 opened
May 13, 2025 -
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_int32 (__main__.TestForeachCUDA)
#153464 opened
May 13, 2025 -
Feedback about Class Tensor
#153463 opened
May 13, 2025 -
[ROCm][TunableOp] Contents of untuned csv files are ignored during offline tuning
#153462 opened
May 13, 2025 -
DISABLED test_mempool_ctx_multithread (__main__.TestMemPool)
#153460 opened
May 13, 2025 -
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_int16 (__main__.TestForeachCUDA)
#153440 opened
May 13, 2025 -
[Intel GPU][XPU] Slow DDP training using oneCCL backend
#153438 opened
May 13, 2025 -
torch._higher_order_ops.scan graph breaks and clamp error with compile/autograd
#153437 opened
May 13, 2025 -
[Export] Non-strict mode can't handle conditionals on tensor subclass types
#153429 opened
May 12, 2025 -
[export] torch.tensor constructor specializes on float value
#153411 opened
May 12, 2025 -
[Feature request] `torch.export` .save/.load could support `safetensors` and/or `weights_only=True`
#153410 opened
May 12, 2025 -
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_float64 (__main__.TestForeachCUDA)
#153395 opened
May 12, 2025 -
[inductor] Make precompilation_timeout_seconds into a config instead of hardcoded it as 3600
#153392 opened
May 12, 2025 -
Compile produces different result than eager for mutable custom op use case
#153389 opened
May 12, 2025 -
Pre-dispatch export doesn't work with non-param/buffer tensor subclasses
#153387 opened
May 12, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float16 (__main__.TestForeachCUDA)
#153379 opened
May 12, 2025 -
MPS backend silently ignores dimension mismatch
#153378 opened
May 12, 2025 -
Triton Compilation Error in Generated Code due to possible float division in index
#153375 opened
May 12, 2025 -
Hide getitems in Dynamo bytecode profiling
#153372 opened
May 12, 2025 -
Flex attention with NJT shape error
#153371 opened
May 12, 2025 -
Rank {rank} has different values for {name}: {scalar_tensor_value}.
#153369 opened
May 12, 2025 -
SAM2 image encoder incorrectness with compile(dynamic=True)
#153366 opened
May 12, 2025 -
Query Regarding Memory Release API in AOTInductor for PyTorch
#153363 opened
May 12, 2025 -
MANIFEST.in is not being used
#153361 opened
May 12, 2025 -
torch.aminmax reducing over all dims should not accept keepdim
#153360 opened
May 12, 2025 -
torch.dequantize result inconsistent on CPU and GPU
#153358 opened
May 12, 2025 -
FSDP2 "got mixed torch.Tensor and DTensor"
#153354 opened
May 12, 2025 -
[Inductor][Schedule][Fusion] Ops are not fused due to the incorrect score_fusion_memory.
#153346 opened
May 11, 2025 -
BlockMask.from_kv_blocks crashes with IndexError when kv_indices is not padded
#153344 opened
May 11, 2025 -
`torch.combinations` exhibits excessive memory usage and hangs for moderate `n` and `r` due to `n^r`
#153337 opened
May 11, 2025 -
Selective Activation Checkpointing on custom autograd.Function
#153334 opened
May 10, 2025 -
importing torch._dynamo under meta device fails
#153330 opened
May 10, 2025 -
Segmentation fault when converting sparse COO tensor with complex values to dense
#153329 opened
May 10, 2025 -
FPE when calling `torch.pixel_shuffle()` with empty tensors and large upscale_factor
#153327 opened
May 10, 2025 -
Segmentation fault when calling `torch.choose_qparams_optimized()` with empty tensors and extreme num_bins value
#153326 opened
May 10, 2025 -
[JIT] Compilation-induced discrepancy in F.instance_norm when passing input as running stats
#153315 opened
May 10, 2025
481 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[nativert] Move GraphSignature to pytorch core
#152969 commented on
May 16, 2025 • 21 new comments -
Update auto-tuning support for _scaled_grouped_mm
#150944 commented on
May 16, 2025 • 15 new comments -
[DLPack] Add support for missing keyword-arguments.
#150218 commented on
May 14, 2025 • 14 new comments -
Inductor logging + analysis of torch.profile
#149697 commented on
May 15, 2025 • 13 new comments -
auto functionalize base_hop
#151067 commented on
May 12, 2025 • 12 new comments -
Upgrade to DLPack 1.0.
#145000 commented on
May 14, 2025 • 12 new comments -
[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices
#152341 commented on
May 16, 2025 • 11 new comments -
Extract DeviceType to a standalone header file
#152787 commented on
May 16, 2025 • 10 new comments -
[Intel GPU] Enable mkdnn._linear_pointwise at XPU backend
#140365 commented on
May 15, 2025 • 9 new comments -
[CI][CUDA] Move cu118 distributed pull jobs to cu126, move cu124-sm75 to cu126-sm75
#151594 commented on
May 17, 2025 • 8 new comments -
[HOP] Mutation and alias rework
#146658 commented on
May 17, 2025 • 7 new comments -
Data dependent free reshape.
#153198 commented on
May 15, 2025 • 7 new comments -
Pattern matcher support for mutable ops with view inputs
#152776 commented on
May 13, 2025 • 6 new comments -
Adding XPU support to DTensor examples
#153213 commented on
May 15, 2025 • 6 new comments -
[dynamic shapes] rewrite slice_forward decomp with guard_or_false
#150474 commented on
May 15, 2025 • 6 new comments -
[Intel GPU][Inductor] Fallback embedding_dense_backward on XPU
#151637 commented on
May 17, 2025 • 6 new comments -
[fsdp] add an experimental allocator hook for buffers that participate in collective communication
#149150 commented on
May 16, 2025 • 5 new comments -
[Dynamo] Replace `unimplemented` with `unimplemented_v2` in `torch/_dynamo/variables/tensor.py`
#153146 commented on
May 15, 2025 • 5 new comments -
Fix `lr_scheduler` unexpectedly calls `step()` when init argument last_epoch is larger than -1
#149312 commented on
May 16, 2025 • 4 new comments -
add device generalisation support for distributed tests
#152471 commented on
May 13, 2025 • 4 new comments -
[nativert] Move file_util to pytorch core
#153162 commented on
May 15, 2025 • 4 new comments -
[MegaCache] Make MegaCache generic to allow external plugins registration
#152977 commented on
May 16, 2025 • 4 new comments -
[CUDA][cuBLASLt] Respect `allow[FP16/BF16]ReductionCuBLAS` in `cuBLASLt`
#153095 commented on
May 17, 2025 • 3 new comments -
Fix DLPack stream logic.
#150217 commented on
May 14, 2025 • 3 new comments -
[BE] Ensure generated stub files by `gen_pyi` are properly formatted
#150730 commented on
May 17, 2025 • 3 new comments -
[associative_scan] Autograd separated
#139939 commented on
May 17, 2025 • 3 new comments -
[hop_schema] support gen_schema for invoke_subgraph
#152984 commented on
May 13, 2025 • 3 new comments -
[Cutlass] E2E Tests for EVT
#152815 commented on
May 17, 2025 • 3 new comments -
[ROCm][CI] Update dockerfile to use centos9
#151929 commented on
May 16, 2025 • 3 new comments -
softmax: add device check for xpu with half_to_float
#150278 commented on
May 15, 2025 • 3 new comments -
[c10d] Add support for testing SIGABRT return
#153167 commented on
May 13, 2025 • 3 new comments -
Support independent builds for cpp extension tests + apply to libtorch_agnostic tests
#153264 commented on
May 15, 2025 • 3 new comments -
[Intel GPU] scalar tensor case handling in addmm, baddmm
#153051 commented on
May 17, 2025 • 2 new comments -
[BE] Resolve lint errors in `.pyi` stub files
#150731 commented on
May 17, 2025 • 2 new comments -
Remove redundant type aliases of _device_t for torch.Device (#152952)
#153007 commented on
May 13, 2025 • 2 new comments -
[DLPack] add NumPy exchange tests.
#150216 commented on
May 14, 2025 • 2 new comments -
[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions
#153101 commented on
May 15, 2025 • 2 new comments -
[jit] DeadCodeEliminator Mark(block) improvement
#152348 commented on
May 13, 2025 • 2 new comments -
Make require_contiguous require exact strides instead of stride order
#148424 commented on
May 13, 2025 • 2 new comments -
Add path used by pip's build isolation procedure to DLL search
#150013 commented on
May 12, 2025 • 2 new comments -
Basic utilities to support remote autotuning
#153201 commented on
May 13, 2025 • 2 new comments -
[export] Move PT2 constants to torch::_export
#153206 commented on
May 17, 2025 • 2 new comments -
Remove Conda Instructions
#152546 commented on
May 12, 2025 • 2 new comments -
Work around MPSGraph issue in backward pass of nn.ReplicationPad1d/2d
#152094 commented on
May 16, 2025 • 2 new comments -
Make `Adam`, `AdamW` work with nonzero-dim Tensor betas
#149939 commented on
May 15, 2025 • 2 new comments -
Cache code generation during triton template expansion and enable it for mm_template.
#151773 commented on
May 16, 2025 • 2 new comments -
Re-enable link linter
#153280 commented on
May 12, 2025 • 2 new comments -
[Inductor] Restrict block analysis to only match integer dims and strides
#149615 commented on
May 12, 2025 • 1 new comment -
[Inductor] Pattern matcher support for mutable ops with non-view inputs
#152775 commented on
May 13, 2025 • 1 new comment -
[BE][CI][Easy] Run `lintrunner` on generated `.pyi` stub files
#150732 commented on
May 17, 2025 • 1 new comment -
[AOTI] Embed cubin files into .so
#150739 commented on
May 16, 2025 • 1 new comment -
[export] add runtime assert messages to python torch checks (#150719)
#152455 comment 10000 ed on
May 15, 2025 • 1 new comment -
autograd: Add VJP and JVP rules for aten::aminmax
#151186 commented on
May 14, 2025 • 1 new comment -
Fix `MaskedTensor` to device ignored mask
#151205 commented on
May 12, 2025 • 1 new comment -
[ROCm] improve sparse addmm, enable complex
#153262 commented on
May 16, 2025 • 1 new comment -
Add assertion to align with cuda
#153233 commented on
May 15, 2025 • 1 new comment -
[ARM] Add test_memory_profiler to aarch64 tests
#145260 commented on
May 14, 2025 • 1 new comment -
Add option to define OpenBLAS version for manylinux Dockerfile_2_28_aarch64
#150106 commented on
May 12, 2025 • 1 new comment -
[ROCm][Windows] Fix building torch 2.8 wheel with ROCm (added hipblasLt and rocblas directories)
#153144 commented on
May 16, 2025 • 1 new comment -
Parallelize sort using libstdc++ parallel mode
#150195 commented on
May 13, 2025 • 1 new comment -
fix slice w/ dynamic shapes
#153131 commented on
May 15, 2025 • 1 new comment -
[WIP][dynamic shapes] mark backed size symbols as size-like
#146335 commented on
May 16, 2025 • 1 new comment -
Deprecate DataLoader pin_memory_device param
#146821 commented on
May 15, 2025 • 1 new comment -
[inductor] Fix block ptr store if input is constant
#148679 commented on
May 10, 2025 • 1 new comment -
removed zero dim cpu logic from fake_tensor.py
#147501 commented on
May 15, 2025 • 1 new comment -
Raise `BufferError` for DLPack buffer-related errors.
#150691 commented on
May 14, 2025 • 1 new comment -
[Intel GPU] OneDNN primitive cache support for Int4 WOQ gemm on XPU
#147693 commented on
May 15, 2025 • 1 new comment -
Random Batch Sampler Speedup
#147706 commented on
May 15, 2025 • 1 new comment -
[DO NOT MERGE] [TRITON] Test enablement of buffer ops in AMD triton
#149041 commented on
May 12, 2025 • 0 new comments -
Support int step for nonfused optimizer
#148956 commented on
May 10, 2025 • 0 new comments -
WIP heuristic choices part 2
#148947 commented on
May 10, 2025 • 0 new comments -
Move token linter code into tools/linter/adaptors/_linter/
#148959 commented on
May 12, 2025 • 0 new comments -
fix untyped decorator lints
#149055 commented on
May 12, 2025 • 0 new comments -
[do-not-land] test eval_frame changes
#149066 commented on
May 12, 2025 • 0 new comments -
[do-not-land] test decorator changes
#149067 commented on
May 12, 2025 • 0 new comments -
[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging
#148981 commented on
May 16, 2025 • 0 new comments -
bootcamp task for DTensor
#148932 commented on
May 12, 2025 • 0 new comments -
[RFC][BE] assume error checking is on by default (#141914)
#148900 commented on
May 11, 2025 • 0 new comments -
Mark auto_functionalized HOPs as cacheable (#151194)
#153304 commented on
May 15, 2025 • 0 new comments -
Register flop formulas for flex attention
#149366 commented on
May 17, 2025 • 0 new comments -
Add x86-simd-sort accelerated sorting
#149362 commented on
May 13, 2025 • 0 new comments -
[dtensor][tp] debug test_layer_norm_bwd_req_grad timeout when #GPU=3
#149355 commented on
May 17, 2025 • 0 new comments -
[Partition] Fix flaky
#149348 commented on
May 17, 2025 • 0 new comments -
[DCP][Draft] Checkpoint daemon process fixes
#149341 commented on
May 16, 2025 • 0 new comments -
Expose GIL and GC events in profiler traces
#149329 commented on
May 16, 2025 • 0 new comments -
[BE]: Update mypy to 1.15
#149326 commented on
May 16, 2025 • 0 new comments -
add support for numpy
#149288 commented on
May 16, 2025 • 0 new comments -
added documentation for masked_fill and masked_fill_
#149285 commented on
May 16, 2025 • 0 new comments -
[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100`
#149282 commented on
May 14, 2025 • 0 new comments -
Make Subset dataset a true wrapper
#149272 commented on
May 15, 2025 • 0 new comments -
Fix multiprocessing with CUDA_VISIBLE_DEVICES seems to give the wrong device
#149248 commented on
May 16, 2025 • 0 new comments -
Ensure conj_physical always does a physical conjugation
#149226 commented on
May 14, 2025 • 0 new comments -
[CI INFRA TEST] Test experiment for ephemeral runners
#149192 commented on
May 13, 2025 • 0 new comments -
[Easy] update pip sources for CUDA in nightly pull tool
#149143 commented on
May 16, 2025 • 0 new comments -
[CI] Move ASAN jobs to clang-18
#149099 commented on
May 13, 2025 • 0 new comments -
[do-not-land] add tests
#149068 commented on
May 12, 2025 • 0 new comments -
Custom ops support arbitrary input types by migrating to python dispatcher
#147927 commented on
May 16, 2025 • 0 new comments -
[CI] add missing matrix cases for `pytorch-linux-focal-py{3.12,3.13}-clang10`
#147882 commented on
May 12, 2025 • 0 new comments -
Bf16 fused adam(W)
#147653 commented on
May 14, 2025 • 0 new comments -
torch.sort: Optimize memory usage with (dtype_indices: ScalarType, dynamic_indices_dtype: bool) options
#147629 commented on
May 11, 2025 • 0 new comments -
[test] sccache log
#147470 commented on
May 17, 2025 • 0 new comments -
[ONNX] Migrate onnx ops decomp functions
#147469 commented on
May 13, 2025 • 0 new comments -
Ensure conj/neg flags are set in destination for CUDA->CPU copies
#147231 commented on
May 14, 2025 • 0 new comments -
10000 Record the XPU and XCCL build settings in the compiled binary
#147161 commented on
May 17, 2025 • 0 new comments -
Optimize LRScheduler docs
#146684 commented on
May 12, 2025 • 0 new comments -
Fix one_hot inconsistent errors after compile
#146466 commented on
May 14, 2025 • 0 new comments -
Add optional generator to distribution sampler/rsample methods.
#146333 commented on
May 15, 2025 • 0 new comments -
[Trace PyDispatcher] Capture Vmapped autograd function as graph
#146288 commented on
May 13, 2025 • 0 new comments -
Format tests by PYFMT
#146267 commented on
May 14, 2025 • 0 new comments -
Fix support for nccl < 2.17
#145719 commented on
May 10, 2025 • 0 new comments -
[Easy] update pip sources for ROCm in nightly pull tool
#145685 commented on
May 16, 2025 • 0 new comments -
removed check for ConvTranspose3D on MPS
#145366 commented on
May 14, 2025 • 0 new comments -
[BE]: Apply ruff PERF401 to torch
#145153 commented on
May 16, 2025 • 0 new comments -
Update fbgemm_gpu pin
#144905 commented on
May 16, 2025 • 0 new comments -
Enable CPP Extension Open Registration tests on Arm
#144774 commented on
May 13, 2025 • 0 new comments -
[dynamo, nested graph breaks] add nested graph break tests
#144516 commented on
May 17, 2025 • 0 new comments -
[Draft][WIP] Enable XPU path for FlexAttention
#143553 commented on
May 16, 2025 • 0 new comments -
Remove __ubsan_ignore_undefined__
#143252 commented on
May 13, 2025 • 0 new comments -
Enable AArch64 CI scripts to be used for local dev
#143190 commented on
May 15, 2025 • 0 new comments -
Use reclaim for temporary python tensor creation
#143174 commented on
May 15, 2025 • 0 new comments -
[1/N] Enable clang-tidy on caffe2/serialize/
#141849 commented on
May 14, 2025 • 0 new comments -
`has_triton`: Use the device interface for detecting Triton availability
#139171 commented on
May 10, 2025 • 0 new comments -
[DRAFT] make reshape work for reshapeing 1dim unbacked non-contig to anything
#148899 commented on
May 16, 2025 • 0 new comments -
FSDP: use Work.wait instead of event for all reduce
#148780 commented on
May 11, 2025 • 0 new comments -
cpp_wrapper: build non-performance-sensitive code at O1
#148773 commented on
May 17, 2025 • 0 new comments -
Re-introduce -Wmaybe-uninitialized
#148760 commented on
May 11, 2025 • 0 new comments -
[DRAFT][Reshape] Guard-free reshape for contiguous tensors to avoid data dependent errors.
#148742 commented on
May 15, 2025 • 0 new comments -
[WIP] backed_size_oblivious=True for export
#148731 commented on
May 13, 2025 • 0 new comments -
[AOTInductor] Codegen fix
#148664 commented on
May 16, 2025 • 0 new comments -
[inductor] lowering for fractional_max_pool3d
#148630 commented on
May 15, 2025 • 0 new comments -
Adjust CMake code for Eigen
#148628 commented on
May 14, 2025 • 0 new comments -
fix 142457 , fixes double free corruption by adding TORCH_CHECK to ensure weights have the proper size
#148620 commented on
May 15, 2025 • 0 new comments -
[dynamo] Don't affect stack traces under TORCHDYNAMO_DISABLE
#148618 commented on
May 16, 2025 • 0 new comments -
[BE][pytree] cleanup parameterized pytree tests
#148569 commented on
May 16, 2025 • 0 new comments -
Fix clang-tidy bugprone* warnings
#148529 commented on
May 11, 2025 • 0 new comments -
Implement fast access to individual elements of jagged nested tensors
#148497 commented on
May 12, 2025 • 0 new comments -
[triton hash update] update the pinned triton hash
#148492 commented on
May 17, 2025 • 0 new comments -
Suppress more warnings
#148488 commented on
May 16, 2025 • 0 new comments -
Demote logger of runtime_asserts_frozen to be fired only on debug mode
#148485 commented on
May 16, 2025 • 0 new comments -
[BE][pytree] rename argument name in register function to match the type annotations: `*_fn -> *_func`
#148484 commented on
May 16, 2025 • 0 new comments -
[BE][pytree] rename `NodeDef` member to match the type annotations: `*_fn -> *_func`
#148474 commented on
May 16, 2025 • 0 new comments -
Enable `_lazy_clone` between CPU and MPS
#148408 commented on
May 16, 2025 • 0 new comments -
[pytree] simplify public API exposition with `__module__`
#148328 commented on
May 16, 2025 • 0 new comments -
Make require_contiguous require exact strides instead of stride order
#148235 commented on
May 13, 2025 • 0 new comments -
[pytree] add another simplified pytree module `torch.pytree`
#148180 commented on
May 16, 2025 • 0 new comments -
use identity op for alpha=inf in torch.celu and quantized_celu
#148066 commented on
May 12, 2025 • 0 new comments -
torch.utils.checkpoint preserves torch function mode stack during recompute
#148023 commented on
May 13, 2025 • 0 new comments -
Bump Protobuf to 6.31.0
#147963 commented on
May 17, 2025 • 0 new comments -
Allow zero sized dimensions in padding operations
#153037 commented on
May 13, 2025 • 0 new comments -
Adding a generic attribute for easier checkpoint discrepancy debugging.
#153021 commented on
May 16, 2025 • 0 new comments -
[WIP][Inductor-CPU] int8 WoQ concat linear
#153004 commented on
May 16, 2025 • 0 new comments -
[FrozenSet] Fixes for FrozenSet
#152991 commented on
May 17, 2025 • 0 new comments -
[hop_schema] add HopSchemaGenerator to make it easier to create hop schema
#152974 commented on
May 13, 2025 • 0 new comments -
Follow up to #152209, remove compat patch after docker image rename
#152958 commented on
May 16, 2025 • 0 new comments -
[dtensor] add privateuse1 SDPA op support to DTensor
#152949 commented on
May 12, 2025 • 0 new comments -
Upgrade to CUDA 12.8.1 for nightly binaries
#152923 commented on
May 13, 2025 • 0 new comments -
[SDPA] Add testing to ensure stride order exactly matches
#152894 commented on
May 14, 2025 • 0 new comments -
Add memory reporting for XPU to Memory Profiler
#152842 commented on
May 15, 2025 • 0 new comments -
[precompile] Add BundledAOTAutogradCacheEntry
#152840 commented on
May 12, 2025 • 0 new comments -
[BE]: Update cudnn to 9.9 for cu128
#152782 commented on
May 15, 2025 • 0 new comments -
[export][cond] support merging constant ints as unbacked symint
#152742 commented on
May 13, 2025 • 0 new comments -
Fix signature of torch.sparse_coo_tensor()
#152681 commented on
May 16, 2025 • 0 new comments -
Added documentation for nonzero_static function (#152347)
#152669 commented on
May 12, 2025 • 0 new comments -
Re-enable FakeTensor caching for SymInts
#152662 commented on
May 15, 2025 • 0 new comments -
cleanup, refactor and add missing self._dde_suppressed checks
#152657 commented on
May 17, 2025 • 0 new comments -
[pytree] make `tree_*` functions accept both Python and C++ `PyTreeSpec`
#152624 commented on
May 16, 2025 • 0 new comments -
[c10d][gloo] Integrate vendor generic FR into gloo
#152614 commented on
May 12, 2025 • 0 new comments -
Makefile: refactor build, setup and lint rules
#152611 commented on
May 15, 2025 • 0 new comments -
[Environment Variable] Use thread-safe getenv functions
#152609 commented on
May 13, 2025 • 0 new comments -
[2/N] Use std::filesystem
#152586 commented on
May 11, 2025 • 0 new comments -
Use swap_tensors path in nn.Module.to for all subclasses that override __torch_dispatch__
#152539 commented on
May 15, 2025 • 0 new comments -
test timm_efficientnet pass
#153290 commented on
May 13, 2025 • 0 new comments -
Enable manywheel build and smoke test on main branch for ROCm
#153287 commented on
May 12, 2025 • 0 new comments -
[WIP] add envvar to bisect number of graphs compiled
#153275 commented on
May 12, 2025 • 0 new comments -
Make python_agnostic cpp extension tests standalone
#153274 commented on
May 15, 2025 • 0 new comments -
[CUDA] Allow cuDNN or flash attn in `test_activation_checkpointing` pattern match check
#153272 commented on
May 13, 2025 • 0 new comments -
[nocommit] bundled autograd cache test
#153269 commented on
May 12, 2025 • 0 new comments -
fix dtensor and tensor inconsistent compute mesh
#153268 commented on
May 15, 2025 • 0 new comments -
More descriptive error message for torch.nanmean() with complex dtypes
#153252 commented on
May 13, 2025 • 0 new comments -
Fixed an issue with XPU skip so the test_decompose_mem_bound_mm.py suite can be ran correctly
#153245 commented on
May 16, 2025 • 0 new comments -
Fix integer overflow bug in triu/tril for large diagonal values
#153240 commented on
May 15, 2025 • 0 new comments -
[WIP][Intel GPU][CI] Acceptance test for OneDNN v3.8.0 upgrading [DONT MERGE]
#153228 commented on
May 15, 2025 • 0 new comments -
[FSDP] Enable async collectives in FSDP with MPI backend for compute/comm and comm/comm overlap
#153215 commented on
May 13, 2025 • 0 new comments -
[ONNX] Update decomposition logic to loop over onnx registry
#153168 commented on
May 17, 2025 • 0 new comments -
Add a (t * 0) pattern
#153161 commented on
May 12, 2025 • 0 new comments -
Use 3.27 as the minimum CMake version
#153153 commented on
May 16, 2025 • 0 new comments -
[Set] [wip] Support sets in VariableBuilder
#153150 commented on
May 17, 2025 • 0 new comments -
[ROCm][CI] Update build-environment for mi300 workflows
#153134 commented on
May 15, 2025 • 0 new comments -
Use std::fma for CUDA Adam kernel's lerps.
#153097 commented on
May 16, 2025 • 0 new comments -
[WIP][XPU] Update Triton commit
#153096 commented on
May 16, 2025 • 0 new comments -
Delete .github/workflows/docker-cache-mi300.yml
#153075 commented on
May 11, 2025 • 0 new comments -
[inductor] Fix #153071
#153073 commented on
May 12, 2025 • 0 new comments -
Update fbgemm pinned version
#153072 commented on
May 17, 2025 • 0 new comments -
[Dynamo] Replace `unimplemented` with `unimplemented_v2` in `torch/_dynamo/variables/misc.py` [2/2]
#153039 commented on
May 14, 2025 • 0 new comments -
Use std::apply for CPU code
#152526 commented on
May 11, 2025 • 0 new comments -
Enable skipIfXpu to support class-level skipping
#151420 commented on
May 13, 2025 • 0 new comments -
ROCm mx-fp4 Support
#151360 commented on
May 13, 2025 • 0 new comments -
Fix skipIfXpu and skipIfHpu disables tests when used on class
#151315 commented on
May 15, 2025 • 0 new comments -
fix sympy FloorToInt when compile
#151185 commented on
May 16, 2025 • 0 new comments -
[MPS] Get Vmap to work with mps backend
#151177 commented on
May 13, 2025 • 0 new comments -
add sbgemv dispatch in torch cpu flash attention
#151108 commented on
May 14, 2025 • 0 new comments -
[ONNX] Support float4
#151069 commented on
May 15, 2025 • 0 new comments -
[dynamo, nested graph breaks] small fixes to resume function generation
#151056 commented on
May 17, 2025 • 0 new comments -
Pin all root requirements to major versions
#150833 commented on
May 14, 2025 • 0 new comments -
[inductor] Clean typing in codegen/common.py and codecache.py
#150767 commented on
May 16, 2025 • 0 new comments -
Avoid overwriting COW data in MPS code
#150721 commented on
May 16, 2025 • 0 new comments -
Support XPU in memory tracker
#150703 commented on
May 14, 2025 • 0 new comments -
Enable lazy cloning in `Tensor.to` between CPU and MPS
#150569 commented on
May 16, 2025 • 0 new comments -
[dynamic shapes] stop writing Max(*, 1) for strides
#150376 commented on
May 16, 2025 • 0 new comments -
Add reverse engineered code to iOS build
#150326 commented on
May 16, 2025 • 0 new comments -
Relax njt x njt to dense matmul reduction checks
#150172 commented on
May 12, 2025 • 0 new comments -
AOTI freezing: fix test issues and enable by default
#149961 commented on
May 17, 2025 • 0 new comments -
[inductor] Add typing to _inductor/ir.py
#149958 commented on
May 13, 2025 • 0 new comments -
Enable XPU distributed test for PT2.8
#149916 commented on
May 16, 2025 • 0 new comments -
Refactoring FSDP2 (_composable/fsdp) test cases to be device agnostic
#149848 commented on
May 16, 2025 • 0 new comments -
[Inductor] Adjust boundary checking of dimensions using YBLOCK
#149504 commented on
May 15, 2025 • 0 new comments -
Fix `SequentialLR` deprecate warning about invoke `step(epoch)`
#149392 commented on
May 14, 2025 • 0 new comments -
test if free chunk
#149371 commented on
May 17, 2025 • 0 new comments -
[export] Refactor pt2 save/load
#152495 commented on
May 13, 2025 • 0 new comments -
[Inductor][CPP] Enable vectorized fp8 quant dequant
#152418 commented on
May 13, 2025 • 0 new comments -
Add Vectorized FP8 E4M3
#152417 commented on
May 13, 2025 • 0 new comments -
[Accelerator] Fix Python typing in accelerator
#152394 commented on
May 13, 2025 • 0 new comments -
[FP8][CUTLASS] xFail `honor_sm_carveout` on `sm100`
#152378 commented on
May 13, 2025 • 0 new comments -
[inductor][dynamo] Include operator name in size/stride/alignment assertion
#152353 commented on
May 16, 2025 • 0 new comments -
Enable the AMP precision with freezing for CPU nightly test
#152298 commented on
May 15, 2025 • 0 new comments -
[DTensor] enable SimpleFSDP's composability with Tensor Parallel
#152286 commented on
May 14, 2025 • 0 new comments -
[CI] Add xpu inductor test into periodic workflow
#152281 commented on
May 14, 2025 • 0 new comments -
Add dynamo config to HOP-ify context managers
#152159 commented on
May 15, 2025 • 0 new comments -
Add AC_TRACER Infra TorchDispatchMode key
#152158 commented on
May 15, 2025 • 0 new comments -
Update _torch_docs.py to Fix torch.bernoulli()
#152104 commented on
May 16, 2025 • 0 new comments -
Switch to standard pep517 sdist generation
#152098 commented on
May 16, 2025 • 0 new comments -
Inductor Tiling Rewrite
#151958 commented on
May 16, 2025 • 0 new comments -
Enable type promotions in slice_scatter (pytorch#147842)
#151911 commented on
May 12, 2025 • 0 new comments -
update get start xpu
#151886 commented on
May 13, 2025 • 0 new comments -
[reland][ROCm] remove caffe2 from hipify
#151845 commented on
May 15, 2025 • 0 new comments -
Add a custom profiler configuration option
#151656 commented on
May 16, 2025 • 0 new comments -
Add device agnostic support for distributed tests
#151560 commented on
May 15, 2025 • 0 new comments -
Update OpenBLAS commit
#151547 commented on
May 15, 2025 • 0 new comments -
[cp] dispatch flex_attention to CP impl in TorchDispatchMode
#151497 commented on
May 16, 2025 • 0 new comments -
Allow to byteswap data when reading saved torch jit data
#151447 commented on
May 13, 2025 • 0 new comments -
Implement fast exp for AVX2 and AVX512 for the flash attention
#151441 commented on
May 14, 2025 • 0 new comments -
DISABLED AotInductorTest.BasicTestCpu (build.bin.test_aoti_inference)
#152889 commented on
May 13, 2025 • 0 new comments -
[c10d][nccl][cuda] Regression (unspecific cuda launch error) with test_c10d_nncl
#136390 commented on
May 13, 2025 • 0 new comments -
[FSDP2] avoid GPU OOM for reshard_after_forward=int with shared post_forward_mesh
#153302 commented on
May 13, 2025 • 0 new comments -
`cuda.Event` handling in dynamo is broken
#153058 commented on
May 13, 2025 • 0 new comments -
DTensor placement propagation for `slice` fails during recompile due to SymInts
#152954 commented on
May 13, 2025 • 0 new comments -
Unusually slow draft_export time
#152337 commented on
May 13, 2025 • 0 new comments -
[CI] MacOS15-M2 runners are unstable
#149999 commented on
May 13, 2025 • 0 new comments -
[inductor] cudagraph error for individually compiled transformer blocks
#152887 commented on
May 13, 2025 • 0 new comments -
export: `tensor.view()` fails with dynamic shapes.
#153174 commented on
May 13, 2025 • 0 new comments -
Export + autocast is eating the exception
#153202 commented on
May 13, 2025 • 0 new comments -
Export shouldn't run TS under the hood.
#153260 commented on
May 13, 2025 • 0 new comments -
non-strict export should detect fake tensor leakage
#153062 commented on
May 13, 2025 • 0 new comments -
[RFC] Add a SlimTensor representation to help AOTInductor generate standalone binaries
#153242 commented on
May 13, 2025 • 0 new comments -
triggered internal assert in matmul
#153172 commented on
May 13, 2025 • 0 new comments -
Reversing along a dimension, similarly to numpy
#95160 commented on
May 13, 2025 • 0 new comments -
[feature request] Global GPU Flag
#7535 commented on
May 13, 2025 • 0 new comments -
Assertion Failure: TestBinaryUfuncsCPU.test_lerp_cpu_complex64 on Graviton 3
#146155 commented on
May 13, 2025 • 0 new comments -
Fused Linear and Cross-Entropy Loss `torch.nn.functional.linear_cross_entropy`
#124480 commented on
May 13, 2025 • 0 new comments -
`SymInt` input doesn't get optimized out from `torch.compiled()` graph even if unused
#108446 commented on
May 13, 2025 • 0 new comments -
Memory leak in torch.save
#149846 commented on
May 13, 2025 • 0 new comments -
Pipeline Parallelism Fails when stage input does not produce gradients in all stages.
#152827 commented on
May 13, 2025 • 0 new comments -
[inductor] [silent incorrectness] [dtype processing] `torch.clamp` can't implicitly covert `int64`
#151744 commented on
May 13, 2025 • 0 new comments -
Floating Point exception in Convolution with disabled SMT
#153139 commented on
May 13, 2025 • 0 new comments -
[RFC] Proposed Changes to Feature Tracking & Classification for PyTorch Releases starting Release 2.8
#152134 commented on
May 13, 2025 • 0 new comments -
`torch.quantile` perform differently on cpu and cuda
#153234 commented on
May 13, 2025 • 0 new comments -
[RFC][API-Unstable] Support 3rd party SYCL kernels with CPP Extension API
#153265 commented on
May 13, 2025 • 0 new comments -
DISABLED test_ddp_apply_optim_in_backward (__main__.TestDistBackendWithSpawn)
#153266 commented on
May 13, 2025 • 0 new comments -
Auto format lint not making suggestions
#153273 commented on
May 13, 2025 • 0 new comments -
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_float32 (__main__.TestForeachCUDA)
#153284 commented on
May 13, 2025 • 0 new comments -
Inconsistent overflow handling in `torch.clamp_min` between CPU and CUDA for float16 tensors
#153187 commented on
May 13, 2025 • 0 new comments -
torch.min document not up to date
#90633 commented on
May 14, 2025 • 0 new comments -
[Nested Tensor / subclasses] view(-1) and splitting dimensions with view() failures
#128649 commented on
May 14, 2025 • 0 new comments -
[ued] HF diffusers pipeline `enable_cpu_offload` errors or graph breaks with a `torch.compile`-ed transformer
#150711 commented on
May 14, 2025 • 0 new comments -
XPU skip breaks the test_decompose_mem_bound_mm.py test suite
#153239 commented on
May 14, 2025 • 0 new comments -
bfloat16 numerical errors for SDPA math backend
#151912 commented on
May 14, 2025 • 0 new comments -
`torch.cuda.manual_seed` ignored
#149621 commented on
May 14, 2025 • 0 new comments -
Unexpected overflow behavior when using `torch.addcmul`
#152294 commented on
May 14, 2025 • 0 new comments -
DISABLED test_inductor_reuse_buffer_after_inplace_collective (__main__.CompileTest)
#147950 commented on
May 14, 2025 • 0 new comments -
DISABLED test_vdd_clamp_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#134445 commented on
May 14, 2025 • 0 new comments -
Flex Attention is incompatible with selective AC
#147879 commented on
May 14, 2025 • 0 new comments -
[whisper][Arc770][Win]XPU performance is worse than CPU
#151985 commented on
May 14, 2025 • 0 new comments -
DISABLED test_duplicate_registration_impl (__main__.TestOpProfiles)
#151281 commented on
May 14, 2025 • 0 new comments -
DISABLED test_repeated_calling_cuda (__main__.AOTInductorTestABICompatibleGpu)
#146185 commented on
May 14, 2025 • 0 new comments -
[RFC][API-Unstable] Intel GPU distributed Backend integration in `torch-xpu-ops`and registeration in PyTorch
#141741 commented on
May 14, 2025 • 0 new comments -
[RFC][API-Unstable]Enable A16W4 on XPU Device
#153019 commented on
May 14, 2025 • 0 new comments -
DISABLED test_find_or_create_pg (__main__.TestPgTag)
#107278 commented on
May 14, 2025 • 0 new comments -
[MPS] MultiheadAttention with masks and dropout produces NaNs
#151667 commented on
May 13, 2025 • 0 new comments -
Incorporate CUDA Memory Trimming Into DeviceCachingAllocator
#152875 commented on
May 13, 2025 • 0 new comments -
Don't hardcoded support for DTensor to_local/from_local/redistribute into dynamo
#152829 commented on
May 13, 2025 • 0 new comments -
DISABLED test_inductor_reduce_scatter_tensor_single (__main__.CompileTest)
#147911 commented on
May 13, 2025 • 0 new comments -
DISABLED test_inductor_inplace_op_on_view (__main__.CompileTest)
#147852 commented on
May 13, 2025 • 0 new comments -
[PTD BE DAY]Burn Down Distributed Disabled Tests!!
#132845 commented on
May 13, 2025 • 0 new comments -
Stop special-casing einops in Dynamo
#142486 commented on
May 13, 2025 • 0 new comments -
[ONNX] export() with dynamic shapes fails where dynamo_export(dynamic_shapes=True) succeeds
#126607 commented on
May 13, 2025 • 0 new comments -
[ued] Slow start up time for `torch.compile` on GGUF Auraflow
#150706 commented on
May 13, 2025 • 0 new comments -
[dynamo] torch._dynamo crashes on `self.value.__module__` inside SkipFunctionVariable.call_function() (PyTorch 2.7, works 2.6)
#152316 commented on
May 13, 2025 • 0 new comments -
ROCm: no HIP device available if device is already initialized
#152941 commented on
May 13, 2025 • 0 new comments -
Unexpected float32 overflow for amp training with torch.compile
#153044 commented on
May 13, 2025 • 0 new comments -
DISABLED AotInductorTest.BasicTestCuda (build.bin.test_aoti_inference)
#152888 commented on
May 13, 2025 • 0 new comments -
DISABLED AotInductorTest.BasicPackageLoaderTestCpu (build.bin.test_aoti_inference)
#152891 commented on
May 13, 2025 • 0 new comments -
Support for banded matrix operations
#118225 commented on
May 12, 2025 • 0 new comments -
Support Delay Loading of c10.dll in when using libtorch as a thirdparty library.
#105058 commented on
May 12, 2025 • 0 new comments -
DISABLED test_item_to_inputs_kernel_nobreak_cuda (__main__.TestInductorDynamicCUDA)
#119538 commented on
May 12, 2025 • 0 new comments -
upstream `apex.normalization.FusedRMSNorm`
#72643 commented on
May 12, 2025 • 0 new comments -
welfordreduce slows down forward layernorm in a bunch of cases
#120184 commented on
May 12, 2025 • 0 new comments -
Mismatch in dynamic quantization performance for torchao and torch.quantization
#152813 commented on
May 12, 2025 • 0 new comments -
DISABLED test_comprehensive_gather_xpu_int64 (__main__.TestInductorOpInfoXPU)
#152970 commented on
May 12, 2025 • 0 new comments -
DISABLED test_comprehensive_scatter_xpu_int32 (__main__.TestInductorOpInfoXPU)
#152971 commented on
May 12, 2025 • 0 new comments -
DISABLED test_com 10000 prehensive_scatter_xpu_int64 (__main__.TestInductorOpInfoXPU)
#153017 commented on
May 12, 2025 • 0 new comments -
DISABLED test_comprehensive_scatter_xpu_bool (__main__.TestInductorOpInfoXPU)
#153018 commented on
May 12, 2025 • 0 new comments -
DISABLED test_comprehensive_gather_xpu_bool (__main__.TestInductorOpInfoXPU)
#152931 commented on
May 12, 2025 • 0 new comments -
DISABLED test_comprehensive_gather_xpu_int32 (__main__.TestInductorOpInfoXPU)
#152930 commented on
May 12, 2025 • 0 new comments -
DISABLED test_comprehensive_gather_xpu_float16 (__main__.TestInductorOpInfoXPU)
#152929 commented on
May 12, 2025 • 0 new comments -
DISABLED test_comprehensive_gather_xpu_float32 (__main__.TestInductorOpInfoXPU)
#152911 commented on
May 12, 2025 • 0 new comments -
DISABLED test_comprehensive_gather_xpu_float64 (__main__.TestInductorOpInfoXPU)
#152910 commented on
May 12, 2025 • 0 new comments -
DISABLED test_comprehensive_scatter_xpu_float64 (__main__.TestInductorOpInfoXPU)
#152898 commented on
May 12, 2025 • 0 new comments -
DISABLED test_comprehensive_scatter_xpu_float16 (__main__.TestInductorOpInfoXPU)
#152925 commented on
May 12, 2025 • 0 new comments -
DISABLED test_comprehensive_scatter_xpu_float32 (__main__.TestInductorOpInfoXPU)
#152912 commented on
May 12, 2025 • 0 new comments -
torch.compile causes stride mismatch in SDPA with non-contiguous query in torch 2.7
#152747 commented on
May 12, 2025 • 0 new comments -
Would it be possible to have a CMAKE option USE_SYSTEM_FMT
#134576 commented on
May 11, 2025 • 0 new comments -
Python 3.10 + intel-openmp failed to use numactl after import torch._C
#136307 commented on
May 11, 2025 • 0 new comments -
Expand Tag Set: views & reductions
#129020 commented on
May 11, 2025 • 0 new comments -
General MPS op coverage tracking issue
#77764 commented on
May 11, 2025 • 0 new comments -
[feature request] torch.hub.load_state_dict_from_url to be replaced by a new good general download-a-file function and to also support local paths and google drive links / private github release links
#73466 commented on
May 11, 2025 • 0 new comments -
Addind RoPE to pytorch core
#149534 commented on
May 11, 2025 • 0 new comments -
NCCL out of memory error after updating to PyTorch 2.7
#152302 commented on
May 10, 2025 • 0 new comments -
`torch.device.__enter__` does not affect `get_default_device` despite taking precedence over `set_default_device`
#148874 commented on
May 10, 2025 • 0 new comments -
[inductor] Improve codegen for argmax+max
#146643 commented on
May 10, 2025 • 0 new comments -
Cannot use torch.arange in torch.ops.higher_order.scan
#153247 commented on
May 10, 2025 • 0 new comments -
[JIT] Inconsistent results between eager and script modes for instance_norm with custom running stats
#153224 commented on
May 10, 2025 • 0 new comments -
reshape_view_helper is only used for fake tensor tracing but not proxy tracing.
#153303 commented on
May 13, 2025 • 0 new comments -
Symbol problem about static variable in inline function
#146969 commented on
May 13, 2025 • 0 new comments -
[Inductor] Investigate computing global amaxes via atomics (instead of a reduction based approach) in triton codgen
#153103 commented on
May 13, 2025 • 0 new comments -
[inductor] `proxy_tensor.py` throws `SyntaxError` when using `.random_`
#151432 commented on
May 13, 2025 • 0 new comments -
DISABLED test_inductor_all_to_all_single (__main__.CompileTest)
#147795 commented on
May 13, 2025 • 0 new comments -
TorchInductor CPU Performance Dashboard
#93531 commented on
May 12, 2025 • 0 new comments -
DISABLED test_inductor_all_reduce_non_contig_input (__main__.CompileTest)
#147733 commented on
May 12, 2025 • 0 new comments -
DISABLED test_tensor_subclasses (__main__.TestScript)
#119949 commented on
May 12, 2025 • 0 new comments -
[DTensor] `Partial(sum)` reductions are wrongly cached (?)
#147180 commented on
May 12, 2025 • 0 new comments -
DTensor does not support `nn.init.eye_`
#136946 commented on
May 12, 2025 • 0 new comments -
Segmentation fault (core dumped) in torch.nn.functional.max_unpool2d
#152913 commented on
May 12, 2025 • 0 new comments -
AOTI regression on SAM and tts-angular
#152606 commented on
May 12, 2025 • 0 new comments -
torch.multiprocessing.Queue Zeroes Out Tensors on Retrieval
#149155 commented on
May 12, 2025 • 0 new comments -
`torch.load` can't deserialize `datetime` objects, even with the appropriate `safe_globals`
#152985 commented on
May 12, 2025 • 0 new comments -
torch.compile on MPS progress tracker
#150121 commented on
May 12, 2025 • 0 new comments -
Code fails with "Expected curr_block->next == nullptr to be true, but got false"
#140419 commented on
May 12, 2025 • 0 new comments -
DISABLED test_rng (__main__.TestCompilerBisector)
#139590 commented on
May 12, 2025 • 0 new comments -
`torch.ldexp` goes out of range when `2**other` is out of range
#153069 commented on
May 12, 2025 • 0 new comments -
Online softmax is disabled on the fly
#153241 commented on
May 12, 2025 • 0 new comments -
mark_unbacked for strides.
#153204 commented on
May 12, 2025 • 0 new comments -
Pytorch 2.7 crashes when using flex attention with torch.amp
#153042 commented on
May 12, 2025 • 0 new comments -
Loading sparse tensors in a DataLoader raises CUDA initialization error since 2.5.0 if you have already initialized CUDA
#153143 commented on
May 12, 2025 • 0 new comments -
inductor-periodic rocm tests failing since at least 4/10
#152866 commented on
May 12, 2025 • 0 new comments -
register_constant doesn't work on simple types
#153061 commented on
May 12, 2025 • 0 new comments -
[feature request] Support native ONNX export of FFT-related ops in opset17 (with `inverse=True`, it also includes inverse DFT)
#107588 commented on
May 12, 2025 • 0 new comments -
RFCS review request: fast Viterbi Decoding
#121160 commented on
May 12, 2025 • 0 new comments -
[Async TP] all-gather-matuls not fusing properly when rowwise scales are used
#149990 commented on
May 12, 2025 • 0 new comments -
Open file leak when dataloader is using persistent_workers and pin_memory AND you create multiple dataloaders.
#91252 commented on
May 12, 2025 • 0 new comments -
DISABLED test_while_loop_schema_gen (__main__.TestHopSchema)
#141202 commented on
May 12, 2025 • 0 new comments -
10000
Request to cherrypick a fix into v1.13.1 (v1.8 has a CVE)
#98115 commented on
May 12, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int16 (__main__.TestForeachCUDA)
#150309 commented on
May 16, 2025 • 0 new comments -
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex128 (__main__.TestForeachCUDA)
#149323 commented on
May 16, 2025 • 0 new comments -
DISABLED test_remove_noop_view_dtype_cuda (__main__.GPUTests)
#151541 commented on
May 16, 2025 • 0 new comments -
ROCm, 7900 XTX: Pytorch FLASH_ATTENTION SDPA is 2.5x slower than MATH (fp16, head_dim 256, seqlen 4360, 12 heads)
#152595 commented on
May 16, 2025 • 0 new comments -
[NJT] can only chunk if the 2nd dimension is ragged
#153238 commented on
May 16, 2025 • 0 new comments -
Compile breaks flex-attention with jagged tensors
#148201 commented on
May 16, 2025 • 0 new comments -
[CXX11ABI] torch 2.6.0-cu126 and cu124 have different exported symbols
#152790 commented on
May 16, 2025 • 0 new comments -
DISABLED test_full_dtype (__main__.TestFull)
#138574 commented on
May 16, 2025 • 0 new comments -
DISABLED test_comprehensive_pca_lowrank_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#139828 commented on
May 16, 2025 • 0 new comments -
DISABLED test_remove_noop_view_dtype_cpu (__main__.CpuTests)
#151540 commented on
May 16, 2025 • 0 new comments -
RFC: The State of Custom CUDA extensions in PyTorch
#152032 commented on
May 16, 2025 • 0 new comments -
[feature request] [discussion] Include basic `ctypes` bindings for `cudart`/`cublasLt`/`cublas`/`nvrtc`/`cudnn` with stock PyTorch
#107800 commented on
May 16, 2025 • 0 new comments -
Download speed issues with the pytorch conda channel
#17023 commented on
May 16, 2025 • 0 new comments -
Unable to build and use libtorch function via pybind11: undefined symbol error upon import
#73016 commented on
May 16, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float64 (__main__.TestForeachCUDA)
#150298 commented on
May 16, 2025 • 0 new comments -
[inductor] [cpu] [[silence] `nn.ConvTranspose2d-F.dropout` and `nn.Conv2d-F.dropout` outputs inconsistent results with eager
#148061 commented on
May 16, 2025 • 0 new comments -
torch.onnx.export causes floating point exception with core dump for empty slice assignment
#110056 commented on
May 16, 2025 • 0 new comments -
IInconsistent Error Handling in `torch.fused_moving_avg_obs_fake_quant` Between CPU and GPU Implementations
#153310 commented on
May 16, 2025 • 0 new comments -
modded-nanogpt flaky NCCL hang starting 3/30 nightly
#152623 commented on
May 16, 2025 • 0 new comments -
Better test coverage on _inductor/scheduler.py
#150476 commented on
May 16, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float32 (__main__.TestForeachCUDA)
#150208 commented on
May 16, 2025 • 0 new comments -
DISABLED test_remove_noop_view_default_cuda (__main__.GPUTests)
#151511 commented on
May 16, 2025 • 0 new comments -
non-negative least squares solver feature request
#48972 commented on
May 16, 2025 • 0 new comments -
[XPU] Upgrade the XPU support packages version to 2025.1 in CI/CD
#151097 commented on
May 16, 2025 • 0 new comments -
DISABLED test_remove_noop_view_default_cpu (__main__.CpuTests)
#151512 commented on
May 16, 2025 • 0 new comments -
DTensor support for dynamic shapes is soft
#152963 commented on
May 15, 2025 • 0 new comments -
redundant recompilation caused by duplicated Sym()
#144068 commented on
May 15, 2025 • 0 new comments -
[c10d] Consolidate watchdog threads
#146956 commented on
May 15, 2025 • 0 new comments -
DISABLED test_int64_upsample3d_cuda_bfloat16 (__main__.TestTorchDeviceTypeCUDA)
#146007 commented on
May 15, 2025 • 0 new comments -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#148966 commented on
May 15, 2025 • 0 new comments -
Fix `USE_STATIC_MKL` lost functionality
#138996 commented on
May 17, 2025 • 0 new comments -
Prioritize building with libgomp over libomp
#138834 commented on
May 17, 2025 • 0 new comments -
Add overflow check for negtive integer div_floor and div_trunc on CPU
#138684 commented on
May 13, 2025 • 0 new comments -
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on
May 16, 2025 • 0 new comments -
Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu
#138068 commented on
May 16, 2025 • 0 new comments -
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on
May 16, 2025 • 0 new comments -
autograd codegen: bump VC properly for mutable ops with no returns
#133044 commented on
May 13, 2025 • 0 new comments -
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor
#127294 commented on
May 14, 2025 • 0 new comments -
[inductor] enable bf32 test for mkldnn conv
#127293 commented on
May 14, 2025 • 0 new comments -
[AOTAutograd] tweak min-cut partitioner to avoid saving softmax output
#126348 commented on
May 16, 2025 • 0 new comments -
allow to use bf16 as fp32 internal precision for mkldnn conv backward
#126054 commented on
May 14, 2025 • 0 new comments -
allow to use bf16 as fp32 internal precision for mkldnn conv
#126050 commented on
May 14, 2025 • 0 new comments -
refine fp32 precision api
#125888 commented on
May 15, 2025 • 0 new comments -
[vision hash update] update the pinned vision hash
#125806 commented on
May 17, 2025 • 0 new comments -
Automated submodule update: FBGEMM
#115316 commented on
May 17, 2025 • 0 new comments -
[pytree] support PyStructSequence types for Python pytree
#113258 commented on
May 16, 2025 • 0 new comments -
Automated submodule update: kineto
#106149 commented on
May 15, 2025 • 0 new comments -
UNSTABLE pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks)
#149370 commented on
May 17, 2025 • 0 new comments -
Performance Regression nightly 03/11→03/12, on nanogpt speedrun
#152823 commented on
May 17, 2025 • 0 new comments -
`randint(max)` causes a graph break, but not `rand().mul(max).floor().to(torch.long)` (on CPU)
#135664 commented on
May 17, 2025 • 0 new comments -
compiling for rocm gfx1010, getting cuda errors
#145670 commented on
May 16, 2025 • 0 new comments -
Cleanup autotune_fallback_to_aten post-deprecation
#153298 commented on
May 16, 2025 • 0 new comments -
Padded tensor subclass
#105325 commented on
May 16, 2025 • 0 new comments -
torch.export does not support torchaudio.transforms.Spectrogram
#112844 commented on
May 16, 2025 • 0 new comments -
[v2.7.1] Release Tracker
#152627 commented on
May 16, 2025 • 0 new comments -
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on
May 16, 2025 • 0 new comments -
MPS incompatibility: Calls into the C++ engine to run the backward pass
#143123 commented on
May 16, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int32 (__main__.TestForeachCUDA)
#150350 commented on
May 16, 2025 • 0 new comments -
cudnn.determinstic=True causes dilated convolution to be >10x slower
#28777 commented on
May 16, 2025 • 0 new comments -
Accuracy issue in torch inductor
#153299 commented on
May 16, 2025 • 0 new comments -
DISABLED test_remove_noop_slice1_cpu (__main__.CpuTests)
#151379 commented on
May 15, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_scatter_cpu (__main__.CpuTests)
#151382 commented on
May 15, 2025 • 0 new comments -
DISABLED test_inductor_all_gather_into_tensor_coalesced (__main__.CompileTest)
#146806 commented on
May 15, 2025 • 0 new comments -
CUDA error: CUBLAS_STATUS_INVALID_VALUE
#64097 commented on
May 15, 2025 • 0 new comments -
Training/Fine-tuning fails with PyTorch 2.8 + 4x 5090 GPUs using DDP/FSDP/DeepSpeed
#150734 commented on
May 15, 2025 • 0 new comments -
inconsistent grads between two types of `allgather`s
#153016 commented on
May 15, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_uint8 (__main__.TestForeachCUDA)
#150417 commented on
May 15, 2025 • 0 new comments -
DISABLED test_fake_registration (__main__.TestOpProfiles)
#151301 commented on
May 15, 2025 • 0 new comments -
DISABLED test_sdpa_mask_fp16_L6_S17_NH23_HS121 (__main__.TestSDPA)
#138905 commented on
May 15, 2025 • 0 new comments -
eval should handle (unhinted: (s77 > 3) | (u0 > 200)) when s77 has hint =5
#153227 commented on
May 14, 2025 • 0 new comments -
Have cherry-pick bot always add the current release to the PR
#152212 commented on
May 14, 2025 • 0 new comments -
Triton Kernel Rejects NamedTupleVariable Arguments
#148289 commented on
May 14, 2025 • 0 new comments -
DISABLED test_reduce_stress_cuda (__main__.ProcessGroupGlooTest)
#152367 commented on
May 14, 2025 • 0 new comments -
DISABLED test_reduce_stress_cuda (__main__.ProcessGroupGlooLazyInitTest)
#152201 commented on
May 14, 2025 • 0 new comments -
DISABLED test_remove_noop_slice1_cuda (__main__.GPUTests)
#151381 commented on
May 14, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_cuda (__main__.GPUTests)
#151383 commented on
May 14, 2025 • 0 new comments -
DISABLED AotInductorTest.FreeInactiveConstantBufferRuntimeConstantFoldingCuda (build.bin.test_aoti_inference)
#150299 commented on
May 14, 2025 • 0 new comments -
DISABLED test_matrix_rank_basic_cuda_float32 (__main__.TestLinalgCUDA)
#150406 commented on
May 14, 2025 • 0 new comments -
CUDA 12.6 Inductor perf test failures
#148699 commented on
May 14, 2025 • 0 new comments -
Make compiled models serializable
#101107 commented on
May 14, 2025 • 0 new comments -
Shared memory out of resource when using flex attention
#133254 commented on
May 14, 2025 • 0 new comments -
FSDP2 tutorial outline
#151505 commented on
May 14, 2025 • 0 new comments -
`lp_pool1d` behavior inconsistency between CPU and GPU
#153312 commented on
May 14, 2025 • 0 new comments -
`torch.nn.functional.rrelu` crashes on CPU with `training=True` when `lower` or `upper` is set to `inf`
#153281 commented on
May 14, 2025 • 0 new comments -
torch.nn.functional.conv_transpose2d produces inconsistent output on CPU and CUDA
#153276 commented on
May 14, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bfloat16 (__main__.TestForeachCUDA)
#150119 commented on
May 14, 2025 • 0 new comments -
DISABLED AotInductorTest.FreeInactiveConstantBufferCuda (build.bin.test_aoti_inference)
#149495 commented on
May 14, 2025 • 0 new comments -
Compiled Autograd + Activation Checkpointing/Offloading
#143176 commented on
May 14, 2025 • 0 new comments -
`torch.native_channel_shuffle` crashes with Floating Point Exception when given large integer parameter
#153231 commented on
May 14, 2025 • 0 new comments -
Behavior of kernel_size parameter of torch.nn.functional.avg_pool2d does not match with documentation
#153149 commented on
May 14, 2025 • 0 new comments -
[RFC] PyTorch DistributedTensor
#88838 commented on
May 15, 2025 • 0 new comments -
SIGSEGV due to insufficient return value checking for PyFrame_GetLocals
#148273 commented on
May 15, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float16 (__main__.TestForeachCUDA)
#150173 commented on
May 15, 2025 • 0 new comments -
[dynamo] Replace `unimplemented` with `unimplemented_v2`
#147913 commented on
May 15, 2025 • 0 new comments -
undefined symbol: __nvJitLinkCreate_12_8, version libnvJitLink.so.12
#152783 commented on
May 15, 2025 • 0 new comments -
Support SDPA flash attention/ memory efficient attn on ROCm gfx908
#141958 commented on
May 15, 2025 • 0 new comments -
DistributedDataParallel with compile(..., mode="max-autotune") hangs in 2.5+
#140395 commented on
May 15, 2025 • 0 new comments -
PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU
#145702 commented on
May 15, 2025 • 0 new comments -
[ONNX] Implement scan
#151327 commented on
May 15, 2025 • 0 new comments -
NCCL init hits CUDA failure 'invalid argument' on 12.2 driver
#150852 commented on
May 15, 2025 • 0 new comments -
[Dynamo] Exception raised inside torch.autocast causes crash AttributeError: 'NoneType' object has no attribute 'is_python_constant
#152012 commented on
May 15, 2025 • 0 new comments -
[source.wheel] Pin setuptools runtime dependency
#152355 commented on
May 15, 2025 • 0 new comments -
[release] CPU perf benchmark latency increase for 2.6->2.7 on c5.24xlarge and A100 instances
#151037 commented on
May 15, 2025 • 0 new comments -
`version.txt` mismatch with tags in release branch
#151425 commented on
May 15, 2025 • 0 new comments -
torch wheels are unusable if CUDA RPMs are installed on the system (was Import error in nvidia/cuda:12.6.3-cudnn-devel-rockylinux9)
#150399 commented on
May 15, 2025 • 0 new comments -
Looking for valid compiling option for extension based on torch-2.1.0+cpu.cxx11.abi
#143780 commented on
May 15, 2025 • 0 new comments -
Preload CUDA fails if CUDA libs in different PYTHONPATH
#147001 commented on
May 15, 2025 • 0 new comments -
PyTorch 2.6 License Issues
#150118 commented on
May 15, 2025 • 0 new comments -
Profiler doesn't seem to work on AMD CPUs
#150052 commented on
May 15, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex64 (__main__.TestForeachCUDA)
#150161 commented on
May 15, 2025 • 0 new comments -
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bfloat16 (__main__.TestForeachCUDA)
#148965 commented on
May 15, 2025 • 0 new comments -
DISABLED test_seqential_batch_workers (__main__.TestDataLoader)
#81891 commented on
May 15, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_cpu (__main__.CpuTests)
#151384 commented on
May 15, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex128 (__main__.TestForeachCUDA)
#150141 commented on
May 15, 2025 • 0 new comments -
DISABLED test_is_isnot (__main__.TestScript)
#120694 commented on
May 15, 2025 • 0 new comments -
[inductor][cpu] pytorch_CycleGAN_and_pix2pix AMP multiple thread performance regression in 2025-04-27 nightly release
#152921 commented on
May 15, 2025 • 0 new comments -
`torch.compile()` produces incorrect results for `asinh_()` operation on large/small values
#152299 commented on
May 15, 2025 • 0 new comments -
Pytorch Typing, for Tensor type annotations
#73359 commented on
May 15, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_scatter_cuda (__main__.GPUTests)
#151378 commented on
May 15, 2025 • 0 new comments -
DISABLED test_slice_scatter_reinplace_cuda (__main__.GPUTests)
#145189 commented on
May 15, 2025 • 0 new comments