Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu #138068

Kh4L · 2024-10-16T09:08:18Z

The to_sparse_csr CPU implementation convert_indices_from_coo_to_csr_cpu doesn't validate the COO indices, which may lead to illegal memory access.

This PR fixes that by adding checks on the index before accessing the data.

# repro
num_nonzeros = 2048

#row,col
dense_size = (1024, 256)


row_indices = torch.randint(0, dense_size[0], (num_nonzeros,))
col_indices = torch.randint(0, dense_size[1], (num_nonzeros,))
coo_indices = torch.stack((row_indices, col_indices))

# this should work
sparse_coo = torch.sparse_coo_tensor(
    coo_indices,
    torch.ones(coo_indices.size(1), dtype=bool),
    dense_size,
)
dst_node_indices = sparse_coo.to_sparse_csr()
print(f"Works well {dst_node_indices}")


# altering row_indices, now it's expected to fail

row_indices[-1] = dense_size[0] + 42

coo_indices = torch.stack((row_indices, col_indices))

# this should not work
sparse_coo = torch.sparse_coo_tensor(
    coo_indices,
    torch.ones(coo_indices.size(1), dtype=bool),
    dense_size,
)
dst_node_indices = sparse_coo.to_sparse_csr()
print("Should never reach here")

pytorch-bot · 2024-10-16T09:08:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138068

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 41 New Failures, 25 Cancelled Jobs, 6 Pending

As of commit 137ce9d with merge base 7f88bf9 ():

NEW FAILURES - The following jobs have failed:

Check Labels / Check labels (gh)
# This PR needs a release notes: label
pull / linux-focal-py3.11-clang10 / test (crossref, 1, 2, lf.linux.2xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC
pull / linux-focal-py3.11-clang10 / test (crossref, 2, 2, lf.linux.2xlarge) (gh)
test_maskedtensor.py::TestOperatorsCPU::test_binary_core_add_layout2_cpu_float64
pull / linux-focal-py3.11-clang10 / test (default, 1, 4, lf.linux.4xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC
pull / linux-focal-py3.11-clang10 / test (default, 2, 4, lf.linux.4xlarge) (gh)
test_sparse.py::TestSparseCPU::test_sparse_matmul_cpu_complex128
pull / linux-focal-py3.11-clang10 / test (default, 3, 4, lf.linux.4xlarge) (gh)
test_maskedtensor.py::TestOperatorsCPU::test_binary_core_add_layout2_cpu_float64
pull / linux-focal-py3.11-clang10 / test (default, 4, 4, lf.linux.4xlarge) (gh)
test_jit.py::TestSparse::test_freeze_sparse_csr
pull / linux-focal-py3.11-clang10 / test (dynamo, 1, 3, lf.linux.2xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSR
pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 3, lf.linux.2xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC
pull / linux-focal-py3.12-clang10 / test (default, 1, 4, lf.linux.4xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC
pull / linux-focal-py3.12-clang10 / test (default, 2, 4, lf.linux.4xlarge) (gh)
test_sparse.py::TestSparseCPU::test_sparse_matmul_cpu_complex128
pull / linux-focal-py3.12-clang10 / test (default, 3, 4, lf.linux.4xlarge) (gh)
test_maskedtensor.py::TestOperatorsCPU::test_binary_core_add_layout2_cpu_float64
pull / linux-focal-py3.12-clang10 / test (default, 4, 4, lf.linux.4xlarge) (gh)
test_jit.py::TestSparse::test_freeze_sparse_csr
pull / linux-focal-py3.12-clang10 / test (dynamo, 1, 3, lf.linux.2xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSR
pull / linux-focal-py3.12-clang10 / test (dynamo, 2, 3, lf.linux.2xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC
pull / linux-focal-py3.12-clang10 / test (dynamo, 3, 3, lf.linux.2xlarge) (gh)
test_sparse.py::TestSparseCPU::test_sparse_matmul_cpu_complex128
pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 1, 3, linux.4xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC
pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 2, 3, linux.4xlarge) (gh)
test_sparse.py::TestSparseCPU::test_sparse_matmul_cpu_complex128
pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 3, 3, linux.4xlarge) (gh)
test_maskedtensor.py::TestOperatorsCPU::test_binary_core_add_layout2_cpu_float64
pull / linux-focal-py3.12-clang10-experimental-split-build / test (dynamo, 1, 3, linux.2xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSR
pull / linux-focal-py3.12-clang10-experimental-split-build / test (dynamo, 2, 3, linux.2xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC
pull / linux-focal-py3.12-clang10-experimental-split-build / test (dynamo, 3, 3, linux.2xlarge) (gh)
test_sparse.py::TestSparseCPU::test_sparse_matmul_cpu_complex64
pull / linux-focal-py3.9-clang10 / test (crossref, 1, 2, lf.linux.2xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC
pull / linux-focal-py3.9-clang10 / test (crossref, 2, 2, lf.linux.2xlarge) (gh)
test_maskedtensor.py::TestOperatorsCPU::test_binary_core_add_layout2_cpu_float64
pull / linux-focal-py3.9-clang10 / test (default, 1, 4, lf.linux.4xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC
pull / linux-focal-py3.9-clang10 / test (default, 2, 4, lf.linux.4xlarge) (gh)
test_sparse.py::TestSparseCPU::test_sparse_matmul_cpu_complex128
pull / linux-focal-py3.9-clang10 / test (default, 3, 4, lf.linux.4xlarge) (gh)
test_maskedtensor.py::TestOperatorsCPU::test_binary_core_add_layout2_cpu_float64
pull / linux-focal-py3.9-clang10 / test (default, 4, 4, lf.linux.4xlarge) (gh)
test_jit.py::TestSparse::test_freeze_sparse_csr
pull / linux-focal-py3.9-clang10 / test (dynamo, 1, 3, lf.linux.2xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSR
pull / linux-focal-py3.9-clang10 / test (dynamo, 2, 3, lf.linux.2xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC
pull / linux-jammy-py3.10-clang15-asan / test (default, 1, 6, lf.linux.4xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC
pull / linux-jammy-py3.10-clang15-asan / test (default, 2, 6, lf.linux.4xlarge) (gh)
test_sparse.py::TestSparseCPU::test_sparse_matmul_cpu_complex64
pull / linux-jammy-py3.10-clang15-asan / test (default, 3, 6, lf.linux.4xlarge) (gh)
test_sparse.py::TestSparseCPU::test_sparse_spdiags_cpu_int32
pull / linux-jammy-py3.10-clang15-asan / test (default, 4, 6, lf.linux.4xlarge) (gh)
test_sparse.py::TestSparseCPU::test_sparse_matmul_cpu_complex128
pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6, lf.linux.4xlarge) (gh)
test_sparse.py::TestSparseCPU::test_sparse_matmul_cpu_float32
pull / linux-jammy-py3.10-clang15-asan / test (default, 6, 6, lf.linux.4xlarge) (gh)
test_sparse.py::TestSparseCPU::test_sparse_spdiags_cpu_complex128
pull / linux-jammy-py3.9-gcc11 / test (default, 1, 4, lf.linux.2xlarge) (gh)
test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC
pull / linux-jammy-py3.9-gcc11 / test (default, 2, 4, lf.linux.2xlarge) (gh)
test_sparse.py::TestSparseCPU::test_sparse_matmul_cpu_complex128
pull / linux-jammy-py3.9-gcc11 / test (default, 3, 4, lf.linux.2xlarge) (gh)
test_maskedtensor.py::TestOperatorsCPU::test_binary_core_add_layout2_cpu_float64
pull / linux-jammy-py3.9-gcc11 / test (default, 4, 4, lf.linux.2xlarge) (gh)
test_jit.py::TestSparse::test_freeze_sparse_csr
pull / linux-jammy-py3.9-gcc11 / test (jit_legacy, 1, 1, lf.linux.2xlarge) (gh)
test_jit_legacy.py::TestSparse::test_freeze_sparse_csr

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-10-16T09:08:24Z

❌ - login: @Kh4L / name: Serge Panev . The commit (137ce9d) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.

github-actions · 2024-10-16T09:09:02Z

8000

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

cpuhrsch · 2024-10-16T18:14:18Z

@Kh4L - have you tried using the check_invariants flag of torch.sparse_coo_tensor?

Kh4L · 2024-10-17T04:02:18Z

@Kh4L - have you tried using the check_invariants flag of torch.sparse_coo_tensor?

Yes, I gave it a try
However, I still believe it’s worth checking the index here, as we were encountering various types of memory corruption issues that were causing Python to crash, making it difficult to debug

I think the cost of adding this index check is relatively small compared to the time it could save developers in the long run

cpuhrsch · 2024-10-17T06:45:37Z

@Kh4L - I agree, I'd have hoped that check_invariants and the torch.sparse.check_sparse_tensor_invariants context manager already did all the relevant checks needed here. Did you try running under with torch.sparse.check_sparse_tensor_invariants(): as in

with torch.sparse.check_sparse_tensor_invariants():
    run_my_model()

cpuhrsch · 2024-10-17T06:46:38Z

If this doesn't work, I suggest we modify this PR to use https://pytorch.org/docs/main/generated/torch.sparse.check_sparse_tensor_invariants.html#torch.sparse.check_sparse_tensor_invariants.is_enabled as a guard. That way this can be turned on for debugging, but doesn't affect performance by default.

Kh4L · 2024-10-17T07:03:22Z

@cpuhrsch I understand but is it really ok to have python crashing with errors such as malloc(): invalid next size (unsorted) even without check_sparse_tensor_invariants ? to_sparse_csr does not seem to be a typical bottleneck

Kh4L · 2024-10-17T14:30:37Z

@cpuhrsch
the index check if optimized and the cost is negligible, see it for a relatively big sptensor:

import torch
import time
import numpy as np

num_nonzeros = 2 ** 14
dense_size = (2**16, 2**14)

row_indices = torch.randint(0, dense_size[0], (num_nonzeros,))
col_indices = torch.randint(0, dense_size[1], (num_nonzeros,))
coo_indices = torch.stack((row_indices, col_indices))
sparse_coo = torch.sparse_coo_tensor(
    coo_indices,
    torch.ones(coo_indices.size(1), dtype=bool),
    dense_size,
)

def benchmark_to_sparse_csr():
    return sparse_coo.to_sparse_csr()

num_runs = 100
times = []

for _ in range(num_runs):
    start_time = time.time()
    benchmark_to_sparse_csr()
    end_time = time.time()
    times.append(end_time - start_time)

print(f"Mean execution time: {np.mean(times):.6f} seconds")
print(f"Standard deviation: {np.std(times):.6f} seconds")

before the change (without index check):

Mean execution time: 0.178715 seconds
Standard deviation: 0.004270 seconds

after the change (with index check):

Mean execution time: 0.178128 seconds
Standard deviation: 0.003673 seconds

cpuhrsch · 2024-10-17T18:45:40Z

@Kh4L - when the Tensor is big it's fine, but maybe when it's small it's not fine anymore for some applications. By supporting the guard the user can choose what works best for them.

Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu

137ce9d

pytorchbot added the open source label Oct 16, 2024

zou3519 requested a review from cpuhrsch October 16, 2024 13:28

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu #138068

Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu #138068

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu #138068

Are you sure you want to change the base?

Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu #138068

Conversation

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138068

❌ 41 New Failures, 25 Cancelled Jobs, 6 Pending

Uh oh!

Uh oh!

This PR needs a release notes: label

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This PR needs a `release notes:` label