8000 Adding PyTorch functionality to SIRF by Imraj-Singh · Pull Request #1305 · SyneRBI/SIRF · GitHub
[go: up one dir, main page]

Skip to content

Adding PyTorch functionality to SIRF#1305

Open
Imraj-Singh wants to merge 78 commits intoSyneRBI:masterfrom
Imraj-Singh:torch
Open

Adding PyTorch functionality to SIRF#1305
Imraj-Singh wants to merge 78 commits intoSyneRBI:masterfrom
Imraj-Singh:torch

Conversation

@Imraj-Singh
Copy link
Contributor
@Imraj-Singh Imraj-Singh commented Feb 15, 2025

Changes in this pull request

The functionality:

  • Functions for transferring between torch and sirf, and vice versa.
  • torch.autograd.Function for objective function, acquisition mode forward, and acquisition model backward. These are not really meant to be used by a user.
  • torch.nn.Module for a module that interacts with the torch.autograd.Function. In the forward method we could change the number of the dimensions in the array, swap in/out components of the forward operator etc.

This functionality of torch.nn.Module can vary dependent on the user's requirements, so perhaps we should just give a use case. Using it for lpd for example, or PET reconstruction with torch optimisers?

Testing performed

Gradchecks for all the new functions. Note that objective function evaluations are quite unstable.

Related issues

Checklist before requesting a review

  • I have performed a self-review of my code
  • I have added docstrings/doxygen in line with the guidance in the developer guide
  • I have implemented unit tests that cover any new or modified functionality
  • The code builds and runs on my machine
  • CHANGES.md has been updated with any functionality change

Contribution Notes

Please read and adhere to the contribution guidelines.

Please tick the following:

  • The content of this Pull Request (the Contribution) is intentionally submitted for inclusion in SIRF (the Work) under the terms and conditions of the Apache-2.0 License.

@Imraj-Singh Imraj-Singh marked this pull request as draft February 15, 2025 13:03
@KrisThielemans KrisThielemans self-assigned this Feb 17, 2025
@KrisThielemans KrisThielemans linked an issue Feb 17, 2025 that may be closed by this pull request
Copy link
Member
@KrisThielemans KrisThielemans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some detailed comments, but overall, I think we still need to decide on how to wrap AcquisitionModel, AcquisitionSensitivityModel, Resampler etc, i.e. anything that has forward and backward members. This should be written only once. You had https://github.com/SyneRBI/SIRF-Exercises/blob/71df538fe892ce621440544f14fa992c230fd120/notebooks/Deep_Learning_PET/sirf_torch.py#L38-L39, which seems completely generic, aside from naming of variables. So, something like this

class OperatorModule(torch.nn.Module):
    def __init__(self, sirf_operator, sirf_src, sirf_dest):
    """constructs wrapper. WARNING operations will overwrite content of `sirf_src` and `sirf_dest`"""
    ...

with usage

acq_model = pet.AcquisitionModelParallelProj()
# etc etc
torch_acq_model = sirf.torch.OperatorModule(acq_model, image, acq_data)

Ideally, we also provide for being able to call save_for_backward

@Imraj-Singh
Copy link
Contributor Author

Added a README with the some notes for a design discussion here.

Copy link
Member
@KrisThielemans KrisThielemans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to have too much code in the design discussion (duplication as well as distracting from actual design). You could include pointers to implementation in the README, but it's still small enough that this probably isn't worth the effort.

Just keep a skeleton in the README, the shorter the better...

How do you want people to react to your questions in the README? Via a review? (not conventional, but could work)

@KrisThielemans
Copy link
Member

By the way, we should avoid running CI on this until it makes sense. Easiest is to put [ci skip] in the message of your last commit that you will push.

Copy link
Contributor Author
@Imraj-Singh Imraj-Singh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test_cases need reviewed again. Perhaps grad_check is unneccessary since SIRF usually does tests for adjointness etc. We could suffice with simpler test checking the things are copied to and from torch/sirf correctly?

Copy link
Member
@KrisThielemans KrisThielemans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various comments in the review. A general comment: I'm not convinced about hiding the "times -1" for the objective function. For one thing, it will fail with CIL functions such as KL. There are a few possible strategies here, which we should discuss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paskino so this is an attempt at pytest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now works with MR, I am getting some very sinister intermittant segfault with setting up the sirf.STIR objective function @KrisThielemans @paskino. I do think I saw this in the PETRIC challenge. Really not sure how to diagnose and it may be SIRF/PyTorch compatibility issues, but I really don't know.

Copy link
Member
@KrisThielemans KrisThielemans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to have a disagreement on how this should be documented. Ideally @mehrhardt @gschramm provide an alternative opinion to mine...

By the way, I think using funky characters in the doc is a bad idea and that it is better to use Latex-doc (although it is less readable of course). This is because you don't know what language settings the user has (unless there's a way to enforce unicode in Markdown).

@gschramm
7D4D Copy link
gschramm commented Mar 4, 2025

gschramm

For me, this description of the autograd in pytorch makes sense:
https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#optional-reading-vector-calculus-using-autograd

This explain why, in the backward pass of a vector valued function y = f(x), we have to implement the
(Jacobian transpose) applied to a vector (supplied to the layer during back propagation).

If we write that out in components, we get the "chain rule".

Screenshot 2025-03-04 at 08 47 20

@KrisThielemans
Copy link
Member

thanks for the link @gschramm. We could maybe include it in the README. I think the main question for me is how much we should elaborate in our own doc, especially if it gets into detail on what is common usage vs what it actually does. I've put some detailed comments, and @Imraj-Singh will address those he agrees with :-). I'd personally prefer not too much (potentially confusing) detail in the README, but point to torch doc and our use cases (which could have a bit more doc then) or exercises.



run_gradcheck(torch_obj_fun, torch_image_data, (modality, data_type), "Objective",
nondet_tol=1e-1, fast_mode=True, eps=1e-3, atol=1e-1, rtol=1e-1)
Copy link
Contributor Author
@Imraj-Singh Imraj-Singh Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gschramm I had to set up the tolerance for non-determinism to be extremely high for objective function gradchecks... I think this may be troubles with computing the numerical Jacobian from a PNLL type function causing issues with numerical precision? I was wondering if you get this with your gradchecks in parallelproj. As a comparison I only wrapped the acquisition model and used torch's PNLL for the objective, and I still need rather high tolerances for non-determinism.

## Forward and backward clarification

The use of the terms forward and backward have different meaning given the context:
* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you refer to the derivatives as "scaled"?

Copy link
Contributor Author
E875 @Imraj-Singh Imraj-Singh Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vector isn't neccessarily a unit vector. That said, I think I will just say "gradient" as that is correct and less confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that "gradient is correct". The backward method multiplies its argument with the gradient of the function, which is the VJP of course. I'd personally remove "from output to input", as this terminology only makes sense if you think about this is inserted as a layer in a feed-forward network.

Suggested change
* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input.
* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) multiplies its argument with the gradient of the function, i.e. computes the Vector-Jacobian-adjoint-Product (VJP).


The wrapper is currently only for **reverse-mode** autodiff - there are JVP methods (for forward-mode autodiff) of `torch.autograd.Function` that are not used at present.

For `SIRFOperator` we can wrap the adjoint operator with `AdjointOperator`, there quite confusingly the forward is the application of the adjoint, and the backward *is* the linear (Jacobian) component/approximation of the operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backward is the linear (Jacobian) component/approximation of the operator

I don't know what this means

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have some operator A(x) and it's adjoint B(x), the forward pass would be B(x), and the backward pass would be J_A(x) * grad_out. So for operator Ax+b, adjoint is A^T x and and backward pass is A grad_out? grad_out is pytorch for product of upstream gradients, i.e. upstream chain rule....

Perhaps I should change it to:

For SIRFOperator, wrapping its adjoint with AdjointOperator results in: forward executing the adjoint operation, and backward computing the Jacobian-based gradient of the original SIRFOperator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure if this is helpful to mention at this point. It could be done in a use-case. But maybe we could say

Suggested change
For `SIRFOperator` we can wrap the adjoint operator with `AdjointOperator`, there quite confusingly the forward is the application of the adjoint, and the backward *is* the linear (Jacobian) component/approximation of the operator.
Note that in certain cases it can be necessary to use the adjoint of an operator as a forward step, e.g. when adding a gradient-descent step as a layer. This can be achieved by wrapping `sirf.AdjointOperator(sirf_operator)`.

remove the "from sirf.SIRF import *" from sirf/__init__.py again
We were returning the wrong variable (found by Codacy)

[ci skip]
Copy link
Member
@KrisThielemans KrisThielemans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the import statement in sirf/SIRF/__init__.py made the CI pass, essentially as it doesn't know anything about the pytorch module.

I haven't run this though.

It would be nice to have CI (calling gradchecks), but it will need installation of pytorch on the runner. I'm also not sure if the tests will run on CPU.

@casperdcl casperdcl force-pushed the torch branch 2 times, most recently from 2e2dc62 to 5bf241d Compare March 11, 2026 16:54
@casperdcl casperdcl force-pushed the torch branch 9 times, most recently from 5b6ea52 to 2d06e11 Compare March 12, 2026 12:54
@casperdcl casperdcl force-pushed the torch branch 5 times, most recently from 8b37e4e to ba99ec9 Compare March 12, 2026 15:24
Copy link
Member
@KrisThielemans KrisThielemans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 2 PET tests causing a segfault on GHA: test_objective_function_gradcheck and test_objective_function_gradient_gradcheck (CPU version of course). However, I cannot reproduce this on a separate machine with (probably) the same software. I built STIR and SIRF in Debug mode, and run the tests via valgrind. There were no segfaults, or any memory problems flagged up with valgrind. There were various memory leaks: valgrind had items "reachable" at exit, but going through them they all seemed to be in Python, Cuda, libgomp. Certainly nothing obvious in STIR/SIRF.

All MR tests pass.

We have therefore decided to merge this PR with the 2 tests disabled and create an issue for it to remind us. It'd therefore be great if people could try the new functionality ASAP (after the merge).

@casperdcl go ahead once you're happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

SIRF torch should be part of SIRF

8 participants

0