-
Notifications
You must be signed in to change notification settings - Fork 24.3k
Reset mask for torch.cumsum? #53095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Marking this as "triage review" because I'm not sure what to label this as. This reminds me a lot of the "reduction with splits" that Caffe2 has (where one provides some "splits" for a vector and the reduction reduces each section of a vector as defined by the splits). From some googling, I haven't been able to find a precedent for this kind of functionality; numpy's cumsum doesn't have a reset mask. However this sounds like a useful operation to have. |
Gotcha, thanks for confirming! In terms of a temporary work around while this is implemented, is manually iterating through each element to be cumulative sum and tracking cumulative sum slower in terms of autodifferentation as compared to slicing, computing cumulative sum on slices, then concatenating? I.e.. something like this:
|
In a similar vein, would you happen to know where exactly in the repo, |
It's a little complicated, but cumsum is implemented here: pytorch/aten/src/ATen/native/ReduceOps.cpp Lines 84 to 91 in d54be1a
This calls into pytorch/aten/src/ATen/native/native_functions.yaml Lines 6648 to 6649 in d54be1a
|
It's hard to say without benchmarking and knowing the input sizes. But generally manually iterating through each element is not very efficient. The more operations that can be batched together, the better. So I'd expect the latter (i.e. slicing, computing cumulative sum on slices, then concatenating) to be faster |
+1 for this, it would be helpful for reinforcement learning applications. Suspect original post is thinking the same (mentioned a vector of log probs) |
This can be implemented without python loops in the current version of PyTorch (tested on 2.2). Below is slightly modified implementation based on heinsen_sequence: import torch
import torch.nn.functional as F
def complex_log(float_input, eps=1e-6):
eps = float_input.new_tensor(eps)
real = float_input.abs().maximum(eps).log()
imag = (float_input < 0).to(float_input.dtype) * torch.pi
return torch.complex(real, imag)
def associative_scan(values: torch.Tensor, coeffs: torch.Tensor, dim: int):
log_values = complex_log(values.float())
log_coeffs = complex_log(coeffs.float())
a_star = torch.cumsum(log_coeffs, dim=dim)
log_x0_plus_b_star = torch.logcumsumexp(log_values - a_star, dim=dim)
log_x = a_star + log_x0_plus_b_star
return torch.exp(log_x).real
input = torch.tensor([1, 2, 3, 4, 5])
inverted_reset_mask = torch.tensor([0, 1, 1, 0, 1])
output = associative_scan(input, inverted_reset_mask, dim=0)
print(output)
Related #95408 |
Uh oh!
There was an error while loading. Please reload this page.
🚀 Feature
torch.cumsum
current performs a cumulative sum over a given axis without resetting the cumulative sum (i.e. the final term is the sum of all terms in the initial vector). Proposing adding an argumentmask
which is a vectors of 0 or 1s, where 1 indicates to reset the cumulative sum at that vector, allowing computation of multiple cumulative sums in one vector.Motivation
I'm currently looking to compute a cumulative sum of log probabilities for a reinforcement learning task per trajectory, and find it generally easier to track all trajectories in a single vector (for ease of transportation between CPU and GPU) and also track a vector that differentiates between trajectories.
Pitch
I'd like something like an argument to
torch.cumsum
calledmask
which has the same length as the axis to cumulative sum over, and indicates when to reset the cumulative sum. For example, if my vector wasx =[a, b, c, d, e]
and my mask wasm = [1, 0, 0, 1, 0]
, thentorch.cumsum(x, mask=m)
should return something like[a, a+b, a+b+c, d, d+e]
.Alternatives
My current implementation is here (which manually iterates through a vector of log probabilities and uses a inverted mask to reset the cumulative sum).
https://github.com/18jeffreyma/multi_cmd/blob/3106417eb4acb9230576e78e7267a24b2efa6b3d/multi_cmd/rl_utils/multi_copg.py#L179
However, this implementation seems to not play very well with CUDA acceleration on autograd, as I yield zero performance benefit on a Nvidia DGX versus using CPU, which is the primary reason I bring up this feature request. Alternatively, is there some autograd issue with implementing this masked-reset cumulative sum using a for-loop?
cc @mruberry @rgommers @heitorschueroff
The text was updated successfully, but these errors were encountered: