10000 Reset mask for torch.cumsum? · Issue #53095 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Reset mask for torch.cumsum? #53095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
18jeffreyma opened this issue Mar 2, 2021 · 7 comments
Open

Reset mask for torch.cumsum? #53095

18jeffreyma opened this issue Mar 2, 2021 · 7 comments
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: numpy Related to numpy support, and also numpy compatibility of our operators module: reductions triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@18jeffreyma
Copy link
18jeffreyma commented Mar 2, 2021

🚀 Feature

torch.cumsum current performs a cumulative sum over a given axis without resetting the cumulative sum (i.e. the final term is the sum of all terms in the initial vector). Proposing adding an argument mask which is a vectors of 0 or 1s, where 1 indicates to reset the cumulative sum at that vector, allowing computation of multiple cumulative sums in one vector.

Motivation

I'm currently looking to compute a cumulative sum of log probabilities for a reinforcement learning task per trajectory, and find it generally easier to track all trajectories in a single vector (for ease of transportation between CPU and GPU) and also track a vector that differentiates between trajectories.

Pitch

I'd like something like an argument to torch.cumsum called mask which has the same length as the axis to cumulative sum over, and indicates when to reset the cumulative sum. For example, if my vector was x =[a, b, c, d, e] and my mask was m = [1, 0, 0, 1, 0], then torch.cumsum(x, mask=m) should return something like [a, a+b, a+b+c, d, d+e].

Alternatives

My current implementation is here (which manually iterates through a vector of log probabilities and uses a inverted mask to reset the cumulative sum).

https://github.com/18jeffreyma/multi_cmd/blob/3106417eb4acb9230576e78e7267a24b2efa6b3d/multi_cmd/rl_utils/multi_copg.py#L179

However, this implementation seems to not play very well with CUDA acceleration on autograd, as I yield zero performance benefit on a Nvidia DGX versus using CPU, which is the primary reason I bring up this feature request. Alternatively, is there some autograd issue with implementing this masked-reset cumulative sum using a for-loop?

cc @mruberry @rgommers @heitorschueroff

@zou3519 zou3519 added feature A request for a proper, new feature. module: reductions triage review labels Mar 2, 2021
@zou3519
Copy link
Contributor
zou3519 commented Mar 2, 2021

Marking this as "triage review" because I'm not sure what to label this as.

This reminds me a lot of the "reduction with splits" that Caffe2 has (where one provides some "splits" for a vector and the reduction reduces each section of a vector as defined by the splits).

From some googling, I haven't been able to find a precedent for this kind of functionality; numpy's cumsum doesn't have a reset mask. However this sounds like a useful operation to have.

@18jeffreyma
Copy link
Author

Gotcha, thanks for confirming! In terms of a temporary work around while this is implemented, is manually iterating through each element to be cumulative sum and tracking cumulative sum slower in terms of autodifferentation as compared to slicing, computing cumulative sum on slices, then concatenating? I.e.. something like this:

for i in range(slp.size(0)):
     if i == 0:
         slp[0] = 0.
     else:
         slp[i] = torch.add(slp[i-1], lp[i-1]) * mask[i-1]

@18jeffreyma
Copy link
Author

In a similar vein, would you happen to know where exactly in the repo, torch.cumsum is implemented? Would rlly appreciate it to understand a bit further why slicing and doing cumsum is more efficient than the implementation above.

@zou3519
Copy link
Contributor
zou3519 commented Mar 8, 2021

In a similar vein, would you happen to know where exactly in the repo, torch.cumsum is implemented? Would rlly appreciate it to understand a bit further why slicing and doing cumsum is more efficient than the implementation above.

It's a little complicated, but cumsum is implemented here:

Tensor cumsum(const Tensor& self, int64_t dim, c10::optional<ScalarType> dtype) {
auto result = [&]() {
NoNamesGuard guard;
return at::_cumsum(integer_upcast(self, dtype), dim);
}();
namedinference::propagate_names(result, self);
return result;
}

This calls into _cumsum, which calls into either _cumsum_cpu or _cumsum_cuda depending on the backend:

CPU: _cumsum_cpu
CUDA: _cumsum_cuda

@zou3519
Copy link
Contributor
zou3519 commented Mar 8, 2021

Gotcha, thanks for confirming! In terms of a temporary work around while this is implemented, is manually iterating through each element to be cumulative sum and tracking cumulative sum slower in terms of autodifferentation as compared to slicing, computing cumulative sum on slices, then concatenating? I.e.. something like this:

for i in range(slp.size(0)):
     if i == 0:
         slp[0] = 0.
     else:
         slp[i] = torch.add(slp[i-1], lp[i-1]) * mask[i-1]

It's hard to say without benchmarking and knowing the input sizes. But generally manually iterating through each element is not very efficient. The more operations that can be batched together, the better. So I'd expect the latter (i.e. slicing, computing cumulative sum on slices, then concatenating) to be faster

@ngimel ngimel added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix and removed feature A request for a proper, new feature. triage review labels Mar 15, 2021
@jbschlosser jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 15, 2021
@heitorschueroff heitorschueroff added module: numpy Related to numpy support, and also numpy compatibility of our operators and removed module: reductions labels Jul 9, 2021
@sergiynesterenko90
Copy link

+1 for this, it would be helpful for reinforcement learning applications. Suspect original post is thinking the same (mentioned a vector of log probs)

@biuq
Copy link
biuq commented May 9, 2024

This can be implemented without python loops in the current version of PyTorch (tested on 2.2). Below is slightly modified implementation based on heinsen_sequence:

import torch
import torch.nn.functional as F

def complex_log(float_input, eps=1e-6):
    eps = float_input.new_tensor(eps)
    real = float_input.abs().maximum(eps).log()
    imag = (float_input < 0).to(float_input.dtype) * torch.pi
    return torch.complex(real, imag)

def associative_scan(values: torch.Tensor, coeffs: torch.Tensor, dim: int):
    log_values = complex_log(values.float())
    log_coeffs = complex_log(coeffs.float())
    a_star = torch.cumsum(log_coeffs, dim=dim)
    log_x0_plus_b_star = torch.logcumsumexp(log_values - a_star, dim=dim)
    log_x = a_star + log_x0_plus_b_star
    return torch.exp(log_x).real

input = torch.tensor([1, 2, 3, 4, 5])
inverted_reset_mask = torch.tensor([0, 1, 1, 0, 1])
output = associative_scan(input, inverted_reset_mask, dim=0)

print(output)
tensor([1.0000, 3.0000, 6.0000, 4.0000, 9.0000])

Related #95408

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: numpy Related to numpy support, and also numpy compatibility of our operators module: reductions triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

8 participants
0