8000 Implement `np.diff` for single order differences by soulitzer · Pull Request #50569 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Implement np.diff for single order differences #50569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

soulitzer
Copy link
Contributor
@soulitzer soulitzer commented Jan 15, 2021

Implements np.diff for single order differences only:

  • method and function variants for diff and function variant for diff_out
  • supports out variant, but not in-place since shape changes
  • adds OpInfo entry, and test in test_torch
  • automatic autograd because we are using the Math dispatch

Update: we only support Tensors for prepend and append in this PR. See discussion below and comments for more details.

Currently there is a quirk in the c++ API based on how this is implemented: it is not possible to specify scalar prepend and appends without also specifying all 4 arguments.

That is because the goal is to match NumPy's diff signature of diff(int n=1, int dim=-1, Union[Scalar, Tensor] prepend=None, Union[Scalar, Tensor] append)=None where all arguments are optional, positional and in the correct order.
There are a couple blockers. One is c++ ambiguity. This prevents us from simply doing diff(int n=1, int dim=-1, Scalar? prepend=None, Tensor? append=None) etc for all combinations of {Tensor, Scalar} x {Tensor, Scalar}.

Why not have append, prepend not have default args and then write out the whole power set of {Tensor, Scalar, omitted} x {Tensor, Scalar, omitted} you might ask. Aside from having to write 18 overloads, this is actually illegal because arguments with defaults must come after arguments without defaults. This would mean having to write diff(prepend, append, n, dim) which is not desired. Finally writing out the entire power set of all arguments n, dim, prepend, append is out of the question because that would actually involve 2 * 2 * 3 * 3 = 36 combinations. And if we include the out variant, that would be 72 overloads!

With this in mind, the current way this is implemented is actually to still do diff(int n=1, int dim=-1, Scalar? prepend=None, Tensor? append=None). But also make use of cpp_no_default_args. The idea is to only have one of the 4 {Tensor, Scalar} x {Tensor, Scalar} provide default arguments for the c++ api, and add cpp_no_default_args for the remaining 3 overloads. With this, Python api works as expected, but some calls such as diff(prepend=1) won't work on c++ api.

We can optionally add 18 more overloads that cover the {dim, n, no-args} x {scalar-tensor, tensor-scalar, scalar-scalar} x {out, non-out} cases for c++ api. [edit: counting is hard - just realized this number is still wrong. We should try to count the cases we do cover instead and subtract that from the total: (2 * 2 * 3 * 3) - (3 + 2^4) = 17. 3 comes from the 3 of 4 combinations of {tensor, scalar}^2 that we declare to be cpp_no_default_args, and the one remaining case that has default arguments has covers 2^4 cases. So actual count is 34 additional overloads to support all possible calls]

[edit: thanks to #50767 hacky_wrapper is no longer necessary; it is removed in the latest commit]
hacky_wrapper was also necessary here because Tensor? will cause dispatch to look for the const optional<Tensor>& schema but also generate a const Tensor& declaration in Functions.h. hacky_wrapper allows us to define our function as const Tensor& but wraps it in optional for us, so this avoids both the errors while linking and loading.

[edit: rewrote the above to improve clarity and correct the fact that we actually need 18 more overloads (26 total), not 18 in total to complete the c++ api]

@soulitzer soulitzer changed the title Implement np.diff Implement np.diff for single order differences Jan 15, 2021
@soulitzer soulitzer requested a review from mruberry January 15, 2021 04:46
@mruberry mruberry self-requested a review January 19, 2021 09:11
Copy link
Collaborator
@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start. As discussed offline, supporting only n=1 is OK for now.

This PR does need to do two things, however:

  • more testing, comparing directly against np.diff, and including noncontiguous and inputs that use different dimensions of the input
  • documentation

@soulitzer
Copy link
Contributor Author

cc @bhosmer If I'm planning to support scalars in addition to Tensors for the append and prepend arguments, it seems like one way to do this is to have 4 variants in native_functions.yaml (or 8, if you include the out variants), but that seems a little clunky. Is there a better approach? Any guidance is appreciated, thanks!

@bhosmer
Copy link
bhosmer commented Jan 21, 2021

[edit: ah you've updated since my last refresh, comment below may be out of date]

cc @bhosmer If I'm planning to support scalars in addition to Tensors for the append and prepend arguments, it seems like one way to do this is to have 4 variants in native_functions.yaml (or 8, if you include the out variants), but that seems a little clunky. Is there a better approach? Any guidance is appreciated, thanks!

Hi @soulitzer - unfortunately I think enumerating the scalar overloads is your only option presently - you may have already looked at these, but just in case, you can search for ".Scalar(" in native_functions.yaml to see examples of existing ops doing it this way.

As far as why there's no more compact alternative, I don't know the full history but I'd guess it's because the only alternative that would spare you the additional C++ entry points (i.e., that wouldn't just be sugar in native_functions.yaml) would be some kind of implicit promotion from Scalar to Tensor, which is not just fraught perf-wise but would also involve things like carefully picking a device for the tensor so that it agrees with other Tensor args, etc.

Hey though, here's a pitch for something else: not using hacky_wrapper :) [edit: the problem you mention in the PR description was fixed in #50767, so if that was the reason you used it, you're unblocked!]

Basically, without it, the kernel signatures can match the declared schema more faithfully. In the case of diff and diff_out that means

  • using optional<Tensor> to represent optional Tensors
  • putting the out argument at the end in diff_out

Here's a patch that does it (as of your last commit yesterday, anyway - changes are pretty straightforward though) https://gist.github.com/bhosmer/7b3ddfdaa19ccdd474b4a95a7cb44d3b

(But feel free not to use if you'd rather stick with what you've got - we're planning to do a big port to get rid of hacky_wrapper later in the half, and it was a good opportunity to work through a concrete use case regardless.)

cc @smessmer re optional<Tensor> discussion

@soulitzer
Copy link
Contributor Author
soulitzer commented Jan 21, 2021

@bhosmer its good to see the Tensor? thing fixed! I've updated it to no longer use hacky_wrapper. I was thinking instead to potentially have support for unions in codegen, which would then map to a new c10::union or something like std::variant. That would be super useful for use case like this one, where we have multiple arguments in sequence that can support multiple types.

@mruberry I think its ready for another look, when you have the chance. Let me know what your thoughts are on introducing the issue in the C++ api, is avoiding this worth adding 18 different overloads (edit: in addition)?

@soulitzer soulitzer requested a review from mruberry January 21, 2021 03:13

Tensor diff_scalar_scalar(const Tensor& self, int64_t n, int64_t dim, c10::optional<Scalar> prepend, c10::optional<Scalar> append) {
return diff_tensor_tensor(self, n, dim,
prepend.has_value() ? c10::optional<Tensor>(diff_broadcast_scalar(prepend.value(), self, dim)) : c10::nullopt,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smessmer be nice to have a .map() polyfill for these, any plans? :)

@bhosmer
Copy link
bhosmer commented Jan 21, 2021

@bhosmer its good to see the Tensor? thing fixed! I've updated it to no longer use hacky_wrapper. I was thinking instead to potentially have support for unions in codegen, which would then map to a new c10::union or something like std::variant. That would be super useful for use case like this one, where we have multiple arguments in sequence that can support multiple types.

Yeah, you're definitely hitting the most aggravatingly combinatorial weakness of the current setup with a signature like diff's. I think supporting unions is a considerably heavier lift than it might seem, though. You'd need to

  • add support for unions to the codegen pipeline - I'm not sure you've surveyed this territory yet, but codegen doesn't currently handle even the compound types it does support in anything like full generality (e.g. take a look at the current type parser and assume the rest of the downstream pipeline makes similar assumptions)
  • teach the codegen logic that toposorts overloads about unions - on the one hand, the main case it's dealing with now - arranging Tensor/Scalar overloads properly - would no longer need to be sorted, but OTOH unions in full generality will deifnitely produce signatures that don't have a canonical specific-to-general ordering at all - we'd need to recognize these and handle them somehow
  • add support for union types in schema parsing and overload resolution to both eager and JIT binding infrastructure (they're separate stacks - PythonArgParser on the eager side, SchemaTypeParser and matchSchema IIRC on the JIT side)
  • add union types to the JIT's type system - this has actually been talked about in the past for other reasons, but AFAIK the complexity cost to typechecking has been a dealbreaker [edit: @eellison mentions that unions are now WIP on the JIT side, though not that close to landing yet, sounds like. That would take care of this item, though I don't think it would include the previous one.]
  • teach the dispatcher about union types. There are a few tricky pieces to this, but the toughest will be that the multi-dispatch system would need to recognize and apply special behavior to union types where one of the members of the type is Tensor, to scan them for dispatch keys at call time. Here's the endpoint of that process, where the current set of key-bearing types are handled - it's a very hot path, and the complexity of supporting general unions here, with the required perf, honestly I think takes it off the table.

You could imagine avoiding some of the complexity overhead with some sort of custom ScalarOrTensor type, this would help particularly on the parsing/overload ordering. But downstream of that it wouldn't be much simpler or less invasive - and the perf overhead of the added dynamism would remain in the dispatcher, making it at least a nonstarter for hot ops - you'd have to do the current combinatorial thing.

Which leaves us with some sort of native_functions sugar that would spare us the multiple declarations, but just mechanically unfold them into C++. This seems like a slippery slope to me - I'd rather have the relationship between the quantity of declarations in native_functions and generated c++ artifacts be straightforward, otherwise it's easy to lose track of the tonnage of code being generated.

From my POV this argues for the status quo - or at least, against unions as a way out of it. But cc @smessmer @ezyang @ljk53 for differing POVs, other clever ideas, etc. 😁

@soulitzer
Copy link
Contributor Author

Thanks @bhosmer for some very interesting perspective!

I didn't think of ScalarOrTensor, but that does seem like it would probably cover the vast majority of the use case of union. If writing the op implementation what you mean by downstream, I actually do think something like ScalarOrTensor would simplify things here. In the case of diff for example, we'd only had to write a single overload rather than 4 and then a helper function that does the conversion. And on the dispatcher side, wouldn't the level of dynamism introduced be similar to something like an optional, which we already have. Both of these are "binary" in a sense and even if you have something like ScalarOrTensor?, that would be three cases instead of two. I'm probably glossing over many details here :P, but that seems to be the case from the surface.

@ezyang ezyang requested a review from peterbell10 January 22, 2021 18:52
@ezyang
Copy link
Contributor
ezyang commented Jan 22, 2021

Adding @peterbell10 because we did some tricks with C++ argument defaulting to avoid ambiguity before, and we can do it again here, probably.

@codecov
Copy link
codecov bot commented Jan 26, 2021

Codecov Report

Merging #50569 (ad276dc) into master (7a8c64d) will increase coverage by 0.00%.
The diff coverage is 92.50%.

@@           Coverage Diff           @@
##           master   #50569   +/-   ##
=======================================
  Coverage   80.84%   80.84%           
=======================================
  Files        1931     1931           
  Lines      210879   210919   +40     
=======================================
+ Hits       170492   170525   +33     
- Misses      40387    40394    +7     

@@ -1367,6 +1367,16 @@
- func: fill_diagonal_(Tensor(a!) self, Scalar fill_value, bool wrap=False) -> Tensor(a!)
variants: method

- func: diff(Tensor self, int n=1, int dim=-1, Tensor? prepend=None, Tensor? append=None) -> Tensor
Copy link
Collaborator
@mruberry mruberry Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @rgommers for a possible issue with NumPy's signature

np.diff has both prepend and append default to "no value", which is represented by:

https://github.com/numpy/numpy/blob/1de46fe2feee9d3c500a83331ac9b75af5aef947/numpy/_globals.py#L57

The comment indicates it's intended to be used with "deprecated keywords." However, it appears these defaults come from the PR adding the kwargs: numpy/numpy#8206. @rgommers, seems like the NumPy signature would ideally have prepend=None and append=None?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment is a little misleading, or at least incomplete. _NoValue is mainly used when keywords are added to a function, and the function forwards its keywords somewhere. See for example np.sum which forwards to the .sum method of its input (which can be a torch.Tensor). Adding keywords to the numpy function that are not yet present in all other/downstream libraries implementing the same functionality would then break previously working code. I'll go fix the code comment.

Wherever you see _NoValue, just read None.

The PR gives this example that would not work with prepend=None for object arrays:

>>> data = np.array([2, 3], dtype=object)
>>> np.diff(data, prepend=None)
array([None, 1], dtype=object)  # expected
array([1], dtype=object)  # actual

A little far-fetched, but unfortunately this works:

>>> np.array(None)                                                          
array(None, dtype=object)

Object arrays are very annoying.

contig = _make_tensor(shape, dtype, device)
_test_diff_numpy(contig)

non_contig = torch.empty(shape + (2, 2), dtype=dtype)[..., 0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use make_tensor instead of _make_tensor:

def make_tensor(size, device: torch.device, dtype: torch.dtype, *,

Both here on 4152 and on 4149.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced _make_tensor with make_tensor on line 4149, but I don't think we can use make_tensor to produce torch.empty for line 4152.

non_contig = torch.empty(shape + (2, 2), dtype=dtype)[..., 0]
non_contig = non_contig.select(-1, -1)
non_contig.copy_(contig)
self.assertTrue(not non_contig.is_contiguous() or shape == (1,))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool testing of both contiguous and discontiguous use cases.

op=torch.diff,
dtypes=all_types_and_complex_and(torch.bool, torch.float16, torch.bfloat16),
sample_inputs_func=sample_inputs_diff,
test_inplace_grad=False),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but the test for inplace grads should be skipped automatically if the op doesn't have an inplace variant.

@@ -1917,6 +1949,9 @@ def __len__(self):
def ident(x):
return x

# Do NOT add to this list. Method tests are being DEPRECATED and replaced by OpInfos.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mruberry
Copy link
Collaborator

Hey @soulitzer!

This looks great. I made a few doc suggestions and a couple requests on test coverage.

@mruberry mruberry mentioned this pull request Jan 27, 2021
3 tasks
@soulitzer
Copy link
Contributor Author

Hey @mruberry, I think I've addressed your comments. I think its ready for another look!

@soulitzer soulitzer requested a review from mruberry January 28, 2021 23:57
Copy link
Collaborator
@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @soulitzer!

Copy link
Contributor
@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@soulitzer merged this pull request in b18eeaa.

@soulitzer soulitzer deleted the implement-np-diff branch April 14, 2021 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0