Implement `np.diff` for single order differences #50569

soulitzer · 2021-01-15T03:53:48Z

Implements np.diff for single order differences only:

method and function variants for diff and function variant for diff_out
supports out variant, but not in-place since shape changes
adds OpInfo entry, and test in test_torch
automatic autograd because we are using the Math dispatch

Update: we only support Tensors for prepend and append in this PR. See discussion below and comments for more details.

Currently there is a quirk in the c++ API based on how this is implemented: it is not possible to specify scalar prepend and appends without also specifying all 4 arguments.

That is because the goal is to match NumPy's diff signature of diff(int n=1, int dim=-1, Union[Scalar, Tensor] prepend=None, Union[Scalar, Tensor] append)=None where all arguments are optional, positional and in the correct order.
There are a couple blockers. One is c++ ambiguity. This prevents us from simply doing diff(int n=1, int dim=-1, Scalar? prepend=None, Tensor? append=None) etc for all combinations of {Tensor, Scalar} x {Tensor, Scalar}.

Why not have append, prepend not have default args and then write out the whole power set of {Tensor, Scalar, omitted} x {Tensor, Scalar, omitted} you might ask. Aside from having to write 18 overloads, this is actually illegal because arguments with defaults must come after arguments without defaults. This would mean having to write diff(prepend, append, n, dim) which is not desired. Finally writing out the entire power set of all arguments n, dim, prepend, append is out of the question because that would actually involve 2 * 2 * 3 * 3 = 36 combinations. And if we include the out variant, that would be 72 overloads!

With this in mind, the current way this is implemented is actually to still do diff(int n=1, int dim=-1, Scalar? prepend=None, Tensor? append=None). But also make use of cpp_no_default_args. The idea is to only have one of the 4 {Tensor, Scalar} x {Tensor, Scalar} provide default arguments for the c++ api, and add cpp_no_default_args for the remaining 3 overloads. With this, Python api works as expected, but some calls such as diff(prepend=1) won't work on c++ api.

We can optionally add 18 more overloads that cover the {dim, n, no-args} x {scalar-tensor, tensor-scalar, scalar-scalar} x {out, non-out} cases for c++ api. [edit: counting is hard - just realized this number is still wrong. We should try to count the cases we do cover instead and subtract that from the total: (2 * 2 * 3 * 3) - (3 + 2^4) = 17. 3 comes from the 3 of 4 combinations of {tensor, scalar}^2 that we declare to be cpp_no_default_args, and the one remaining case that has default arguments has covers 2^4 cases. So actual count is 34 additional overloads to support all possible calls]

[edit: thanks to #50767 hacky_wrapper is no longer necessary; it is removed in the latest commit]
hacky_wrapper was also necessary here because Tensor? will cause dispatch to look for the const optional<Tensor>& schema but also generate a const Tensor& declaration in Functions.h. hacky_wrapper allows us to define our function as const Tensor& but wraps it in optional for us, so this avoids both the errors while linking and loading.

[edit: rewrote the above to improve clarity and correct the fact that we actually need 18 more overloads (26 total), not 18 in total to complete the c++ api]

aten/src/ATen/native/ReduceOps.cpp

test/test_torch.py

torch/testing/_internal/common_methods_invocations.py

mruberry

This looks like a good start. As discussed offline, supporting only n=1 is OK for now.

This PR does need to do two things, however:

more testing, comparing directly against np.diff, and including noncontiguous and inputs that use different dimensions of the input
documentation

soulitzer · 2021-01-19T21:17:41Z

cc @bhosmer If I'm planning to support scalars in addition to Tensors for the append and prepend arguments, it seems like one way to do this is to have 4 variants in native_functions.yaml (or 8, if you include the out variants), but that seems a little clunky. Is there a better approach? Any guidance is appreciated, thanks!

bhosmer · 2021-01-21T00:46:39Z

[edit: ah you've updated since my last refresh, comment below may be out of date]

cc @bhosmer If I'm planning to support scalars in addition to Tensors for the append and prepend arguments, it seems like one way to do this is to have 4 variants in native_functions.yaml (or 8, if you include the out variants), but that seems a little clunky. Is there a better approach? Any guidance is appreciated, thanks!

Hi @soulitzer - unfortunately I think enumerating the scalar overloads is your only option presently - you may have already looked at these, but just in case, you can search for ".Scalar(" in native_functions.yaml to see examples of existing ops doing it this way.

As far as why there's no more compact alternative, I don't know the full history but I'd guess it's because the only alternative that would spare you the additional C++ entry points (i.e., that wouldn't just be sugar in native_functions.yaml) would be some kind of implicit promotion from Scalar to Tensor, which is not just fraught perf-wise but would also involve things like carefully picking a device for the tensor so that it agrees with other Tensor args, etc.

Hey though, here's a pitch for something else: not using hacky_wrapper :) [edit: the problem you mention in the PR description was fixed in #50767, so if that was the reason you used it, you're unblocked!]

Basically, without it, the kernel signatures can match the declared schema more faithfully. In the case of diff and diff_out that means

using optional<Tensor> to represent optional Tensors
putting the out argument at the end in diff_out

Here's a patch that does it (as of your last commit yesterday, anyway - changes are pretty straightforward though) https://gist.github.com/bhosmer/7b3ddfdaa19ccdd474b4a95a7cb44d3b

(But feel free not to use if you'd rather stick with what you've got - we're planning to do a big port to get rid of hacky_wrapper later in the half, and it was a good opportunity to work through a concrete use case regardless.)

cc @smessmer re optional<Tensor> discussion

soulitzer · 2021-01-21T03:12:50Z

@bhosmer its good to see the Tensor? thing fixed! I've updated it to no longer use hacky_wrapper. I was thinking instead to potentially have support for unions in codegen, which would then map to a new c10::union or something like std::variant. That would be super useful for use case like this one, where we have multiple arguments in sequence that can support multiple types.

@mruberry I think its ready for another look, when you have the chance. Let me know what your thoughts are on introducing the issue in the C++ api, is avoiding this worth adding 18 different overloads (edit: in addition)?

bhosmer · 2021-01-21T06:41:05Z

aten/src/ATen/native/ReduceOps.cpp

+
+Tensor diff_scalar_scalar(const Tensor& self, int64_t n, int64_t dim, c10::optional<Scalar> prepend, c10::optional<Scalar> append) {
+  return diff_tensor_tensor(self, n, dim,
+              prepend.has_value() ? c10::optional<Tensor>(diff_broadcast_scalar(prepend.value(), self, dim)) : c10::nullopt,


@smessmer be nice to have a .map() polyfill for these, any plans? :)

bhosmer · 2021-01-21T07:56:03Z

@bhosmer its good to see the Tensor? thing fixed! I've updated it to no longer use hacky_wrapper. I was thinking instead to potentially have support for unions in codegen, which would then map to a new c10::union or something like std::variant. That would be super useful for use case like this one, where we have multiple arguments in sequence that can support multiple types.

Yeah, you're definitely hitting the most aggravatingly combinatorial weakness of the current setup with a signature like diff's. I think supporting unions is a considerably heavier lift than it might seem, though. You'd need to

add support for unions to the codegen pipeline - I'm not sure you've surveyed this territory yet, but codegen doesn't currently handle even the compound types it does support in anything like full generality (e.g. take a look at the current type parser and assume the rest of the downstream pipeline makes similar assumptions)
teach the codegen logic that toposorts overloads about unions - on the one hand, the main case it's dealing with now - arranging Tensor/Scalar overloads properly - would no longer need to be sorted, but OTOH unions in full generality will deifnitely produce signatures that don't have a canonical specific-to-general ordering at all - we'd need to recognize these and handle them somehow
add support for union types in schema parsing and overload resolution to both eager and JIT binding infrastructure (they're separate stacks - PythonArgParser on the eager side, SchemaTypeParser and matchSchema IIRC on the JIT side)
add union types to the JIT's type system - this has actually been talked about in the past for other reasons, but AFAIK the complexity cost to typechecking has been a dealbreaker [edit: @eellison mentions that unions are now WIP on the JIT side, though not that close to landing yet, sounds like. That would take care of this item, though I don't think it would include the previous one.]
teach the dispatcher about union types. There are a few tricky pieces to this, but the toughest will be that the multi-dispatch system would need to recognize and apply special behavior to union types where one of the members of the type is Tensor, to scan them for dispatch keys at call time. Here's the endpoint of that process, where the current set of key-bearing types are handled - it's a very hot path, and the complexity of supporting general unions here, with the required perf, honestly I think takes it off the table.

You could imagine avoiding some of the complexity overhead with some sort of custom ScalarOrTensor type, this would help particularly on the parsing/overload ordering. But downstream of that it wouldn't be much simpler or less invasive - and the perf overhead of the added dynamism would remain in the dispatcher, making it at least a nonstarter for hot ops - you'd have to do the current combinatorial thing.

Which leaves us with some sort of native_functions sugar that would spare us the multiple declarations, but just mechanically unfold them into C++. This seems like a slippery slope to me - I'd rather have the relationship between the quantity of declarations in native_functions and generated c++ artifacts be straightforward, otherwise it's easy to lose track of the tonnage of code being generated.

From my POV this argues for the status quo - or at least, against unions as a way out of it. But cc @smessmer @ezyang @ljk53 for differing POVs, other clever ideas, etc. 😁

soulitzer · 2021-01-22T06:36:16Z

Thanks @bhosmer for some very interesting perspective!

I didn't think of ScalarOrTensor, but that does seem like it would probably cover the vast majority of the use case of union. If writing the op implementation what you mean by downstream, I actually do think something like ScalarOrTensor would simplify things here. In the case of diff for example, we'd only had to write a single overload rather than 4 and then a helper function that does the conversion. And on the dispatcher side, wouldn't the level of dynamism introduced be similar to something like an optional, which we already have. Both of these are "binary" in a sense and even if you have something like ScalarOrTensor?, that would be three cases instead of two. I'm probably glossing over many details here :P, but that seems to be the case from the surface.

ezyang · 2021-01-22T18:53:04Z

Adding @peterbell10 because we did some tricks with C++ argument defaulting to avoid ambiguity before, and we can do it again here, probably.

codecov · 2021-01-26T01:44:34Z

Codecov Report

Merging #50569 (ad276dc) into master (7a8c64d) will increase coverage by 0.00%.
The diff coverage is 92.50%.

@@           Coverage Diff           @@
##           master   #50569   +/-   ##
=======================================
  Coverage   80.84%   80.84%           
=======================================
  Files        1931     1931           
  Lines      210879   210919   +40     
=======================================
+ Hits       170492   170525   +33     
- Misses      40387    40394    +7

mruberry · 2021-01-27T01:02:54Z

aten/src/ATen/native/native_functions.yaml

@@ -1367,6 +1367,16 @@
 - func: fill_diagonal_(Tensor(a!) self, Scalar fill_value, bool wrap=False) -> Tensor(a!)
  variants: method

+- func: diff(Tensor self, int n=1, int dim=-1, Tensor? prepend=None, Tensor? append=None) -> Tensor


cc @rgommers for a possible issue with NumPy's signature

np.diff has both prepend and append default to "no value", which is represented by:

https://github.com/numpy/numpy/blob/1de46fe2feee9d3c500a83331ac9b75af5aef947/numpy/_globals.py#L57

The comment indicates it's intended to be used with "deprecated keywords." However, it appears these defaults come from the PR adding the kwargs: numpy/numpy#8206. @rgommers, seems like the NumPy signature would ideally have prepend=None and append=None?

That comment is a little misleading, or at least incomplete. _NoValue is mainly used when keywords are added to a function, and the function forwards its keywords somewhere. See for example np.sum which forwards to the .sum method of its input (which can be a torch.Tensor). Adding keywords to the numpy function that are not yet present in all other/downstream libraries implementing the same functionality would then break previously working code. I'll go fix the code comment.

Wherever you see _NoValue, just read None.

The PR gives this example that would not work with prepend=None for object arrays:

>>> data = np.array([2, 3], dtype=object) >>> np.diff(data, prepend=None) array([None, 1], dtype=object) # expected array([1], dtype=object) # actual

A little far-fetched, but unfortunately this works:

>>> np.array(None) array(None, dtype=object)

Object arrays are very annoying.

mruberry · 2021-01-27T17:48:08Z

test/test_torch.py

+            contig = _make_tensor(shape, dtype, device)
+            _test_diff_numpy(contig)
+
+            non_contig = torch.empty(shape + (2, 2), dtype=dtype)[..., 0]


Use make_tensor instead of _make_tensor:

pytorch/torch/testing/_internal/common_utils.py

Line 1482 in 4a2aa0f

def make_tensor(size, device: torch.device, dtype: torch.dtype, *,

Both here on 4152 and on 4149.

I've replaced _make_tensor with make_tensor on line 4149, but I don't think we can use make_tensor to produce torch.empty for line 4152.

test/test_torch.py

mruberry · 2021-01-27T17:50:39Z

test/test_torch.py

+            non_contig = torch.empty(shape + (2, 2), dtype=dtype)[..., 0]
+            non_contig = non_contig.select(-1, -1)
+            non_contig.copy_(contig)
+            self.assertTrue(not non_contig.is_contiguous() or shape == (1,))


Cool testing of both contiguous and discontiguous use cases.

torch/_torch_docs.py

mruberry · 2021-01-27T18:01:33Z

torch/testing/_internal/common_methods_invocations.py

+           op=torch.diff,
+           dtypes=all_types_and_complex_and(torch.bool, torch.float16, torch.bfloat16),
+           sample_inputs_func=sample_inputs_diff,
+           test_inplace_grad=False),


Not for this PR, but the test for inplace grads should be skipped automatically if the op doesn't have an inplace variant.

mruberry · 2021-01-27T18:01:42Z

torch/testing/_internal/common_methods_invocations.py

@@ -1917,6 +1949,9 @@ def __len__(self):
 def ident(x):
    return x

+# Do NOT add to this list. Method tests are being DEPRECATED and replaced by OpInfos.


mruberry · 2021-01-27T18:02:27Z

Hey @soulitzer!

This looks great. I made a few doc suggestions and a couple requests on test coverage.

…ement-np-diff

soulitzer · 2021-01-28T23:57:50Z

Hey @mruberry, I think I've addressed your comments. I think its ready for another look!

mruberry

Nice work, @soulitzer!

facebook-github-bot

@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-02-03T04:28:56Z

@soulitzer merged this pull request in b18eeaa.

facebook-github-bot added the cla signed label Jan 15, 2021

soulitzer changed the title ~~Implement np.diff~~ Implement np.diff for single order differences Jan 15, 2021

soulitzer force-pushed the implement-np-diff branch from 40c1a1b to 1479088 Compare January 15, 2021 04:45

soulitzer requested a review from mruberry January 15, 2021 04:46

mruberry reviewed Jan 15, 2021

View reviewed changes

aten/src/ATen/native/ReduceOps.cpp Outdated Show resolved Hide resolved

mruberry reviewed Jan 19, 2021

View reviewed changes

aten/src/ATen/native/ReduceOps.cpp Outdated Show resolved Hide resolved

mruberry reviewed Jan 19, 2021

View reviewed changes

test/test_torch.py Outdated Show resolved Hide resolved

mruberry reviewed Jan 19, 2021

View reviewed changes

torch/testing/_internal/common_methods_invocations.py Outdated Show resolved Hide resolved

mruberry self-requested a review January 19, 2021 09:11

mruberry reviewed Jan 19, 2021

View reviewed changes

soulitzer force-pushed the implement-np-diff branch from 5f7f572 to af9cc13 Compare January 19, 2021 18:50

soulitzer requested a review from bhosmer January 20, 2021 22:05

soulitzer force-pushed the implement-np-diff branch from 44c01f9 to 07a1d46 Compare January 20, 2021 22:28

Add implementation tests and docs

5e99c87

soulitzer force-pushed the implement-np-diff branch from 07a1d46 to 5e99c87 Compare January 20, 2021 22:29

soulitzer added 2 commits January 20, 2021 17:51

Fix typos and add message about method tests

1deba99

Fix flake8

3afa2bc

soulitzer added 2 commits January 20, 2021 21:40

Remove hacky wrapper, improve docs

84f28c4

Update tensor docs

c948f1d

soulitzer force-pushed the implement-np-diff branch from fcd2e54 to c948f1d Compare January 21, 2021 02:44

soulitzer requested a review from mruberry January 21, 2021 03:13

Call .cpu() before .numpy()

abe47a0

bhosmer reviewed Jan 21, 2021

View reviewed changes

ezyang requested a review from peterbell10 January 22, 2021 18:52

typo

c86c91a

Improve errors

49a17ad