DOC: Update subclassing doc regarding downstream compatibility #7491

gfyoung · 2016-03-31T10:19:04Z

Yet another fun journey for me down compatibility lane for fromnumeric.py. Closes #7460. As was the case in #7325, this PR does not destroy any old functionality or compatibility, only improves it.

Updates documentation regarding numpy's responsibility to ensure downstream compatibility.

Closes #7460.

gfyoung · 2016-04-01T02:00:46Z

cc @jreback - this should address the breakage seen in #12687 in pandas from the numpy side.

gfyoung · 2016-04-01T05:06:36Z

@jreback: I tested the current state of this PR against my budding PR for pandas using the **kwargs change as you suggested, and the tests pass fine.

gfyoung · 2016-04-17T02:01:38Z

@charris : Can this be tagged as well for the upcoming 1.12.0 release as well (or even 1.11.1)? This PR patches up a bug I found in original PR #7325 and is important for compatibility reasons in the pandas library.

seberg · 2016-04-17T10:27:19Z

Isn't the original only in the master branch? Backporting of this seems a no-go to me in any case....
If this fixes a regression currently in master we should probably put at least that part into 1.12. What is the regression?

gfyoung · 2016-04-17T11:16:06Z

@seberg : you are correct: #7325 is not in 1.11.0 (or maintenance/1.11.x for that matter), so yes, this is fair to tag for 1.12.0.

gfyoung · 2016-05-07T13:08:51Z

Can somebody take a look at this? This PR is important for the pandas library and patches a bug in my original PR (#7325).

charris · 2016-05-08T01:05:06Z

numpy/core/fromnumeric.py

@@ -52,6 +52,25 @@ def _wrapit(obj, method, *args, **kwds):
    return result


+def _is_compat_error(e, method):


Could you add a docstring here explaining what the function does? Usual docstring format would be good. I know that some of the other functions are lacking in complete documentation, but they are old...

charris · 2016-05-28T01:34:42Z

@gfyoung I'm going to be traveling for a few days, but now that 1.11.1 is on its way I'll be spending more time on getting 1.12 branched.

gfyoung · 2016-05-28T07:47:56Z

@charris : 👍 thanks for letting me know!

gfyoung · 2016-07-29T03:04:15Z

@charris : Any updates on this, in particular this comment here?

charris · 2016-07-29T18:22:12Z

Where do functions in fromnumeric get called from the ndarray implementation?

# We perform this check because the ndarray implementation
# points to the implementation in this module. Thus, we would
# be going in a circle if we then called the ndarray method.

It appears to me that we have code organization problems if that is the case.

gfyoung · 2016-07-29T18:25:41Z

The ndarray implementation points to the C implementation, which is what is called in fromnumeric. Ctrl+F any function that uses this wrapper, remove the check, and then call the function. You should get a recursion depth exceeded error.

charris · 2016-07-29T18:55:42Z

I believe there is a problem, but trace it for me, say for sum. AFAICT, the ndarray sum method calls _sum in _methods.py, which does not call into fromnumeric unless the ufunc is doing something funny, which might be the case. But if so we want to fix that mess first.

gfyoung · 2016-07-30T05:31:00Z

Hmmm...can't seem to reproduce now. I took out the check.

gfyoung · 2016-08-05T02:23:45Z

@charris : I removed the checks, and Travis + Appveyor are still passing. I also checked that this doesn't break anything on pandas. So ready to merge if there are no other concerns.

charris · 2016-08-09T20:20:35Z

Why not something lik 8000 e

def _forwardcall(obj, func, method, *args, **kwds):
    # Call method if available.
    if hasattr(obj, method):
        return getattr(obj, method)(*args, **kwds)

    # Call function to handle it, including wrapping
    if func:
        return func(obj, *args, **kwds)

    return _wrapit(obj, method, *args, **kwds)

def _wrapfunc(obj, method, *args, **kwds):
    return _forwardcall(obj, None, method, *args, **kwds)


def _wrapfunc2(obj, func, method, *args, **kwds):
    return _forwardcall(obj, func, method, *args, **kwds)

Note that the function needs to be passed as _methods._sum, etc. I'm not clear that there should be any compatible errors, what am I missing?

EDIT: The function never gets called for ndarrays here, nor in your original. In the 1.11 version going through the method for ndarrays was avoided. I don't know what the timing difference is.

EDIT: I don't like the big string searches, they add a lot of call overhead.

gfyoung · 2016-08-10T03:14:19Z

@charris : The reason why I have all of the try-except is because of signature incompatibilities. For example (and this is one of many), numpy.ndarray and pandas.Series both have sum as a method, but their signatures are quite different. Your suggestion does not account for that and would cause a TypeError due to unexpected arguments.

Also, I don't quite follow your point about big string searches and call overhead. IINM in is quite fast and should dent performance. Also, the reason I do the check is because we want to filter out only certain types of AttributeError or TypeError.

charris · 2016-08-10T04:36:56Z

Let me start by saying that I won't put this in as is and trying to be compatible with everything downstream is a losing proposition. That said, it might be worth making some limited exceptions for bits of pandas. Let's look at Series.sum

Series.sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

We can't support all of that, only the axis, which may not support tuples, and some kwargs that the pandas' documentation does not specify. AFAIK, keepdims and maybe out are the only numpy keywords that were causing problems and I think Allan's PR already fixed keepdims. I think if a simple call fails, let it die. The user should be using the Series method if they want the functionality. It may even be a bad idea to give them the idea that np.sum should work for Series. For subclasses of ndarray a little more flexibility on our side may be appropriate, especially if we add arguments, but Series.sum has diverged a long ways from ndarray.sum.

The simplest construct for the _wrapfunction2 functionality that I can think of looks something like

...
if type(a) is not mu.ndarray and hasattr(a, 'sum'):
    return a.sum(axis=axis, dtype=dtype, out=out, **kwargs)
return _methods._sum(a, axis=axis, dtype=dtype, out=out, **kwargs)

Which is only three lines and might should be in the function definition. Note that it calls direct into _methods._sum instead of going through the ndarray method and doesn't call another function to do stuff. It would be nice to know if making the direct call for ndarray is actually worth it timewise as the special case dirties up the code.

gfyoung · 2016-08-10T04:42:42Z

@charris : I don't think you are fully understanding the point of this PR IMHO. This is not about accommodating to pandas. While pandas is what spurred the PR, this is about more gracefully handling issues of downstream incompatibility. Notice that none of the changes I made are pandas specific. I tried to make the changes as generic as possible for a reason.

gfyoung · 2016-08-10T04:46:54Z

All of these functions continue to check if the object being passed in has a corresponding method (e.g. sum), and if so, tries to make the call while passing in all of the arguments. However, unlike before, if the call fails, instead of blowing up, we call the _methods method to handle the object as array-like.

gfyoung · 2016-08-10T04:47:31Z

@charris : What exactly is the issue that you have with my changes at the moment? You have been proposing all of these alternatives, and I fail to see what you are disagreeing with here.

shoyer · 2016-08-10T07:16:27Z

numpy/core/fromnumeric.py

-    except AttributeError:
+    # Because the putting is done in-place, we need to enforce
+    # that the object being passed in is an instance of ndarray
+    if not isinstance(a, mu.ndarray):


Is there really no way to currently call np.put on a non-ndarray and get a correct results? If not, this is a breaking API change that we need to be extreme cautious about.

shoyer · 2016-08-10T07:41:46Z

My understanding is that the reason why pandas accepts **kwargs in the signature for Series.sum is actually so that it can handle (by ignoring) the extra keyword arguments from numpy.sum. Xarray does the same thing. So there's already a pretty sane way to handle duck typing compatibility with numpy.sum (if such a thing is really desirable).

@gfyoung I appreciate your efforts but, like @charris, don't agree with the direction in this PR. Matching strings in error messages is slow and fragile, and the logic in _wrapfunction2 is too convoluted. The idiomatic approach is to refuse to guess in such circumstances and simply fail. Several of NumPy's most glaring issues stem from similarly heroic attempts to succeed in all possible circumstances (see, e.g., the way we handle arbitrary types in np.array or binary arithmetic).

I think the current approach used by these functions (calling the method if it exists, with keyword arguments as fixed by your PR #7325) is about as far as we can reasonably take duck typing. If the author of a duck array class wishes to be compatible with future signature changes to NumPy functions, they can include **kwargs (or, better yet, **unused_kwargs) in the signature of any such functions. If it isn't already obvious, then we should certainly document this option clearly.

gfyoung · 2016-08-10T07:55:05Z

@shoyer : Given the push-back, fair enough. Where should this be documented?

shoyer · 2016-08-10T08:04:34Z

I don't see a really obvious best fit, but maybe either of these pages: http://docs.scipy.org/doc/numpy/reference/arrays.classes.html
http://docs.scipy.org/doc/numpy/user/basics.subclassing.html

The other obvious place would simply be in the docstring of methods such as numpy.sum.

rgommers · 2016-08-10T09:26:07Z

+1 for documenting this in the subclassing doc, not in docstrings (not directly relevant for most np.sum users, and will lead to more duplication).

gfyoung · 2016-08-10T15:39:13Z

@charris , @shoyer , @rgommers : Changed PR to only update documentation.

shoyer · 2016-08-10T16:27:06Z

numpy/doc/subclassing.py

@@ -555,6 +555,21 @@ def __array_wrap__(self, arr, context=None):
 how this can work, have a look at the ``memmap`` class in
 ``numpy.core``.

+Subclassing and Downstream Compatibility


This section should focus on concrete examples of how NumPy accommodates subclasses and duck-type arrays in its API. For example, the way that numpy functions like np.sum attempt to call object methods, and what signature your sum method needs to have for this to work. Without context, it's not clear what it means to be "compatible with numpy's method".

Sure thing, done.

shoyer · 2016-08-11T06:45:27Z

@gfyoung thanks for bearing with me, the doc page looks great to me now.

Anyone else have feedback?

[ci skip]

shoyer · 2016-08-15T16:08:57Z

Thanks @gfyoung

gfyoung · 2016-08-15T17:06:56Z

@shoyer, @charris : ty for all your comments!

charris added 00 - Bug component: numpy._core 03 - Maintenance labels Mar 31, 2016

gfyoung mentioned this pull request Apr 1, 2016

COMPAT: breakage with just merged: https://github.com/numpy/numpy/pull/7325 pandas-dev/pandas#12687

Closed

gfyoung mentioned this pull request Apr 7, 2016

COMPAT: Expand compatibility with fromnumeric.py pandas-dev/pandas#12810

Closed

seberg added this to the 1.12.0 release milestone Apr 17, 2016

charris reviewed May 8, 2016
View reviewed changes

shoyer reviewed Aug 10, 2016
View reviewed changes

gfyoung changed the title ~~BUG, MAINT: Further improvements to fromnumeric.py downstream library compatibility~~ DOC: Update subclassing doc regarding downstream compatibility Aug 10, 2016

shoyer reviewed Aug 10, 2016
View reviewed changes

shoyer added 04 - Documentation and removed 00 - Bug labels Aug 11, 2016

DOC: Update subclassing doc regarding downstream compat

127ecd1

[ci skip]

shoyer merged commit 2fe4162 into numpy:master Aug 15, 2016

gfyoung deleted the fromnumeric-compat-expand branch August 15, 2016 17:07

		@@ -52,6 +52,25 @@ def _wrapit(obj, method, args, *kwds):
		return result


		def _is_compat_error(e, method):

Uh oh!

DOC: Update subclassing doc regarding downstream compatibility #7491

DOC: Update subclassing doc regarding downstream compatibility #7491

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!