8000 DOC: Update subclassing doc regarding downstream compatibility by gfyoung · Pull Request #7491 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

DOC: Update subclassing doc regarding downstream compatibility #7491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 15, 2016
Merged

DOC: Update subclassing doc regarding downstream compatibility #7491

merged 1 commit into from
Aug 15, 2016

Conversation

gfyoung
Copy link
Contributor
@gfyoung gfyoung commented Mar 31, 2016

Yet another fun journey for me down compatibility lane for fromnumeric.py. Closes #7460. As was the case in #7325, this PR does not destroy any old functionality or compatibility, only improves it.

Updates documentation regarding numpy's responsibility to ensure downstream compatibility.

Closes #7460.

@gfyoung
Copy link
Contributor Author
gfyoung commented Apr 1, 2016

cc @jreback - this should address the breakage seen in #12687 in pandas from the numpy side.

@gfyoung
Copy link
Contributor Author
gfyoung commented Apr 1, 2016

@jreback: I tested the current state of this PR against my budding PR for pandas using the **kwargs change as you suggested, and the tests pass fine.

@gfyoung
Copy link
Contributor Author
gfyoung commented Apr 17, 2016

@charris : Can this be tagged as well for the upcoming 1.12.0 release as well (or even 1.11.1)? This PR patches up a bug I found in original PR #7325 and is important for compatibility reasons in the pandas library.

@seberg
Copy link
Member
seberg commented Apr 17, 2016

Isn't the original only in the master branch? Backporting of this seems a no-go to me in any case....
If this fixes a regression currently in master we should probably put at least that part into 1.12. What is the regression?

@seberg seberg added this to the 1.12.0 release milestone Apr 17, 2016
@gfyoung
Copy link
Contributor Author
gfyoung commented Apr 17, 2016

@seberg : you are correct: #7325 is not in 1.11.0 (or maintenance/1.11.x for that matter), so yes, this is fair to tag for 1.12.0.

@gfyoung
Copy link
Contributor Author
gfyoung commented May 7, 2016

Can somebody take a look at this? This PR is important for the pandas library and patches a bug in my original PR (#7325).

@@ -52,6 +52,25 @@ def _wrapit(obj, method, *args, **kwds):
return result


def _is_compat_error(e, method):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a docstring here explaining what the function does? Usual docstring format would be good. I know that some of the other functions are lacking in complete documentation, but they are old...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@charris
Copy link
Member
charris commented May 28, 2016

@gfyoung I'm going to be traveling for a few days, but now that 1.11.1 is on its way I'll be spending more time on getting 1.12 branched.

@gfyoung
Copy link
Contributor Author
gfyoung commented May 28, 2016

@charris : 👍 thanks for letting me know!

@gfyoung
Copy link
Contributor Author
gfyoung commented Jul 29, 2016

@charris : Any updates on this, in particular this comment here?

@charris
Copy link
Member
charris commented Jul 29, 2016

Where do functions in fromnumeric get called from the ndarray implementation?

# We perform this check because the ndarray implementation
# points to the implementation in this module. Thus, we would
# be going in a circle if we then called the ndarray method.

It appears to me that we have code organization problems if that is the case.

@gfyoung
Copy link
Contributor Author
gfyoung commented Jul 29, 2016

The ndarray implementation points to the C implementation, which is what is called in fromnumeric. Ctrl+F any function that uses this wrapper, remove the check, and then call the function. You should get a recursion depth exceeded error.

@charris
Copy link
Member
charris commented Jul 29, 2016

I believe there is a problem, but trace it for me, say for sum. AFAICT, the ndarray sum method calls _sum in _methods.py, which does not call into fromnumeric unless the ufunc is doing something funny, which might be the case. But if so we want to fix that mess first.

@gfyoung
Copy link
Contributor Author
gfyoung commented Jul 30, 2016

Hmmm...can't seem to reproduce now. I took out the check.

@gfyoung
Copy link
Contributor Author
gfyoung commented Aug 5, 2016

@charris : I removed the checks, and Travis + Appveyor are still passing. I also checked that this doesn't break anything on pandas. So ready to merge if there are no other concerns.

@charris
Copy link
Member
charris commented Aug 9, 2016

Why not something lik 8000 e

def _forwardcall(obj, func, method, *args, **kwds):
    # Call method if available.
    if hasattr(obj, method):
        return getattr(obj, method)(*args, **kwds)

    # Call function to handle it, including wrapping
    if func:
        return func(obj, *args, **kwds)

    return _wrapit(obj, method, *args, **kwds)

def _wrapfunc(obj, method, *args, **kwds):
    return _forwardcall(obj, None, method, *args, **kwds)


def _wrapfunc2(obj, func, method, *args, **kwds):
    return _forwardcall(obj, func, method, *args, **kwds)

Note that the function needs to be passed as _methods._sum, etc. I'm not clear that there should be any compatible errors, what am I missing?

EDIT: The function never gets called for ndarrays here, nor in your original. In the 1.11 version going through the method for ndarrays was avoided. I don't know what the timing difference is.

EDIT: I don't like the big string searches, they add a lot of call overhead.

@gfyoung
Copy link
Contributor Author
gfyoung commented Aug 10, 2016

@charris : The reason why I have all of the try-except is because of signature incompatibilities. For example (and this is one of many), numpy.ndarray and pandas.Series both have sum as a method, but their signatures are quite different. Your suggestion does not account for that and would cause a TypeError due to unexpected arguments.

Also, I don't quite follow your point about big string searches and call overhead. IINM in is quite fast and should dent performance. Also, the reason I do the check is because we want to filter out only certain types of AttributeError or TypeError.

@charris
Copy link
Member
charris commented Aug 10, 2016

Let me start by saying that I won't put this in as is and trying to be compatible with everything downstream is a losing proposition. That said, it might be worth making some limited exceptions for bits of pandas. Let's look at Series.sum

Series.sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

We can't support all of that, only the axis, which may not support tuples, and some kwargs that the pandas' documentation does not specify. AFAIK, keepdims and maybe out are the only numpy keywords that were causing problems and I think Allan's PR already fixed keepdims. I think if a simple call fails, let it die. The user should be using the Series method if they want the functionality. It may even be a bad idea to give them the idea that np.sum should work for Series. For subclasses of ndarray a little more flexibility on our side may be appropriate, especially if we add arguments, but Series.sum has diverged a long ways from ndarray.sum.

The simplest construct for the _wrapfunction2 functionality that I can think of looks something like

...
if type(a) is not mu.ndarray and hasattr(a, 'sum'):
    return a.sum(axis=axis, dtype=dtype, out=out, **kwargs)
return _methods._sum(a, axis=axis, dtype=dtype, out=out, **kwargs)

Which is only three lines and might should be in the function definition. Note that it calls direct into _methods._sum instead of going through the ndarray method and doesn't call another function to do stuff. It would be nice to know if making the direct call for ndarray is actually worth it timewise as the special case dirties up the code.

@gfyoung
Copy link
Contributor Author
gfyoung commented Aug 10, 2016

@charris : I don't think you are fully understanding the point of this PR IMHO. This is not about accommodating to pandas. While pandas is what spurred the PR, this is about more gracefully handling issues of downstream incompatibility. Notice that none of the changes I made are pandas specific. I tried to make the changes as generic as possible for a reason.

@gfyoung
Copy link
Contributor Author
gfyoung commented Aug 10, 2016

All of these functions continue to check if the object being passed in has a corresponding method (e.g. sum), and if so, tries to make the call while passing in all of the arguments. However, unlike before, if the call fails, instead of blowing up, we call the _methods method to handle the object as array-like.

@gfyoung
Copy link
Contributor Author
gfyoung commented Aug 10, 2016

@charris : What exactly is the issue that you have with my changes at the moment? You have been proposing all of these alternatives, and I fail to see what you are disagreeing with here.

except AttributeError:
# Because the putting is done in-place, we need to enforce
# that the object being passed in is an instance of ndarray
if not isinstance(a, mu.ndarray):
Copy link
Member
6D40

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there really no way to currently call np.put on a non-ndarray and get a correct results? If not, this is a breaking API change that we need to be extreme cautious about.

@shoyer
Copy link
Member
shoyer commented Aug 10, 2016

My understanding is that the reason why pandas accepts **kwargs in the signature for Series.sum is actually so that it can handle (by ignoring) the extra keyword arguments from numpy.sum. Xarray does the same thing. So there's already a pretty sane way to handle duck typing compatibility with numpy.sum (if such a thing is really desirable).

@gfyoung I appreciate your efforts but, like @charris, don't agree with the direction in this PR. Matching strings in error messages is slow and fragile, and the logic in _wrapfunction2 is too convoluted. The idiomatic approach is to refuse to guess in such circumstances and simply fail. Several of NumPy's most glaring issues stem from similarly heroic attempts to succeed in all possible circumstances (see, e.g., the way we handle arbitrary types in np.array or binary arithmetic).

I think the current approach used by these functions (calling the method if it exists, with keyword arguments as fixed by your PR #7325) is about as far as we can reasonably take duck typing. If the author of a duck array class wishes to be compatible with future signature changes to NumPy functions, they can include **kwargs (or, better yet, **unused_kwargs) in the signature of any such functions. If it isn't already obvious, then we should certainly document this option clearly.

@gfyoung
Copy link
Contributor Author
gfyoung commented Aug 10, 2016

@shoyer : Given the push-back, fair enough. Where should this be documented?

@shoyer
Copy link
Member
shoyer commented Aug 10, 2016

I don't see a really obvious best fit, but maybe either of these pages: http://docs.scipy.org/doc/numpy/reference/arrays.classes.html
http://docs.scipy.org/doc/numpy/user/basics.subclassing.html

The other obvious place would simply be in the docstring of methods such as numpy.sum.

@rgommers
Copy link
Member

+1 for documenting this in the subclassing doc, not in docstrings (not directly relevant for most np.sum users, and will lead to more duplication).

@gfyoung gfyoung changed the title BUG, MAINT: Further improvements to fromnumeric.py downstream library compatibility DOC: Update subclassing doc regarding downstream compatibility Aug 10, 2016
@gfyoung
Copy link
Contributor Author
gfyoung commented Aug 10, 2016

@charris , @shoyer , @rgommers : Changed PR to only update documentation.

@@ -555,6 +555,21 @@ def __array_wrap__(self, arr, context=None):
how this can work, have a look at the ``memmap`` class in
``numpy.core``.

Subclassing and Downstream Compatibility
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should focus on concrete examples of how NumPy accommodates subclasses and duck-type arrays in its API. For example, the way that numpy functions like np.sum attempt to call object methods, and what signature your sum method needs to have for this to work. Without context, it's not clear what it means to be "compatible with numpy's method".

Copy link
Contributor Author
@gfyoung gfyoung Aug 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, done.

@shoyer
Copy link
Member
shoyer commented Aug 11, 2016

@gfyoung thanks for bearing with me, the doc page looks great to me now.

Anyone else have feedback?

@shoyer shoyer merged commit 2fe4162 into numpy:master Aug 15, 2016
@shoyer
Copy link
Member
shoyer commented Aug 15, 2016

Thanks @gfyoung

@gfyoung
Copy link
Contributor Author
gfyoung commented Aug 15, 2016

@shoyer, @charris : ty for all your comments!

@gfyoung gfyoung deleted the fromnumeric-compat-expand branch August 15, 2016 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0