8000 BUG: Fix unicode(unicode_array_0d) on python 2.7 by eric-wieser · Pull Request #9201 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fix unicode(unicode_array_0d) on python 2.7 #9201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 13, 2017

Conversation

eric-wieser
Copy link
Member
@eric-wieser eric-wieser commented Jun 1, 2017

Related to #9139, first two commits stand alone in #9202

Now a commit from #9332 a single commit on top of #9784, standalone.

This is like #9332, but:

  • Can't be affected by np.set_string_function (is this a good thing?)
  • Also fixes unicode such that they convert to python unicode scalars

Results on python 2.7:

>>> str(np.array('test'))
'test'
>>> unicode(np.array(u'café'))
u'café'

Copy link
Contributor
@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, though I think there are two mistakes, and have a suggestion on how better to deal with MaskedArray.

10000
@@ -198,3 +206,35 @@ array_str(PyArrayObject *self)
}
return s;
}

#ifdef NPY_PY3K
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be ifndef, no?

@@ -10,4 +10,9 @@ array_repr(PyArrayObject *self);
NPY_NO_EXPORT PyObject *
array_str(PyArrayObject *self);

#ifdef NPY_PY3K
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that in the rebase, I guess

numpy/ma/core.py Outdated
@@ -3885,15 +3885,15 @@ def compress(self, condition, axis=None, out=None):
_new._mask = _mask.compress(condition, axis=axis)
return _new

def __str__(self):
def __str__(self, str_type=str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe help our future selves by adding an explicit note that str_type can be removed once we drop support of python 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I'm also slightly wary of using different signatures for dunder methods. Before you know it, subclasses rely on it... Might it be an idea to make a new function _prepare_str(self) (better name surely possible...) which returns res without str applied and then use that in both __str__ and __unicode__? Or possibly just the masked print option enabled part as _fill_with_mask_for_str?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@eric-wieser
Copy link
Member Author

This recurses horribly, because some of the scalar __str__s call np.array(scalar).__str__()...

@mhvk
Copy link
Contributor
mhvk commented Jun 1, 2017

How about splitting off the implementation of __unicode__ from the 0-d scalar case so that this PR remains simple?

@ahaldane
Copy link
Member
ahaldane commented Jun 1, 2017

I haven't read the code yet, but won't the recursion problems be fixed after #9139?

@eric-wieser
Copy link
Member Author

Nope, because #9139 still recurses in void_str

@eric-wieser
Copy link
Member Author
eric-wieser commented Jun 1, 2017

How about splitting off the implementation of unicode from the 0-d scalar case so that this PR remains simple?

The sole purpose of implementing __unicode__ at all is to deal with the scalar case. The other cases fall back to calling repr on the individual items anyway, and just work

The problem is invoking style(arr) when style==str

@eric-wieser eric-wieser changed the base branch from master to maintenance/1.13.x June 1, 2017 19:22
@eric-wieser eric-wieser changed the base branch from maintenance/1.13.x to master June 1, 2017 19:22
@eric-wieser
Copy link
Member Author

(that was to trick the first two commits into disappearing)

@eric-wieser
Copy link
Member Author

I've stolen a commit from #9332, and rebased this on top of it.

numpy/ma/core.py Outdated
String representation.

"""
def __insert_masked_print(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhvk: #9768 means this doesn't even need an extra argument any more

@ahaldane
Copy link
Member

I'm a little on the fence whether str(0d) shouldn't respect np.set_string_function. 0d arrays are ndarrays, after all, so according to the set_string_function doc they should be affected.

Also, while I did remove the style argument in #8983, that was necessary to fix some problems there at the expense of back-compatibility. In #9332 I add it back in as those problems are avoided, so now we can maintain the style arg to be back-compatible.

On the other hand, removing the style argument and special-casing the str of 0d arrays is nice in two ways:

  • the style argument was ugly
  • makes 0d behavior more like numpy scalars, which could be desirable. But this is a bit questionable too, since the repr of a 0d array is different from a scalar.

But are these two benefits enough to justify the compatibility break of disabling the style argument?

@mhvk
Copy link
Contributor
mhvk commented Sep 27, 2017

In my future perfect world, scalars would not exist at all, one would just have array scalars, and those behave like other arrays, so I'd prefer not to go in a direction where array scalars are less like arrays, even if it has the benefit of making them more similar to scalars. I'd could see going the other way, though, removing style and typesetting scalar[...]

@eric-wieser
Copy link
Member Author
eric-wieser commented Sep 27, 2017

My argument for why str and unicode should behave as if they were raw scalars is that I'm categorizing str and unicode as type-coercion functions (like int, float, bool etc,) rather than with repr. In all those other functions, we let the 0d array decay to a scalar first.

I'll fix the warning this evening.

@mhvk
Copy link
Contributor
mhvk commented Sep 27, 2017

@eric-wieser - I don't think that you should compare str with float, etc., at least not in the context where it executes __str__ Indeed, the python documentation states [1] that the point is "to compute the “informal” or nicely printable string representation of an object."

[1] https://docs.python.org/3/reference/datamodel.html#object.__str__

@eric-wieser eric-wieser force-pushed the ndarray-unicode branch 2 times, most recently from 4450637 to a8c563a Compare September 28, 2017 15:59
@eric-wieser
Copy link
Member Author

Another argument for not letting np.set_string_function interfere is that if we did, then we'd want to allow it to do so for __unicode__ too for consistency, which would be a python2-only API - something that seems a little silly to add with dropping 2.7 support on the not-so-distance horizon

@eric-wieser
Copy link
Member Author

Rebased on top of #9913, which makes the diff a little smaller

@charris charris added this to the 1.14.0 release milestone Nov 12, 2017
…ing_function

It's more important that scalars and 0d arrays are consistent here.

Previously, unicode(arr0d) would crash on 2.7
@ahaldane
Copy link
Member

LGTM.

I understand it is a little strange that after this PR, in python2 0d unicode is hardcoded but not the 0d str. But I think it's OK to special-case the 0d unicode for various pragmatic reasons - we are already talking about python2 end of life, and unicode of 0ds hasn't been functional before now, so we should feel free to choose fixed behavior.

I'd like to merge. Good to go? (checking since you pushed without commenting)

@eric-wieser
Copy link
Member Author
eric-wieser commented Nov 13, 2017

Yep, good to go.

it's OK to special-case the 0d unicode for various pragmatic reasons

I agree - this is better than the old behaviour, and sending __unicode__ through array_str probably opens a can of worms we don't care about solving before EOL for python 2.

it's also consistent with __format__, which is currently hard-coded until we decide whether it should be customizable.

@ahaldane
Copy link
Member

All right, merging. Thanks Eric!

@ahaldane ahaldane merged commit 565e8ca into numpy:master Nov 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0