8000 ENH: Add support for dict view objects. by hypercubestart · Pull Request #14539 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Add support for dict view objects. #14539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

hypercubestart
Copy link
Contributor

fixes #573

accepting dict views come from: seberg@9809763

>>> import numpy
>>> d = {0: 1, 2: 3}
>>> numpy.sum(d.values())
4

@seberg seberg self-requested a review September 18, 2019 01:31
PyObject *mod = PyImport_ImportModule("collections.abc");

if (mod != NULL) {
mappingview = PyObject_GetAttrString(mod, "MappingView");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to do this, we should use Collection

@eric-wieser
Copy link
Member
eric-wieser commented Sep 18, 2019

Some remarks:

  • MappingView seems far too specific to special-case. IMO we should be trying to remove special cases from ndarray conversion, not adding yet more.
  • Collection would be a reasonable abc candidate, but would include set which we currently exclude. Given the similarities between dict.keys() and a set, I think we need to treat these objects in the same way
  • Collections are not ordered, numpy arrays are. Perhaps we should force the user to think about this by raising an exception.

@seberg
Copy link
Member
seberg commented Sep 18, 2019

I think that was the reason why MappingView, because there is a PEP fairly recently, that I believe guarantees stable ordering. I agree that the better thing would be python registering with sequence (I think there was some reason why not, but cannot think of it right now). To be clear, I am not sure I like it much, but it is apparently pretty annoying for downstream such as matplotlib.

EDIT: Hmmm, I guess the closest PEP is likely the one for stable order of kwargs. I guess it is true that this is stable, but not sure it is spelled out quite clearly anywhere.

@eric-wieser
Copy link
Member

I think there was some reason why not, but cannot think of it right now

Because they don't implement __reversed__ or __getitem__.

I think I'd be in favor of deprecating np.array(collections.abc.Collection), which is the path sets and MappingViews currently take, which today returns a 0d object array. That would at least make the behavior less surprising.

@seberg
Copy link
Member
seberg commented Sep 18, 2019

I deprecating any "fallthrough" to object is probably a good plan (and then rudder back on specifics probably ;)), the reason I had looked it at all was the argument that it is at least ordered and that matplotlib says it is pretty painful for them.

@eric-wieser
Copy link
Member

Is there a matplotlib issue somewhere cataloging that pain? I ask because most py2->py3 conversion would replace d.values() with list(d.values()) anyway, which numpy would consume happily.

@seberg
Copy link
Member
seberg commented Sep 18, 2019

ping @story645. Might be pinging the wrong person, but do you want to chime in Hannah on why matpltolib would like the dictview inside arrays?

@story645
Copy link
Contributor

It's more that plotting dictionary keys and views is fairly common use case because that's how a lot of frequency data is stored & that list(dict.values()) is painful in tutorials & example docs.

matplotlib/matplotlib#6787 was merged to patch this (since modified a drop) but the discussion in those comments was that the fix should really go upstream 'cause it's working around Numpy's limitations.

@eric-wieser
Copy link
Member
eric-wieser commented Sep 19, 2019

It seems like the argument is roughly "Why should I have to call np.array(list(x)) instead of just np.array(x)"?

The problem is, MappingView is only one of many cases where applying list changes behavior:

s = lambda: {1, 2, 3}
d = lambda: dict(a=1, b=2)
it = lambda: (i for i in range(2))
to_test = (s, d, lambda: d().keys(), lambda: d().values(), lambda: d().items(), it)
for val in to_test:
    print("array(     ... ): {!r}".format(np.array(val())))
    print("array(list(...)): {!r}".format(np.array(list(val()))))
    print()
array(     ... ): array({1, 2, 3}, dtype=object)
array(list(...)): array([1, 2, 3])

array(     ... ): array({'b': 2, 'a': 1}, dtype=object)
array(list(...)): array(['b', 'a'], dtype='<U1')

array(     ... ): array(dict_keys(['b', 'a']), dtype=object)
array(list(...)): array(['b', 'a'], dtype='<U1')

array(     ... ): array(dict_values([2, 1]), dtype=object)
array(list(...)): array([2, 1])

array(     ... ): array(dict_items([('b', 2), ('a', 1)]), dtype=object)
array(list(...)): array([['b', '2'],
       ['a', '1']], dtype='<U1')

array(     ... ): array(<generator object <lambda>.<locals>.<genexpr> at 0x000001BC5E509728>,
      dtype=object)
array(list(...)): array([0, 1])

It seems strange to me to change the behavior of some of these without changing all of them. I'm not leaning particularly in one way or the other on whether making the user call list is the right design here, but what I do think is important is that:

  • We do things consistently for all of these cases
  • We do a deprecation cycle if we make any change at all here, since users could be relying on the behavior above

@hypercubestart
Copy link
Contributor Author

I am in favor of making changes because dict_keys and dict_values are fairly common use cases IMO. As such, I feel like np.asarray should be able to handle these data types directly without user conversion. I am fine with changing the behaviors of the other cases for consistency as noted by @eric-wieser , but as a sidenote I have rarely seen any other data type passed to numpy. Though, I still agree that a deprecation cycle would be necessary for the users that do use numpy as such.

@charris charris changed the title Add support for dict view objects and added tests ENH: Add support for dict view objects. Sep 22, 2019
@hypercubestart
Copy link
Contributor Author

@eric-wieser Would it be a good idea to just add a list() wrapper to the input? That would guarantee consistency and avoid any weird cases

@mattip
Copy link
Member
mattip commented May 20, 2020

In a recent community call we talked about this and decided not to support this use case. If Python ends up accepting a change to make dict.values() a sequence, we would automatically support it, but we do not want to do that change on our own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Python 3 dictionary view objects not supported (Trac #2113)
6 participants
0