8000 dtype richcompare doesn't respect protocol · Issue #5345 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

dtype richcompare doesn't respect protocol #5345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pitrou opened this issue Dec 3, 2014 · 6 comments
Closed

dtype richcompare doesn't respect protocol #5345

pitrou opened this issue Dec 3, 2014 · 6 comments

Comments

@pitrou
Copy link
Member
pitrou commented Dec 3, 2014

When a type's __eq__ (or tp_richcompare in C) doesn't understand the other type, it should return NotImplemented so that the other type's __eq__ has a chance to be called. Unfortunately, NumPy's ArrayDesrc doesn't respect that convention and raises TypeError instead, making comparisons fragile:

>>> import numpy as np
>>> class A(object):
...     def __eq__(self, other):
...         return True
... 
>>> dt = np.dtype('i8')
>>> A() == dt
True
>>> dt == A()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: data type not understood

It is generally considered bad practice to raise TypeError in a comparison method, and can prevent advanced uses: such as defining a custom type comparable with ArrayDescr, or simply mixing ArrayDescr with other types in e.g. a dictionary.

Indeed let's say I use the standard lru_cache, then hash collisions can produce unwanted errors because of the dictionary used internally comparing the keys:

>>> @functools.lru_cache()
... def f(x): return str(x)
... 
>>> 
>>> class A(object):
...   def __hash__(self):   
...      return hash(dt)
... 
>>> f(dt)
'int64'
>>> f(A())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/antoine/34/lib/python3.4/functools.py", line 446, in wrapper
    link = cache_get(key)
TypeError: data type not understood
@njsmith
Copy link
Member
njsmith commented Dec 3, 2014

Ugh. dtype hashing is... kinda a mess.

Dtypes are hashable objects that are nonetheless mutable (though
fortunately rarely mutated), and whose == believes that it knows
perfectly well how to handle all object types. It handles them like:

if isinstance(other, np.dtype):
# do a real comparison
else:
return self == np.dtype(other)

np.dtype coercion itself is happy to interpret a startling variety of
objects, including strings, types, tuples, lists, dicts, and None.
(For fun, try: np.dtype("f8") == None.) It can also fail in a variety
of ways, and "Errors should never pass silently". Unfortunately this
coercion is very widely used in real code (things like 'if arr.dtype
== "float32": ...'; if someone writes "flaot32" by accident then
letting this pass silently would be pretty bad indeed).

And, of course, dtype.hash makes no attempt to be consistent with
the hash functions for these other types. Which means that if you put
objects of any of the above types into the same dict as a dtype,
whether they end up counting as the same object will depend on your
hash seed and bucket count.

In theory we could try to clean up bits of this (like, maybe suppress
the error in some cases and return NotImplemented in others?), but
that would still leave a lot of mess behind. The best idea I'm
thinking of right now is to just say, if you want to put dtype objects
in a dict, then only put dtype objects in that dict. Or find some
other way to ensure that they will not be compared to non-dtype
objects (e.g., replace each object with a tuple (True, dt) or (False,
non_dt)).

Any better ideas?

On Wed, Dec 3, 2014 at 2:58 PM, Antoine Pitrou notifications@github.com wrote:

When a type's eq (or tp_richcompare in C) doesn't understand the other type, it should return NotImplemented so that the other type's eq has a chance to be called. Unfortunately, NumPy's ArrayDesrc doesn't respect that convention and raises TypeError instead, making comparisons fragile:

import numpy as np
class A(object):
... def eq(self, other):
... return True
...
dt = np.dtype('i8')
A() == dt
True
dt == A()
Traceback (most recent call last):
File "", line 1, in
TypeError: data type not understood

It is generally considered bad practice to raise TypeError in a comparison method, and can prevent advanced uses: such as defining a custom type comparable with ArrayDescr, or simply mixing ArrayDescr with other types in e.g. a dictionary.

Indeed let's say I use the standard lru_cache, then hash collisions can produce unwanted errors because of the dictionary used internally comparing the keys:

@functools.lru_cache()
... def f(x): return str(x)
...

class A(object):
... def hash(self):
... return hash(dt)
...
f(dt)
'int64'
f(A())
Traceback (most recent call last):
File "", line 1, in
File "/home/antoine/34/lib/python3.4/functools.py", line 446, in wrapper
link = cache_get(key)
TypeError: data type not understood


Reply to this email directly or view it on GitHub.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@pitrou
Copy link
8000 Member Author
pitrou commented Dec 4, 2014

if someone writes "flaot32" by accident then letting this pass silently would be pretty bad indeed

Would it? It's not any worse then doing it when the object being compared is a plain string.
If people worry aboyt typos they probably should write np.float32 instead of "float32" :-)

@pitrou
Copy link
Member Author
pitrou commented Dec 4, 2014

As for heterogenous dicts, that's why I brought the lru_cache() example. There are situations where it can make sense to have different key types in a dict.

@njsmith
Copy link
Member
njsmith commented Dec 4, 2014

Yeah, in an ideal world dtype.eq probably wouldn't try to coerce
strings (or anything else) in the first place, but that's not how numpy's
original designers did things. So == "float32" is a common idiom, and
you've got to support the users you have, not the users you want...

I definitely see the use case for heterogenous dicts. I just don't see a
way to make it work without massively breaking the dtype API -- AFAICT it
won't work reliably unless we get rid of not just the TypeErrors on failed
coercions, but the coercions altogether. That's why I was asking if you had
a better idea :-).
On 4 Dec 2014 00:56, "Antoine Pitrou" notifications@github.com wrote:

As for heterogenous dicts, that's why I brought the lru_cache() example.
There are situations where it can make sense to have different key types in
a dict.


Reply to this email directly or view it on GitHub
#5345 (comment).

@stefansjs
Copy link

Ouch, I just found this out by trying to put a dtype into a type that wants to be hashable. Thanks for the clear writeup! I understand why equality is violates the python API, but couldn't hash be fixed?

I'm imagining

  • if you compare equivalent to a builtin type, return the hash for that type instead of yourself.
  • if you are some kind of object/array type hash your string description, plus some alignment info

I can definitely see an argument for hashing dtypes differently from builtin types even though the equality function would return true in that case (which would help me immensely for my use case). In this case, maybe you could return the hash of immutable parameters, like byte order, alignment, size, and builtin type used to construct you. Is this reasonable?

Further, as a workaround, for my custom type, is it sufficient to do something like this?

def hash_dtype(np_dtype):
    """ Provides a stable hash for dtypes by hashing some (but not all) properties """
    hash_properties = (np_dtype.type, np_dtype.byteorder, np_dtype.alignment, np_dtype.str)
    return hash(hash_properties)

@seberg
Copy link
Member
seberg commented Feb 24, 2021

Closing this as a duplicate of gh-7242 (both issues have quite a lot of discussion).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0