8000 PyPy3 TypeError: 'numpy.float64' objects are unhashable · Issue #8887 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

PyPy3 TypeError: 'numpy.float64' objects are unhashable #8887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
peterjc opened this issue Apr 3, 2017 · 14 comments
Closed

PyPy3 TypeError: 'numpy.float64' objects are unhashable #8887

peterjc opened this issue Apr 3, 2017 · 14 comments

Comments

@peterjc
Copy link
Contributor
peterjc commented Apr 3, 2017

Cross reference PyPy issue: https://bitbucket.org/pypy/pypy/issues/2479/numpyfloat64-objects-are-unhashable

I'm logging this here for reference, and on the off chance that someone on the NumPy team could say why numpy.float64 might behave so differently from the other size floats under PyPy3.

Expected behaviour, showing Python 3.5.0 in 64-bit Linux:

$ python3
Python 3.5.0 (default, Sep 28 2015, 11:25:31) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> print(numpy.__version__)
1.12.1
>>> f = 123.456
>>> for t in [np.float, np.float16, np.float32, np.float64, np.float128]:
...     new = t(f)
...     print(hash(new), hash(new)==hash(f), t.__name__)
... 
1051464412201451643 True float
1008806316530991227 False float16
1051467367688700027 False float32
1051464412201451643 True float64
1051464412201451643 True float128
>>> quit()

Observed behaviour using PyPy3.5 v5.7 beta from https://bitbucket.org/squeaky/portable-pypy/downloads/pypy3.5-5.7-beta-linux_x86_64-portable.tar.bz2

$ ~/Downloads/pypy3.5-5.7-beta-linux_x86_64-portable/bin/pypy
Python 3.5.3 (b16a4363e930f6401bceb499b9520955504c6cb0, Mar 21 2017, 12:36:24)
[PyPy 5.7.0-beta0 with GCC 6.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``the world doesn't want us to
know''
>>>> import numpy as np
>>>> print(np.__version__)
1.12.1
>>>> f = 123.456
>>>> for t in [np.float, np.float16, np.float32, np.float128, np.float64]:
....     new = t(f)
....     print(hash(new), hash(new)==hash(f), t.__name__)
....     
1051464412201451643 True float
1008806316530991227 False float16
1051467367688700027 False float32
1051464412201451643 True float128
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
TypeError: 'numpy.float64' objects are unhashable
>>>> quit()

The other float types seem to hash fine, but numpy.float64 does not under PyPy3.5 v5.7 beta.

@eric-wieser
Copy link
Member

I think the difference is:

>>> print(np.float16.__hash__)
<slot wrapper '__hash__' of 'numpy.float16' objects>
>>> print(np.float32.__hash__)
<slot wrapper '__hash__' of 'numpy.float32' objects>
>>> print(np.float64.__hash__)
None

Because np.float64.__bases__ contains float, so this defers to float.__hash__.

Could this be a bug to do with hashing and multiple inheritance in PyPy?

@njsmith
Copy link
Member
njsmith commented Apr 3, 2017

This is almost certainly related somehow to the fact that float64 is a subclass of float (the built-in type), and the other floatXX classes aren't.

@eric-wieser
Copy link
Member
eric-wieser commented Apr 3, 2017

I'm super confused here though - why is it legal to call hash(float64)? Test case:

>>> class MyFloat(float):
	__hash__ = None

	
>>> mf = MyFloat(1)
>>> nf = np.float64(1)
>>> mf.__hash__
>>> hash(mf)
Traceback (most recent call last):
  File "<pyshell#31>", line 1, in <module>
    hash(mf)
TypeError: unhashable type: 'MyFloat'
>>> nf.__hash__
>>> hash(nf)
1

I thought that __hash__ == None always made the type unhashable? How does numpy pull of having __hash__ == None and still being hashable?

@rlamy
Copy link
Contributor
rlamy commented Apr 3, 2017

I tried debugging this in PyPy, but got bogged down trying to understand how exactly numpy initialises the types.

The situation on CPython is quite dodgy, since float64 has both a tp_hash and a descriptor for __hash__ that returns None - which causes the above behaviour: hash(nf) looks up the slot tp_hash, not the descriptor.

@njsmith
Copy link
Member
njsmith commented Apr 3, 2017

@eric-wieser: I'm not sure what's going on with float64.__hash__ returning None, but special methods are internally way more complicated than that. Most methods are natively entries in a python dict; special methods are natively slots in a C struct, and there's a whole elaborate layer of code to map back and forth between the python and C representations and try to make them look the same. Part of this is that when defining a class from Python, the type object constructor has a special case where it checks for __hash__ == None and then initializes the tp_hash slot in a special way. But float64 is defined in C, so it doesn't go through that code path at all. And then of course pypy adds a whole additional layer of complexity...

@peterjc: you should also check your intXX objects -- I bet either int32 or int64 has the same bug (depending on how large your platform's long is).

@rlamy
Copy link
Contributor
rlamy commented Apr 3, 2017

@njsmith Nope, the issue exists only with np.float64 and np.complex128 (on a 64-bit Linux).

@njsmith
Copy link
Member
njsmith commented Apr 3, 2017

descriptor for __hash__ that returns None

wut

@rlamy
Copy link
Contributor
rlamy commented Apr 3, 2017

@njsmith Sorry, it's not a descriptor, it's just a None value in the class dict (which is the conventional way of indicating "not hashable").

@eric-wieser
Copy link
Member

Found the bug, patch incoming...

8000

@rlamy
Copy link
Contributor
rlamy commented Apr 3, 2017

The culprit is the DUAL_INHERIT macro in multiarraymodule.c::setup_scalartypes(). ATM, it calls PyType_Ready, and then sets tp_hash. But since tp_hash is NULL when PyType_Ready is called, it sets __dict__['__hash__'] = None and tp_hash to PyObject_HashNotImplemented, which is later overwritten.

It seems that the fix is just to set tp_hash first.

@eric-wieser
Copy link
Member

@rlamy: Indeed, that's the patch I have

eric-wieser added a commit to eric-wieser/numpy that referenced this issue Apr 3, 2017
This would previously cause hashing to work even though `__hash__` is None.

Fixes numpy#8887
@eric-wieser
Copy link
Member

@peterjc: Does this fix PyPy?

@rlamy
Copy link
Contributor
rlamy commented Apr 3, 2017

@eric-wieser Yes, it fixes it.

@peterjc
Copy link
Contributor Author
peterjc commented Apr 3, 2017

Excellent work everyone - thank you. I'll look forward to this working in the next NumPy release :)

peterjc added a commit to biopython/biopython that referenced this issue Apr 4, 2017
This closes GitHub issue #1112 where under PyPy3.5 v5.7 beta
a bug in NumPy 1.12.1 makes numpy.float64 unhashable. See
numpy/numpy#8887
MarkusPiotrowski pushed a commit to MarkusPiotrowski/biopython that referenced this issue Oct 31, 2017
This closes GitHub issue biopython#1112 where under PyPy3.5 v5.7 beta
a bug in NumPy 1.12.1 makes numpy.float64 unhashable. See
numpy/numpy#8887
dlsun added a commit to dlsun/symbulate that referenced this issue Sep 5, 2018
There is a bug in Numpy 1.12.1 that cause the previous approach to break: numpy/numpy#8887
dlsun added a commit to dlsun/symbulate that referenced this issue Oct 8, 2018
* beginning overhaul of random processes

* fixing random process behavior

* fixing bug with infinite samples from BoxModel

* removing unused imports

* fixing bug with cumsum

* TODO for result.py

* adding plotting for Vectors and other result objects

* fixing problems with RandomProcess

* fixing issues with joining and hashing of Vectors

* move RandomProcess functionality to RV, fix plotting

* replacing references to RandomProcess with RV, fixing bugs with Vector

* stripping RandomProcess of most methods

* fixing bugs with hashing and comparison of Vectors

* fixing a few bugs with plotting

* fixed speed issue with RandomProcess

* Fix error with scalar types not being recognized

* fixing problems with composition of RV and Result objects

* switching order of drift and scale parameters

* allow random processes to be combined with Vectors

* changing default index set of RandomProcess to Naturals()

* Change is_hashable()

There is a bug in Numpy 1.12.1 that cause the previous approach to break: numpy/numpy#8887

* change order of arguments for Brownian motion

* fix bug with standard math functions

* change how Vectors are stored and joined

* distinguish between tuple and Tuple

* replace isinstance(x, int) by the more robust isinstance( numbers.Integral)

* check for numpy float types

* allowing joint distributions to be defined with scalars

* adding concat function

* allow Vectors to be added to tuples and lists

* fixing how arrays are set in Vector

* vectorize Vector operations

* Vectors can now store arbitrary objects

* fixing bug with MultivariateNormal

* fixing typo in commit 20728b

* fixing bug with RVResults that was preventing random processes from being plotted

* creating Tuple and Vector classes

* fixing bug with Tuple

* changing behavior of Tuple and Vector append

* fixing bug with corr, random processes, and sorting for .tabulate()

* fixing DeckOfCards

* making all things Vectors, Tuples only returned by *

* making Tuples compatible with generators

* using generators throughout

* removing vectorized operations for now
dlsun added a commit to dlsun/symbulate that referenced this issue Oct 8, 2018
* beginning overhaul of random processes

* fixing random process behavior

* fixing bug with infinite samples from BoxModel

* removing unused imports

* fixing bug with cumsum

* TODO for result.py

* adding plotting for Vectors and other result objects

* fixing problems with RandomProcess

* fixing issues with joining and hashing of Vectors

* move RandomProcess functionality to RV, fix plotting

* replacing references to RandomProcess with RV, fixing bugs with Vector

* stripping RandomProcess of most methods

* fixing bugs with hashing and comparison of Vectors

* fixing a few bugs with plotting

* fixed speed issue with RandomProcess

* Fix error with scalar types not being recognized

* fixing problems with composition of RV and Result objects

* switching order of drift and scale parameters

* allow random processes to be combined with Vectors

* changing default index set of RandomProcess to Naturals()

* Change is_hashable()

There is a bug in Numpy 1.12.1 that cause the previous approach to break: numpy/numpy#8887

* change order of arguments for Brownian motion

* fix bug with standard math functions

* change how Vectors are stored and joined

* distinguish between tuple and Tuple

* replace isinstance(x, int) by the more robust isinstance( numbers.Integral)

* check for numpy float types

* allowing joint distributions to be defined with scalars

* adding concat function

* allow Vectors to be added to tuples and lists

* fixing how arrays are set in Vector

* vectorize Vector operations

* Vectors can now store arbitrary objects

* fixing bug with MultivariateNormal

* fixing typo in commit 20728b

* fixing bug with RVResults that was preventing random processes from being plotted

* creating Tuple and Vector classes

* fixing bug with Tuple

* changing behavior of Tuple and Vector append

* fixing bug with corr, random processes, and sorting for .tabulate()

* fixing DeckOfCards

* making all things Vectors, Tuples only returned by *

* making Tuples compatible with generators

* using generators throughout

* removing vectorized operations for now

* Adding Multinomial, Fixing Pareto (#75)

* Fixing Pareto tests

* Adding Multinomial, Fixing Pareto
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0