8000 Fix dtype int vs char inconsistencies + dtype hashing by cournape · Pull Request #231 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Fix dtype int vs char inconsistencies + dtype hashing #231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
BUG: fix #2017 by ignoring type_num in the hash input.
Since type_num is not considered in PyArray_EquivTypes (for dtype
equality comparison), seems reasonable to ignore it for hashing as
well.
  • Loading branch information
cournape committed Mar 7, 2012
commit 3d8da1540b51060097dea6eb48fe40f5e4a3811f
5 changes: 2 additions & 3 deletions numpy/core/src/multiarray/hashdescr.c
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,8 @@ static int _array_descr_builtin(PyArray_Descr* descr, PyObject *l)
* For builtin type, hash relies on : kind + byteorder + flags +
* type_num + elsize + alignment
*/
t = Py_BuildValue("(cciiii)", descr->kind, nbyteorder,
descr->flags, descr->type_num, descr->elsize,
descr->alignment);
t = Py_BuildValue("(cccii)", descr->kind, nbyteorder,
descr->flags, descr->elsize, descr->alignment);

for(i = 0; i < PyTuple_Size(t); ++i) {
item = PyTuple_GetItem(t, i);
Expand Down
12 changes: 12 additions & 0 deletions numpy/core/tests/test_dtype.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,18 @@ def test_dtype(self):
self.assertTrue(dt.byteorder != dt3.byteorder, "bogus test")
assert_dtype_equal(dt, dt3)

def test_equivalent_dtype_hashing(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could test more cases here, although that gets complicated by the various conventions. What happens with float96/float128 and true quad precision?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually more of a regression test than a unit test (for #2017). Do you feel that more unit tests for dtype hashing are needed ? Or just to check the flags member value ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly it's motivated by my experience that every time I write a complete test it turns up bugs, so I'm trying to push things that way. The test here will do, but it would be nice to make something more complete. If you are pressed for time, and who isn't, this can go in as is. Writing a solid test for this looks to be non-trivial, but another PR with such a test would be welcome.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine adding new tests (am sprinting at pycon), but I am not sure what kind of tests you had in mind ?

Regarding true quad precision, I don't think we have any platform supporting those yet, or I misunderstand the test you have in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sun supports quad precision. I think it will show up in the next couple of years.

I'm not sure what tests would be appropriate, maybe a table of known hashes to check against. What about void types and structured types? In some ways this might be like the tests for the random number generators where just the first several values starting with a known seed are tabulated. That serves to make sure that changes don't go unnoticed. If there is a chance that pickles are affected that could be checked also. Can you think of other things that might be useful? I'm no expert here.

How was pycon?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding quad precision, I cannot test this, but I cannot see a reason why it would not work: the hash method use the same attributes for all builtin dtypes.

Right now, I think we have decent coverage of the hashing logic to test the a == b -> hash(a) == hash(b). As for structured types, we already have some tests (look for tests using assert_dtype_equal/assert_dtype_not_equal), as well as for subarray dtypes.

I am not sure we want to hardcode hashes - we do not want to guarantee more than python itself, and given the recent discussion on randomized hashes, they may be even weaker than they are now (http://bugs.python.org/issue13703).

Pycon was nice, quite a few things we need to look at (https://us.pycon.org/2012/schedule/presentation/78/).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've been keeping an eye on that, even posted a heads up on the list a while back. Linux Weekly News is a good place to keep an eye on new things like that.

# Make sure equivalent dtypes with different type num hash equal
uintp = np.dtype(np.uintp)
if uintp.itemsize == 4:
left = uintp
right = np.dtype(np.uint32)
else:
left = uintp
right = np.dtype(np.ulonglong)
4E93 self.assertTrue(left == right)
self.assertTrue(hash(left) == hash(right))

def test_invalid_types(self):
# Make sure invalid type strings raise exceptions
for typestr in ['O3', 'O5', 'O7', 'b3', 'h4', 'I5', 'l4', 'l8',
Expand Down
0