10000 Fix #5339: cache dtype.__hash__. by pitrou · Pull Request #5343 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Fix #5339: cache dtype.__hash__. #5343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 4, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/release/1.10.0-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,9 @@ C API
The changes to *swapaxes* also apply to the *PyArray_SwapAxes* C function,
which now returns a view in all cases.

The dtype structure (PyArray_Descr) has a new member at the end to cache
its hash value. This shouldn't affect any well-written applications.

recarray field return types
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Previously the returned types for recarray fields accessed by attribute and by
Expand Down
4 changes: 4 additions & 0 deletions numpy/core/include/numpy/ndarraytypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -619,6 +619,10 @@ typedef struct _PyArray_Descr {
* for NumPy 1.7.0.
*/
NpyAuxData *c_metadata;
/* Cached hash value (-1 if not yet computed).
* This was added for NumPy 2.0.0.
*/
npy_hash_t hash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this unfortunately breaks the ABI and causes forward compatibility issues with cython.
though I am playing with breaking ABI by hiding the ufunc structure (at least partially) maybe we could break this too.
But if we do we should do it properly and add a pointer to a private object or something similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't adding a field at the end be fine, ABI-wise? We don't allow
subclassing. What cython issue are you thinking of?

I doubt there's any legitimate reason for anyone outside numpy to access
anything besides a few fields at the beginning anyway though (in
particularly itemsize), and I've also been pondering plans to rewrite the
dtype struct. So I wouldn't rule that out either. Rather than using a
pimpl, though, it might be sufficient to just hide the private fields
behind a private struct definition that only we can cast to.
On 4 Dec 2014 06:47, "Julian Taylor" notifications@github.com wrote:

In numpy/core/include/numpy/ndarraytypes.h:

@@ -619,6 +620,10 @@ typedef struct _PyArray_Descr {
* for NumPy 1.7.0.
*/
NpyAuxData *c_metadata;

  •    /\* Cached hash value (-1 if not yet computed).
    
  •     \* This was added for NumPy 2.0.0.
    
  •     */
    
  •    npy_hash_t hash;
    

this unfortunately breaks the ABI and causes forward compatibility issues
with cython.
though I am playing with breaking ABI by hiding the ufunc structure (at
least partially) maybe we could break this too.
But if we do we should do it properly and add a pointer to a private
object or something similar.


Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/pull/5343/files#r21288459.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we do we should do it properly and add a pointer to a private object or something similar.

That sounds reasonable. Do you have a patch for that somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extending structs breaks programs embedding this struct into other structures (unlikely) and cython checks the size of these types and complains if they are different than what the program was built with, thats just an annoyance but one that comes up very often.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished my ufunc patch yet, though it goes along the lines that njsmith suggested, keep the current structure and internally use a larger one. Additionally it might be interesting to mangle the public part in the development version to trigger failures for people that use the internals. For release the mangling can be reverted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less knowledgeable than you guys :) What should be the way forward?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one way would be to implement the hiding of the object internals, another wait until someone else does it, but that might take a while.
Unfortunately the dtype internals are probably far more commonly used than the ufunc internals I want to hide. This probably gives us much less wiggle room for breaking the ABI.
Unless someone finds a way to do so without breaking it, maybe some kind of magic number in one of the existing fields could work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I am also just overly cautious, in light of no better idea if you want to work on it I think the approach of adding a second private struct that embeds the public struct is worthwhile. Its only problem is that we technically can't trust the hash to be correct if others do unusual stuff with the public part.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the general scheme work like PyArray_Fields currently does? Which dtype fields should be private vs. public exactly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on my questions above :)

} PyArray_Descr;

typedef struct _arr_descr {
Expand Down
13 changes: 0 additions & 13 deletions numpy/core/include/numpy/npy_3kcompat.h
Original file line number Diff line number Diff line change
Expand Up @@ -486,19 +486,6 @@ NpyCapsule_Check(PyObject *ptr)

#endif

/*
* Hash value compatibility.
* As of Python 3.2 hash values are of type Py_hash_t.
* Previous versions use C long.
*/
#if PY_VERSION_HEX < 0x03020000
typedef long npy_hash_t;
#define NPY_SIZEOF_HASH_T NPY_SIZEOF_LONG
#else
typedef Py_hash_t npy_hash_t;
#define NPY_SIZEOF_HASH_T NPY_SIZEOF_INTP
#endif

#ifdef __cplusplus
}
#endif
Expand Down
13 changes: 13 additions & 0 deletions numpy/core/include/numpy/npy_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,19 @@ typedef long npy_long;
typedef float npy_float;
typedef double npy_double;

/*
* Hash value compatibility.
* As of Python 3.2 hash values are of type Py_hash_t.
* Previous versions use C long.
*/
#if PY_VERSION_HEX < 0x03020000
typedef long npy_hash_t;
#define NPY_SIZEOF_HASH_T NPY_SIZEOF_LONG
#else
typedef Py_hash_t npy_hash_t;
#define NPY_SIZEOF_HASH_T NPY_SIZEOF_INTP
#endif

/*
* Disabling C99 complex usage: a lot of C code in numpy/scipy rely on being
* able to do .real/.imag. Will have to convert code first.
Expand Down
4 changes: 4 additions & 0 deletions numpy/core/src/multiarray/arraytypes.c.src
Original file line number Diff line number Diff line change
Expand Up @@ -4031,6 +4031,8 @@ static PyArray_Descr @from@_Descr = {
NULL,
/* c_metadata */
NULL,
/* hash */
-1,
};

/**end repeat**/
Expand Down Expand Up @@ -4172,6 +4174,8 @@ NPY_NO_EXPORT PyArray_Descr @from@_Descr = {
NULL,
/* c_metadata */
NULL,
/* hash */
-1,
};

/**end repeat**/
Expand Down
5 changes: 5 additions & 0 deletions numpy/core/src/multiarray/descriptor.c
Original file line number Diff line number Diff line change
Expand Up @@ -1591,6 +1591,7 @@ PyArray_DescrNew(PyArray_Descr *base)
}
Py_XINCREF(newdescr->typeobj);
Py_XINCREF(newdescr->metadata);
newdescr->hash = -1;

return newdescr;
}
Expand Down Expand Up @@ -1994,6 +1995,8 @@ arraydescr_names_set(PyArray_Descr *self, PyObject *val)
return -1;
}
}
/* Invalidate cached hash value */
self->hash = -1;
/* Update dictionary keys in fields */
new_names = PySequence_Tuple(val);
new_fields = PyDict_New();
Expand Down Expand Up @@ -2443,6 +2446,8 @@ arraydescr_setstate(PyArray_Descr *self, PyObject *args)
version);
return NULL;
}
/* Invalidate cached hash value */
self->hash = -1;

if (version == 1 || version == 0) {
if (fields != Py_None) {
Expand Down
11 changes: 6 additions & 5 deletions numpy/core/src/multiarray/hashdescr.c
C849
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,6 @@ PyArray_DescrHash(PyObject* odescr)
{
PyArray_Descr *descr;
int st;
npy_hash_t hash;

if (!PyArray_DescrCheck(odescr)) {
PyErr_SetString(PyExc_ValueError,
Expand All @@ -310,10 +309,12 @@ PyArray_DescrHash(PyObject* odescr)
}
descr = (PyArray_Descr*)odescr;

st = _PyArray_DescrHashImp(descr, &hash);
if (st) {
return -1;
if (descr->hash == -1) {
st = _PyArray_DescrHashImp(descr, &descr->hash);
if (st) {
return -1;
}
}

return hash;
return descr->hash;
}
15 changes: 15 additions & 0 deletions numpy/core/tests/test_dtype.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,21 @@ def test_different_titles(self):
'titles': ['RRed pixel', 'Blue pixel']})
assert_dtype_not_equal(a, b)

def test_mutate(self):
# Mutating a dtype should reset the cached hash value
a = np.dtype([('yo', np.int)])
b = np.dtype([('yo', np.int)])
c = np.dtype([('ye', np.int)])
assert_dtype_equal(a, b)
assert_dtype_not_equal(a, c)
a.names = ['ye']
assert_dtype_equal(a, c)
assert_dtype_not_equal(a, b)
state = b.__reduce__()[2]
a.__setstate__(state)
assert_dtype_equal(a, b)
assert_dtype_not_equal(a, c)

def test_not_lists(self):
"""Test if an appropriate exception is raised when passing bad values to
the dtype constructor.
Expand Down
0