-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: NEP 42 user dtype has type number set to -1 and this causes various failures. #22900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sorry about not answering earlier, I was travelling a bit and barely checked on updates. Right now, I think you are seeing this as an error during printing. You can avoid this by implementing |
Hmm, I see you have repr, what is the thing you are expecting to work? The |
There are two failures I have seen related to these dtype attributes. In [1]: from microohm import *
In [2]: import numpy as np
In [3]: import pandas as pd
In [4]: arr = np.array([1.0, 2.0], dtype=QuantityDType("m"))
In [5]: s1 = pd.Series(arr)
In [6]: s1
Out[6]: ---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last) In [7]: np.ctypeslib.ndpointer(dtype=QuantityDType('m'))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[7], line 1
----> 1 np.ctypeslib.ndpointer(dtype=QuantityDType('m'))
File ~/github/microohm/.venv/lib/python3.11/site-packages/numpy/ctypeslib.py:343, in ndpointer(dtype, ndim, shape, flags)
340 else:
341 base = _ndptr
--> 343 klass = type("ndpointer_%s"%name, (base,),
344 {"_dtype_": dtype,
345 "_shape_" : shape,
346 "_ndim_" : ndim,
347 "_flags_" : num})
348 _pointer_type_cache[cache_key] = klass
349 return klass
ValueError: type name must not contain null characters |
OK, I can't say I am surprised about either. But I am not sure that having a type number available will fix them as well. |
Is it possible for DTypeMeta to be a user defined type such that existing code works as is? |
The question is whether that same "existing code" actually works with a user dtype that is defined the old way. Pandas does a lot of weird stuff, if pandas works with existing user dtypes like Feel free to just override Until now, I have never dug into what pandas dislikes exactly. I doubt it is much that needs to change in pandas to make things work, but I don't know what it is. |
I tried overriding /*NUMPY_API
Register Data type
Does not change the reference count of descr
*/
NPY_NO_EXPORT int
PyArray_RegisterDataType(PyArray_Descr *descr)
{
PyArray_Descr *descr2;
int typenum;
int i;
PyArray_ArrFuncs *f;
/* See if this type is already registered */
for (i = 0; i < NPY_NUMUSERTYPES; i++) {
descr2 = userdescrs[i];
if (descr2 == descr) {
return descr->type_num;
}
}
...
userdescrs[NPY_NUMUSERTYPES++] = descr; This value is then used in various places: def _name_get(dtype):
# provides dtype.name.__get__, documented as returning a "bit name"
if dtype.isbuiltin == 2:
# user dtypes don't promise to do anything special
return dtype.type.__name__ descriptor.c /*
* returns 1 for a builtin type
* and 2 for a user-defined data-type descriptor
* return 0 if neither (i.e. it's a copy of one)
*/
static PyObject *
arraydescr_isbuiltin_get(PyArray_Descr *self, void *NPY_UNUSED(ignored))
{
long val;
val = 0;
if (self->fields == Py_None) {
val = 1;
}
if (PyTypeNum_ISUSERDEF(self->type_num)) {
val = 2;
}
return PyLong_FromLong(val);
} ndarraytypes.h #define PyTypeNum_ISUSERDEF(type) (((type) >= NPY_USERDEF) && \
((type) < NPY_USERDEF+ \
NPY_NUMUSERTYPES)) |
>>> import numpy as np
>>> from numpy.core._rational_tests import rational
>>> import pandas as pd
>>> arr = np.array([rational(1), rational(2)])
>>> s1 = pd.Series(arr)
>>> s1
0 1
1 2
dtype: rational |
@ngoldbaum dunno if you are interested in having a look at why pandas seems to be fine wit hold-style, but not new style user dtypes. The question though is if it is a simple thing or not. Since new-style dtypes are parametric, much of the old infrastructure seems probably to not work in either case. |
So the full error from pandas is here:
I patched numpy to avoid the error in I think we probably just want to define diff --git a/numpy/core/_dtype.py b/numpy/core/_dtype.py
index 3db80c17e..c4d3594d2 100644
--- a/numpy/core/_dtype.py
+++ b/numpy/core/_dtype.py
@@ -344,7 +344,7 @@ def _name_includes_bit_suffix(dtype):
def _name_get(dtype):
# provides dtype.name.__get__, documented as returning a "bit name"
- if dtype.isbuiltin == 2:
+ if dtype.isbuiltin == 2 or dtype.kind == '\x00':
# user dtypes don't promise to do anything special
return dtype.type.__name__ The built-in dtypes have names that show the width of the dtype in bytes, so maybe there also needs to be some code to do that but that's not really a big deal and should be doable in Python based on |
Perhaps go through all of the dtype attributes and define the desired behavior for the new DTypeMeta. For example, the current values for the following don't seem quite right to me: Is DTypeMeta a user-defined data type? |
Not necessarily, no. |
In my case, QuantityDType is a DTypeMeta that is instaniated as in QuantityDType("m"), so to me it is a user defined type. Are we supposed to subclass it to a DType? at runtime based on the unit expression? |
Describe the issue:
Custom user dtypes in the new NEP 42 DTypeMeta have a type number set to -1 and this leads to various failures.
In the 'experimental_public_dtype_api.c', the code is:
Well, it seems we can't get away with it. For the code example, I can provide an example from my package microohm.
Reproduce the code example:
Error message:
Runtime information:
1.25.0.dev0+272.gbf20c55a2
3.11.1 (main, Dec 9 2022, 10:58:57) [GCC 11.3.0]
[{'numpy_version': '1.25.0.dev0+272.gbf20c55a2',
'python': '3.11.1 (main, Dec 9 2022, 10:58:57) [GCC 11.3.0]',
'uname': uname_result(system='Linux', node='curro2', release='5.15.0-56-generic', version='#62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX'],
'not_found': ['AVX512_KNL',
'AVX512_KNM',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'SkylakeX',
'filepath': '/usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.20'}]
None
Context for the issue:
This causes failures in downstream libraries like Pandas, etc.
The text was updated successfully, but these errors were encountered: