-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
API: Create Preliminary DTypeMeta class and np.dtype subclasses #15508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The only thing from python that this is supposed to change, is that:
should not be |
f"extern NPY_NO_EXPORT {self.internal_type} {mangled_name};\n" | ||
# And define the name as: (*(type *)(&mangled_name)) | ||
f"#define {self.name} (*({self.ptr_cast} *)(&{mangled_name}))\n" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API generation changes are the ugliest part here... Unfortunately, I really want to hide everything away (unless it is an internal build), and have the PyArrayDescr_Type
defined as PyTypeObject
even internally. But I also need to initialize/allocate it statically as a PyArray_DTypeMeta
.
143bc06
to
a729ca8
Compare
abf365a
to
0cc50c3
Compare
We've had a lot of confusion in past discussions about |
Thanks for having a look Eric. I had put this mainly in the hope to get quick judgment as to this "static" type being a bit weird. But by now I think it is a good first step (even if not pretty). Of course we could at some point decide to make all dtypes heaptypes, but that is fully changable. There was a bug with |
legacy_dtype_default_new(PyArray_DTypeMeta *self, | ||
PyObject *args, PyObject *kwargs) | ||
{ | ||
/* TODO: This should allow endianess and possibly metadata */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to just disable this completely for now as well. Or, put a PreliminaryAPI warning here. Which I think would be nice in general, but I am not sure I care too much about right away (it is not like this is a tricky API choice), that type(instance)()
gives the default instance.
The main annoyance about this PR is, how to create the TypeStruct in a way that is hidden from the public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warnings are probably annoying, given that you'll then have to filter them out again if you want to use this internally (applies to any preliminary API). How about just putting a big warning in the docs for any such object? And on the Python side, keep it out of the main namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, as this is the python side, for a long time it probably only matters in tests though. np.dtype
does not really have a test right now. I could change the repr away from np.dtype[float64]
if you prefer. E.g. just add an underscore to make it "private".
This makes all dtypes being subclasses of a: class DType(np.dtype): is_abstract = False is_flexible = False % True for void, strings, and datetimes singleton = prototype_dtype_instance # other information defined on the singleton currently subclass of np.dtype. Right now this happens by replacing the Type field of of the singleton instance. This is valid for types/objects not created in python (not HeapTypes). It is necessary to have this functionality for continuing support of user defined legacy dtypes.
(currently no public API changes, but the hash changed). The type safety may not be important as such, but there used to be issues, at least with MSVC about this, and clang warns.
Note that these attributes are *not* fixed yet, just a start to have something.
@eric-wieser if you have a larger chunk of concentration cycles (I know those are hard to get by, and its not getting easier right now likely)... I am not sure if you looked at this before much. I do see most of this a bit in flux (e.g. as soon as we tag on more to the DType class, it may be necessary to create them explicitly). But I would be happy if we can get something in this direction going shortly after merging the NEP, which I hope we can manage in less then two weeks and I think you would be the best reviewer, or to suggest larger changes. |
Now that 1.19 is Edit: out -> branched |
Yes, that would be great. I am not sure there is much for me to do here right now though. It requires some review. If anyone wants to talk about it or so, I am happy to do that. |
LGTM. It may need some iteration and adjustment as the work progresses, but I think the approach is solid. Since this unlocks more work I would like to put it in soonish. @eric-wieser thoughts? |
Thanks @seberg. This may need some fine tuning but let's see how it goes. |
This has broken compilation with the cython + cpython master. You get failures like:
I am testing with cpython master which is a bit annoying to bisect given the change to |
You are running into #16419, have not figured out yet what I have to move around. I guess those declaration just must not be in that file due to the nature of how it is used, but not exactly sure... |
Ah, great (well, from the point of view that I don't have to bisect this further...not sure how long it would have taken me to get to "maybe it is gcc"....)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just reading through the PR to understand some of the dtype changes. Overall this is a solid PR :) ! Added a few comments, feel free to ignore.
* should be defined on the class and inherited to the scalar. | ||
* (NPY_HALF is the largest builtin one.) | ||
*/ | ||
for (i = 0; i <= NPY_HALF; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like NPY_NTYPES seems to be the upper limit here. For example, if we were to add a new builtin type that would be added after NPY_HALF? doubt, if this even happens to even worry about ?
return -1; | ||
} | ||
|
||
snprintf(tp_name, name_length, "numpy.dtype[%s]", scalar_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not really care right now how we want to name this in the future. Although I actually like the square bracket idea.
Only downside i see is that from a python user perspective this is not a valid class name, and trying to create an object (in the obvious way) will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True... I suppose the metaclass could probably have a __repr__
slot to fix some of it. Although, it may be that to actually make it copy-pastable, we need a .
in there, which may mean it can only be done by making everything HeapTypes (which is not totally unreasonable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something like numpy.dtype_%s ? Although this may be a bit ugly and little less self-explanatory of it being a subclass of np.dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do various things I guess, but it will be a while before we should seriously consider exposing these in the API, and I think until then, it will be fine to modify the __repr__
as we like.
if (dot) { | ||
scalar_name = dot + 1; | ||
} | ||
ssize_t name_length = strlen(scalar_name) + 14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a comment for this magic number would be great
static void | ||
dtypemeta_dealloc(PyArray_DTypeMeta *self) { | ||
/* Do not accidentally delete a statically defined DType: */ | ||
assert(((PyTypeObject *)self)->tp_flags & Py_TPFLAGS_HEAPTYPE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will always raise with the current code right? also this functions like a static type today so this dealloc never actually happens ?
* in the future when we implement HeapTypes (python/dynamically | ||
* defined types). It should be revised at that time. | ||
*/ | ||
assert(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like this was added for debugging ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that and the other comments. This one was intentional, because I thought if we actually start using it, we should check that it actually works correctly :).
static PyMemberDef dtypemeta_members[] = { | ||
{"_abstract", | ||
T_BYTE, offsetof(PyArray_DTypeMeta, abstract), READONLY, NULL}, | ||
{"type", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this also be hidden ?
# which are more than just storage information, these would need to be | ||
# given when creating a dtype: | ||
parametric = (np.void, np.str_, np.bytes_, np.datetime64, np.timedelta64) | ||
if dtype.type not in parametric: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i find the following behavior a bit confusing :
>>> dtype = np.dtype("float32")
>>> dtype.type == type(dtype).type
True
maybe attribute name being different from type/_type will help ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to figure this out before 1.20. I currently see the dtype type as non-existing, i.e. its a class attribute, which the instance simply looks up on the class. But: I think there may be use-cases which DType.type
is NULL
, and in that case, there may be some use in dtype.type
being non-null. Although I suspect that is useless in practice (at least unless I expand some of the classmethods to also work as instance-methods in the NEP).
For now, I would view it as being a bad implementation of:
class DType:
type = something
@property
def dtype(self):
return self.type # returns the class attribute always.
but we store it also on the dtype
for backward compatibility reasons.
This makes all dtypes being subclasses of a:
subclass of np.dtype.
Right now this happens by replacing the Type field of of the
singleton instance. This is valid for types/objects not created
in python (not HeapTypes). It is necessary to have this functionality
for continuing support of user defined legacy dtypes.
This is basically the draft start for the proposed NEP 42. I would not want to expose much more, if we have some agreement that
np.dtype[np.float64]
is a nice API, I would be happy to add such small things (with aPreliminaryDTypeAPIWarning
).My hope is this:
np.array(...)
coercion.Most of which has to be defined in more technical detail and with specific decisions and would be part of NEP 42 though.
This PR aims to add no new public API with the exception of barely useful DType classes!. In aprticularly no C-API is added.
As of now this is probably still a bit draft. The whole thing is a bit ugly, but I think it is actually somewhat fine, and we have to do this "heap allocated static type" for things such as
quaternions
.