-
-
Notifications
You must be signed in to change notification settings - Fork 11k
WIP: refactor dtype to be a type subclass #12585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
21dcbcf
to
2efd78b
Compare
The first commit 59ad4d3 passes all but codecov tests and is a refactor of existing code. The next batch of commits are all attempts to get a job running on Azure Pipelines with The code errors out, somehow I am not binding the class methods. The new azure job fails since
This commit produces the following output (note unbound methods need a class object):
Now if I could only figure out how to bind those class methods ... |
Perhaps |
Can you show the output of
and
|
I think |
I want to use a value that means "use old behaviour" so that the default is off, i.e. "use new behaviour" but we can leave it in for a few deprecation cycles for backward compatibility |
New (broken) code:
Old (working) code:
The only change to |
Also
Old code
|
|
I think your copy of the |
It is more convoluted than that. Attribute lookup usually goes to the object |
What happens if you remove the
Yes, exactly. I think the problem is that class dtype(type):
def __new__(metacls, name, bases, dict):
# this copy breaks methods - don't do it!
for k in dir(metacls):
try:
dict[k] = getattr(metacls, k)
except AttributeError:
pass
return super().__new__(metacls, name, bases, dict)
class bad_class(metaclass=dtype):
pass
class good_class:
pass
good_class.mro
#<built-in method mro of type object at 0x0000021A4A6F6F78>
bad_class.mro
#<method 'mro' of 'type' objects> |
No more crashes, tests run to completion. There are failures around pickling and creating new instances |
Next step: get |
Is it going wrong in |
@mhvk: it is because the function resolution goes:
The old code falls through to 3, I registered a |
down to one failing test, which it seems needs an implementation of |
* #name2 = HALF, DATETIME, TIMEDELTA# | ||
*/ | ||
#if USE_DTYPE_AS_PYOBJECT | ||
dtype = PyObject_New(PyArray_Descr, &PyArrayDescr_Type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage of heap allocating here vs the previous static allocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference. At some point perhaps subinterpreters support may become mature enough that we will want to allow reloading the NumPy module, and thinking about what that means for static allocation gets me confused.
The problem is that this patch changes the value of The test is just bad here. It should be using |
dtype = PyObject_New(PyArray_Descr, &PyArrayDescr_Type); | ||
#ifndef USE_DTYPE_AS_PYOBJECT | ||
/* Don't copy PyObject_HEAD part */ | ||
memset((char *)dtype + sizeof(PyObject), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me (and thus for future people looking at this...), it would be helpful to expand on why this is done. Does it get filled out later by actually instantiating? I realize links to documentation can get out of sync, but might still be better than nothing.
Looking at the documentation for PyObject_New
, I see "Fields not defined by the Python object header are not initialized" - so, is the point here to undo the initialization that is done? But why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the goal here is to ensure all the tp_ slots are initialized to null, without having to list them all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was whether it should be zeroed: presumably PyObject_New
initializes this section (since it states that other parts are not initialized, suggesting that these parts are), and, again presumably, this is with stuff from PyArrayDescr_Type
. If those presumptions hold, are those items wrong? Or might they have, e.g., init/new functions that can just be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, PyObject_New
does not initialize this section, because it's not part of the PyObject
head - in the same way that PyObject_New
does not initialize new instance of PyArrayObject
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment here is wrong though - the word copy
is no longer appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arggh, I just looked so poorly - of course the code zeros everything beyond PyObject
. Which definitely makes sense.
Had a quick look at the basic ideas for this again, and wondered whether perhaps something is missing: shouldn't the subclassing be more hierarchical, so that, e.g., all integer dtype are subclasses of a base integer type? That would allow one to replace, e.g., the current More generally, it seems that really there should be a NEP, arguably about this chance but certainly about the larger plans. Just as one example, some of the ideas about having |
@mattip, I think this is really close - just needs the test change I mention above, and a better comment by the @mhvk: I think the hierarchy can come later. Lifting
|
I can't tell if merging them makes sense at all. It'd be nice to have a description of this plan written down somewhere, e.g. as a NEP :-).
I don't see the benefit of dtypes providing It sounds like we're definitely not all on the same page here, which usually means we need to stop and talk things out just so everyone knows what's going on. |
I think it's worth acknowledging that there are a bunch of orthogonal and sequential changes being proposed, and I think it would be wise to try and segregate them into multiple NEPs |
I am working on drafting a NEP to describe the motivation of changing instances of |
@mattip - great to have something concrete to look at! @njsmith - the comment I made came from the suggestion for an |
In light of the discussion on nep 29 #12630, I will refactor this to define a |
PyObject_HEAD | ||
#else | ||
/* may need to be PyHeapTypeObject for new tp_as_* functions? */ | ||
PyTypeObject descrtype; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this needs to be PyHeapTypeObject. The Cython team explored this and ran into the problem before until they made meta-types inherit from PyHeapTypeObject. See, for example, https://mail.python.org/pipermail/cython-devel/2013-September/003813.html
PyTypeObject descrtype; | |
PyHeapTypeObject descrtype; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this applies to us. From that thread:
However, when MyClass is created, an instance of struct __pyx_obj_4meta_MetaClass (cc. 200 + 4 bytes) is dynamically allocated by the Python memory management machinery. The machinery then tries to initialize the allocated memory.
In our case, the instance of the dtype is malloc
'd and initialized by arraydescr_new
, so we're in complete control.
As far as I can tell, the main purpose of PyHeapTypeObject
is to convert builtin magic methods into slots, something we probably don't care about too much for dtypes.
6a54239
to
aaff618
Compare
Closing. This seems to not be the right approach. |
Replaces #12430. This first step is to refactor the global descr creation to use PyObject_New. Note the new define
USE_DTYPE_AS_PYOBJECT
which preserves the old behavior. Once the new dtype works we can flip the default.