-
-
Notifications
You must be signed in to change notification settings - Fork 11k
WIP, NEP: add dtype design NEP - DtypeMeta, Dtype, and hierarchy #12660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
doc/neps/nep-0029-dtype-as-type.rst
Outdated
is a class object. Instantiating that class object ``a.type(3)`` produces a | ||
Num`py `scalar <http://www.numpy.org/devdocs/reference/arrays.scalars.html>`_. | ||
|
||
This NEP proposes a class heirarchy for dtypes. The ``np.dtype`` class will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hierarchy spelled wrong (here and below), also abstrace->abstract
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
# subclass of IntDescr | ||
return f"dtype('{_kind_to_stem[self.kind]}{self.itemsize:d}')" | ||
|
||
def get_format_function(self, data, **options): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhvk noted: Still feel this would logically be on dtype.type.format; maybe that can be the default in the upper class?
Or, perhaps better, just omit it altogether and focus on a method on a dtype that currently exists, overriding that in Color.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting exists on dtypes, it is based on a switch
statement to get the proper formatting function in arrayprint. I am proposing to make that a method-based lookup on the dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the format function including the part where it determines, e.g., for integer the maximum size the string will be? I guess that is indeed logical to have on the dtype
. Still, for the purpose of the NEP, it seems too complicated; if the point is to be able to override a dtype, I feel it would be better to have an example that is closer to what a dtype normally does, like interpreting a bit of memory.
itemsize = 8 | ||
col 10000 ors = ['red', 'green', 'blue'] | ||
def get_format_function(self, data, **options): | ||
class Format(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhvk noted: Instead of formatting, I'd pick something that actually exists, perhaps the interpretation of the memory (either setting or getting).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of a categorical dtype, as far as I can tell, is to give textual meaning to the values in the ndarray. The only place that has meaning is when printing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably, should be able to set that type with a string as well? Indeed, even accessing it should give, I would think, an ColorType
scalar that is just an int
with a different __str__
(heck, scalars come back to haunt us immediately!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding any of the PyArray_Descr
functions or ufuncs requires C code. I was looking for an easy way to demonstrate a subclass, but it seems I only confused reviewers.
Simplified by removing the metaclass and extending |
# Dtype('int8') or Dtype('S10') or record descr | ||
return create_new_descr(cls, *args, **kwargs) | ||
|
||
class GenericDtype(Dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be no benefit to having GenericDtype
any more. I thought the argument before that you don't necessarily want to override what int
means, but want to be sure that itemsize
exists, was logical, so I would move some of the __new__
above here (or stick with the metaclass, it really wasn't bad and could be used to auto-fill methods based on itemsize
, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattip - a few more comments, though I must admit I'm confused. Mostly, I feel the examples are too abstract; is the goal actually to be able to subclass np.dtype
with the NEP work in place? How would one actually put in the cast functions?
Maybe most relevant: I think it should be closer to an abstract base class, where the presence of itemsize
only gets checked for the instance, not for the class.
pass | ||
|
||
class IntDtype(GenericDtype): | ||
def __repr__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This __repr__
could move to GenericDtype
, as it would be the same for float
, string
(and that gives it some purpose!)
from numpy.core.arrayprint import IntegerFormat | ||
return IntegerFormat(data) | ||
|
||
class UInt8Dtype(IntDtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, it seems strange that if arr.dtype
is meant to be an instance, we are still creating a class here, with actual instantiation of the singleton doing nothing. It does seem one might then just as well just the class as the dtype
and keep instantiation in reserve for something more interesting...
Or, more in the spirit of the PR (though with the metaclass, as the check for itemsize
needs to come after initialization), could this be something like:
class UIntDtype(IntDtype):
kind = 'u'
def __init__(cls, itemsize):
self.itemsize = itemsize
self.minimum = 0
self.maximum = (1 << itemsize) - 1
uint8dtype = UIntDtype(itemsize=8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though with the metaclass, as the check for itemsize needs to come after initialization F438
Dtypes will be immutable, so I'd expect this to go through __new__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note with the issubclass(np.dtype, type)
approach from the outdate NEP, this last line could look like:
class uint8dtype(metaclass=UIntDtype, itemsize=8):
pass
import numpy as np | ||
|
||
class Dtype(): | ||
def __new__(cls, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more, the metaclass does in fact seem the best idea, but it should check for the presence of itemsize
on any instances not on the class, i.e., it should let type
do its work, and then check that itemsize
exists. This, of course, is just what ABCmeta
does...
In #12630 (comment), it was noted:
I think this goes to the crux of what I've been wondering about: how would this be done if I subclass Note that I think it is absolutely fine for the actual work to proceed in steps, i.e., keep using |
In the long run we certainly want to support defining new dtypes in either C or Python, so it makes sense for a spec or prototype to discuss both. But if the plan is to implement this in stages, then it makes sense for the first stage to only support subclassing in C, since that's where all the current dtypes are defined. Then stage 2 might be adding slot wrappers so that dtype methods are callable from python, and stage 3 would be adding the glue to actually define the methods in Python. |
If we are going with the approach that Dtype should be a metaclass, this PR should be closed in favor of #12630 |
Closing, it seems starting over will be easier than resuscitating this. Hopefully the discussion will have been useful as a starting point for a replacement. |
Replaces PR #12630 after comments there.