-
-
Notifications
You must be signed in to change notification settings - Fork 11k
WIP,NEP: Create draft of DTypes NEP #14422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very happy to see a NEP that tackles this topic!
It was too much to take in a short time right now, but the main questions I have:
- does the proposal consider to fix the casting behaviour for numpy default integer types around the edges (i.e.
np.iinfo('uint64').max
andnp.iinfo('int64').min
? Cf. a whole set of issues - For that issue specifically, it would be amazing if the NEP allowed a way for users to choose their desired behaviour on crossing that respective int-boundary - either cast to an object array of python ints (for those that really need integers), or cast to float (for those who need speed more than accuracy).
PS. I'm sure you have the following issue on your radar, but I thought I'd xref for completeness: #2899
|
||
For non-flexible DTypes, the second step is trivial, since they have a canonical implementation (if there is only a single instance, that one should be typically used for backward compatibility though). For flexible DTypes a second pass is needed, this is either an ``adjust_dtypes`` step within UFuncs, or ``__discover_descr_from_pyobject__`` when coercing within ``np.array``. For the latter, this generally means a second pass is necessary for flexible dtypes (although it may be possible to optimize that for common cases). In this case the ``__common_instance__`` method has to be used as well. | ||
|
||
There is currently an open question whether ``adjust_dtypes`` may require the values in some cases. This is currently *not* strictly necessary (with the exception that ``objarr.astype("S")`` will use coercion rather than casting logic, a special case that needs to remain). It could be allowed by giving ``adjust_dtypes`` the input array in certain cases. For the moment it seems preferable to avoid this, if such a disovery step is required, it will require a helper function:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a good chance I may not have understood everything, but I believe the issues around np.uint64
will need to use the values (i.e. greater than np.iinfo('uint64').max
or not) to correctly set the dtype.
Thanks for the look, I did not update this in a bit, will do now (and want to try to make the motivation a bit clearer (especially with respect to why certain choices make sense). About tackling integer coercion issues, I do not think we particularly need this NEP to do so, since it is something that happens while creating the array. Deprecating/changing the behaviour that |
079a328
to
2a4f413
Compare
This is a first push from HackMD directly. For now as a markdown file, simply because HackMD does not seem to support rst really... [ci-skip]
a111b55
to
9243ccd
Compare
I am working on changing the current NEP to contain less technical things (with respect to actual techincal decisions). Trying to focus on a few fairly clear design decisions and a commitment of us to push this forward. So that we can hopefully accept this while avoiding most technical discussions. The plan will be to followup with more more technical NEPs later on. This does not mean that this is easy or short unfortunately... There is currently a start here, it may change drastically in the next week or two. |
Clsoing in favor of gh-15505, and following. |
Link to the current rendered version (rendering currently broken?) (Updated with 4 commits)
This is the beginning for a draft, some points may fluctuate as I am prototyping. However, the NEP includes the core concepts/decisions:
Do we want DType classes, such that for
dtype = np.dtype("float64")
,type(dtype).mro() == [numpy.Float64, numpy.dtype, object]
. (We would also have a DTypeMeta, but it seems better to hide that as an implementation detail mostly, since metaclass are just confusing. I am not actually surenumpy.dtype
has to show up in themro()
, but it would be the base class).Do we want an AbstractDType (hierarchy), which should allow to define some special coercions, such as to "blasable"/"inexact". It also provides
isinstance/issubclass
checking (through ABC like overriding) and a hierarchy for UFunc dispatching.Use
CastingImpl
objects (to begin with very limited) for defining/storing casting functions.UFuncImpl
, so that it is python executablefloat64
is cast to a string of specific length)DTypeTransferFunction
(so that we can extend in the future and allow users to implement specialized/faster versions when we are ready).Use dynamically created AbstractDType classes to support value based casting. Users will need some access to this, but maybe do not need to be able to create value based casting themselves. (I do not particularly like this part, but it works; I expect some of the indirection here and elsewhere may cause slight slowdowns, but I doubt they are serious and we should have some new opportunities for optimization).
A starting branch can be found at in seberg#6 and includes a chunk big chunk of the work for DTypes (albeit in an early prototype state which will need a lot of cleanup). UFunc dispatching are up next (may need a bit to get to a testable state).
[ci-skip]