8000 WIP,NEP: Create draft of DTypes NEP by seberg · Pull Request #14422 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

WIP,NEP: Create draft of DTypes NEP #14422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

seberg
Copy link
Member
@seberg seberg commented Sep 4, 2019

Link to the current rendered version (rendering currently broken?) (Updated with 4 commits)


This is the beginning for a draft, some points may fluctuate as I am prototyping. However, the NEP includes the core concepts/decisions:

  1. Do we want DType classes, such that for dtype = np.dtype("float64"), type(dtype).mro() == [numpy.Float64, numpy.dtype, object]. (We would also have a DTypeMeta, but it seems better to hide that as an implementation detail mostly, since metaclass are just confusing. I am not actually sure numpy.dtype has to show up in the mro(), but it would be the base class).

  2. Do we want an AbstractDType (hierarchy), which should allow to define some special coercions, such as to "blasable"/"inexact". It also provides isinstance/issubclass checking (through ABC like overriding) and a hierarchy for UFunc dispatching.

  3. Use CastingImpl objects (to begin with very limited) for defining/storing casting functions.

    • In the future, similar to/subclass of UFuncImpl, so that it is python executable
    • Provides some information (such as float64 is cast to a string of specific length)
    • Has private slot to ask for the DTypeTransferFunction (so that we can extend in the future and allow users to implement specialized/faster versions when we are ready).
  4. Use dynamically created AbstractDType classes to support value based casting. Users will need some access to this, but maybe do not need to be able to create value based casting themselves. (I do not particularly like this part, but it works; I expect some of the indirection here and elsewhere may cause slight slowdowns, but I doubt they are serious and we should have some new opportunities for optimization).


A starting branch can be found at in seberg#6 and includes a chunk big chunk of the work for DTypes (albeit in an early prototype state which will need a lot of cleanup). UFunc dispatching are up next (may need a bit to get to a testable state).

[ci-skip]

Copy link
Contributor
@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very happy to see a NEP that tackles this topic!

It was too much to take in a short time right now, but the main questions I have:

  • does the proposal consider to fix the casting behaviour for numpy default integer types around the edges (i.e. np.iinfo('uint64').max and np.iinfo('int64').min? Cf. a whole set of issues
  • For that issue specifically, it would be amazing if the NEP allowed a way for users to choose their desired behaviour on crossing that respective int-boundary - either cast to an object array of python ints (for those that really need integers), or cast to float (for those who need speed more than accuracy).

PS. I'm sure you have the following issue on your radar, but I thought I'd xref for completeness: #2899


For non-flexible DTypes, the second step is trivial, since they have a canonical implementation (if there is only a single instance, that one should be typically used for backward compatibility though). For flexible DTypes a second pass is needed, this is either an ``adjust_dtypes`` step within UFuncs, or ``__discover_descr_from_pyobject__`` when coercing within ``np.array``. For the latter, this generally means a second pass is necessary for flexible dtypes (although it may be possible to optimize that for common cases). In this case the ``__common_instance__`` method has to be used as well.

There is currently an open question whether ``adjust_dtypes`` may require the values in some cases. This is currently *not* strictly necessary (with the exception that ``objarr.astype("S")`` will use coercion rather than casting logic, a special case that needs to remain). It could be allowed by giving ``adjust_dtypes`` the input array in certain cases. For the moment it seems preferable to avoid this, if such a disovery step is required, it will require a helper function::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a good chance I may not have understood everything, but I believe the issues around np.uint64 will need to use the values (i.e. greater than np.iinfo('uint64').max or not) to correctly set the dtype.

@h-vetinari
Copy link
Contributor

PPS. Xref-ing some previous NEP PRs and related issues so that people happening upon those threads can find this PR more easily: #2899 #12585 #12630 #12660

@seberg
Copy link
Member Author
seberg commented Sep 20, 2019

Thanks for the look, I did not update this in a bit, will do now (and want to try to make the motivation a bit clearer (especially with respect to why certain choices make sense). About tackling integer coercion issues, I do not think we particularly need this NEP to do so, since it is something that happens while creating the array. Deprecating/changing the behaviour that 1 + np.array([1, 2], np.int16) inspects the value of 1 (and worse corners), should get easier to tackle though.

@seberg seberg changed the title WIP,NEP: Create early draft of DTypes NEP WIP,NEP: Createdraft of DTypes NEP Sep 27, 2019
@seberg seberg mentioned this pull request Jan 7, 2020
2 tasks
hackmd-deploy and others added 4 commits January 8, 2020 16:10
This is a first push from HackMD directly. For now as a markdown file, simply because HackMD does not seem to support rst really...

[ci-skip]
@seberg
Copy link
Member Author
seberg commented Jan 9, 2020

I am working on changing the current NEP to contain less technical things (with respect to actual techincal decisions). Trying to focus on a few fairly clear design decisions and a commitment of us to push this forward. So that we can hopefully accept this while avoiding most technical discussions.

The plan will be to followup with more more technical NEPs later on. This does not mean that this is easy or short unfortunately...

There is currently a start here, it may change drastically in the next week or two.

@seberg seberg changed the title WIP,NEP: Createdraft of DTypes NEP WIP,NEP: Create draft of DTypes NEP Jan 15, 2020
@seberg
Copy link
Member Author
seberg commented Feb 4, 2020

Clsoing in favor of gh-15505, and following.

@seberg seberg closed this Feb 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0