8000 WIP, NEP: add dtype design NEP - DtypeMeta, Dtype, and hierarchy by mattip · Pull Request #12660 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

WIP, NEP: add dtype design NEP - DtypeMeta, Dtype, and hierarchy #12660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

mattip
Copy link
Member
@mattip mattip commented Jan 4, 2019

Replaces PR #12630 after comments there.

is a class object. Instantiating that class object ``a.type(3)`` produces a
Num`py `scalar <http://www.numpy.org/devdocs/reference/arrays.scalars.html>`_.

This NEP proposes a class heirarchy for dtypes. The ``np.dtype`` class will
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hierarchy spelled wrong (here and below), also abstrace->abstract

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# subclass of IntDescr
return f"dtype('{_kind_to_stem[self.kind]}{self.itemsize:d}')"

def get_format_function(self, data, **options):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhvk noted: Still feel this would logically be on dtype.type.format; maybe that can be the default in the upper class?

Or, perhaps better, just omit it altogether and focus on a method on a dtype that currently exists, overriding that in Color.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting exists on dtypes, it is based on a switch statement to get the proper formatting function in arrayprint. I am proposing to make that a method-based lookup on the dtype.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the format function including the part where it determines, e.g., for integer the maximum size the string will be? I guess that is indeed logical to have on the dtype. Still, for the purpose of the NEP, it seems too complicated; if the point is to be able to override a dtype, I feel it would be better to have an example that is closer to what a dtype normally does, like interpreting a bit of memory.

itemsize = 8
col 10000 ors = ['red', 'green', 'blue']
def get_format_function(self, data, **options):
class Format():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhvk noted: Instead of formatting, I'd pick something that actually exists, perhaps the interpretation of the memory (either setting or getting).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of a categorical dtype, as far as I can tell, is to give textual meaning to the values in the ndarray. The only place that has meaning is when printing them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably, should be able to set that type with a string as well? Indeed, even accessing it should give, I would think, an ColorType scalar that is just an int with a different __str__ (heck, scalars come back to haunt us immediately!).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding any of the PyArray_Descr functions or ufuncs requires C code. I was looking for an easy way to demonstrate a subclass, but it seems I only confused reviewers.

@mattip mattip changed the title NEP: add dtype design NEP - DtypeMeta, Dtype, and hierarchy WIP, NEP: add dtype design NEP - DtypeMeta, Dtype, and hierarchy Jan 4, 2019
@mattip
Copy link
Member Author
mattip commented Jan 4, 2019

Simplified by removing the metaclass and extending Dtype.__new__. Fwiw, arrayformatnow works by method lookup on the dtype.

# Dtype('int8') or Dtype('S10') or record descr
return create_new_descr(cls, *args, **kwargs)

class GenericDtype(Dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be no benefit to having GenericDtype any more. I thought the argument before that you don't necessarily want to override what int means, but want to be sure that itemsize exists, was logical, so I would move some of the __new__ above here (or stick with the metaclass, it really wasn't bad and could be used to auto-fill methods based on itemsize, etc.)

Copy link
Contributor
@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip - a few more comments, though I must admit I'm confused. Mostly, I feel the examples are too abstract; is the goal actually to be able to subclass np.dtype with the NEP work in place? How would one actually put in the cast functions?

Maybe most relevant: I think it should be closer to an abstract base class, where the presence of itemsize only gets checked for the instance, not for the class.

pass

class IntDtype(GenericDtype):
def __repr__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This __repr__ could move to GenericDtype, as it would be the same for float, string (and that gives it some purpose!)

from numpy.core.arrayprint import IntegerFormat
return IntegerFormat(data)

class UInt8Dtype(IntDtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, it seems strange that if arr.dtype is meant to be an instance, we are still creating a class here, with actual instantiation of the singleton doing nothing. It does seem one might then just as well just the class as the dtype and keep instantiation in reserve for something more interesting...

Or, more in the spirit of the PR (though with the metaclass, as the check for itemsize needs to come after initialization), could this be something like:

class UIntDtype(IntDtype):
    kind = 'u'
    def __init__(cls, itemsize):
        self.itemsize = itemsize
        self.minimum = 0
        self.maximum = (1 << itemsize) - 1

uint8dtype = UIntDtype(itemsize=8)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though with the metaclass, as the check for itemsize needs to come after initialization F438

Dtypes will be immutable, so I'd expect this to go through __new__

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note with the issubclass(np.dtype, type) approach from the outdate NEP, this last line could look like:

class uint8dtype(metaclass=UIntDtype, itemsize=8):
    pass

import numpy as np

class Dtype():
def __new__(cls, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more, the metaclass does in fact seem the best idea, but it should check for the presence of itemsize on any instances not on the class, i.e., it should let type do its work, and then check that itemsize exists. This, of course, is just what ABCmeta does...

@mhvk
Copy link
Contributor
mhvk commented Jan 4, 2019

In #12630 (comment), it was noted:

Right now dtype objects already have this: PyArray_ArrFuncs->setitem converts a Python object to raw memory, and PyArray_ArrFuncs->getitem goes the other way. These are absolutely something we should have as (slot) methods on new-style dtypes

I think this goes to the crux of what I've been wondering about: how would this be done if I subclass dtype?

Note that I think it is absolutely fine for the actual work to proceed in steps, i.e., keep using ArrFuncs for now, but I think if subclassing is a goal, the NEP should address how those can make that work. Here, it would be particularly useful to have a more concrete example may be good: How would one really do your Plant, i.e., do actual setting/getting (based on enum?). But perhaps it is better to do something a bit simpler and closer to binary, e.g., implementing something that uses struct.pack/unpack (better if it is for something that is not a dtype already, which leaves the pascal string format 'p'...). Or a 1-bit format that uses np.unpackbits?

@njsmith
Copy link
Member
njsmith commented Jan 4, 2019

In the long run we certainly want to support defining new dtypes in either C or Python, so it makes sense for a spec or prototype to discuss both.

But if the plan is to implement this in stages, then it makes sense for the first stage to only support subclassing in C, since that's where all the current dtypes are defined. Then stage 2 might be adding slot wrappers so that dtype methods are callable from python, and stage 3 would be adding the glue to actually define the methods in Python.

@mattip
Copy link
Member Author
mattip commented Jan 31, 2019

If we are going with the approach that Dtype should be a metaclass, this PR should be closed in favor of #12630

@mattip
Copy link
Member Author
mattip commented Aug 19, 2019

Closing, it seems starting over will be easier than resuscitating this. Hopefully the discussion will have been useful as a starting point for a replacement.

@mattip mattip closed this Aug 19, 2019
@mattip mattip deleted the dtype-nep3 branch June 8, 2020 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0