8000 fixes from review and rework without metaclass · numpy/numpy@a2d839f · GitHub
[go: up one dir, main page]

Skip to content

Commit a2d839f

Browse files
committed
fixes from review and rework without metaclass
1 parent e1badcb commit a2d839f

File tree

1 file changed

+83
-66
lines changed

1 file changed

+83
-66
lines changed

doc/neps/nep-0029-dtype-as-type.rst

Lines changed: 83 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,11 @@ in a descriptor, which is a python object of type ``dtype``.
2323

2424
The ``dtype`` obect instance ``a`` has attributes, among them ``a.type``, which
2525
is a class object. Instantiating that class object ``a.type(3)`` produces a
26-
Num`py `scalar <http://www.numpy.org/devdocs/reference/arrays.scalars.html>`_.
26+
Numpy `scalar <http://www.numpy.org/devdocs/reference/arrays.scalars.html>`_.
2727

28-
This NEP proposes a class heirarchy for dtypes. The ``np.dtype`` class will
29-
become an abstrace base class, and a number of new classes will be created with
30-
a heirarchy like scalars. They will support subclassing. A future NEP may
28+
This NEP proposes a class hierarchy for dtypes. The ``np.dtype`` class will
29+
become an abstract base class, and a number of new classes will be created with
30+
a hierarchy like scalars. They will support subclassing. A future NEP may
3131
propose unifying the scalar and dtype type systems, but that is not a goal of
3232
this NEP. The changed dtype will:
3333

@@ -46,63 +46,59 @@ this NEP. The changed dtype will:
4646
Overall Design
4747
--------------
4848

49-
The ``Dtype`` class (and any subclass without ``itemsize``) is by definition an
50-
abstract base class. A metaclass ``DtypeMeta`` is used to add slots for
51-
converting memory chunks into objects and back, to provide datatype-specific
52-
functionality, and to provide the casting functions to convert data.
49+
The ``Dtype`` class (and any subclass without ``itemsize``) is effectively an
50+
abstract base class, as it cannot be used to create instances. A class
51+
hierarchy is used to add datatype-specific functionality such as ``names`` and
52+
``fields`` for structured dtypes. The current behaviours are preserved:
53+
54+
- ``np.dtype(obj, align=False, copy=False)`` calls ``arraydescr_new`` with
55+
various types of ``obj``:
56+
- ``int``, ``np.genenric`` scalar classes, list or dict are parsed into
57+
appropriate dtypes
58+
- singletons are returned where appropriate
59+
60+
Additionally, dtype subclasses are passed through to the subclass ``__new__``
5361

5462
A prototype without error checking, without options handling, and describing
5563
only ``np.dtype(np.uint8)`` together with an overridable ``get_format_function``
5664
for ``arrayprint`` looks like::
5765

5866
import numpy as np
5967

60-
class DtypeMeta(type):
61-
# Add slot methods to the base Dtype to handle low-level memory
62-
# conversion to/from char[itemsize] to int/float/utf8/whatever
63-
# In cython this would look something like
64-
#cdef int (*unbox_method)(PyObject* self, PyObject* source, char* dest)
65-
#cdef PyObject* (*box_method)(PyObject* self, char* source)
66-
67-
def __call__(cls, *args, **kwargs):
68+
class Dtype():
69+
def __new__(cls, *args, **kwargs):
70+
if len(args) == 0:
71+
# Do not allow creating instances of abstract base classes
72+
if not hasattr(cls, 'itemsize'):
73+
raise ValueError("cannot create instances of "
74+
f"abstract class {cls!r}")
75+
return super().__new__(cls, *args, **kwargs)
6876
# This is reached for Dtype(np.uint8, ...).
6977
# Does not yet handle align, copy positional arguments
70-
if len(args) > 0:
71-
obj = args[0]
72-
if isinstance(obj, int):
73-
return dtype_int_dict[obj]
74-
elif isinstance(obj, type) and issubclass(obj, np.generic):
75-
return dtype_scalar_dict[obj]
76-
else:
77-
# Dtype('int8') or Dtype('S10') or record descr
78-
return create_new_descr(cls, *args, **kwargs)
78+
obj = args[0]
79+
if isinstance(obj, int):
80+
return dtype_int_dict[obj]
81+
elif isinstance(obj, type) and issubclass(obj, np.generic):
82+
return dtype_scalar_dict[obj]
7983
else:
80-
# At import, when creating Dtype and subclasses
81-
return type.__call__(cls, *args, **kwargs)
84+
# Dtype('int8') or Dtype('S10') or record descr
85+
return create_new_descr(cls, *args, **kwargs)
8286

83-
class Dtype():
84-
def __new__(cls, *args, **kwargs):
85-
# Do not allow creating instances of abstract base classes
86-
if not hasattr(cls, 'itemsize'):
87-
raise ValueError("cannot create instances of "
88-
f"abstract class {cls!r}")
89-
return super().__new__(cls, *args, **kwargs)
90-
91-
class GenericDescr(Dtype, metaclass=DtypeMeta):
87+
class GenericDtype(Dtype):
9288
pass
9389

94-
class IntDescr(GenericDescr):
90+
class IntDtype(GenericDtype):
9591
def __repr__(self):
9692
# subclass of IntDescr
9793
return f"dtype('{_kind_to_stem[self.kind]}{self.itemsize:d}')"
9894

9995
def get_format_function(self, data, **options):
10096
# replaces switch on dtype found in _get_format_function
10197
# (in arrayprint), **options details missing
102-
from np.core.arrayprint import IntegerFormat
98+
from numpy.core.arrayprint import IntegerFormat
10399
return IntegerFormat(data)
104100

105-
class UInt8Descr(IntDescr):
101+
class UInt8Dtype(IntDtype):
106102
kind = 'u'
107103
itemsize = 8
108104
type = np.uint8
@@ -111,34 +107,24 @@ for ``arrayprint`` looks like::
111107
#ArrFuncs = int8_arrayfuncs
112108
113109

114-
dtype_int_dict = {1: UInt8Descr()}
115-
dtype_scalar_dict = {np.uint8: UInt8Descr()}
110+
dtype_int_dict = {1: UInt8Dtype()}
111+
dtype_scalar_dict = {np.uint8: dtype_int_dict[1]}
116112
_kind_to_stem = {
117113
'u': 'uint',
118114
'i': 'int',
119-
'c': 'complex',
120-
'f': 'float',
121-
'b': 'bool',
122-
'V': 'void',
123-
'O': 'object',
124-
'M': 'datetime',
125-
'm': 'timedelta',
126-
'S': 'bytes',
127-
'U': 'str',
128115
}
129116

117+
130118
At NumPy startup, as we do today, we would generate the builtin set of
131119
descriptor instances, and fill in ``dtype_int_dict`` and ``dtype_scalar_type``
132-
so that the built-in descriptors would continue to be singletons. ``Void``,
133-
``Byte`` and ``Unicode`` descriptors would be constructed on demand, as is done
134-
today. The magic that returns a singleton or a new descriptor happens in
135-
``DtypeMeta.__call__``.
120+
so that the built-in descriptors would continue to be singletons. Some
121+
descriptors would be constructed on demand, as is done today.
136122

137123
All descriptors would inherit from ``Dtype``::
138124

139125
>>> a = np.dtype(np.uint8)
140126
>>> type(a).mro()
141-
[<class 'UInt8Descr'>, <class 'IntDescr'>, <class 'GenericDescr'>,
127+
[<class 'UInt8Dtype'>, <class 'IntDtype'>, <class 'GenericDtype'>,
142128
<class 'Dtype'>, <class 'object'>]
143129

144130
>>> isinstance(a, np.dtype):
@@ -150,38 +136,65 @@ Note that the ``repr`` of ``a`` is compatibility with NumPy::
150136
"dtype('uint8')"
151137

152138
Each class will have its own set of ArrFuncs (``clip``, ``fill``,
153-
``cast``).
139+
``cast``) and attributes appropriate to that class.
154140

155141
Downstream users of NumPy can subclass these type classes. Creating a categorical
156142
dtype would look like this (without error checking for out-of-bounds values)::
157143

158-
class Colors(Dtype):
144+
class Plant(Dtype):
159145
itemsize = 8
160-
colors = ['red', 'green', 'blue']
146+
names = ['tree', 'flower', 'grass']
161147
def get_format_function(self, data, **options):
162148
class Format():
163149
def __init__(self, data):
164150
pass
165151
def __call__(self, x):
166-
return self.colors[x]
152+
return Plant.names[x]
167153
return Format(data)
168154

169-
c = np.array([0, 1, 1, 0, 2], dtype=Colors)
155+
c = np.array([0, 1, 1, 0, 2], dtype=Plant)
170156

171157
Additional code would be needed to neutralize the slot functions.
172158

173-
There is a level of indirection between ``Dtype`` and ``IntDescr`` so that
174-
downstream users could create their own duck-descriptors that do not use
175-
``DtypeMeta.__call__`` at all, but could still answer ``True`` to
176-
``isintance(mydtype, Dtype)``.
159+
The overall hierarchy is meant to map to the scalar hierarchy.
160+
161+
Now ``arrayprint`` would look something like this (very much simplified, the
162+
actual format details are not the point):
163+
164+
def arrayformat(data, dtype):
165+
formatter = dtype.get_format_function(data)
166+
result = []
167+
for v in data:
168+
result.append(formatter(v))
169+
return 'array[' + ', '.join(result) + ']'
170+
171+
def arrayprint(data):
172+
print(arrayformat(data, data.dtype))
173+
174+
a = np.array([0, 1, 2, 0, 1, 2], dtype='uint8')
175+
176+
# Create a dtype instance, returns a singleton from dtype_scalar_dict
177+
uint8 = Dtype(np.uint8)
178+
179+
# Create a user-defined dtype
180+
garden = Plant()
181+
182+
# We cannot use ``arrayprint`` just yet, but ``arrayformat`` works
183+
print(arrayformat(a, uint8))
184+
185+
array[0, 1, 2, 0, 1, 2]
186+
187+
print(arrayformat(a, garden))
188+
189+
array[tree, flower, grass, tree, flower, grass]
177190

178191
Advantages
179192
==========
180193

181194
It is very difficult today to override dtype behaviour. Internally
182195
descriptor objects are all instances of a generic dtype class and internally
183196
behave as containers more than classes with method overrides. Giving them a
184-
class heirarchy with overrideable methods will reduce explicit branching in
197+
class hierarchy with overrideable methods will reduce explicit branching in
185198
code (at the expense of a dictionary lookup) and allow downstream users to
186199
more easily define new dtypes. We could re-examine interoperability with
187200
pandas_ typesystem.
@@ -195,6 +208,10 @@ should continue with `PR 12284`_ to vendor our own numpy.pxd in order to make th
195208
transition less painful. We should not break working dtype-subclasses like
196209
`quaterions`_.
197210

211+
Code that depends on all dtypes having similar attributes might break. For
212+
instance there is no reason ``int`` dtypes need the ``names`` and ``field``
213+
empty attributes.
214+
198215
Future Extensions
199216
=================
200217

@@ -204,7 +221,7 @@ This would make the descriptor more like the ``int`` or ``float`` type. However
204221
allowing instantiating scalars from descriptors is not a goal of this NEP.
205222

206223
A further extension would be to refactor ``numpy.datetime64`` to use the new
207-
heirarchy.
224+
hierarchy.
208225

209226
Appendix
210227
========

0 commit comments

Comments
 (0)
0