@@ -23,11 +23,11 @@ in a descriptor, which is a python object of type ``dtype``.
23
23
24
24
The ``dtype `` obect instance ``a `` has attributes, among them ``a.type ``, which
25
25
is a class object. Instantiating that class object ``a.type(3) `` produces a
26
- Num`py `scalar <http://www.numpy.org/devdocs/reference/arrays.scalars.html >`_.
26
+ Numpy `scalar <http://www.numpy.org/devdocs/reference/arrays.scalars.html >`_.
27
27
28
- This NEP proposes a class heirarchy for dtypes. The ``np.dtype `` class will
29
- become an abstrace base class, and a number of new classes will be created with
30
- a heirarchy like scalars. They will support subclassing. A future NEP may
28
+ This NEP proposes a class hierarchy for dtypes. The ``np.dtype `` class will
29
+ become an abstract base class, and a number of new classes will be created with
30
+ a hierarchy like scalars. They will support subclassing. A future NEP may
31
31
propose unifying the scalar and dtype type systems, but that is not a goal of
32
32
this NEP. The changed dtype will:
33
33
@@ -46,63 +46,59 @@ this NEP. The changed dtype will:
46
46
Overall Design
47
47
--------------
48
48
49
- The ``Dtype `` class (and any subclass without ``itemsize ``) is by definition an
50
- abstract base class. A metaclass ``DtypeMeta `` is used to add slots for
51
- converting memory chunks into objects and back, to provide datatype-specific
52
- functionality, and to provide the casting functions to convert data.
49
+ The ``Dtype `` class (and any subclass without ``itemsize ``) is effectively an
50
+ abstract base class, as it cannot be used to create instances. A class
51
+ hierarchy is used to add datatype-specific functionality such as ``names `` and
52
+ ``fields `` for structured dtypes. The current behaviours are preserved:
53
+
54
+ - ``np.dtype(obj, align=False, copy=False) `` calls ``arraydescr_new `` with
55
+ various types of ``obj ``:
56
+ - ``int ``, ``np.genenric `` scalar classes, list or dict are parsed into
57
+ appropriate dtypes
58
+ - singletons are returned where appropriate
59
+
60
+ Additionally, dtype subclasses are passed through to the subclass ``__new__ ``
53
61
54
62
A prototype without error checking, without options handling, and describing
55
63
only ``np.dtype(np.uint8) `` together with an overridable ``get_format_function ``
56
64
for ``arrayprint `` looks like::
57
65
58
66
import numpy as np
59
67
60
- class DtypeMeta(type ):
61
- # Add slot methods to the base Dtype to handle low-level memory
62
- # conversion to/from char[itemsize] to int/float/utf8/whatever
63
- # In cython this would look something like
64
- #cdef int (*unbox_method)(PyObject* self, PyObject* source, char* dest)
65
- #cdef PyObject* (*box_method)(PyObject* self, char* source)
66
-
67
- def __call__( cls, *args, **kwargs):
68
+ class Dtype( ):
69
+ def __new__(cls, *args, **kwargs):
70
+ if len(args) == 0:
71
+ # Do not allow creating instances of abstract base classes
72
+ if not hasattr(cls, 'itemsize'):
73
+ raise ValueError("cannot create instances of "
74
+ f"abstract class {cls!r}")
75
+ return super().__new__( cls, *args, **kwargs)
68
76
# This is reached for Dtype(np.uint8, ...).
69
77
# Does not yet handle align, copy positional arguments
70
- if len(args) > 0:
71
- obj = args[0]
72
- if isinstance(obj, int):
73
- return dtype_int_dict[obj]
74
- elif isinstance(obj, type) and issubclass(obj, np.generic):
75
- return dtype_scalar_dict[obj]
76
- else:
77
- # Dtype('int8') or Dtype('S10') or record descr
78
- return create_new_descr(cls, *args, **kwargs)
78
+ obj = args[0]
79
+ if isinstance(obj, int):
80
+ return dtype_int_dict[obj]
81
+ elif isinstance(obj, type) and issubclass(obj, np.generic):
82
+ return dtype_scalar_dict[obj]
79
83
else:
80
- # At import, when creating Dtype and subclasses
81
- return type.__call__ (cls, *args, **kwargs)
84
+ # Dtype('int8') or Dtype('S10') or record descr
85
+ return create_new_descr (cls, *args, **kwargs)
82
86
83
- class Dtype():
84
- def __new__(cls, *args, **kwargs):
85
- # Do not allow creating instances of abstract base classes
86
- if not hasattr(cls, 'itemsize'):
87
- raise ValueError("cannot create instances of "
88
- f"abstract class {cls!r}")
89
- return super().__new__(cls, *args, **kwargs)
90
-
91
- class GenericDescr(Dtype, metaclass=DtypeMeta):
87
+ class GenericDtype(Dtype):
92
88
pass
93
89
94
- class IntDescr(GenericDescr ):
90
+ class IntDtype(GenericDtype ):
95
91
def __repr__(self):
96
92
# subclass of IntDescr
97
93
return f"dtype('{_kind_to_stem[self.kind]}{self.itemsize:d}')"
98
94
99
95
def get_format_function(self, data, **options):
100
96
# replaces switch on dtype found in _get_format_function
101
97
# (in arrayprint), **options details missing
102
- from np .core.arrayprint import IntegerFormat
98
+ from numpy .core.arrayprint import IntegerFormat
103
99
return IntegerFormat(data)
104
100
105
- class UInt8Descr(IntDescr ):
101
+ class UInt8Dtype(IntDtype ):
106
102
kind = 'u'
107
103
itemsize = 8
108
104
type = np.uint8
@@ -111,34 +107,24 @@ for ``arrayprint`` looks like::
111
107
#ArrFuncs = int8_arrayfuncs
112
108
113
109
114
- dtype_int_dict = {1: UInt8Descr ()}
115
- dtype_scalar_dict = {np.uint8: UInt8Descr()}
110
+ dtype_int_dict = {1: UInt8Dtype ()}
111
+ dtype_scalar_dict = {np.uint8: dtype_int_dict[1]}
116
112
_kind_to_stem = {
117
113
'u': 'uint',
118
114
'i': 'int',
119
- 'c': 'complex',
120
- 'f': 'float',
121
- 'b': 'bool',
122
- 'V': 'void',
123
- 'O': 'object',
124
- 'M': 'datetime',
125
- 'm': 'timedelta',
126
- 'S': 'bytes',
127
- 'U': 'str',
128
115
}
129
116
117
+
130
118
At NumPy startup, as we do today, we would generate the builtin set of
131
119
descriptor instances, and fill in ``dtype_int_dict `` and ``dtype_scalar_type ``
132
- so that the built-in descriptors would continue to be singletons. ``Void ``,
133
- ``Byte `` and ``Unicode `` descriptors would be constructed on demand, as is done
134
- today. The magic that returns a singleton or a new descriptor happens in
135
- ``DtypeMeta.__call__ ``.
120
+ so that the built-in descriptors would continue to be singletons. Some
121
+ descriptors would be constructed on demand, as is done today.
136
122
137
123
All descriptors would inherit from ``Dtype ``::
138
124
139
125
>>> a = np.dtype(np.uint8)
140
126
>>> type(a).mro()
141
- [<class 'UInt8Descr '>, <class 'IntDescr '>, <class 'GenericDescr '>,
127
+ [<class 'UInt8Dtype '>, <class 'IntDtype '>, <class 'GenericDtype '>,
142
128
<class 'Dtype'>, <class 'object'>]
143
129
144
130
>>> isinstance(a, np.dtype):
@@ -150,38 +136,65 @@ Note that the ``repr`` of ``a`` is compatibility with NumPy::
150
136
"dtype('uint8')"
151
137
152
138
Each class will have its own set of ArrFuncs (``clip ``, ``fill ``,
153
- ``cast ``).
139
+ ``cast ``) and attributes appropriate to that class.
154
140
155
141
Downstream users of NumPy can subclass these type classes. Creating a categorical
156
142
dtype would look like this (without error checking for out-of-bounds values)::
157
143
158
- class Colors (Dtype):
144
+ class Plant (Dtype):
159
145
itemsize = 8
160
- colors = ['red ', 'green ', 'blue ']
146
+ names = ['tree ', 'flower ', 'grass ']
161
147
def get_format_function(self, data, **options):
162
148
class Format():
163
149
def __init__(self, data):
164
150
pass
165
151
def __call__(self, x):
166
- return self.colors [x]
152
+ return Plant.names [x]
167
153
return Format(data)
168
154
169
- c = np.array([0, 1, 1, 0, 2], dtype=Colors )
155
+ c = np.array([0, 1, 1, 0, 2], dtype=Plant )
170
156
171
157
Additional code would be needed to neutralize the slot functions.
172
158
173
- There is a level of indirection between ``Dtype `` and ``IntDescr `` so that
174
- downstream users could create their own duck-descriptors that do not use
175
- ``DtypeMeta.__call__ `` at all, but could still answer ``True `` to
176
- ``isintance(mydtype, Dtype) ``.
159
+ The overall hierarchy is meant to map to the scalar hierarchy.
160
+
161
+ Now ``arrayprint `` would look something like this (very much simplified, the
162
+ actual format details are not the point):
163
+
164
+ def arrayformat(data, dtype):
165
+ formatter = dtype.get_format_function(data)
166
+ result = []
167
+ for v in data:
168
+ result.append(formatter(v))
169
+ return 'array[' + ', '.join(result) + ']'
170
+
171
+ def arrayprint(data):
172
+ print(arrayformat(data, data.dtype))
173
+
174
+ a = np.array([0, 1, 2, 0, 1, 2], dtype='uint8')
175
+
176
+ # Create a dtype instance, returns a singleton from dtype_scalar_dict
177
+ uint8 = Dtype(np.uint8)
178
+
179
+ # Create a user-defined dtype
180
+ garden = Plant()
181
+
182
+ # We cannot use ``arrayprint `` just yet, but ``arrayformat `` works
183
+ print(arrayformat(a, uint8))
184
+
185
+ array[0, 1, 2, 0, 1, 2]
186
+
187
+ print(arrayformat(a, garden))
188
+
189
+ array[tree, flower, grass, tree, flower, grass]
177
190
178
191
Advantages
179
192
==========
180
193
181
194
It is very difficult today to override dtype behaviour. Internally
182
195
descriptor objects are all instances of a generic dtype class and internally
183
196
behave as containers more than classes with method overrides. Giving them a
184
- class heirarchy with overrideable methods will reduce explicit branching in
197
+ class hierarchy with overrideable methods will reduce explicit branching in
185
198
code (at the expense of a dictionary lookup) and allow downstream users to
186
199
more easily define new dtypes. We could re-examine interoperability with
187
200
pandas _ typesystem.
@@ -195,6 +208,10 @@ should continue with `PR 12284`_ to vendor our own numpy.pxd in order to make th
195
208
transition less painful. We should not break working dtype-subclasses like
196
209
`quaterions `_.
197
210
211
+ Code that depends on all dtypes having similar attributes might break. For
212
+ instance there is no reason ``int `` dtypes need the ``names `` and ``field ``
213
+ empty attributes.
214
+
198
215
Future Extensions
199
216
=================
200
217
@@ -204,7 +221,7 @@ This would make the descriptor more like the ``int`` or ``float`` type. However
204
221
allowing instantiating scalars from descriptors is not a goal of this NEP.
205
222
206
223
A further extension would be to refactor ``numpy.datetime64 `` to use the new
207
- heirarchy .
224
+ hierarchy .
208
225
209
226
Appendix
210
227
========
0 commit comments