API: Create Preliminary DTypeMeta class and np.dtype subclasses #15508

seberg · 2020-02-04T18:53:14Z

This makes all dtypes being subclasses of a:

class DType(np.dtype):
    is_abstract = False
    is_flexible = False  % True for void, strings, and datetimes
    singleton = prototype_dtype_instance
    # other information defined on the singleton currently

subclass of np.dtype.

Right now this happens by replacing the Type field of of the
singleton instance. This is valid for types/objects not created
in python (not HeapTypes). It is necessary to have this functionality
for continuing support of user defined legacy dtypes.

This is basically the draft start for the proposed NEP 42. I would not want to expose much more, if we have some agreement that np.dtype[np.float64] is a nice API, I would be happy to add such small things (with a PreliminaryDTypeAPIWarning).

My hope is this:

We can agree to merge something like this, even if it may still be a bit in flux.
This enables other smaller changes, which are somewhat unrelated. E.g. use new methods stored on these DTypes to:
- reimplement the current np.array(...) coercion.
- Reimplement how sorting is handled (add new sort functions) if someone cares for it, it is low priority for me.
- Reimplement how we organize casting.

Most of which has to be defined in more technical detail and with specific decisions and would be part of NEP 42 though.

This PR aims to add no new public API with the exception of barely useful DType classes!. In aprticularly no C-API is added.

As of now this is probably still a bit draft. The whole thing is a bit ugly, but I think it is actually somewhat fine, and we have to do this "heap allocated static type" for things such as quaternions.

numpy/core/src/multiarray/dtypemeta.c

seberg · 2020-02-04T19:08:14Z

The only thing from python that this is supposed to change, is that:

type(np.dtype(np.float64))  # EDIT: fixed

should not be np.dtype, but instead gives/prints: numpy.dtype[numpy.float64]. I do not really care right now how we want to name this in the future. Although I actually like the square bracket idea. Possibly even to get the class. Since that removes the need for putting a second set of names somewhere if we need np.dtype[np.inexact].

numpy/core/src/multiarray/dtypemeta.c

seberg · 2020-02-05T00:44:02Z

numpy/core/code_generators/genapi.py

+            f"extern NPY_NO_EXPORT {self.internal_type} {mangled_name};\n"
+            # And define the name as: (*(type *)(&mangled_name))
+            f"#define {self.name} (*({self.ptr_cast} *)(&{mangled_name}))\n"
+        )


The API generation changes are the ugliest part here... Unfortunately, I really want to hide everything away (unless it is an internal build), and have the PyArrayDescr_Type defined as PyTypeObject even internally. But I also need to initialize/allocate it statically as a PyArray_DTypeMeta.

numpy/core/src/multiarray/dtypemeta.c

eric-wieser · 2020-02-18T20:52:33Z

We've had a lot of confusion in past discussions about issubclass vs isinstance relationships. Can you add some tests to this PR to make it easy to tell at a glance what is actually implemented here?

seberg · 2020-02-18T21:58:06Z

Thanks for having a look Eric. I had put this mainly in the hope to get quick judgment as to this "static" type being a bit weird. But by now I think it is a good first step (even if not pretty). Of course we could at some point decide to make all dtypes heaptypes, but that is fully changable.

There was a bug with __name__, because the name part of the type (for a static type) cannot include ., that is fixed now. The test is minimal, but it does test pretty much everything that is actually changed.

seberg · 2020-02-18T22:25:40Z

numpy/core/src/multiarray/dtypemeta.c

+legacy_dtype_default_new(PyArray_DTypeMeta *self,
+        PyObject *args, PyObject *kwargs)
+{
+    /* TODO: This should allow endianess and possibly metadata */


I am happy to just disable this completely for now as well. Or, put a PreliminaryAPI warning here. Which I think would be nice in general, but I am not sure I care too much about right away (it is not like this is a tricky API choice), that type(instance)() gives the default instance.

The main annoyance about this PR is, how to create the TypeStruct in a way that is hidden from the public.

Warnings are probably annoying, given that you'll then have to filter them out again if you want to use this internally (applies to any preliminary API). How about just putting a big warning in the docs for any such object? And on the Python side, keep it out of the main namespace.

True, as this is the python side, for a long time it probably only matters in tests though. np.dtype does not really have a test right now. I could change the repr away from np.dtype[float64] if you prefer. E.g. just add an underscore to make it "private".

This makes all dtypes being subclasses of a: class DType(np.dtype): is_abstract = False is_flexible = False % True for void, strings, and datetimes singleton = prototype_dtype_instance # other information defined on the singleton currently subclass of np.dtype. Right now this happens by replacing the Type field of of the singleton instance. This is valid for types/objects not created in python (not HeapTypes). It is necessary to have this functionality for continuing support of user defined legacy dtypes.

(currently no public API changes, but the hash changed). The type safety may not be important as such, but there used to be issues, at least with MSVC about this, and clang warns.

…e intention is

Note that these attributes are *not* fixed yet, just a start to have something.

…creation

seberg · 2020-03-20T18:05:13Z

@eric-wieser if you have a larger chunk of concentration cycles (I know those are hard to get by, and its not getting easier right now likely)... I am not sure if you looked at this before much. I do see most of this a bit in flux (e.g. as soon as we tag on more to the DType class, it may be necessary to create them explicitly).

But I would be happy if we can get something in this direction going shortly after merging the NEP, which I hope we can manage in less then two weeks and I think you would be the best reviewer, or to suggest larger changes.

mattip · 2020-05-19T07:13:49Z

Now that 1.19 is ~~out~~ branched, we should move this forward

Edit: out -> branched

seberg · 2020-05-19T14:18:41Z

Yes, that would be great. I am not sure there is much for me to do here right now though. It requires some review. If anyone wants to talk about it or so, I am happy to do that.

mattip · 2020-05-19T18:16:13Z

LGTM. It may need some iteration and adjustment as the work progresses, but I think the approach is solid. Since this unlocks more work I would like to put it in soonish. @eric-wieser thoughts?

mattip · 2020-05-27T06:39:06Z

Thanks @seberg. This may need some fine tuning but let's see how it goes.

tacaswell · 2020-05-29T19:10:09Z

This has broken compilation with the cython + cpython master.

You get failures like:

    C compiler: gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC

    compile options: '-DNPY_INTERNAL_BUILD=1 -DHAVE_NPY_CONFIG_H=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -Inumpy/core/include -Ibuild/src.linux-x86_64-3.10/numpy/core/include/numpy -Inumpy/core/src/common -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -I/home/tcaswell/.virtualenvs/bleeding/include -I/home/tcaswell/.pybuild/bleeding/include/python3.10 -Ibuild/src.linux-x86_64-3.10/numpy/core/src/common -Ibuild/src.linux-x86_64-3.10/numpy/core/src/npymath -c'
    extra options: '-std=c99'
    gcc -pthread -shared build/temp.linux-x86_64-3.10/build/src.linux-x86_64-3.10/numpy/core/src/multiarray/_multiarray_tests.o build/temp.linux-x86_64-3.10/numpy/core/src/common/mem_overlap.o -Lbuild/temp.linux-x86_64-3.10 -lnpymath -o build/lib.linux-x86_64-3.10/numpy/core/_multiarray_tests.cpython-310-x86_64-linux-gnu.so
    /usr/bin/ld: build/temp.linux-x86_64-3.10/numpy/core/src/common/mem_overlap.o:/tmp/pip-req-build-cxqd0fcv/numpy/core/include/numpy/ndarraytypes.h:1824: multiple definition of `PyArrayDTypeMeta_Type'; build/temp.linux-x86_64-3.10/build/src.linux-x86_64-3.10/numpy/core/src/multiarray/_multiarray_tests.o:/tmp/pip-req-build-cxqd0fcv/numpy/core/include/numpy/ndarraytypes.h:1824: first defined here
    collect2: error: ld returned 1 exit status
    error: Command "gcc -pthread -shared build/temp.linux-x86_64-3.10/build/src.linux-x86_64-3.10/numpy/core/src/multiarray/_multiarray_tests.o build/temp.linux-x86_64-3.10/numpy/core/src/common/mem_overlap.o -Lbuild/temp.linux-x86_64-3.10 -lnpymath -o build/lib.linux-x86_64-3.10/numpy/core/_multiarray_tests.cpython-310-x86_64-linux-gnu.so" failed with exit status 1
    Running setup.py install for numpy ... error
ERROR: Command errored out with exit status 1: /home/tcaswell/.virtualenvs/bleeding/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-cxqd0fcv/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-cxqd0fcv/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-8ridma62/install-record.txt --single-version-externally-managed --compile --install-headers /home/tcaswell/.virtualenvs/bleeding/include/site/python3.10/numpy Check the logs for full command output.
Exception information:
Traceback (most recent call last):
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/req/req_install.py", line 827, in install
    success = install_legacy(
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/operations/install/legacy.py", line 86, in install
    raise LegacyInstallFailure
pip._internal.operations.install.legacy.LegacyInstallFailure

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/cli/base_command.py", line 188, in _main
    status = self.run(options, args)
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/cli/req_command.py", line 185, in wrapper
    return func(self, options, args)
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/commands/install.py", line 398, in run
    installed = install_given_reqs(
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/req/__init__.py", line 67, in install_given_reqs
    requirement.install(
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/req/req_install.py", line 845, in install
    six.reraise(*exc.parent)
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_vendor/six.py", line 703, in reraise
    raise value
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/operations/install/legacy.py", line 74, in install
    runner(
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/utils/subprocess.py", line 270, in runner
    call_subprocess(
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/utils/subprocess.py", line 241, in call_subprocess
    raise InstallationError(exc_msg)
pip._internal.exceptions.InstallationError: Command errored out with exit status 1: /home/tcaswell/.virtualenvs/bleeding/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-cxqd0fcv/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-cxqd0fcv/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-8ridma62/install-record.txt --single-version-externally-managed --compile --install-headers /home/tcaswell/.virtualenvs/bleeding/include/site/python3.10/numpy Check the logs for full command output.
Removed build tracker: '/tmp/pip-req-tracker-76reoduh'

I am testing with cpython master which is a bit annoying to bisect given the change to Py_SIZE and Py_TYPE so I have not chased it down further yet.

seberg · 2020-05-29T19:13:10Z

You are running into #16419, have not figured out yet what I have to move around. I guess those declaration just must not be in that file due to the nature of how it is used, but not exactly sure...

tacaswell · 2020-05-29T19:15:49Z

Ah, great (well, from the point of view that I don't have to bisect this further...not sure how long it would have taken me to get to "maybe it is gcc"....)!

anirudh2290

I was just reading through the PR to understand some of the dtype changes. Overall this is a solid PR :) ! Added a few comments, feel free to ignore.

anirudh2290 · 2020-06-19T02:05:19Z

numpy/core/src/multiarray/arraytypes.c.src

+     * should be defined on the class and inherited to the scalar.
+     * (NPY_HALF is the largest builtin one.)
+     */
+    for (i = 0; i <= NPY_HALF; i++) {


nit: looks like NPY_NTYPES seems to be the upper limit here. For example, if we were to add a new builtin type that would be added after NPY_HALF? doubt, if this even happens to even worry about ?

anirudh2290 · 2020-06-19T02:30:41Z

numpy/core/src/multiarray/dtypemeta.c

+        return -1;
+    }
+
+    snprintf(tp_name, name_length, "numpy.dtype[%s]", scalar_name);


I do not really care right now how we want to name this in the future. Although I actually like the square bracket idea.

Only downside i see is that from a python user perspective this is not a valid class name, and trying to create an object (in the obvious way) will fail.

True... I suppose the metaclass could probably have a __repr__ slot to fix some of it. Although, it may be that to actually make it copy-pastable, we need a . in there, which may mean it can only be done by making everything HeapTypes (which is not totally unreasonable).

Can we do something like numpy.dtype_%s ? Although this may be a bit ugly and little less self-explanatory of it being a subclass of np.dtype.

We could do various things I guess, but it will be a while before we should seriously consider exposing these in the API, and I think until then, it will be fine to modify the __repr__ as we like.

anirudh2290 · 2020-06-19T18:58:15Z

numpy/core/src/multiarray/dtypemeta.c

+    if (dot) {
+        scalar_name = dot + 1;
+    }
+    ssize_t name_length = strlen(scalar_name) + 14;


nit: a comment for this magic number would be great

anirudh2290 · 2020-06-19T19:16:26Z

numpy/core/src/multiarray/dtypemeta.c

+static void
+dtypemeta_dealloc(PyArray_DTypeMeta *self) {
+    /* Do not accidentally delete a statically defined DType: */
+    assert(((PyTypeObject *)self)->tp_flags & Py_TPFLAGS_HEAPTYPE);


this will always raise with the current code right? also this functions like a static type today so this dealloc never actually happens ?

anirudh2290 · 2020-06-19T19:25:42Z

numpy/core/src/multiarray/dtypemeta.c

+     * in the future when we implement HeapTypes (python/dynamically
+     * defined types). It should be revised at that time.
+     */
+    assert(0);


nit: looks like this was added for debugging ?

Thanks for that and the other comments. This one was intentional, because I thought if we actually start using it, we should check that it actually works correctly :).

anirudh2290 · 2020-06-19T19:38:27Z

numpy/core/src/multiarray/dtypemeta.c

+static PyMemberDef dtypemeta_members[] = {
+    {"_abstract",
+        T_BYTE, offsetof(PyArray_DTypeMeta, abstract), READONLY, NULL},
+    {"type",


should this also be hidden ?

anirudh2290 · 2020-06-19T19:56:07Z

numpy/core/tests/test_dtype.py

+        # which are more than just storage information, these would need to be
+        # given when creating a dtype:
+        parametric = (np.void, np.str_, np.bytes_, np.datetime64, np.timedelta64)
+        if dtype.type not in parametric:


i find the following behavior a bit confusing :

>>> dtype = np.dtype("float32") >>> dtype.type == type(dtype).type True

maybe attribute name being different from type/_type will help ?

I have to figure this out before 1.20. I currently see the dtype type as non-existing, i.e. its a class attribute, which the instance simply looks up on the class. But: I think there may be use-cases which DType.type is NULL, and in that case, there may be some use in dtype.type being non-null. Although I suspect that is useless in practice (at least unless I expand some of the classmethods to also work as instance-methods in the NEP).

For now, I would view it as being a bad implementation of:

class DType: type = something @property def dtype(self): return self.type # returns the class attribute always.

but we store it also on the dtype for backward compatibility reasons.

seberg commented Feb 4, 2020

View reviewed changes

numpy/core/src/multiarray/dtypemeta.c Outdated Show resolved Hide resolved

seberg commented Feb 4, 2020

View reviewed changes

numpy/core/src/multiarray/dtypemeta.c Outdated Show resolved Hide resolved

charris added 01 - Enhancement component: numpy._core labels Feb 4, 2020

seberg commented Feb 5, 2020

View reviewed changes

seberg force-pushed the dtypemeta-new branch 3 times, most recently from 143bc06 to a729ca8 Compare February 9, 2020 01:55

mattip reviewed Feb 9, 2020

View reviewed changes

numpy/core/src/multiarray/dtypemeta.c Outdated Show resolved Hide resolved

seberg force-pushed the dtypemeta-new branch 9 times, most recently from abf365a to 0cc50c3 Compare February 9, 2020 22:22

seberg commented Feb 18, 2020

View reviewed changes

seberg marked this pull request as ready for review February 23, 2020 18:13

seberg added 7 commits March 18, 2020 17:29

Possible initialization safety fixups and fix API hash.

4a080dc

(currently no public API changes, but the hash changed). The type safety may not be important as such, but there used to be issues, at least with MSVC about this, and clang warns.

Small fixups/comments

5ced904

Create a static prototype to copy, so that it is more obvious what th…

63ed08c

…e intention is

STY: Just small indentation fixup

c3c0786

Add tests and fixup __name__

e02cae8

Use "parametric" instead of "flexible" and add tests for attributes

6d951be

Note that these attributes are *not* fixed yet, just a start to have something.

seberg added 3 commits March 18, 2020 17:29

Improve docs and test slightl; Reject also kwargs in dtype from type …

7c36544

…creation

PyDict_GET_SIZE is not public (or at least not on older python)

e6fde0e

small fixups/name changes

2ea745b

seberg force-pushed the dtypemeta-new branch from 3fa1945 to 2ea745b Compare March 18, 2020 23:04

seberg mentioned this pull request Apr 13, 2020

ENH: make ndarray generic over dtype numpy/numpy-stubs#48

Closed

seberg added the triage review Issue/PR to be discussed at the next triage meeting label Apr 22, 2020

seberg added this to the 1.20.0 release milestone May 5, 2020

seberg removed the triage review Issue/PR to be discussed at the next triage meeting label May 5, 2020

mattip merged commit 36e0171 into numpy:master May 27, 2020

person142 mentioned this pull request Jun 9, 2020

Typing support for dtypes #16545

Closed

anirudh2290 reviewed Jun 19, 2020

View reviewed changes

Uh oh!

API: Create Preliminary DTypeMeta class and np.dtype subclasses #15508

API: Create Preliminary DTypeMeta class and np.dtype subclasses #15508

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!