8000 API: Create Preliminary DTypeMeta class and np.dtype subclasses by seberg · Pull Request #15508 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

API: Create Preliminary DTypeMeta class and np.dtype subclasses #15508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 27, 2020

Conversation

seberg
Copy link
Member
@seberg seberg commented Feb 4, 2020

This makes all dtypes being subclasses of a:

class DType(np.dtype):
    is_abstract = False
    is_flexible = False  % True for void, strings, and datetimes
    singleton = prototype_dtype_instance
    # other information defined on the singleton currently

subclass of np.dtype.

Right now this happens by replacing the Type field of of the
singleton instance. This is valid for types/objects not created
in python (not HeapTypes). It is necessary to have this functionality
for continuing support of user defined legacy dtypes.


This is basically the draft start for the proposed NEP 42. I would not want to expose much more, if we have some agreement that np.dtype[np.float64] is a nice API, I would be happy to add such small things (with a PreliminaryDTypeAPIWarning).

My hope is this:

  1. We can agree to merge something like this, even if it may still be a bit in flux.
  2. This enables other smaller changes, which are somewhat unrelated. E.g. use new methods stored on these DTypes to:
    • reimplement the current np.array(...) coercion.
    • Reimplement how sorting is handled (add new sort functions) if someone cares for it, it is low priority for me.
    • Reimplement how we organize casting.

Most of which has to be defined in more technical detail and with specific decisions and would be part of NEP 42 though.

This PR aims to add no new public API with the exception of barely useful DType classes!. In aprticularly no C-API is added.


As of now this is probably still a bit draft. The whole thing is a bit ugly, but I think it is actually somewhat fine, and we have to do this "heap allocated static type" for things such as quaternions.

@seberg
Copy link
Member Author
seberg commented Feb 4, 2020

The only thing from python that this is supposed to change, is that:

type(np.dtype(np.float64))  # EDIT: fixed

should not be np.dtype, but instead gives/prints: numpy.dtype[numpy.float64]. I do not really care right now how we want to name this in the future. Although I actually like the square bracket idea. Possibly even to get the class. Since that removes the need for putting a second set of names somewhere if we need np.dtype[np.inexact].

f"extern NPY_NO_EXPORT {self.internal_type} {mangled_name};\n"
# And define the name as: (*(type *)(&mangled_name))
f"#define {self.name} (*({self.ptr_cast} *)(&{mangled_name}))\n"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API generation changes are the ugliest part here... Unfortunately, I really want to hide everything away (unless it is an internal build), and have the PyArrayDescr_Type defined as PyTypeObject even internally. But I also need to initialize/allocate it statically as a PyArray_DTypeMeta.

@seberg seberg force-pushed the dtypemeta-new branch 3 times, most recently from 143bc06 to a729ca8 Compare February 9, 2020 01:55
@seberg seberg force-pushed the dtypemeta-new branch 9 times, most recently from abf365a to 0cc50c3 Compare February 9, 2020 22:22
@eric-wieser
Copy link
Member

We've had a lot of confusion in past discussions about issubclass vs isinstance relationships. Can you add some tests to this PR to make it easy to tell at a glance what is actually implemented here?

@seberg
Copy link
Member Author
seberg commented Feb 18, 2020

Thanks for having a look Eric. I had put this mainly in the hope to get quick judgment as to this "static" type being a bit weird. But by now I think it is a good first step (even if not pretty). Of course we could at some point decide to make all dtypes heaptypes, but that is fully changable.

There was a bug with __name__, because the name part of the type (for a static type) cannot include ., that is fixed now. The test is minimal, but it does test pretty much everything that is actually changed.

legacy_dtype_default_new(PyArray_DTypeMeta *self,
PyObject *args, PyObject *kwargs)
{
/* TODO: This should allow endianess and possibly metadata */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to just disable this completely for now as well. Or, put a PreliminaryAPI warning here. Which I think would be nice in general, but I am not sure I care too much about right away (it is not like this is a tricky API choice), that type(instance)() gives the default instance.

The main annoyance about this PR is, how to create the TypeStruct in a way that is hidden from the public.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings are probably annoying, given that you'll then have to filter them out again if you want to use this internally (applies to any preliminary API). How about just putting a big warning in the docs for any such object? And on the Python side, keep it out of the main namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, as this is the python side, for a long time it probably only matters in tests though. np.dtype does not really have a test right now. I could change the repr away from np.dtype[float64] if you prefer. E.g. just add an underscore to make it "private".

@seberg seberg marked this pull request as ready for review February 23, 2020 18:13
seberg added 7 commits March 18, 2020 17:29
This makes all dtypes being subclasses of a:

class DType(np.dtype):
    is_abstract = False
    is_flexible = False  % True for void, strings, and datetimes
    singleton = prototype_dtype_instance
    # other information defined on the singleton currently

subclass of np.dtype.

Right now this happens by replacing the Type field of of the
singleton instance. This is valid for types/objects not created
in python (not HeapTypes). It is necessary to have this functionality
for continuing support of user defined legacy dtypes.
(currently no public API changes, but the hash changed).
The type safety may not be important as such, but there used
to be issues, at least with MSVC about this, and clang warns.
Note that these attributes are *not* fixed yet, just a start to
have something.
@seberg
Copy link
Member Author
seberg commented Mar 20, 2020

@eric-wieser if you have a larger chunk of concentration cycles (I know those are hard to get by, and its not getting easier right now likely)... I am not sure if you looked at this before much. I do see most of this a bit in flux (e.g. as soon as we tag on more to the DType class, it may be necessary to create them explicitly).

But I would be happy if we can get something in this direction going shortly after merging the NEP, which I hope we can manage in less then two weeks and I think you would be the best reviewer, or to suggest larger changes.

@seberg seberg added the triage review Issue/PR to be discussed at the next triage meeting label Apr 22, 2020
@seberg seberg added this to the 1.20.0 release milestone May 5, 2020
@seberg seberg removed the triage review Issue/PR to be discussed at the next triage meeting label May 5, 2020
@mattip
Copy link
Member
mattip commented May 19, 2020

Now that 1.19 is out branched, we should move this forward

Edit: out -> branched

@seberg
Copy link
Member Author
seberg commented May 19, 2020

Yes, that would be great. I am not sure there is much for me to do here right now though. It requires some review. If anyone wants to talk about it or so, I am happy to do that.

@mattip
Copy link
Member
mattip commented May 19, 2020

LGTM. It may need some iteration and adjustment as the work progresses, but I think the approach is solid. Since this unlocks more work I would like to put it in soonish. @eric-wieser thoughts?

@mattip mattip merged commit 36e0171 into numpy:master May 27, 2020
@mattip
Copy link
Member
mattip commented May 27, 2020

Thanks @seberg. This may need some fine tuning but let's see how it goes.

@tacaswell
Copy link
Contributor

This has broken compilation with the cython + cpython master.

You get failures like:

    C compiler: gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC

    compile options: '-DNPY_INTERNAL_BUILD=1 -DHAVE_NPY_CONFIG_H=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -Inumpy/core/include -Ibuild/src.linux-x86_64-3.10/numpy/core/include/numpy -Inumpy/core/src/common -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -I/home/tcaswell/.virtualenvs/bleeding/include -I/home/tcaswell/.pybuild/bleeding/include/python3.10 -Ibuild/src.linux-x86_64-3.10/numpy/core/src/common -Ibuild/src.linux-x86_64-3.10/numpy/core/src/npymath -c'
    extra options: '-std=c99'
    gcc -pthread -shared build/temp.linux-x86_64-3.10/build/src.linux-x86_64-3.10/numpy/core/src/multiarray/_multiarray_tests.o build/temp.linux-x86_64-3.10/numpy/core/src/common/mem_overlap.o -Lbuild/temp.linux-x86_64-3.10 -lnpymath -o build/lib.linux-x86_64-3.10/numpy/core/_multiarray_tests.cpython-310-x86_64-linux-gnu.so
    /usr/bin/ld: build/temp.linux-x86_64-3.10/numpy/core/src/common/mem_overlap.o:/tmp/pip-req-build-cxqd0fcv/numpy/core/include/numpy/ndarraytypes.h:1824: multiple definition of `PyArrayDTypeMeta_Type'; build/temp.linux-x86_64-3.10/build/src.linux-x86_64-3.10/numpy/core/src/multiarray/_multiarray_tests.o:/tmp/pip-req-build-cxqd0fcv/numpy/core/include/numpy/ndarraytypes.h:1824: first defined here
    collect2: error: ld returned 1 exit status
    error: Command "gcc -pthread -shared build/temp.linux-x86_64-3.10/build/src.linux-x86_64-3.10/numpy/core/src/multiarray/_multiarray_tests.o build/temp.linux-x86_64-3.10/numpy/core/src/common/mem_overlap.o -Lbuild/temp.linux-x86_64-3.10 -lnpymath -o build/lib.linux-x86_64-3.10/numpy/core/_multiarray_tests.cpython-310-x86_64-linux-gnu.so" failed with exit status 1
    Running setup.py install for numpy ... error
ERROR: Command errored out with exit status 1: /home/tcaswell/.virtualenvs/bleeding/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-cxqd0fcv/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-cxqd0fcv/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-8ridma62/install-record.txt --single-version-externally-managed --compile --install-headers /home/tcaswell/.virtualenvs/bleeding/include/site/python3.10/numpy Check the logs for full command output.
Exception information:
Traceback (most recent call last):
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/req/req_install.py", line 827, in install
    success = install_legacy(
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/operations/install/legacy.py", line 86, in install
    raise LegacyInstallFailure
pip._internal.operations.install.legacy.LegacyInstallFailure

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/cli/base_command.py", line 188, in _main
    status = self.run(options, args)
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/cli/req_command.py", line 185, in wrapper
    return func(self, options, args)
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/commands/install.py", line 398, in run
    installed = install_given_reqs(
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/req/__init__.py", line 67, in install_given_reqs
    requirement.install(
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/req/req_install.py", line 845, in install
    six.reraise(*exc.parent)
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_vendor/six.py", line 703, in reraise
    raise value
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/operations/install/legacy.py", line 74, in install
    runner(
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/utils/subprocess.py", line 270, in runner
    call_subprocess(
  File "/home/tcaswell/.virtualenvs/bleeding/lib/python3.10/site-packages/pip/_internal/utils/subprocess.py", line 241, in call_subprocess
    raise InstallationError(exc_msg)
pip._internal.exceptions.InstallationError: Command errored out with exit status 1: /home/tcaswell/.virtualenvs/bleeding/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-cxqd0fcv/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-cxqd0fcv/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-8ridma62/install-record.txt --single-version-externally-managed --compile --install-headers /home/tcaswell/.virtualenvs/bleeding/include/site/python3.10/numpy Check the logs for full command output.
Removed build tracker: '/tmp/pip-req-tracker-76reoduh'

I am testing with cpython master which is a bit annoying to bisect given the change to Py_SIZE and Py_TYPE so I have not chased it down further yet.

@seberg
Copy link
Member Author
seberg commented May 29, 2020

You are running into #16419, have not figured out yet what I have to move around. I guess those declaration just must not be in that file due to the nature of how it is used, but not exactly sure...

@tacaswell
Copy link
Contributor

Ah, great (well, from the point of view that I don't have to bisect this further...not sure how long it would have taken me to get to "maybe it is gcc"....)!

Copy link
Member
@anirudh2290 anirudh2290 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just reading through the PR to understand some of the dtype changes. Overall this is a solid PR :) ! Added a few comments, feel free to ignore.

* should be defined on the class and inherited to the scalar.
* (NPY_HALF is the largest builtin one.)
*/
for (i = 0; i <= NPY_HALF; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like NPY_NTYPES seems to be the upper limit here. For example, if we were to add a new builtin type that would be added after NPY_HALF? doubt, if this even happens to even worry about ?

return -1;
}

snprintf(tp_name, name_length, "numpy.dtype[%s]", scalar_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not really care right now how we want to name this in the future. Although I actually like the square bracket idea.

Only downside i see is that from a python user perspective this is not a valid class name, and trying to create an object (in the obvious way) will fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True... I suppose the metaclass could probably have a __repr__ slot to fix some of it. Although, it may be that to actually make it copy-pastable, we need a . in there, which may mean it can only be done by making everything HeapTypes (which is not totally unreasonable).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do something like numpy.dtype_%s ? Although this may be a bit ugly and little less self-explanatory of it being a subclass of np.dtype.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do various things I guess, but it will be a while before we should seriously consider exposing these in the API, and I think until then, it will be fine to modify the __repr__ as we like.

if (dot) {
scalar_name = dot + 1;
}
ssize_t name_length = strlen(scalar_name) + 14;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a comment for this magic number would be great

static void
dtypemeta_dealloc(PyArray_DTypeMeta *self) {
/* Do not accidentally delete a statically defined DType: */
assert(((PyTypeObject *)self)->tp_flags & Py_TPFLAGS_HEAPTYPE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will always raise with the current code right? also this functions like a static type today so this dealloc never actually happens ?

* in the future when we implement HeapTypes (python/dynamically
* defined types). It should be revised at that time.
*/
assert(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like this was added for debugging ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that and the other comments. This one was intentional, because I thought if we actually start using it, we should check that it actually works correctly :).

static PyMemberDef dtypemeta_members[] = {
{"_abstract",
T_BYTE, offsetof(PyArray_DTypeMeta, abstract), READONLY, NULL},
{"type",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also be hidden ?

# which are more than just storage information, these would need to be
# given when creating a dtype:
parametric = (np.void, np.str_, np.bytes_, np.datetime64, np.timedelta64)
if dtype.type not in parametric:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find the following behavior a bit confusing :

>>> dtype = np.dtype("float32")
>>> dtype.type == type(dtype).type
True

maybe attribute name being different from type/_type will help ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to figure this out before 1.20. I currently see the dtype type as non-existing, i.e. its a class attribute, which the instance simply looks up on the class. But: I think there may be use-cases which DType.type is NULL, and in that case, there may be some use in dtype.type being non-null. Although I suspect that is useless in practice (at least unless I expand some of the classmethods to also work as instance-methods in the NEP).

For now, I would view it as being a bad implementation of:

class DType:
    type = something
    @property 
    def dtype(self):
        return self.type  # returns the class attribute always.

but we store it also on the dtype for backward compatibility reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0