-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
MAINT: Use pythoncapi_compat.h in npy_3kcompat.h #18713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pythoncapi_compat.h provides new Python functions on old Python functions, like Py_NewRef(). It supports Python 2.7 but requires ISO C99 support (with a partial ISO C90 support for Python 2.7). Remove Py_SET_TYPE(), Py_SET_SIZE(), Py_SET_REFCNT(), Py_SETREF(): they are now provided by pythoncapi_compat.h.
pythoncapi_compat.h is copied from the https://github.com/pythoncapi/pythoncapi_compat project. This change replaces the numpy custom implementation of Py_SET_TYPE(), Py_SET_SIZE(), Py_SET_REFCNT() and Py_SETREF() functions with the pythoncapi_compat.h implementation which comes from CPython (sometimes with minor changes to adapt it to older Python versions). The main drawback is that pythoncapi_compat.h requires ISO C99 support, whereas numpy has compatibility with GCC -ansi and ISO C90. I'm not sure if requiring ISO C99 support is acceptable right now in numpy? numpy still has compatibility layer with Python 2.7 for example (pythoncapi_compat.h supports Python 2.7). cc @mattip |
cc @charris (who added Py_SET_TYPE and Py_SET_SIZE), @tacaswell and @eric-wieser (fixed return value of these macros). |
@vstinner NumPy has required c99 since 1.17.0, but I don't know about downstream projects that may use the |
#endif | ||
|
||
#include <Python.h> | ||
#include "frameobject.h" // PyFrameObject, PyFrame_GetBack() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "..."
for frameobject.h
? It seems to be included in the python include directory in my distro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frameobject.h is not included by Python.h :-( I'm not sure why.
I partially fixed this issue in Python 3.10: Python.h now includes a new pyframe.h header file that I added, it defines PyFrameObject, PyFrame_GetLineNumber() and PyFrame_GetCode(), but not PyFrame_GetBack() (excluded from the limited C API).
ob->ob_refcnt = refcnt; | ||
} | ||
#define Py_SET_REFCNT(ob, refcnt) _Py_SET_REFCNT(_PyObject_CAST(ob), refcnt) | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a thing? It will horribly break non-CPython implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattip I see that it is already used in numpy/core/src/multiarray/array_coercion.c
, is that in error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either:
- all the fields are guaranteed to be zeroed out in that allocation, in which case we could use
Py_INCREF
instead, or - we could be returning random data in other fields of the struct.
So I think use of that "convenience function" could enable masking possible bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a thing? It will horribly break non-CPython implementations.
The _PyObject_CAST() is required to prevent introducing new compiler warnings when Py_REFCNT(obj) = refcnt;
is replaced with Py_SET_REFCNT(obj, refcnt);
. Many functions of the Python C API were implemented as macro which casted their argument to PyObject*
, but static inline functions define a type for their parameters (usually PyObject*
).
In Python 3.10, Py_REFCNT(obj) = refcnt;
now fails with a compiler error: Py_SET_REFCNT() must be used. See:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize it was added to npy_3kcompat.h
in #16899.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... which was needed for python3.10.
The root cause was 1035c3f in #16200 which changed
PyArrayObject_fields arr_fields = {
.ob_base.ob_refcnt = 1,
.ob_base.ob_type = &PyArray_Type,
.flags = NPY_ARRAY_WRITEABLE, /* assume array is not behaved. */
};
to
PyArrayObject_fields arr_fields = {
.flags = NPY_ARRAY_WRITEABLE, /* assume array is not behaved. */
};
Py_SET_TYPE(&arr_fields, &PyArray_Type);
Py_REFCNT(&arr_fields) = 1;
and Py_REFCNT
went away in python3.10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that spot should really be phased out soon, hopefully (there might be a similar one in other "casting" code that also should be used less soon hopefully).
At that point this would only be necessary as a backward compatibility hack. And that means that the performance gain of using a stack allocation stops mattering (I am not sure how big it is, but I assume it is pretty significant here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for getting rid of this code.
A release note would be helpful because of the c99 requirement, probably under |
Including this new header in NumPy, which is widely used in the scientific community, will enable more people to use these functions. Are we sure we want them as part of our public C-API? Some of them seem to enable dangerous behaviour, like |
Py_REFCNT(), Py_TYPE() and Py_SIZE() are defined as macro in CPython which can be abused to set the value:
Not only it's possible, but numpy actually used that until very recently ;-) I added Py_SET_xxx() functions so PyPy (and other Python implementations) can decide to implement it differently than modifying directy the PyObject structure member. For example, Py_SET_REFCNT() could do nothing in PyPy.
|
I didn't add it to the numpy public C API on purpose. Is there a way to "hide" it or even not add it to the public C API? The problem is that I would like to use it from another public C API header file ... In the long term, my plan is to ship pythoncapi_compat.h directly in Linux distributions, rather than copying it to each project. But first I need earlier adopters ;-) https://github.com/pythoncapi/pythoncapi_compat#examples-of-projects-using-pythoncapi_compath I would like to follow the same migration path than the Python six module for the Python 2 to Python 3 migration. |
Well, I created this PR to ask you (numpy devs) the question ;-) I don't know how the numpy C API is used.
pythoncapi_compat.h mostly requires ISO C99 to use In practice, pythoncapi_compat.h works on Python 2.7 as well, even on the old Visual Studio 2008 used by Python 2.7 on Windows (the header contains a specific hack for the inline keyword just for this compiler).
I'm the maintainer of the https://github.com/pythoncapi/pythoncapi_compat project. I just added the PyPy support. For example, to get the PyPy support, you have to update you copy of the header file (it's an example, I udpated my PR to fix the PyPy CI job ;-) ). But unless you want to fix the support for a specific platform, the only reason to update the file is if you need a new function which is added to the header file. In short, you don't have to update it. I plan to propose this header file as a long term solution to evolve the C API in CPython (to introduce incompatible C API changes). It's part of my larger plan https://www.python.org/dev/peps/pep-0620/ to hide implementation details from the Python C API. While I don't think that pythoncapi_compat.h should be shipped by Python, I hope that it can be endorsed somehow by the Python (CPython) project. For example, my plan is to move it in the GitHub PSF organization ( https://github.com/psf/ ) where I already moved the pyperf project (I would like to propose the idea at my Language Summit talk). I also proposed a talk about pythoncapi_compat.h at the next Python Language Summit: https://us.pycon.org/2021/summits/language/ I don't know yet if it will be accepted. Sadly, my other talk about the C API was not selected at Pycon US. |
I proposed a similar PR for the pybind11 project: pybind/pybind11#2932 I added PyPy support to pythoncapi_compat.h to pybind11, to fix the PyPy job of pybind11 ;-) |
@vstinner I think it is good you changed up the |
If you don't modify Py_REFCNT, PY_TYPE, or PY_SIZE, you usually don't need pythoncapi_compat.h ;-) When I started to work on that, I didn't expect that anyone would modify directly ob_refcnt, ob_type and ob_size. But I am always surprised how people abuse APIs. As soon as something is possible, someone somehow will abuse it somehow :-) numpy PyArray_Pack() allocates a Python object on the stack:
I didn't know that it was even technically possible :-) It sounds like a micro-optimization to avoid allocating it on the heap memory (slower). Py_SET_TYPE() is usally needed when defining a type statically to workaround Visual Studio linker issues. The linker fails to find the parent type (symbol address) when the static type refers to it. So Py_SET_TYPE() must be called sometimes at the module initialization. It's no longer needed if you define types as heap types, since the parent type is passed at runtime, when the heap type is defined with a function call. Py_SET_SIZE() is the correct way to set PyVarObject.ob_size member when allocating a new "variable" object (PyVarObject). See numpy PyArray_Scalar(). |
Seems like a universal law of downstream users... I did feel dirty writing that code ;), luckily this is largely a "temporary" workaround and should not be hard to fix (but requires its easier after adding some new API). This/these one won't rob my sleep. The creation of C-extension metaclasses may well do that though :). We probably need these hacks at least for a short while. But we should include it in Its maybe not be the nicest thing... But I would be OK with just delete these macros from |
I'll bring in @tacaswell as he seems to keep up with these things. |
The test failure is unrelated. |
IMO the most mature option for that is Cython. There is also the promising HPy project: https://hpyproject.org/ Both support multiple versions of CPython and PyPy. pythoncapi_compat.h is only a very light "abstraction", designed for C extensions written directly with the Python C API, which is the case for numpy. |
Done, I added a release note, thanks for the guidance. |
I don't want to argue the merits of various C-API changes here and am sorry I opened that can of worms. The question in my mind is do we want to replace 17 lines of code with over 300 lines of other code. |
1 similar comment
This comment has been minimized.
This comment has been minimized.
Regarding C99 support, depending on the exact features used, this might require vs2019 on the windows side - MSVC has never claimed full compatibility AFAIK, but it supports C11 (which inter alia removed some things that were mandatory in C99) since the very recent Visual Studio 16.8, However it looks like numpy has done fine so far on windows since the bump to C99 (at least for the subset being used)? Regarding downstream (particularly scipy), there was some discussion on this in scipy/scipy#13713. Even though the universal C runtime promises that it should be possible to build (e.g.) scipy with vs2019 against a cpython compiled with vs2019, @rgommers had some concerns about this. To further this discussion, I opened conda-forge/scipy-feedstock#165 (& scipy/scipy#13758), where Isuru from conda-forge/core warned that this works for conda, but likely has issues for wheels that require C++ or libs that might be statically linked, reinforcing the link to the VS-version used to compile cpython (which needs upgrading). I'm currently waiting for an answer from Steve Dower in a thread about these things on discuss. FWIW, I support moving forward on this regardless of the above. |
I tried to elaborate in the release note what C99 features are needed by pythoncapi_compat.h:
Visual Studio 2017 supports this very well. I don't recall if VS 2013 supports this or not. I know that VS 2008 doesn't support the inline keyword, but this version is no longer supported by Microsoft. It's still required to build Python 2.7 on Windows, but numpy no longer support Python 2.7. FYI Python requires a subset of C99 since Python 3.6 (released in December 2016), so far we didn't get any issues with C compilers. Python supports many platforms (Windows, macOS, Linux, FreeBSD, AIX, OpenBSD, etc.) and many C compilers (old GCC, old clang, AIX XLC, Intel ICC, etc.). Sadly, C99 is still not fully supported by all C compilers :-( C11 support is worse. I'm disappointed by the fact that C11 means 2011: this standard is already 10 years ago! C99 means 1999: 22 years old!!! If you avoid "exotic" C99 features, you will be fine. Python started to use |
@h-vetinari when you say
Do you mean C99 in general or this PR in particular? If the latter, could you relate to my comment about adding unneeded code? |
numpy implements Py_SET_xxx() functions as macros. pythoncapi_compat.h provides the implementation from CPython which uses static inline functions. It avoids subtle differences depending on the Python version (use the numpy macros on old Python, use static inline functions on recent Python). It's also a matter of maintenance. I offer to maintain pythoncapi_compat.h, so you don't have to write such macro nor maintain them. I propose a similar PR to Cython, but it was rejected (cython/cython#3934) since Cython prefers to use its own code for that. Tomorrow, pythoncapi_compat.h will likely grow to get more getter functions to structures (PyObject, PyTypeObject, PyFrameObject, etc.), since https://www.python.org/dev/peps/pep-0620/ long term plan is to avoid accessing directly any structure member. But this PEP remains a draft, so it's up to you to wait until the PEP is accepted or rejected. You can try to avoid completly Py_SET_TYPE(), Py_SET_SIZE(), Py_SET_REFCNT(), Py_SETREF() functions. But I suggest you to keep at least Py_SETREF(): this one helps to write correct code. |
After a long conversation with @vstinner we agreed that in this particular case, the additional header is not worth the code addition. |
Ok, I'm fine with that. I may propose using pythoncapi_compat.h again later, when the numpy compatibility layer will become too large ;-) |
pythoncapi_compat.h provides new Python functions on old Python
functions, like Py_NewRef(). It supports Python 2.7 but requires ISO
C99 support (with a partial ISO C90 support for Python 2.7).
Remove Py_SET_TYPE(), Py_SET_SIZE(), Py_SET_REFCNT(), Py_SETREF():
they are now provided by pythoncapi_compat.h.