8000 MAINT: Use pythoncapi_compat.h in npy_3kcompat.h by vstinner · Pull Request #18713 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

MAINT: Use pythoncapi_compat.h in npy_3kcompat.h #18713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

MAINT: Use pythoncapi_compat.h in npy_3kcompat.h #18713

wants to merge 4 commits into from

Conversation

vstinner
Copy link
Contributor
@vstinner vstinner commented Apr 1, 2021

pythoncapi_compat.h provides new Python functions on old Python
functions, like Py_NewRef(). It supports Python 2.7 but requires ISO
C99 support (with a partial ISO C90 support for Python 2.7).

Remove Py_SET_TYPE(), Py_SET_SIZE(), Py_SET_REFCNT(), Py_SETREF():
they are now provided by pythoncapi_compat.h.

pythoncapi_compat.h provides new Python functions on old Python
functions, like Py_NewRef(). It supports Python 2.7 but requires ISO
C99 support (with a partial ISO C90 support for Python 2.7).

Remove Py_SET_TYPE(), Py_SET_SIZE(), Py_SET_REFCNT(), Py_SETREF():
they are now provided by pythoncapi_compat.h.
@vstinner
Copy link
Contributor Author
vstinner commented Apr 1, 2021

pythoncapi_compat.h is copied from the https://github.com/pythoncapi/pythoncapi_compat project.

This change replaces the numpy custom implementation of Py_SET_TYPE(), Py_SET_SIZE(), Py_SET_REFCNT() and Py_SETREF() functions with the pythoncapi_compat.h implementation which comes from CPython (sometimes with minor changes to adapt it to older Python versions).

The main drawback is that pythoncapi_compat.h requires ISO C99 support, whereas numpy has compatibility with GCC -ansi and ISO C90. I'm not sure if requiring ISO C99 support is acceptable right now in numpy? numpy still has compatibility layer with Python 2.7 for example (pythoncapi_compat.h supports Python 2.7).

cc @mattip

@vstinner
Copy link
Contributor Author
vstinner commented Apr 1, 2021

cc @charris (who added Py_SET_TYPE and Py_SET_SIZE), @tacaswell and @eric-wieser (fixed return value of these macros).

@charris
Copy link
Member
charris commented Apr 1, 2021

@vstinner NumPy has required c99 since 1.17.0, but I don't know about downstream projects that may use the npy_3compat file. Looks like SciPy only dropped it in 1.5.0. Is there anything except the comment style that needs it? I'm also curious if the pythoncapi_compat.h file is going to be maintained.

#endif

#include <Python.h>
#include "frameobject.h" // PyFrameObject, PyFrame_GetBack()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "..." for frameobject.h? It seems to be included in the python include directory in my distro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frameobject.h is not included by Python.h :-( I'm not sure why.

I partially fixed this issue in Python 3.10: Python.h now includes a new pyframe.h header file that I added, it defines PyFrameObject, PyFrame_GetLineNumber() and PyFrame_GetCode(), but not PyFrame_GetBack() (excluded from the limited C API).

ob->ob_refcnt = refcnt;
}
#define Py_SET_REFCNT(ob, refcnt) _Py_SET_REFCNT(_PyObject_CAST(ob), refcnt)
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a thing? It will horribly break non-CPython implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip I see that it is already used in numpy/core/src/multiarray/array_coercion.c, is that in error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either:

  • all the fields are guaranteed to be zeroed out in that allocation, in which case we could use Py_INCREF instead, or
  • we could be returning random data in other fields of the struct.

So I think use of that "convenience function" could enable masking possible bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a thing? It will horribly break non-CPython implementations.

The _PyObject_CAST() is required to prevent introducing new compiler warnings when Py_REFCNT(obj) = refcnt; is replaced with Py_SET_REFCNT(obj, refcnt);. Many functions of the Python C API were implemented as macro which casted their argument to PyObject*, but static inline functions define a type for their parameters (usually PyObject*).

In Python 3.10, Py_REFCNT(obj) = refcnt; now fails with a compiler error: Py_SET_REFCNT() must be used. See:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize it was added to npy_3kcompat.h in #16899.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... which was needed for python3.10.

The root cause was 1035c3f in #16200 which changed

PyArrayObject_fields arr_fields = {
            .ob_base.ob_refcnt = 1,
            .ob_base.ob_type = &PyArray_Type,
             .flags = NPY_ARRAY_WRITEABLE,  /* assume array is not behaved. */
         };

to

PyArrayObject_fields arr_fields = {
             .flags = NPY_ARRAY_WRITEABLE,  /* assume array is not behaved. */
         };
    Py_SET_TYPE(&arr_fields, &PyArray_Type);
    Py_REFCNT(&arr_fields) = 1;

and Py_REFCNT went away in python3.10

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that spot should really be phased out soon, hopefully (there might be a similar one in other "casting" code that also should be used less soon hopefully).
At that point this would only be necessary as a backward compatibility hack. And that means that the performance gain of using a stack allocation stops mattering (I am not sure how big it is, but I assume it is pretty significant here).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for getting rid of this code.

@charris
Copy link
Member
charris commented Apr 1, 2021

A release note would be helpful because of the c99 requirement, probably under compatibility, see doc/release/upcoming_changes for examples.

@mattip
Copy link
Member
mattip commented Apr 1, 2021

Including this new header in NumPy, which is widely used in the scientific community, will enable more people to use these functions. Are we sure we want them as part of our public C-API? Some of them seem to enable dangerous behaviour, like Py_SET_REFCNT and Py_SET_SIZE. If anything, I would prefer to shrink npy_3kcompat.h (since we have dropped versions of python before 3.7) instead.

@vstinner
Copy link
Contributor Author
vstinner commented Apr 1, 2021

Some of them seem to enable dangerous behaviour, like Py_SET_REFCNT and Py_SET_SIZE.

Py_REFCNT(), Py_TYPE() and Py_SIZE() are defined as macro in CPython which can be abused to set the value:

Py_REFCNT(obj) = refcnt;
PY_TYPE(obj) = type;
PY_SIZE(obj) = size;

Not only it's possible, but numpy actually used that until very recently ;-)

  • a96b18e uses Py_SET_SIZE() and Py_SET_TYPE()
  • be14f37 uses Py_SET_REFCNT()

I added Py_SET_xxx() functions so PyPy (and other Python implementations) can decide to implement it differently than modifying directy the PyObject structure member. For example, Py_SET_REFCNT() could do nothing in PyPy.

Py_REFCNT(obj) = refcnt; is a compiler error since Python 3.10. I also made a similar change for Py_TYPE and Py_SIZE, but I had to revert it since it broke too many C extensions (including numpy, but numpy is now fixed!).

@vstinner
Copy link
Contributor Author
vstinner commented Apr 1, 2021

Are we sure we want them as part of our public C-API?

I didn't add it to the numpy public C API on purpose. Is there a way to "hide" it or even not add it to the public C API? The problem is that I would like to use it from another public C API header file ...

In the long term, my plan is to ship pythoncapi_compat.h directly in Linux distributions, rather than copying it to each project. But first I need earlier adopters ;-) https://github.com/pythoncapi/pythoncapi_compat#examples-of-projects-using-pythoncapi_compath

I would like to follow the same migration path than the Python six module for the Python 2 to Python 3 migration.

@vstinner
Copy link
Contributor Author
vstinner commented Apr 1, 2021

NumPy has required c99 since 1.17.0, but I don't know about downstream projects that may use the npy_3compat file. Looks like SciPy only dropped it in 1.5.0.

Well, I created this PR to ask you (numpy devs) the question ;-) I don't know how the numpy C API is used.

Is there anything except the comment style that needs it?

pythoncapi_compat.h mostly requires ISO C99 to use static inline to define static inline functions. CPython is now doing the same. More and more old macros are converted to clean static inline functions. For example, Py_INCREF() is now defined as a static inline function. It's convenient since profilers and debuggers are able to retrieve static inline function names!

In practice, pythoncapi_compat.h works on Python 2.7 as well, even on the old Visual Studio 2008 used by Python 2.7 on Windows (the header contains a specific hack for the inline keyword just for this compiler).

I'm also curious if the pythoncapi_compat.h file is going to be maintained.

I'm the maintainer of the https://github.com/pythoncapi/pythoncapi_compat project. I just added the PyPy support. For example, to get the PyPy support, you have to update you copy of the header file (it's an example, I udpated my PR to fix the PyPy CI job ;-) ). But unless you want to fix the support for a specific platform, the only reason to update the file is if you need a new function which is added to the header file. In short, you don't have to update it.

I plan to propose this header file as a long term solution to evolve the C API in CPython (to introduce incompatible C API changes). It's part of my larger plan https://www.python.org/dev/peps/pep-0620/ to hide implementation details from the Python C API. While I don't think that pythoncapi_compat.h should be shipped by Python, I hope that it can be endorsed somehow by the Python (CPython) project. For example, my plan is to move it in the GitHub PSF organization ( https://github.com/psf/ ) where I already moved the pyperf project (I would like to propose the idea at my Language Summit talk).

I also proposed a talk about pythoncapi_compat.h at the next Python Language Summit: https://us.pycon.org/2021/summits/language/ I don't know yet if it will be accepted. Sadly, my other talk about the C API was not selected at Pycon US.

@vstinner
Copy link
Contributor Author
vstinner commented Apr 1, 2021

I proposed a similar PR for the pybind11 project: pybind/pybind11#2932 I added PyPy support to pythoncapi_compat.h to pybind11, to fix the PyPy job of pybind11 ;-)

@mattip
Copy link
Member
mattip commented Apr 1, 2021

@vstinner I think it is good you changed up the Py_REFCNT, PY_TYPE, and PY_SIZE macros. I cannot speak for other projects. I would think that in general including this header file is a last-resort admission from a project that it will forever be using problematic C-API constructs, and it would be better if we could avoid it in NumPy.

@vstinner
Copy link
Contributor Author
vstinner commented Apr 1, 2021

@vstinner I think it is good you changed up the Py_REFCNT, PY_TYPE, and PY_SIZE macros. I cannot speak for other projects. I would think that in general including this header file is a last-resort admission from a project that it will forever be using problematic C-API constructs, and it would be better if we could avoid it in NumPy.

If you don't modify Py_REFCNT, PY_TYPE, or PY_SIZE, you usually don't need pythoncapi_compat.h ;-)

When I started to work on that, I didn't expect that anyone would modify directly ob_refcnt, ob_type and ob_size. But I am always surprised how people abuse APIs. As soon as something is possible, someone somehow will abuse it somehow :-)

numpy PyArray_Pack() allocates a Python object on the stack:

    PyArrayObject_fields arr_fields = {
            .flags = NPY_ARRAY_WRITEABLE,  /* assume array is not behaved. */
        };
    Py_SET_TYPE(&arr_fields, &PyArray_Type);
    Py_SET_REFCNT(&arr_fields, 1);

I didn't know that it was even technically possible :-) It sounds like a micro-optimization to avoid allocating it on the heap memory (slower).

Py_SET_TYPE() is usally needed when defining a type statically to workaround Visual Studio linker issues. The linker fails to find the parent type (symbol address) when the static type refers to it. So Py_SET_TYPE() must be called sometimes at the module initialization. It's no longer needed if you define types as heap types, since the parent type is passed at runtime, when the heap type is defined with a function call.

Py_SET_SIZE() is the correct way to set PyVarObject.ob_size member when allocating a new "variable" object (PyVarObject). See numpy PyArray_Scalar().

@seberg
Copy link
Member
seberg commented Apr 1, 2021

But I am always surprised how people abuse APIs. As soon as something is possible, someone somehow will abuse it somehow :-)

Seems like a universal law of downstream users... I did feel dirty writing that code ;), luckily this is largely a "temporary" workaround and should not be hard to fix (but requires its easier after adding some new API). This/these one won't rob my sleep. The creation of C-extension metaclasses may well do that though :).


We probably need these hacks at least for a short while. But we should include it in numpy/core/src/npy_pycompat.h and not here. (This ends up as public API and I don't think anyone wants that.)

Its maybe not be the nicest thing... But I would be OK with just delete these macros from npy_py3kcompat.h entirely with a release note if that is the practical thing to do. All serious projects have to compile against old NumPy versions anyway, so nobody can reasonably be using them at this time. And even if someone managed to use it, the fix is should be both very quick and easy to find.

@charris
Copy link
Member
charris commented Apr 2, 2021

I'll bring in @tacaswell as he seems to keep up with these things.

@charris
Copy link
Member
charris commented Apr 2, 2021

The test failure is unrelated.

@vstinner
Copy link
Contributor Author
vstinner commented Apr 2, 2021

The creation of C-extension metaclasses may well do that though :)

IMO the most mature option for that is Cython. There is also the promising HPy project: https://hpyproject.org/ Both support multiple versions of CPython and PyPy. pythoncapi_compat.h is only a very light "abstraction", designed for C extensions written directly with the Python C API, which is the case for numpy.

@vstinner
Copy link
Contributor Author
vstinner commented Apr 2, 2021

@charris:

A release note would be helpful because of the c99 requirement, probably under compatibility, see doc/release/upcoming_changes for examples.

Done, I added a release note, thanks for the guidance.

@mattip
Copy link
Member
mattip commented Apr 2, 2021

I don't want to argue the merits of various C-API changes here and am sorry I opened that can of worms. The question in my mind is do we want to replace 17 lines of code with over 300 lines of other code.

1 similar comment
@mattip

This comment has been minimized.

@h-vetinari
Copy link
Contributor
h-vetinari commented Apr 2, 2021

@charris: NumPy has required c99 since 1.17.0, but I don't know about downstream projects that may use the npy_3compat file. Looks like SciPy only dropped it in 1.5.0. Is there anything except the comment style that needs it? I'm also curious if the pythoncapi_compat.h file is going to be maintained.

Regarding C99 support, depending on the exact features used, this might require vs2019 on the windows side - MSVC has never claimed full compatibility AFAIK, but it supports C11 (which inter alia removed some things that were mandatory in C99) since the very recent Visual Studio 16.8, However it looks like numpy has done fine so far on windows since the bump to C99 (at least for the subset being used)?

Regarding downstream (particularly scipy), there was some discussion on this in scipy/scipy#13713. Even though the universal C runtime promises that it should be possible to build (e.g.) scipy with vs2019 against a cpython compiled with vs2019, @rgommers had some concerns about this.

To further this discussion, I opened conda-forge/scipy-feedstock#165 (& scipy/scipy#13758), where Isuru from conda-forge/core warned that this works for conda, but likely has issues for wheels that require C++ or libs that might be statically linked, reinforcing the link to the VS-version used to compile cpython (which needs upgrading). I'm currently waiting for an answer from Steve Dower in a thread about these things on discuss.

FWIW, I support moving forward on this regardless of the above.

@vstinner
Copy link
Contributor Author
vstinner commented Apr 2, 2021

Regarding C99 support, depending on the exact features used, this might require vs2019 on the windows side - MSVC has never claimed full compatibility AFAIK, but it supports C11 (...)

I tried to elaborate in the release note what C99 features are needed by pythoncapi_compat.h:

The numpy C API now requires a subset of the ISO C99, especially static inline functions and C++-style line comments.

Visual Studio 2017 supports this very well. I don't recall if VS 2013 supports this or not.

I know that VS 2008 doesn't support the inline keyword, but this version is no longer supported by Microsoft. It's still required to build Python 2.7 on Windows, but numpy no longer support Python 2.7.

FYI Python requires a subset of C99 since Python 3.6 (released in December 2016), so far we didn't get any issues with C compilers. Python supports many platforms (Windows, macOS, Linux, FreeBSD, AIX, OpenBSD, etc.) and many C compilers (old GCC, old clang, AIX XLC, Intel ICC, etc.).

Sadly, C99 is still not fully supported by all C compilers :-( C11 support is worse. I'm disappointed by the fact that C11 means 2011: this standard is already 10 years ago! C99 means 1999: 22 years old!!! If you avoid "exotic" C99 features, you will be fine. Python started to use <stdint.h> for example ;-) I like the ability to finally just write int64_t.

@mattip
Copy link
Member
mattip commented Apr 2, 2021

@h-vetinari when you say

FWIW, I support moving forward on this regardless of the above.

Do you mean C99 in general or this PR in particular? If the latter, could you relate to my comment about adding unneeded code?

@vstinner
Copy link
Contributor Author
vstinner commented Apr 2, 2021

The question in my mind is do we want to replace 17 lines of code with over 300 lines of other code.

numpy implements Py_SET_xxx() functions as macros. pythoncapi_compat.h provides the implementation from CPython which uses static inline functions. It avoids subtle differences depending on the Python version (use the numpy macros on old Python, use static inline functions on recent Python).

It's also a matter of maintenance. I offer to maintain pythoncapi_compat.h, so you don't have to write such macro nor maintain them.

I propose a similar PR to Cython, but it was rejected (cython/cython#3934) since Cython prefers to use its own code for that.

Tomorrow, pythoncapi_compat.h will likely grow to get more getter functions to structures (PyObject, PyTypeObject, PyFrameObject, etc.), since https://www.python.org/dev/peps/pep-0620/ long term plan is to avoid accessing directly any structure member. But this PEP remains a draft, so it's up to you to wait until the PEP is accepted or rejected.


You can try to avoid completly Py_SET_TYPE(), Py_SET_SIZE(), Py_SET_REFCNT(), Py_SETREF() functions. But I suggest you to keep at least Py_SETREF(): this one helps to write correct code.

@mattip
Copy link
Member
mattip commented Apr 2, 2021

After a long conversation with @vstinner we agreed that in this particular case, the additional header is not worth the code addition.

@mattip mattip closed this Apr 2, 2021
@vstinner vstinner deleted the pythoncapi_compat branch April 2, 2021 15:58
@vstinner
Copy link
Contributor Author
vstinner commented Apr 2, 2021

After a long conversation with @vstinner we agreed that in this particular case, the additional header is not worth the code addition.

Ok, I'm fine with that. I may propose using pythoncapi_compat.h again later, when the numpy compatibility layer will become too large ;-)

@seberg
Copy link
Member
9BA8 seberg commented Apr 2, 2021

Thanks @vstinner and @mattip. I hope we can manage to keep up with the times and remove the warts that make the compatibility layer necessary rather than grow it :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0