8000 ENH: supply our version of numpy.pxd, requires cython>=0.29 by mattip · Pull Request #12284 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: supply our version of numpy.pxd, requires cython>=0.29 #12284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 4, 2019

Conversation

mattip
Copy link
Member
@mattip mattip commented Oct 29, 2018

Fixes #11803

Adds our own numpy.pxd, copied from the cython __init__.pxd. It will only be used if the numpy include path from numpy.get_includes() is added to the cython compile flags, otherwise the cython one will still be used.

Modifications were needed to mtrand.pyx since ndarray.shape is a C npy_intp in the newer numpy.pxd, not a python-level sequence attribute.

@mattip
Copy link
Member Author
mattip commented Oct 29, 2018

Also the numpy.pxd uses the Cython check_size ignore syntax to avoid the "size changed" warnings, which requires cython 0.29 and above. Once downstream users update, we can remove the warning filters from numpy.__init__.py. xref #11788

NPY_SEARCHRIGHT

enum:
# DEPRECATED since NumPy 1.7 ! Do not use in new code!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we just added this, is there any reason to keep the deprecated headers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we have npy_1_7_deprecated_api.h for people who are still using the old API (scipy, pandas, ...), we should reflect old values in Cython.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to truly deprecate the old API, we need either cython/cython#2640 or to revert the attribute hiding PyArray_FUNCTIONS since cython does not convert python attribute lookup array.ndim into a getter function PyArray_NDIMS(array)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need array.ndim to be converted into PyArray_NDIMS(array)? It would be good to have written down somewhere what problem is trying to be solved here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cython class overview gives the motivation: write python-like code, run it through the cython compiler and get efficient C code out. Extension type spec dives into the details of how to map python object attributes to C struct fields.

We try to hide the C struct fields in PyArrayObject_fields behind C getter functions to the opaque PyArrayObject struct, so the techniques described in the links above are not suffi 8000 cient.

@mattip
Copy link
Member Author
mattip commented Oct 30, 2018

Builds are failing since LGTM and the CHROOT travis build are not using Cython 0.29

@mattip
Copy link
Member Author
mattip commented Oct 30, 2018

xref cython/cython#2498

@mattip mattip force-pushed the numpy.pxd branch 2 times, most recently from 88b8d15 to 4068465 Compare October 31, 2018 08:26
@mattip
Copy link
Member Author
mattip commented Oct 31, 2018

after cherry picking the changes from #12299, travis and lgtm are now passing

@@ -35,6 +33,17 @@ cdef extern from "math.h":
double sin(double x)
double cos(double x)

# Put initarray first since it includes the NPY_NO_DEPRECATED_API definition
cdef extern from "Python.h":
ctypedef int Py_intptr_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the fact that this is missing from the Python.pxi an upstream bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which Python.pxi? The cdef extern from "Python.h" tells Cython to add #include Python.h at this point in the output C file, and promises that that header file (or one of the headers included by it) will include a typedef int Py_intptr_t statement.

Copy link
Member
@eric-wieser eric-wieser Nov 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one - my assumption was that this header already includes all of the cdef externs that match to Python.h. Is that not true?

Edit: Seems that's our own file

else:
return ()

cdef inline char* _util_dtypestring(dtype descr, char* f, char* end, int* offset) except NULL:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function worries me - can we not reuse our PEP3118 format string builder, rather than having a second implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. _buffer_format_string is buried pretty deep inside the code. I wonder what the performance hit would be if we went via PyObject_GetBuffer()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched this all for PyObject_GetBuffer() which calls array_getbuffer, and a NOOP __release_buffer__, with a note explaining why it is a NOOP currently.

At some point if we drop our _buffer_info_cache we should revisit this code

@mattip mattip changed the title ENH: supply our version of numpy.pxd, requires cython>=0.29 WIP: ENH: supply our version of numpy.pxd, requires cython>=0.29 Nov 18, 2018
@mattip mattip force-pushed the numpy.pxd branch 3 times, most recently from ad1629f to 29ae6b0 Compare January 1, 2019 08:26
@mattip mattip mentioned this pull request Mar 21, 2019
@mattip mattip removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Mar 21, 2019
@jdemeyer
Copy link
Contributor

I don't understand why you hide numpy.pxd in the include subdirectory.

Typically, the layout of .pxd files follows that of the Python project. Think of a cimport as analogous to an import. So, if you expect users to do cimport numpy, then it makes more sense to move numpy.pxd to a top-level __init__.pxd file (more or less the way it is in Cython).

@mattip
Copy link
Member Author
mattip commented Mar 24, 2019

@jdemeyer whatever works correctly. I am trying to get this to work with pandas, so that python setup.py build_ext --inplace --force will find the numpy pxd before the one in Cython/Include/numpy/__init__.pxd. It seems no matter what I do the cython one is found first (I added a line like x = xxxxxx in the file to make compilation fail if the wrong one is chosen). What I tried:

  • using the include_path = pkg_resources.resource_filename('numpy', 'core/include')
  • putting it in site-packages/numpy/__init__.pxd (your suggestion)

Neither of those helped. But just now I tried using include_path = pkg_resources.resource_filename('numpy', 'core/include/numpy') which seems to work

@mattip mattip removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Aug 19, 2019
@mattip mattip requested a review from seberg August 19, 2019 11:35
@rgommers rgommers added this to the 1.18.0 release milestone Aug 28, 2019
Copy link
Member
@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems right to me to move it into numpy (and I think there is general consensus here). Since it moved around, just to be sure: numpy.get_includes() does not need to be modified for dependencies to pick up our numpy.pxd file?

Unfortunately, there are some merge conflicts (I can fix them up maybe though).

@mattip
Copy link
Member Author
mattip commented Sep 3, 2019


Add our own ``*.pxd`` cython import file
--------------------------------------------
Added a ``numpy/__init__.pxd`` file. It will be used for `cpython import numpy`
Copy link
Member
@seberg seberg Sep 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Added a ``numpy/__init__.pxd`` file. It will be used for `cpython import numpy`
Added a ``numpy/__init__.pxd`` file. It will be used for ``cimport numpy as cnp``

is what you mean I think? And I am not sure we still need the search path link (at least not as a link).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks

@seberg
Copy link
Member
seberg commented Sep 3, 2019

Is there a difference between this pyd and the one in cython that can explain the windows crash? Or is that a random crash due to OpenBLAS or so?

@mattip
Copy link
Member Author
mattip commented Sep 4, 2019

seems like a random crash

@seberg
Copy link
Member
seberg commented Sep 4, 2019

Thanks, lets put this in then.

@seberg seberg merged commit 3883be3 into numpy:master Sep 4, 2019
@jakirkham
Copy link
Contributor

Thanks all for the work on this! 😀

@scoder
Copy link
Contributor
scoder commented May 5, 2020

@mattip I checked in my local numpy 1.18.3 installation and it lacks the numpy/__init__.pxd file that this PR added. My guess is that it still needs to be added to the distutils package_data in order to be included not just in the sdist but also in binary distributions.

@scoder
Copy link
Contributor
scoder commented May 5, 2020

Although … the .pxd files in the numpy/random/ directory are included. So, why it numpy/__init__.pxd not in my wheel?

@jakirkham
Copy link
Contributor

Not sure. Looks like they should all be there.

@scoder
Copy link
Contributor
scoder commented May 5, 2020

they should all be there

Yes, that's the MANIFEST file that's used for the sdist. It doesn't have an effect on the binary distributions. However, I couldn't find any package data configuration or so in NumPy's setup.py. So how do the files in numpy.random get included?

@scoder
Copy link
Contributor
scoder commented May 5, 2020

Ah, found it, I think.

@mattip
Copy link
Member Author
mattip commented May 5, 2020

@scoder gh-16162, thanks for the heads up.

Comment on lines +259 to +270
# Note: This syntax (function definition in pxd files) is an
# experimental exception made for __getbuffer__ and __releasebuffer__
# -- the details of this may change.
def __getbuffer__(ndarray self, Py_buffer* info, int flags):
PyObject_GetBuffer(<object>self, info, flags);

def __releasebuffer__(ndarray self, Py_buffer* info):
# We should call a possible tp_bufferrelease(self, info) but no
# interface to that is exposed by cython or python. And currently
# the function is NULL in numpy, we rely on refcounting to release
# info when self is collected
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumPy can just leave these out entirely. They were added in the time when Python 2.6/3.0 were still new and the average NumPy installation on user side didn't support the PEP 3118 buffer protocol yet. The times have changed a bit since then. Even Cython can probably drop these by now. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cython first. I think there are tests to verify that the buffer protocol works for ndarray in cython.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer protocol does work for ndarrays, because NumPy supports it natively. The implementation through PyObject_GetBuffer() shows this clearly (Cython still uses a real one). In fact, since __releasebuffer__ is empty, releasing buffers will not even work here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed both methods in cython/cython@b794997

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing here too

scoder added a commit to cython/cython that referenced this pull request May 6, 2020
…umPy 'ndarray' since there are probably no NumPy installations out there anymore that do not support the buffer protocol themselves and are still worth supporting.

See numpy/numpy#12284 (comment)
See GH-3573
object PyArray_Choose (ndarray, object, ndarray, NPY_CLIPMODE)
int PyArray_Sort (ndarray, int, NPY_SORTKIND)
object PyArray_ArgSort (ndarray, int, NPY_SORTKIND)
object PyArray_SearchSorted (ndarray, object, NPY_SEARCHSIDE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looks like you are right. Since 2012 it has been

PyArray_SearchSorted(ndarray, object,  NPY_SEARCHSIDE, object)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it safe to add an argument when its NULL?! Or was this just always wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Providing Public Cython API within NumPy
10 participants
0