-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT Use Cython memoryviews #10624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Probably the main limitation is that cython memoryviews cannot be used in read-only. |
Right, good to know @glemaitre ! That is unfortunate. Also related #7981 .Looks like not much can be done about it at the moment #7981 (comment) . Keeping this issue open in case this gets resolved upstream in Cython. |
maybe we need to persuade the numpy people that they should spend their
funding on fixing this...
|
I also note that we have used pointers instead of memoryviews for similar
reasons, making it harder to debug out of range errors
|
Well, "[spend a] small part of it" might work better :) Also, as far as I understand from cython/cython#1869 (comment), PR has an initial implementation, and providing some more usage tests for it would help to move it forward. I'm not familiar enough with the way memoryviews are treated in joblib to do that, unfortunately.. |
From scipy roadmap,
so I don't understand why the fact that scipy uses memoryviews is not an issue but migrating internal cython code is? I imagine scipy won't use const memoryviews by default . cc @lesteve @glemaitre Also Cython 0.28.b1 is out, what should be the procedure to test it? Currently travis uses mostly 0.26.1. |
It is an issue as soon as you are using a cython function taking memoryview as arguments in a
Great to hear that a 0.28 beta is out! For the record, the scipy-dev-wheels build is run daily and uses the cython-dev wheel. If you are talking more about how to test the const memoryview cython fix, a first good step would be to pick one of the existing issues (#7981 seems like a good candidate for example), adding I have a plan to add something to test_common.py which monkeypatches the joblib.Parallel max_nbytes default and set it to 0 as an attempt to detect more failures. I haven't got around to it yet. |
I know, I meant that generally speaking, I guess it might become an issue more frequently if scipy proceeds with the migration.
Good to know. 8000
Yes, but my point is that as long as CI uses cython < 0.28, we won't be able to add a PR testing that on different OS and Python versions because tests would fail. So I'm wondering what should/can be done about it.
That would be very useful. |
What about creating a scikit-learn branch and experimenting with using const memoryview (both with cython <= 0.27 and cython > 0.27) ? I have started an experiment on my fork here. From this little experiment this seems feasible (although not completely trivial, mutiple trial and errors ...) to have a common cython codebase for cython <= 0.27 and cython > 0.27. |
I don't think there's any good reason for us to not pin the minimum Cython
version wherever we like..?
|
Sure we can require cython>=0.28 when it is out. Given that cython 0.28b1 has been announced, this was more meant as an experiment to provide feed-back to cython in case we notice any problems. |
FYI, I tried to revert #4775 and use memoryviews in sklearn/linear_model/cd_fast.pyx. This does not seem straightforward (a Travis log of the error with cython master is here) at all at the moment with cython. The main blockers seems to be:
|
Thanks for looking into it @lesteve ! I guess hoping this would work straight away was a bit too optimistic ... |
FYI I have emailed cython-users with the main issues I found: https://groups.google.com/d/msg/cython-users/7B_RNQXefC8/ryDyFUV7AAAJ. Let's see what they say ... |
Thanks for sending that email @lesteve ! After discussing this topic with @massich @jorisvandenbossche and @glemaitre , if we can't make const (ro) memoryviews work with fused types, one way around it could be to use the |
I tried to ask nicely but it looks like 0.28 will ship without the fix, but there is some hope that the bug will be looked at and fixed in the next minor release. Oh well at least I tried ... I guess tempita could work as another work-around. My feeling is that it is a bit painful of adding layers of work-arounds that may have to be undone if the bug gets fixed eventually though. |
Regarding the |
@jakirkham I saw that, is it a pretty impressive (but as you say, somewhat hacky) workaround. To put something like this in scikit-learn it would be better if cython/cython#1772 was just fixed altogether and I would really like to for it to happen. With @lesteve we spend half a day trying to investigate this in Cython, with limited success, I'll post status updates on that issue. |
Thanks for sharing the update. That's fair. Though I think we will need a Cython expert to direct us and/or possibly help us do some of the work. The advantage of the workaround is we are able to get something running today with a minimal amount of "hackiness" and effort. Plus we can mirror the behavior Cython will have for end users. |
Hi all, Does this all mean that even new contributions shouldn't use memory views? I am working on #12807 and I am extensively using memory views. Just want to make sure I won't have to change back to pointers later on. Thanks! |
8000
You can't use memviews if both following conditions are met:
otherwise it's fine. For the first it means you can't use for data which can be provided by the user (typically X and y in |
In @scoder blog post about Cython 0.29:
|
A PR has been recently been merged in cython to allow |
|
Until Cython 3 arrives, a possible temporary fix for |
FYI, the next bug-fix release of Cython (ETA January 2023) will provide the fix for cython/cython@78bb640#diff-ff3c479edefad986d2fe6fe7ead575a46b086e3bbcf0ccc86d85efc4a4c63c79R20-R21 |
Looks like that landed in Cython 0.29.33, which was recently tagged 🚀 |
That's great news. I opened #25322 to discuss increasing the minimum Cython version. |
It is now possible to use I have a draft of an issue to give indications, yet I think opening it might create a clash with this one? Edit: removed the draft and have opened #25484 |
FYI, scipy/scipy#14262 might be a blocker for const memoryviews when we call scipy C API for BLAS functions. |
We use wrappers around SciPy interfaces to BLAS'; those are defined in |
Are the wrappers already ignoring |
They might. Some compilers are stricter than others and raise a compilation errors (e.g. when using C++ on some compilation units) whilst others are just raising warning when discarding There's one place where we explicitly discard scikit-learn/sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx.tp Lines 389 to 393 in 3f82f84
I think that we can work around this limitation even if scipy/scipy#14262 remains unsolved. |
FYI, I have opened #25484 to list actionable improvements in this regards. |
I am going to close this one since I guess #25484 has been merged a while ago. Feel free to reopen if I missed something. |
Currently a fair amount of Cython code uses the numpy buffer interface.
When possible, it could be worth migrating some of it to the typed memoryviews Cython interface that has lower overhead and could result in some performance improvements [1], [2]. This would also allow to release GIL more frequently which would be positive for parallel computations with the threading backend.
I'm not saying that it makes sense to use memoryviews everywhere, and there are subtle differences between the two (possibly related #7059 (comment)). Just opening this issue to track progress.
For instance,
sklearn/utils/sparsefuncs_fast.pyx
could be a place to start.The text was updated successfully, but these errors were encountered: