-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] MNT: Use memoryviews instead of ndarrays #11722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] MNT: Use memoryviews instead of ndarrays #11722
Conversation
Wherever possible make sure to use pointers instead of NumPy `ndarray`s to access the underlying C data in the NumPy `ndarray`s in `enet_coordinate_descent`. This tends to be a bit faster in practice because some overhead of using the NumPy `ndarray` objects can be avoided (e.g. pointer arithmetic, type conversion, etc.). ref: https://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/
why pointers over typed memoryviews
|
@jakirkham did you bench and observed an actual gain ? can you share numbers? |
+1 for typed memoryviews; this won't be affected by that issue of read-only fused memoryviews #10624 (comment) I think? |
So this function was mainly allocating pointers at the top, but they were not being used throughout, which is what this change did. Using memoryviews should also be equivalent performance-wise (as long as we
8000
are not creating views) since these are just pointers with some pointer arithmetic. That would be a different change, but I'd be ok with that. As long as we are not using |
@lesteve, any thoughts on this vs. switching to memoryviews? |
the difference between pointers and menoryviews that matters is that cython
can help us debug out of bounds errors and similar with memoryviews, and
cannot with pointers. we had avoided memoryviews mostly because they did
not until recently support read-only memmaps. To the extent that that is
fixed I think anyone using pointers for arrays needs to argue their case.
|
As it is preferred in scikit-learn to use memoryviews due to their nicer checks and simpler semantics, switch to memoryviews instead of pointers. If memoryviews are sliced only to get an exact value, these remain as performant as pointers and are a bit easier to work with. They are also considerably faster to work with than NumPy ndarray objects. So this remains an improvement.
3b5acf9
to
99d94d9
Compare
To reiterate, am more than happy to use |
The only point that I'm interested in making is that As to benchmarks, Jake's blogpost referenced in the OP has some. Admittedly they are from ~6yrs ago, but the findings are pretty similar to what I see today. Here are my numbers.
Basically everything has gotten a bit faster in the intervening time, but the spread and order of magnitude is about the same. Namely memoryviews that generate no intermediate views or pointers will perform best. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally this LGTM, if only for readability and getting rid of pointer arithmetics, but I would also have expected this to speed things up at least a bit. With quick benchmarks, it doesn't seem to be the case,
import numpy as np
from sklearn.linear_model import ElasticNet
rng = np.random.RandomState(42)
n_samples, n_features = 100000, 100
X = rng.rand(n_samples, n_features)
y = rng.rand(n_samples)
estimator = ElasticNet()
On master,
In [4]: %timeit estimator.fit(X, y)
255 ms ± 4.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
with this PR,
In [6]: %timeit estimator.fit(X, y)
263 ms ± 3.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
is this also the case in your use case @jakirkham
or am I missing something ?
Why expect much change in speed between pointers and memoryviews? |
Pointers and memoryview would be equivalent, but the title is "Use memoryviews instead of ndarrays" and that's what the issue description outlines. There are indeed a few changes of ndarrays with memoryviews but it doesn't seem to affect performance here, unless my benchmarks are off. |
Using TL;DR a sizeable amount of time in this benchmark is spent in data checks and copying. So it doesn't really do the change justice. FWIW there has been some work to cutdown on unneeded checks and avoid unnecessary copying generally like PR ( #11487 ), PR ( #11693 ), and PR ( #11690 ). While these have improved the situation, it would still be desirable to cutdown on these further. Suggestions on that front would be very welcome. 😄 |
Please let me know if there is anything else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What confuses me here is that we've not changed anything from being an ndarray, have we?
Any data coming in or space allocated is done with ndarrays. We are taking memoryviews on to them, which we use for access and assignment. That said, I agree we can probably just type everything as memoryviews outright cutting down on some boilerplate. Have pushed this change. |
18a159e
to
5c1ec82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @jakirkham !
Performance wise with the above benchmark this doesn't seem to have much impact (including the latest version) but it goes in to the general direction of #10624
We were taking `ndarray`s as input and constructing `ndarray`s within `enet_coordinate_descent` only to convert them into `memoryview`s. Instead we can cutdown on the boilerplate and overhead a bit by simply taking `memoryview`s as input and storing all NumPy arrays as `memoryview`s.
Since memoryviews don't have methods like `__pow__`, switch to using NumPy functions like `square` to perform these operations on the data. Thus handling the type coercion needed.
For `memoryview`s defined in `enet_coordinate_descent`, go ahead and specify the striding. As these are all freshly created 1-D arrays, we know they are contiguous. So can just specify this as part of their `memoryview` definitions.
5c1ec82
to
2300ec0
Compare
Thanks all for the great feedback. |
Wherever possible make sure to use
memoryviews
instead of NumPyndarray
s to access the underlying C data in the NumPyndarray
s inenet_coordinate_descent
. This tends to be a bit faster in practice because some overhead of using the NumPyndarray
objects can be avoided (e.g. pointer arithmetic, type conversion, etc.).ref: https://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/