8000 [MRG] MNT: Use memoryviews instead of ndarrays by jakirkham · Pull Request #11722 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] MNT: Use memoryviews instead of ndarrays #11722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jakirkham
Copy link
Contributor
@jakirkham jakirkham commented Jul 31, 2018

Wherever possible make sure to use memoryviews instead of NumPy ndarrays to access the underlying C data in the NumPy ndarrays in enet_coordinate_descent. This tends to be a bit faster in practice because some overhead of using the NumPy ndarray objects can be avoided (e.g. pointer arithmetic, type conversion, etc.).

ref: https://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/

Wherever possible make sure to use pointers instead of NumPy `ndarray`s
to access the underlying C data in the NumPy `ndarray`s in
`enet_coordinate_descent`. This tends to be a bit faster in practice
because some overhead of using the NumPy `ndarray` objects can be
avoided (e.g. pointer arithmetic, type conversion, etc.).

ref: https://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/
@jakirkham jakirkham changed the title MNT: Use pointers instead of ndarrays [MRG] MNT: Use pointers instead of ndarrays Jul 31, 2018
@jnothman
Copy link
Member
jnothman commented Jul 31, 2018 via email

@agramfort
Copy link
Member

@jakirkham did you bench and observed an actual gain ? can you share numbers?

@rth
Copy link
Member
rth commented Aug 1, 2018

why pointers over typed memoryviews

+1 for typed memoryviews; this won't be affected by that issue of read-only fused memoryviews #10624 (comment) I think?

@jakirkham
Copy link
Contributor Author

So this function was mainly allocating pointers at the top, but they were not being used throughout, which is what this change did. Using memoryviews should also be equivalent performance-wise (as long as we 8000 are not creating views) since these are just pointers with some pointer arithmetic. That would be a different change, but I'd be ok with that. As long as we are not using ndarrays (beyond an initial conversion to memoryviews) internally we should be ok.

@jakirkham
Copy link
Contributor Author

@lesteve, any thoughts on this vs. switching to memoryviews?

@jnothman
Copy link
Member
jnothman commented Aug 5, 2018 via email

@jakirkham jakirkham changed the title [MRG] MNT: Use pointers instead of ndarrays [MRG] MNT: Use menoryviews instead of ndarrays Aug 7, 2018
@jakirkham jakirkham changed the title [MRG] MNT: Use menoryviews instead of ndarrays [MRG] MNT: Use memoryviews instead of ndarrays Aug 7, 2018
As it is preferred in scikit-learn to use memoryviews due to their nicer
checks and simpler semantics, switch to memoryviews instead of pointers.
If memoryviews are sliced only to get an exact value, these remain as
performant as pointers and are a bit easier to work with. They are also
considerably faster to work with than NumPy ndarray objects. So this
remains an improvement.
@jakirkham jakirkham force-pushed the use_ptrs__enet_coordinate_descent branch from 3b5acf9 to 99d94d9 Compare August 7, 2018 00:47
@jakirkham
Copy link
Contributor Author

To reiterate, am more than happy to use memoryviews (in fact I also prefer memoryview). Was only making use of pointers the code already had previously and wasn't about to make a bigger change then needed. Since we all seem to prefer memoryviews and there doesn't seem to be any dissent, have made the change in my recent commits. 😄

@jakirkham
Copy link
Contributor Author

The only point that I'm interested in making is that ndarray objects (even in Cython) are really slow. Switching to pointers or memoryviews (particularly when used properly) will result in a significant performance improvement.

As to benchmarks, Jake's blogpost referenced in the OP has some. Admittedly they are from ~6yrs ago, but the findings are pretty similar to what I see today. Here are my numbers.

  • Python + numpy: 2350 ms
  • Cython + numpy: 459 ms
  • Cython + memviews (slicing): 10.9 ms
  • Cython + raw pointers: 1.26 ms
  • Cython + memviews (no slicing): 1.20 ms

Basically everything has gotten a bit faster in the intervening time, but the spread and order of magnitude is about the same. Namely memoryviews that generate no intermediate views or pointers will perform best.

Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this LGTM, if only for readability and getting rid of pointer arithmetics, but I would also have expected this to speed things up at least a bit. With quick benchmarks, it doesn't seem to be the case,

import numpy as np
from sklearn.linear_model import ElasticNet

rng = np.random.RandomState(42)

n_samples, n_features = 100000, 100
X = rng.rand(n_samples, n_features)
y = rng.rand(n_samples)
estimator = ElasticNet()

On master,

In [4]: %timeit estimator.fit(X, y)
255 ms ± 4.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

with this PR,

In [6]: %timeit estimator.fit(X, y)
263 ms ± 3.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

is this also the case in your use case @jakirkham
or am I missing something ?

@jnothman
Copy link
Member
jnothman commented Aug 7, 2018

Why expect much change in speed between pointers and memoryviews?

@rth
Copy link
Member
rth commented Aug 7, 2018

Why expect much change in speed between pointers and memoryviews?

Pointers and memoryview would be equivalent, but the title is "Use memoryviews instead of ndarrays" and that's what the issue description outlines. There are indeed a few changes of ndarrays with memoryviews but it doesn't seem to affect performance here, unless my benchmarks are off.

@jakirkham
Copy link
Contributor Author

Using lprun on estimator.fit, one can see more than 50% of the time is spent in the input data check, a little less than 10% of the time is spent in _pre_fit, and 40% is spent in enet_path. We can tweak this to turn off most of the checks. However it still spends an inordinate amount of time in _pre_fit (~45%), which is basically all spent in _preprocess_data. Of that time, a little under 80% of it is spent doing a copy.

TL;DR a sizeable amount of time in this benchmark is spent in data checks and copying. So it doesn't really do the change justice. FWIW there has been some work to cutdown on unneeded checks and avoid unnecessary copying generally like PR ( #11487 ), PR ( #11693 ), and PR ( #11690 ). While these have improved the situation, it would still be desirable to cutdown on these further. Suggestions on that front would be very welcome. 😄

@jakirkham
Copy link
Contributor Author

Please let me know if there is anything else.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What confuses me here is that we've not changed anything from being an ndarray, have we?

@jakirkham
Copy link
Contributor Author

Any data coming in or space allocated is done with ndarrays. We are taking memoryviews on to them, which we use for access and assignment.

That said, I agree we can probably just type everything as memoryviews outright cutting down on some boilerplate. Have pushed this change.

@jakirkham jakirkham force-pushed the use_ptrs__enet_coordinate_descent branch from 18a159e to 5c1ec82 Compare August 10, 2018 07:42
Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jakirkham !

Performance wise with the above benchmark this doesn't seem to have much impact (including the latest version) but it goes in to the general direction of #10624

We were taking `ndarray`s as input and constructing `ndarray`s within
`enet_coordinate_descent` only to convert them into `memoryview`s.
Instead we can cutdown on the boilerplate and overhead a bit by simply
taking `memoryview`s as input and storing all NumPy arrays as
`memoryview`s.
Since memoryviews don't have methods like `__pow__`, switch to using
NumPy functions like `square` to perform these operations on the data.
Thus handling the type coercion needed.
For `memoryview`s defined in `enet_coordinate_descent`, go ahead and
specify the striding. As these are all freshly created 1-D arrays, we
know they are contiguous. So can just specify this as part of their
`memoryview` definitions.
@jakirkham jakirkham force-pushed the use_ptrs__enet_coordinate_descent branch from 5c1ec82 to 2300ec0 Compare August 10, 2018 15:22
@jnothman jnothman merged commit 7849376 into scikit-learn:master Aug 11, 2018
@jakirkham jakirkham deleted the use_ptrs__enet_coordinate_descent branch August 13, 2018 14:52
@jakirkham
Copy link
Contributor Author

Thanks all for the great feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0