-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH add sparse output to SplineTransformer #24145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR
# Note that scipy BSpline returns float64 arrays and converts input | ||
# x=X[:, i] to c-contiguous float64. | ||
n_out = self.n_features_out_ + n_features * (1 - self.include_bias) | ||
if X.dtype in FLOAT_DTYPES: | ||
dtype = X.dtype | ||
else: | ||
dtype = np.float64 | ||
XBS = np.zeros((n_samples, n_out), dtype=dtype, order=self.order) | ||
if use_sparse: | ||
output_list = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bunch of sparse matrices being constructed in this implementation. For each feature:
- A CSR design matrix is constructed
- This matrix can be converted into a lil matrix
- hstack converts it all back to CSR.
What is the runtime for transform
with sparse output compared to dense output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import numpy as np
from sklearn.preprocessing import SplineTransformer
X = np.linspace([-1, -10, 100], [1, 10, 100], 10000)
st_sparse = SplineTransformer(sparse_output=True, extrapolation="error").fit(X)
st_dense = SplineTransformer(extrapolation="error").fit(X)
%timeit st_dense.transform(X)
2.13 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit st_sparse.transform(X)
43.7 ms ± 336 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
That's unfortunate. Having n_features=1 doesn't change the result.
But memory consumption should be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bottleneck seems to be in scipy's _bspl.pyx
function _make_design_matrix
. In contrast, evaluate_spline
does all loops explicitly. We might report this upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what's going on here TBH, but csr->lil->csr does sound expensive. Where is the bottleneck, what does the profiler say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the impact of scipy/scipy#16840 on the scikit-learn benchmark in https://github.com/scikit-learn/scikit-learn/pull/24145/files#r940357974?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there scipy nightly builds to install? Otherwise, this is above my current time budget to benchmark.
Scipy 1.10 will not only speed up BSpline.design_matrix
but also make it easier to implement spline extrapolation in scikit-learn, thereby further reducing runtime and memory consumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SciPy has nightly builds:
pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple scipy
but their CI is failing and not updating the nightly builds. I opened MacPython/scipy-wheels#175 to track the SciPy issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With scipy==1.10.0.dev0
, one gets improved runtime:
import numpy as np
from sklearn.preprocessing import SplineTransformer
X = np.linspace([-1, -10, 100], [1, 10, 100], 10000)
st_sparse = SplineTransformer(sparse_output=True, extrapolation="error").fit(X)
st_dense = SplineTransformer(extrapolation="error").fit(X)
%timeit st_dense.transform(X)
1.89 ms ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit st_sparse.transform(X)
4.89 ms ± 53.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my laptop, I get (scipy 1.10.1)
%timeit st_dense.transform(X)
2.12 ms ± 46.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit st_sparse.transform(X)
4.61 ms ± 162 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this support, @lorentzenchr.
Here are a few comments and questions.
We won't have time to finish the review on this one before the 1.2 release. Moving it to 1.3 |
Finally, CI is 🟢. Edit: Codecov does not count as it is clear that not everything is tested for a single scipy version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I just have a few last comments. Errors on the CI with scipy-dev
are unrelated (see #26154).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another pass of review. The performance is still poor with extrapolation (at least "periodic") even with scipy 1.10.1. From the inline comments, it's not clear if this is expected or not.
If it's expected and there is no easy way around it, I think the inline comments should be updated to make this more explicit (see below for details).
Other than that, LGTM!
XBS = sparse.hstack(output_list) | ||
elif self.sparse_output: | ||
# TODO: Remove ones scipy 1.10 is the minimum version. See comments above. | ||
XBS = sparse.csr_matrix(XBS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems to imply that with scipy 1.10, the .tolil
conversion would no longer be necessary.
However, as far as I understand, we still have to go through this condition when extrapolate="periodic"
, even with scipy 1.10 or later:
Would there be a way to avoid the .tolil
conversion completely with recent scipy versions?
At the moment, sparse periodic extrapolation is more than 20x slower than its dense counterpart:
In [1]: import numpy as np
...: from sklearn.preprocessing import SplineTransformer
...:
...: X = np.linspace([-1, -10, 100], [1, 10, 101], 10000)
...: extrapolation="periodic"
...: st_sparse = SplineTransformer(sparse_output=True, extrapolation=extrapolation).fit(X)
...: st_dense = SplineTransformer(extrapolation=extrapolation).fit(X)
In [2]: %timeit st_dense.transform(X)
1.29 ms ± 1.25 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [3]: %timeit st_sparse.transform(X)
64.1 ms ± 851 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
scipy version: 1.10.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is slower, yes, but this is not a performance regression as sparse output is a new feature and the default is dense output.
And yes, with scipy >= 1.10, we can get rid of (at least most of) the lil-conversions as we can use design_matrix(..., extrapolate=True)
.
Oops, I broke the linter when resolving the conflicts via the github UI. Let me push a fix. |
Merged. Thanks @lorentzenchr! |
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Reference Issues/PRs
Fixes #20998.
What does this implement/fix? Explain your changes.
This PR adds argument
sparse
sparse_output
toSplineTransformer
. Set toTrue
, it returns a sparse csr matrix.Any other comments?
This is available only for scipy >= 1.8. Further improvements will be possible with scipy 1.10 (extrapolate argument for
BSpline.design_matrix
).