-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
ENH new svd_solver='covariance_eigh' for PCA #27491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I get a pytest xdist bug on the CI:
I don't remember what is the workaround for this... EDIT: that was caused by the use of |
The Array API compliance tests pass for the new solver both for PyTorch with MPS on Apple M1 and for PyTorch cuda and CuPy with an NVIDIA GPU! Will run some quick benchmarks on cuda and post them in the description. EDIT: done. The performance is very good for cuda. For MPS it's not terrible but not better than the CPU device. |
…e as a pandas dataframe
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@lorentzenchr I updated the PR to take the latest batch of review comments into account.
Any second review approval will do, when people have time. |
I re-ran the benchmark script and the auto strategy is still near-optimally picking the fastest solver as originally reported in #27491 (comment). |
BTW, it would be interesting to try this new solver with dask in an out-of-core setting where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of non-code.
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (despite the fact that I'm not that much into PCA 😏)
@ogrisel Ping (me/us) when ready to merge. |
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
@lorentzenchr this is ready from my point of view. |
I forgot about #27491 (comment). Let me give it a shot. EDIT: this should be good now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last review comment 🤞
@lorentzenchr this should be good to go now. For information I also did the following experiment to check numerical stability in the In [1]: import numpy as np
In [2]: X_low = np.random.randn(int(1e7), 5)
In [3]: W = np.random.randn(5, 30)
In [4]: X = X_low @ W
In [5]: X += np.random.randn(*X.shape) * 1e-12 # rank 5 up to eps-level noise
In [6]: from sklearn.decomposition import PCA
In [7]: PCA(svd_solver="covariance_eigh").fit(X).explained_variance_
Out[7]:
array([5.13565104e+01, 4.16859476e+01, 2.90827456e+01, 2.34093563e+01,
1.54211597e+01, 5.95719954e-14, 5.63161175e-14, 5.02011137e-14,
4.21787206e-14, 2.90082256e-14, 2.88704010e-14, 1.78452632e-14,
1.53384409e-14, 1.36259394e-14, 1.14775441e-14, 9.65281151e-15,
4.52684746e-15, 2.73330102e-15, 2.23056558e-17, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00])
In [8]: PCA(svd_solver="full").fit(X).explained_variance_
Out[8]:
array([5.13565104e+01, 4.16859476e+01, 2.90827456e+01, 2.34093563e+01,
1.54211597e+01, 1.00568294e-24, 1.00282867e-24, 1.00269022e-24,
1.00233412e-24, 1.00208860e-24, 1.00173990e-24, 1.00155317e-24,
1.00116836e-24, 1.00078966e-24, 1.00051726e-24, 1.00049199e-24,
1.00025070e-24, 1.00023886e-24, 1.00011534e-24, 9.99640288e-25,
9.99398032e-25, 9.99344471e-25, 9.99024388e-25, 9.98812971e-25,
9.98585628e-25, 9.98398337e-25, 9.98292080e-25, 9.97761657e-25,
9.97621196e-25, 9.97159309e-25]) So I think the new solver is fine. The non-eps level components of the spectrum are recovered correctly and the other are below eps level for both |
|
||
.. versionadded:: 0.18.0 | ||
|
||
.. versionchanged:: 1.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't there be 1.5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Well spotted. Do you want to open a PR to fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'll do it
Implements the new solver proposed in #27483.
I also took the opportunity of this PR to strengthen the solver equivalence tests.
Benchmark
Laptop CPU with numpy on macOS (with Apple Accelerate BLAS)
so around a 10x speed-up.
Switching to OpenBLAS makes it 2x or so slower than Apple Accelerate but still very good.
Note: to switch the BLAS implementation with conda-forge, there is no need to rebuild scikit-learn or scipy. Just the following commands:
or
PyTorch via Array API with MPS device on Apple M1
As the warning said, the MPS backend for PyTorch still lacks
torch.linalg.*
kernels and fallbacks to the CPU which makes the overall performance a bit slower than just using numpy on CPU. I can also try directly using PyTorch on the CPU (without the MPS backend):So we get similar results as with numpy on the same host.
Intel Xeon CPU (44 physical cores from 2018 or so) with numpy on Linux (with MKL BLAS)
I get similar numbers (slightly slower) as on my macOS laptop for the new solver. I inspected the use of threads with htop (on a 10x larger dataset) and I could see that MKL (and OpenBLAS) decide to only use 1 thread for the matrix matrix multiplication (to compute the covariance). This is probably because the matrix is very skinny and concurrent updates by several threads would be detrimental. As a result, using a multi-core machine is not very beneficial for this solver on this kind of data (but it's still more 5x faster than using the randomized solver that actually uses many cores on htop, but not 100% of the time).
Switching to OpenBLAS makes it 2x or so slower than MKL but still very good.
Anyways, the performance (with MKL) seems to be competitive with the scikit-learn-intelex benchmark results we collected a while ago. Note that I cannot reproduce those values with the latest version of scikit-learn-intelex because it seems to use the scikit-learn "randomized" solver instead of their own in the latest version.
PyTorch via Array API with cuda (NVIDIA V100 GPU)
So here again we observe a 10x speed-up with the new solver.
We can compare to NVIDIA's cuML from RAPIDS 23.08 on the same data either provided as CPU allocated numpy array or GPU allocated CuPy:
So when the data is pre-allocated on the cuda device, our PyTorch + cuda implementation is 5x faster than cuML's!
PyTorch via Array API on Intel Max GPU
The performance is not good at all, but I suspect that the Intel extension for PyTorch is not ready yet. I opened an issue to track progress upstream:
torch.linalg.eigh
is significantly slower than expected on Max Series GPU intel/intel-extension-for-pytorch#439TODO:
transform
andfit_transform
,Note that I had to change the way we use
svd_flip
in this estimator to be able to use get the same signs for the components of the new solver. I think it's better for scikit-learn to be self-consistent (checked in the solver equivalence test) than consistent across versions.