8000 Faster Eigen Decomposition for Isomap & KernelPCA · Issue #31246 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
8000

Faster Eigen Decomposition for Isomap & KernelPCA #31246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
oerrabie opened this issue Apr 24, 2025 · 2 comments · May be fixed by #31247
Open

Faster Eigen Decomposition for Isomap & KernelPCA #31246

oerrabie opened this issue Apr 24, 2025 · 2 comments · May be fixed by #31247

Comments

@oerrabie
Copy link
oerrabie commented Apr 24, 2025

(disclaimer: this issue and associated PR are part of a student project supervised by @smarie )

Summary

Eigendecomposition is slow when number of samples is large. This impacts decomposition models such as KernelPCA and Isomap. A "randomized" eigendecomposition method (from Halko et al) has been introduced for KernelPCA leveraging Halko's algorithm 4.3 for randomized SVD decomposition (also used in PCA).

Unfortunately, the current approach is only valid for decomposition of PSD matrices - which suits well for KernelPCA but can not be true in the context of Isomap. Therefore Isomap has not accelerated implementation as of today.

We propose to introduce an additional approximate eigendecomposition method based on algorithm 5.3 from the same paper.
This method should offer a faster alternative to existing solvers (arpack, dense, etc.) while maintaining accuracy, and as opposed to randomized svd, is suitable to find eigenvalues for non-PSD matrices.

Describe your proposed solution

  • Implement _randomized_eigsh(selection='value'), that is left as NotImplemented today.
  • Integrate it as an alternate solver in Isomap and in KernelPCA.
  • Add tests comparing performance with existing solvers.
  • Provide benchmarks to evaluate speedup and accuracy.

Motivation

  • Improves scalability for large datasets.
  • Reduces computation time for eigen decomposition-based methods.

Note: this solution could be used to accelerate all models relying on eigenvalue decomposition, including possibly #22330

@oerrabie oerrabie added Needs Triage Issue requires triage New Feature labels Apr 24, 2025
@ogrisel
Copy link
Member
ogrisel commented Apr 25, 2025

I am not familiar (anymore) with the details of KernelPCA, Isomap, _randomized_eighs and algorithm 5.3 in general, but what you propose sounds reasonable.

Looking forward to reviewing the PR (I assume #31247 is the PR by your student) and once the CI is green and once it includes some benchmark results.

@ogrisel ogrisel removed the Needs Triage Issue requires triage label Apr 25, 2025
@smarie
Copy link
Contributor
smarie commented Apr 28, 2025

Thanks @ogrisel for the quick feedback !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0