ENH Add eigh as a solver in MDS #22330

Micky774 · 2022-01-29T20:28:52Z

Reference Issues/PRs

Fixes #15272
Resolves #16067 (Stalled)

What does this implement/fix? Explain your changes.

PR #16067:
Adds implementation of Multidimensional scaling (MDS), which uses Singular Value Decompostion (SVD) method. SVD works much faster and far more accurate for Euclidean matrixes than SMACOF algorithm (current implementation of MDS in sklearn/manifold module).

This PR:
Addresses review comments and provides further forum for discussion regarding API.

Any other comments?

Micky774 · 2022-01-29T20:57:39Z

One minor note is that since we default metric=True, I went ahead and changed the default solver to svd since I wager that in most cases where a user is operating MDS in a metric context, the dissimilarity matrix will be Euclidean for the solver.

On that note, re: @smarie 's comment

The term Euclidean, in this context, refers to a property that dissimilarity matrices may have. Specifically:

which is from

"Metric and Euclidean properties of dissimilarity coefficients"
J. C. Gower and P. Legendre, Journal of Classification 3, 5–48 (1986)

I do, however, think that the term is confusing at best since it's an overloaded name. Perhaps we ought to consider rephrasing the condition, or clarifying what Euclidean means in this specific context.

Also, regarding why this method uses the eigen-solver, the classical metric MDS strategy directly uses the eigenvalue decomposition.

Micky774 · 2022-01-29T21:25:14Z

One minor note is that since we default metric=True, I went ahead and changed the default solver to svd since I wager that in most cases where a user is operating MDS in a metric context, the dissimilarity matrix will be Euclidean for the solver.

~~Actually I changed it to begin with solver='auto' and resolve it to svd or smacof based on whether metric=True or not, resp.~~

Ignore all of that. I have removed auto and gone back to defaulting to smacof to maintain generality of a default MDS estimator.

cmarmo · 2022-02-03T19:12:13Z

Hi @Micky774, thanks for your follow-up! For reviewers, please notice that the failures are related to #22061 : this PR is ready for review. Thanks!

Micky774 · 2022-02-12T22:08:46Z

@NicolasHug @adrinjalali I wonder if either of you would be interested in reviewing this

Micky774 · 2022-02-15T23:52:21Z

Okay I'm not sure why it's failing the CI linting test:

When I run black locally, I get no issue (indeed it didn't stop me during pre-commit either). When I make the change that the CI black is suggesting, then re-run it, I get this result.

Is the CI using a different version of black? Or is there something else I'm missing?

thomasjpfan · 2022-02-16T01:54:41Z

We recently update our version of black to the stable versionersino. Installing black==22.1.0 should resolve the issue.

For merge with main and pre-commit should detect the new config and install the new version of black. Be sure the config looks like this:

scikit-learn/.pre-commit-config.yaml

Lines 8 to 9 in abbee57

    
           -   repo: https://github.com/psf/black 
        
               rev: 22.1.0

thomasjpfan

Is this called the "Classical MDS algorithm"?

Can we run a benchmark on a decently sized dataset to compare the new solver and SMACOF?

sklearn/manifold/_mds.py

doc/modules/manifold.rst

sklearn/manifold/_mds.py

dgolteanu · 2022-11-14T23:31:32Z

Sorry for an ignorant question, is this PR missing something or any chance it will make it in 1.2? Excitedly following from the sidelines :)

Micky774 · 2022-11-29T22:18:59Z

Sorry for an ignorant question, is this PR missing something or any chance it will make it in 1.2? Excitedly following from the sidelines :)

Sorry for the late reply -- it is currently waiting on reviewers, so it is not likely to make it into 1.2 unfortunately.

github-actions · 2023-07-27T14:02:48Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 6cd2ad8. Link to the linter CI: here}

jolespin · 2024-04-23T18:09:21Z

Has this been integrated into any beta versions to use?

dkobak · 2024-10-23T08:06:10Z

While I support this PR in principle, I don't like the suggested API and some of the suggested wording in the docs. This PR implements "classical MDS" (aka "principal coordinates analysis", PCoA). Classical MDS optimizes a different loss function compared to metric MDS. The former optimizes what is called "strain" while the latter optimizes what is called "stress". For the "strain" there is an exact solution in terms of eigendecomposition. For the stress there is not, which is why SMACOF algorithm is used instead. These two things are not approximations of each other, and are not supposed to give the same result.

For that reason, I don't like solver="smacof" and solver="eigh", because these are not two different solvers optimizing the same loss, but two different optimization problems. I think it should be called algorithm or maybe flavor. And the two options I would call algorithm="classical" vs. algorithm="metric". As there is also non-metric MDS, currently available via metric=False, it could be refactored as algorithm="non-metric". Unfortunately this API would need to go through a deprecation cycle...

Alternatively, we could make classical MDS available via classical=True, by default set to classical=False.

adrinjalali · 2024-11-04T12:41:56Z

@antoinebaker would you be able to have a look here maybe?

Micky774 · 2024-11-04T16:28:13Z

I favor the algorithm : {'classical', 'metric', 'non-metric'}, default = 'classical' approach w/ deprecation of metric. I'll begin updating this PR, if folks are okay with this API moving forwards.

dkobak · 2024-11-04T16:47:31Z

@Micky774 IMHO the default should be metric, for backwards compatibility (and also because I do first think of non-classical metric MDS when somebody says "MDS").

dkobak · 2024-11-22T08:54:53Z

@Micky774 Are you still planning on updating this PR according to what was discussed above, or should somebody take over?

antoinebaker · 2024-11-29T11:50:44Z

Thanks @Micky774 for the PR!

For the API, I think we should consider implementing a new estimator ClassicalMDS or PCoA.

As correctly pointed out by @dkobak

While I support this PR in principle, I don't like the suggested API and some of the suggested wording in the docs. This PR implements "classical MDS" (aka "principal coordinates analysis", PCoA). Classical MDS optimizes a different loss function compared to metric MDS. The former optimizes what is called "strain" while the latter optimizes what is called "stress". For the "strain" there is an exact solution in terms of eigendecomposition. For the stress there is not, which is why SMACOF algorithm is used instead. These two things are not approximations of each other, and are not supposed to give the same result.

I feel that adding a solver or algorithm parameter has several drawbacks, from the developer and user perspective:

the SMACOF and PCoA do not share any logic in the code
it will add confusion to the already confusing User Guide section, with too many nested sections and mathematical details (eg the different optimization problems, the different losses)
it will add confusion to the MDS parameters and attributes
- many parameters are SMACOF specific n_init, max_iter, verbose, eps, n_jobs, random_state, normalized_stress and irrelevant for PCoA
- for PCoA we would like to have a strain_ attribute (that's what is optimized) instead of the (normalized) stress_ attribute for SMACOF
- the attribute n_iter_ is SMACOF specific
- the resulting docstring will be convoluted

Keeping MDS and adding PCoA instead will make it clear that we have two different algorithms, tackling different optimization problems, with different behaviors (SMACOF is stochastic and iterative while PCoA is deterministic). In the User Guide we should of course cross reference PCoA and MDS, for instance advice PCoA as a (faster) alternative for metric MDS on dissimilarities computed from euclidian distances. Also it will be easier to implement future developments for the SMACOF/PCoA separately, eg Array API support or sample weight.

What do you think @Micky774, @dkobak and @adrinjalali ?

dkobak · 2024-11-29T12:03:55Z

Good points @antoinebaker. I don't have a strong opinion here. I find that many people first think of classical MDS when I say "MDS", so that would be an argument in favor of having classical MDS inside the MDS class. But I do see your point.

Regarding the name of a new class, I think in the ML community more people are familiar with the term "MDS" than with the term "PCoA", but on the other hand PCoA looks better as a class name than ClassicalMDS. On the yet other hand, there may be unnecessary confusion with PCA. So perhaps ClasicalMDS?

antoinebaker · 2024-11-29T12:59:03Z

ClassicalMDS sounds good, that way users searching for "MDS" (unspecified) will find both classes.

Micky774 · 2024-11-29T14:32:33Z

cc @scikit-learn/core-devs I think this is a nice opportunity to clean up our MDS support a bit so I want to get some opinions on the way forwards here

adrinjalali · 2024-12-02T08:56:24Z

I'm happy with the suggestions here.

dkobak · 2025-04-02T11:33:59Z

Hi @Micky774. Are you still planning to get back to this PR and rework it to implement a new class ClassicalMDS as we discussed? If not, I would maybe go ahead and create a separate PR for that. But of course it would be great if you would adjust this PR.

adrinjalali · 2025-04-23T12:56:59Z

@dkobak I think you can go ahead and have a PR superseding this one.

dkobak · 2025-04-23T13:26:00Z

@adrinjalali Thanks, I will try to find time for this after my pending PR #31117 gets merged. If you could take a look there, it'd be much appreciated by the way!

dkobak · 2025-05-06T13:56:52Z

@adrinjalali @antoinebaker I took a stab at it at #31322.

Pan Jan and others added 6 commits March 3, 2020 14:39

Implement SVD-based method in MDS class

27868c6

Make requested changes

cba2700

Change name 'method' to 'solver'

03954cf

Make another set of requested changes

3047960

Merge branch 'main' into svd_in_mds

ec66a71

Added temporary benchmark notebook

c8ec454

Added temporary benchmark notebook

c8ec454

github-actions bot added the module:manifold label Jan 29, 2022

Micky774 mentioned this pull request Jan 29, 2022

Implement SVD algorithm in MDS #16067

Closed

Micky774 added 8 commits January 29, 2022 17:03

Updated solver selection logic

bdc70da

Updated documentation and improved solver selection

8573ed3

Merge branch 'main' into svd_in_mds

534bd23

Updated docs that included old default: ... format

4c1b098

Updated whats_new

94a7bfa

Undo solver='auto' change

48a003b

Updated solver error message

ca60660

Corrected solver parameter documentation

48b9d21

Merge branch 'main' into svd_in_mds

f4fa1fe

Micky774 mentioned this pull request Feb 8, 2022

sklearn MDS vs skbio PCoA #15272

Open

Merge branch 'main' into svd_in_mds

0b63d80

Micky774 and others added 2 commits February 14, 2022 10:45

Merge branch 'main' into svd_in_mds

c8f0eca

Removed depreciated tests

6042fa2

Micky774 added 2 commits February 16, 2022 00:31

Merge branch 'main' into svd_in_mds

8f3b8d4

Linting w/ new version of Black

874739a

thomasjpfan reviewed Mar 23, 2022

View reviewed changes

sklearn/manifold/_mds.py Show resolved Hide resolved

doc/modules/manifold.rst Outdated Show resolved Hide resolved

sklearn/manifold/_mds.py Outdated Show resolved Hide resolved

sklearn/manifold/_mds.py Outdated Show resolved Hide resolved

Merge branch 'main' into svd_in_mds

19cac39

thomasjpfan mentioned this pull request Jul 21, 2022

[MRG] Issue 1453: MDS fall back to SVD when possible #4485

Closed

Micky774 and others added 3 commits July 25, 2022 11:50

Merge branch 'main' into svd_in_mds

696448a

Merge branch 'main' into svd_in_mds

88c0f1f

Merge branch 'main' into svd_in_mds

9ae844f

Merge branch 'main' into svd_in_mds

1b2ae3c

Merge branch 'main' into svd_in_mds

6cd2ad8

adrinjalali mentioned this pull request Nov 8, 2024

Clarification on Kruskal Stress as an Optimization Target in Metric and Non-metric MDS #30240

Closed

smarie mentioned this pull request Apr 4, 2025

Faster Eigen Decomposition for Isomap & KernelPCA Projet-open-source/scikit-learn#5

Open

oerrabie mentioned this pull request Apr 24, 2025

Faster Eigen Decomposition for Isomap & KernelPCA #31246

Open

dkobak mentioned this pull request May 6, 2025

FEA Implement classical MDS #31322

Open

smarie mentioned this pull request Jun 10, 2025

ENH: linalg: randomized svd and eigh decomposition (faster but approximate) scipy/scipy#23145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH Add eigh as a solver in MDS #22330

ENH Add eigh as a solver in MDS #22330

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH Add eigh as a solver in MDS #22330

Are you sure you want to change the base?

ENH Add eigh as a solver in MDS #22330

Uh oh!

Conversation

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!