8000 ENH Calculate normed stress (Stress-1) in `manifold.MDS` by Micky774 · Pull Request #22562 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH Calculate normed stress (Stress-1) in manifold.MDS #22562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Jul 4, 2022

Conversation

Micky774
Copy link
Contributor
@Micky774 Micky774 commented Feb 20, 2022

Reference Issues/PRs

Resolves #10168 (stalled)
Resolves #12285 (stalled)
Resolves #13042 (stalled)
Resolves #18094 (stalled)

These PRs are (I believe) in chronological order.

What does this implement/fix? Explain your changes.

Implements the a normalized Stress function (Stress-1) for the smacof algorithm in manifold.MDS. Users can select to use the normalized stress value in place of the unnormalized raw stress (current default) through the normalize keyword.

Any other comments?

From my current understanding, normalized stress only guarantees true scale-invariance in the non-metric case -- as was the case when Kruscal first proposed it. I believe this is the correct implementation and faithful to the original context since non-metric MDS cares only about relative ordering of dissimilarities.

Indeed it seems that metric MDS generally uses stress majorization which does not use normalized stress. Consequently, I currently raise a warning when the user attempts to use MDS with metric=True, normalize=True warning that normalized stress is not supported for metric MDS and that unnormalized stress will be used instead.

It's also worth noting that using normalized=True will change the scale of eps. I'm not sure how to approach this since I don't think warning the user every time they use normalize=True that they should double-check the scale of their eps is a good idea.

Sources:

  1. Original non-metric mds
  2. Companion paper detailing numerical method

Follow-up PRs:

  1. Change default value of normalized to auto
  2. Deprecate normalized

rotheconrad and others added 2 commits August 4, 2020 18:34
Implemented the normalized stress value from Borchmann's stalled PR: scikit-learn#10168
With these changes, passing normalize=True returns a meaningful stress value between 0-1. The current returned stress value is basically useless. normalize is set to False by default.
@Micky774
Copy link
Contributor Author
Micky774 commented Feb 20, 2022

Re: concerns regarding when to normalize stress (e.g. during each loop, or only at output):

Edit: current impression is that it does indeed require the stress to be normalized at every step of the loop, since the normalization constant is determined by the distances generated by the embedding at that step.

@Micky774
Copy link
Contributor Author

@glemaitre @jnothman in case either of you two would be interested in reviewing

@thomasjpfan
Copy link
Member

For the opening message, it's good to write "Resolves #18094 (stalled)" so that when this is merged all the linked PRs are also closed.

@Micky774
Copy link
Contributor Author

For the opening message, it's good to write "Resolves #18094 (stalled)" so that when this is merged all the linked PRs are also closed.

Updated -- thanks!

@thomasjpfan
Copy link
Member

The format is very specific. It has to be:

Resolves #18094

The word "PR" in the middle will not link properly

@Micky774
Copy link
Contributor Author

The format is very specific

Should be fixed now, sorry about that! I appreciate the clarification

@Micky774 Micky774 changed the title WIP ENH Feature: calculate normed stress (Stress-1) in manifold.MDS ENH Feature: calculate normed stress (Stress-1) in manifold.MDS Feb 27, 2022
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When stress is normalized, do we need to do anything with dis?

dis = np.sqrt((X**2).sum(axis=1)).sum()
if verbose >= 2:
print("it: %d, stress %s" % (it, stress))
if old_stress is not None:
if (old_stress - stress / dis) < eps:

(Because we are dividing stress with dis to stop the iteration.)

@Micky774
Copy link
Contributor Author
Micky774 commented Mar 25, 2022

When stress is normalized, do we need to do anything with dis?

dis = np.sqrt((X**2).sum(axis=1)).sum()
if verbose >= 2:
print("it: %d, stress %s" % (it, stress))
if old_stress is not None:
if (old_stress - stress / dis) < eps:

(Because we are dividing stress with dis to stop the iteration.)

As we determined in a conversation off Github, the current set up is fine in that old_stress - stress / dis is an approximation of the "normalized gradient" that Kruscal describes in his companion paper. Users will need to tune epsilon differently when moving between normalized_stress=True/False but this is unnavoidable.

Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
@Micky774
Copy link
Contributor Author
Micky774 commented May 31, 2022

I am wondering if we don't want to add some information in the User Guide as well regarding the parameter in the "Nonmetric MDS" section.

Went ahead and added some content to the User Guide entry, and actually I think there was a mistake in the old entry since it was talking about preserving ordinality w.r.t similarity (technically it does, but the entry was misleading), so I changed that to dissimilarity.

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glemaitre
Copy link
Member

I am merging main into the branch. It should solve the remaining failure.

thomasjpfan
thomasjpfan previously approved these changes Jun 28, 2022
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment about documentation, otherwise LGTM

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a remaining concern on normalized_stress and how it automatically switches.

@thomasjpfan thomasjpfan dismissed their stale review June 29, 2022 21:39

Remaining concern over normalized_stress API

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan changed the title ENH Feature: calculate normed stress (Stress-1) in manifold.MDS ENH Calculate normed stress (Stress-1) in manifold.MDS Jul 4, 2022
@thomasjpfan thomasjpfan merged commit ae51c13 into scikit-learn:main Jul 4, 2022
@Micky774 Micky774 deleted the mds_normalized branch July 4, 2022 20:43
Harsh14901 pushed a commit to Harsh14901/scikit-learn that referenced this pull request Jul 6, 2022
…n#22562)

Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Co-authored-by: Roth E Conrad <rotheconrad@gatech.edu>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
ogrisel pushed a commit to ogrisel/scikit-learn that referenced this pull request Jul 11, 2022
…n#22562)

Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Co-authored-by: Roth E Conrad <rotheconrad@gatech.edu>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@dkobak
Copy link
Contributor
dkobak commented Dec 20, 2024

Hi @Micky774. Currently the normalized stress is not allowed with metric MDS. I don't understand why. It works just fine, and is actually very convenient to have if one wants to compare metric MDS with non-metric MDS solutions. I suggest to remove this restriction in #30514 (that fixes some bugs in non-metric MDS implementation). What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0