-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH Calculate normed stress (Stress-1) in manifold.MDS
#22562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implemented the normalized stress value from Borchmann's stalled PR: scikit-learn#10168 With these changes, passing normalize=True returns a meaningful stress value between 0-1. The current returned stress value is basically useless. normalize is set to False by default.
Re: concerns regarding when to normalize stress (e.g. during each loop, or only at output): Edit: current impression is that it does indeed require the stress to be normalized at every step of the loop, since the normalization constant is determined by the distances generated by the embedding at that step. |
@glemaitre @jnothman in case either of you two would be interested in reviewing |
For the opening message, it's good to write "Resolves #18094 (stalled)" so that when this is merged all the linked PRs are also closed. |
Updated -- thanks! |
The format is very specific. It has to be: Resolves #18094 The word "PR" in the middle will not link properly |
Should be fixed now, sorry about that! I appreciate the clarification |
manifold.MDS
manifold.MDS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When stress is normalized, do we need to do anything with dis
?
scikit-learn/sklearn/manifold/_mds.py
Lines 129 to 133 in bb5ab53
dis = np.sqrt((X**2).sum(axis=1)).sum() | |
if verbose >= 2: | |
print("it: %d, stress %s" % (it, stress)) | |
if old_stress is not None: | |
if (old_stress - stress / dis) < eps: |
(Because we are dividing stress
with dis
to stop the iteration.)
As we determined in a conversation off Github, the current set up is fine in that |
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Went ahead and added some content to the User Guide entry, and actually I think there was a mistake in the old entry since it was talking about preserving ordinality w.r.t similarity (technically it does, but the entry was misleading), so I changed that to dissimilarity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I am merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment about documentation, otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a remaining concern on normalized_stress
and how it automatically switches.
Remaining concern over normalized_stress API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
manifold.MDS
manifold.MDS
…n#22562) Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com> Co-authored-by: Roth E Conrad <rotheconrad@gatech.edu> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
…n#22562) Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com> Co-authored-by: Roth E Conrad <rotheconrad@gatech.edu> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Hi @Micky774. Currently the normalized stress is not allowed with metric MDS. I don't understand why. It works just fine, and is actually very convenient to have if one wants to compare metric MDS with non-metric MDS solutions. I suggest to remove this restriction in #30514 (that fixes some bugs in non-metric MDS implementation). What do you think? |
Reference Issues/PRs
Resolves #10168 (stalled)
Resolves #12285 (stalled)
Resolves #13042 (stalled)
Resolves #18094 (stalled)
These PRs are (I believe) in chronological order.
What does this implement/fix? Explain your changes.
Implements the a normalized Stress function (Stress-1) for the
smacof
algorithm inmanifold.MDS
. Users can select to use the normalized stress value in place of the unnormalized raw stress (current default) through thenormalize
keyword.Any other comments?
From my current understanding, normalized stress only guarantees true scale-invariance in the non-metric case -- as was the case when Kruscal first proposed it. I believe this is the correct implementation and faithful to the original context since non-metric MDS cares only about relative ordering of dissimilarities.
Indeed it seems that metric MDS generally uses stress majorization which does not use normalized stress. Consequently, I currently raise a warning when the user attempts to use
MDS
withmetric=True, normalize=True
warning that normalized stress is not supported for metric MDS and that unnormalized stress will be used instead.It's also worth noting that using
normalized=True
will change the scale ofeps
. I'm not sure how to approach this since I don't think warning the user every time they usenormalize=True
that they should double-check the scale of theireps
is a good idea.Sources:
Follow-up PRs:
normalized
toauto
normalized