-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] Feature: calculate normed stress (Stress-1) in sklearn.manifold.MDS #13042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Is there anything still missing for the merge? |
I need to rebase on the latest version of sklearn. If anything else is
needed, please let me know
…On Sun, Apr 19, 2020, 23:57 Antonio Escobar ***@***.***> wrote:
Is there anything still missing for the merge?
The Stress-1 feature is actually quite fundamental to understand if the
fit is meaningless or not.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#13042 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGG7Z5BCQZR7Z7ZH43ZOMFLRNMGOTANCNFSM4GSBPCHA>
.
|
I merged the latest master version into this branch and solved the merge conflict. |
Is it required to compute the Stress-1 in every iteration, or can it be just done for the final (returned) Stress value? Maybe it can be just a new returned value, instead of a new option. Keeping stress and adding stress_one or stress_normalized |
This is a very good question. Does norming in every iteration affect the result too? |
The stop condition (eps) is checked using the normalized stress, so it might stop prematurely and perform less iterations, since the epsilon in the normalized stress is comparatively smaller. Not a big deal, one could just decrease the eps if using the normalized option, but I think it can anyway be more efficient doing the normalization just at the end. |
Thank you for raising a very good point. I also agree that checking the normalized stress at every iteration is not very likely to cause MDS to stop early whereas it is quite more computing intensive. Next week, to be thorough, I could benchmark a version calculating the normalized stress at every iteration and one just at the end over a few randomly generated distance matrices. I would mainly focus on comparing execution time and the number of iterations required to converge. |
Closing as superseded by #22562. |
Reference Issues/PRs
Fixes #10168 #12285
What does this implement/fix? Explain your changes.
This is a follow-up on the stale PRs referenced above, the main diff is the fix for the previously failing unit test:
https://travis-ci.org/scikit-learn/scikit-learn/jobs/437566342#L2818
To my understanding, even using normalized stress,
smacof()
needs to be initialized at same configuration for the propertyNormed stress should be the same for values multiplied by some factor "k"
to be true so I setrandom_state
ofsmacof()
to a fixed value. Dissimilarity matrix also needs to be large enough.Any other comments?
The previous reviewer was @glemaitre . To my understanding review comments have been addressed but if something is missing, I'll do my best to fix it.