8000 [MRG] Feature: calculate normed stress (Stress-1) in sklearn.manifold.MDS by matthieu-pa · Pull Request #13042 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] Feature: calculate normed stress (Stress-1) in sklearn.manifold.MDS #13042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

matthieu-pa
Copy link

Reference Issues/PRs

Fixes #10168 #12285

What does this implement/fix? Explain your changes.

This is a follow-up on the stale PRs referenced above, the main diff is the fix for the previously failing unit test:

https://travis-ci.org/scikit-learn/scikit-learn/jobs/437566342#L2818

        stress1 = mds.smacof(sim, normalize=True)[1]
        stress2 = mds.smacof(k * sim, normalize=True)[1]
    
        # Normed stress should be the same for
        # values multiplied by some factor "k"
>       assert_allclose(stress1, stress2)
k          = 2
sim        = array([[0, 5, 3, 4],
       [5, 0, 2, 2],
       [3, 2, 0, 1],
       [4, 2, 1, 0]])
stress1    = 0.025998852705994606
stress2    = 0.033375197002665315
/home/travis/build/scikit-learn/scikit-learn/sklearn/manifold/tests/test_mds.py:78: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib/python2.7/dist-packages/numpy/testing/utils.py:1183: in assert_allclose
    verbose=verbose, header=header)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

To my understanding, even using normalized stress, smacof() needs to be initialized at same configuration for the property Normed stress should be the same for values multiplied by some factor "k" to be true so I set random_state of smacof() to a fixed value. Dissimilarity matrix also needs to be large enough.

Any other comments?

The previous reviewer was @glemaitre . To my understanding review comments have been addressed but if something is missing, I'll do my best to fix it.

@amueller amueller added the Needs Benchmarks A tag for the issues and PRs which require some benchmarks label Aug 5, 2019
@ADEscobar
Copy link

Is there anything still missing for the merge?
The Stress-1 feature is actually quite fundamental to understand if the fit is meaningless or not.

@matthieu-pa
Copy link
Author
matthieu-pa commented Apr 20, 2020 via email

@matthieu-pa
Copy link
Author

I merged the latest master version into this branch and solved the merge conflict.

@ADEscobar
Copy link
ADEscobar commented Apr 21, 2020

I merged the latest master version into this branch and solved the merge conflict.

Is it required to compute the Stress-1 in every iteration, or can it be just done for the final (returned) Stress value?

Maybe it can be just a new returned value, instead of a new option. Keeping stress and adding stress_one or stress_normalized

@jnothman
Copy link
Member

Is it required to compute the Stress-1 in every iteration, or can it be just done for the final (returned) Stress value?

This is a very good question. Does norming in every iteration affect the result too?

@ADEscobar
Copy link

Is it required to compute the Stress-1 in every iteration, or can it be just done for the final (returned) Stress value?

This is a very good question. Does norming in every iteration affect the result too?

The stop condition (eps) is checked using the normalized stress, so it might stop prematurely and perform less iterations, since the epsilon in the normalized stress is comparatively smaller.

Not a big deal, one could just decrease the eps if using the normalized option, but I think it can anyway be more efficient doing the normalization just at the end.

@matthieu-pa
Copy link
Author

Not a big deal, one could just decrease the eps if using the normalized option, but I think it can anyway be more efficient doing the normalization just at the end.

Thank you for raising a very good point. I also agree that checking the normalized stress at every iteration is not very likely to cause MDS to stop early whereas it is quite more computing intensive.

Next week, to be thorough, I could benchmark a version calculating the normalized stress at every iteration and one just at the end over a few randomly generated distance matrices. I would mainly focus on comparing execution time and the number of iterations required to converge.

@joshuacwnewton joshuacwnewton mentioned this pull request Aug 7, 2020
Base automatically changed from master to main January 22, 2021 10:50
@cmarmo cmarmo added Superseded PR has been replace by a newer PR and removed Waiting for Reviewer labels Mar 25, 2021
@cmarmo
Copy link
Contributor
cmarmo commented May 2, 2022

Closing as superseded by #22562.

@cmarmo cmarmo closed this May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:manifold Needs Benchmarks A tag for the issues and PRs which require some benchmarks Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0