8000 In multifold.MDS stress value doesn’t correspond to returned coordinates · Issue #16846 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
8000

In multifold.MDS stress value doesn’t correspond to returned coordinates #16846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
akrupnik opened this issue Apr 5, 2020 · 2 comments · Fixed by #30514
Closed

In multifold.MDS stress value doesn’t correspond to returned coordinates #16846

akrupnik opened this issue Apr 5, 2020 · 2 comments · Fixed by #30514

Comments

@akrupnik
Copy link
akrupnik commented Apr 5, 2020

Description

 

In multifold.MDS  stress value doesn’t correspond to returned coordinates.

 

How to reproduce:

 

Apply MDS to four points in 2d-plane:

 

X   Y

1   5

1   4

1   1

3   3

 

If distances are Euclidean, then disparities matrix will be:

 

disparities =

 [ 0.                      1.                     4.                                    2.82842712  ]

 [ 1.                      0.                     3.                                    2.23606798  ]

 [ 4.                      3.                     0.                                    2.82842712  ]

 [ 2.82842712      2.23606798     2.82842712                    0.                  ]

 

Perform multidimensional scaling:

 

mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=42, metric = True)

results = mds.fit(disparities)

coords = results.embedding_

 

cords =

 [ -0.32503572      1.78399044 ]

 [  0.09843686      0.87890749 ]

 [  1.47842546     -1.76761757 ]

 [ -1.2518266      -0.89528037 ]

 

stress = results.stress_

0.0045113518633979315

 

Now calculate stress by hand.

 

First calculate Euclidian distances which correspond coordinates returned:

 

dis = euclidean_distances(coords)

 [ 0.                   0.9992518       3.98326395      2.83503675  ]

 [ 0.9992518     0.                     2.98470492      2.22956362  ]

 [ 3.98326395   2.98470492     0.                      2.86622548  ]

 [ 2.83503675   2.22956362     2.86622548      0.                  ]

 

And now we can calculate stress value:

 

stress =

((dis.ravel() - disparities.ravel()) ** 2).sum() / 2

0.0020293041020245147

 

So, the real stress calculated using returned coordinates doesn’t correspond to the stress value

results.stress_ = 0.0045113518633979315

 

After some debugging it is clear that MDS returns coordinates for the current iteration and stress for the previous iteration.

 

 

Here is the script to reproduce:

 

from sklearn import manifold

from sklearn.metrics.pairwise import euclidean_distances

import pandas as pd

 

data = [['A',1,5],['B',1,4],['C',1,1],['D',3,3]]

df = pd.DataFrame(data, columns = ['Name', 'X','Y'])

 

#Distance matrix

disparities = euclidean_distances(df[['X','Y']])

mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=42,  metric = True)

results = mds.fit(disparities)

coords = results.embedding_

stress = results.stress_

print ('returned stress=',stress)

 

# Calculate Stress by hand

dis = euclidean_distances(coords)

real_stress = ((dis.ravel() - disparities.ravel()) ** 2).sum() / 2

print('real stress =',real_stress)

 

 

 

 

Versions:

System:

    python: 3.7.6 (default, Dec 30 2019, 19:38:36)  [Clang 10.0.0 (clang-1000.11.45.5)]

executable: /usr/local/opt/python/bin/python3.7

   machine: Darwin-17.7.0-x86_64-i386-64bit

 

Python deps:

       pip: 19.3.1

setuptools: 42.0.2

   sklearn: 0.21.3

     numpy: 1.17.4

     scipy: 1.3.2

    Cython: 0.29.15

    pandas: 1.0.1

 

 

 

 

 

EDITED by @ogrisel to strip the HTML of the reproducer and use markdown fomatting instead:

from sklearn import manifold
from sklearn.metrics.pairwise import euclidean_distances
import pandas as pd

data = [['A',1,5],['B',1,4],['C',1,1],['D',3,3]]
df = pd.DataFrame(data, columns = ['Name', 'X','Y'])

# Distance matrix
disparities = euclidean_distances(df[['X','Y']])

# Fit MDS on precomputed distances
mds = manifold.MDS(
    n_components=2,
    dissimilarity="precomputed",
    random_state=42,
    metric=True,
)

# Print Stress computed by scikit-learn
results = mds.fit(disparities)
coords = results.embedding_
stress = results.stress_
print ('returned stress=',stress)

# Calculate Stress by hand
dis = euclidean_distances(coords)
real_stress = ((dis.ravel() - disparities.ravel()) ** 2).sum() / 2
print('real stress =',real_stress)
@NicolasHug
Copy link
Member

In the future please make sure to format your post to distinguish code from text (I'd do it for you but the source is all html...)

Would you like to submit a PR with the proposed changes @akrupnik ? It would be easier for us to consider

@dkobak
Copy link
Contributor
dkobak commented Dec 20, 2024

Thanks for submitting this bug. I just made a PR that fixes it (together with some other things): #30514.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
43C4
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
0