10000 In multifold.MDS stress value doesn’t correspond to returned coordinates · Issue #16846 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
In multifold.MDS stress value doesn’t correspond to returned coordinates #16846
Closed
@akrupnik

Description

@akrupnik

Description

 

In multifold.MDS  stress value doesn’t correspond to returned coordinates.

 

How to reproduce:

 

Apply MDS to four points in 2d-plane:

 

X   Y

1   5

1   4

1   1

3   3

 

If distances are Euclidean, then disparities matrix will be:

 

disparities =

 [ 0.                      1.                     4.                                    2.82842712  ]

 [ 1.                      0.                     3.                                    2.23606798  ]

 [ 4.                      3.                     0.                                    2.82842712  ]

 [ 2.82842712      2.23606798     2.82842712                    0.                  ]

 

Perform multidimensional scaling:

 

mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=42, metric = True)

results = mds.fit(disparities)

coords = results.embedding_

 

cords =

 [ -0.32503572      1.78399044 ]

 [  0.09843686      0.87890749 ]

 [  1.47842546     -1.76761757 ]

 [ -1.2518266      -0.89528037 ]

 

stress = results.stress_

0.0045113518633979315

 

Now calculate stress by hand.

 

First calculate Euclidian distances which correspond coordinates returned:

 

dis = euclidean_distances(coords)

 [ 0.                   0.9992518       3.98326395      2.83503675  ]

 [ 0.9992518     0.                     2.98470492      2.22956362  ]

 [ 3.98326395   2.98470492     0.                      2.86622548  ]

 [ 2.83503675   2.22956362     2.86622548      0.                  ]

 

And now we can calculate stress value:

 

stress =

((dis.ravel() - disparities.ravel()) ** 2).sum() / 2

0.0020293041020245147

 

So, the real stress calculated using returned coordinates doesn’t correspond to the stress value

results.stress_ = 0.0045113518633979315

 

After some debugging it is clear that MDS returns coordinates for the current iteration and stress for the previous iteration.

 

 

Here is the script to reproduce:

 

from sklearn import manifold

from sklearn.metrics.pairwise import euclidean_distances

import pandas as pd

 

data = [['A',1,5],['B',1,4],['C',1,1],['D',3,3]]

df = pd.DataFrame(data, columns = ['Name', 'X','Y'])

 

#Distance matrix

disparities = euclidean_distances(df[['X','Y']])

mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=42,  metric = True)

results = mds.fit(disparities)

coords = results.embedding_

stress = results.stress_

print ('returned stress=',stress)

 

# Calculate Stress by hand

dis = euclidean_distances(coords)

real_stress = ((dis.ravel() - disparities.ravel()) ** 2).sum() / 2

print('real stress =',real_stress)

 

 

 

 

Versions:

System:

    python: 3.7.6 (default, Dec 30 2019, 19:38:36)  [Clang 10.0.0 (clang-1000.11.45.5)]

executable: /usr/local/opt/python/bin/python3.7

   machine: Darwin-17.7.0-x86_64-i386-64bit

 

Python deps:

       pip: 19.3.1

setuptools: 42.0.2

   sklearn: 0.21.3

     numpy: 1.17.4

     scipy: 1.3.2

    Cython: 0.29.15

    pandas: 1.0.1

 

 

 

 

 

EDITED by @ogrisel to strip the HTML of the reproducer and use markdown fomatting instead:

from sklearn import manifold
from sklearn.metrics.pairwise import euclidean_distances
import pandas as pd

data = [['A',1,5],['B',1,4],['C',1,1],['D',3,3]]
df = pd.DataFrame(data, columns = ['Name', 'X','Y'])

# Distance matrix
disparities = euclidean_distances(df[['X','Y']])

# Fit MDS on precomputed distances
mds = manifold.MDS(
    n_components=2,
    dissimilarity="precomputed",
    random_state=42,
    metric=True,
)

# Print Stress computed by scikit-learn
results = mds.fit(disparities)
coords = results.embedding_
stress = results.stress_
print ('returned stress=',stress)

# Calculate Stress by hand
dis = euclidean_distances(coords)
real_stress = ((dis.ravel() - disparities.ravel()) ** 2).sum() / 2
print('real stress =',real_stress)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0