-
-
Notifications
You must be signed in to change notification settings - Fork 26.8k
Description
Description
Â
In multifold.MDS stress value doesn’t correspond to returned coordinates.
Â
How to reproduce:
Â
Apply MDS to four points in 2d-plane:
Â
X Â Â Y
1 Â Â 5
1 Â Â 4
1Â Â 1
3Â Â 3
Â
If distances are Euclidean, then disparities matrix will be:
Â
disparities =
 [ 0.           1.           4.                  2.82842712 ]
 [ 1.           0.           3.                  2.23606798 ]
 [ 4.           3.           0.                  2.82842712 ]
 [ 2.82842712   2.23606798   2.82842712           0.         ]
Â
Perform multidimensional scaling:
Â
mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=42, metric = True)
results = mds.fit(disparities)
coords = results.embedding_
Â
cords =
 [ -0.32503572   1.78399044 ]
 [ 0.09843686   0.87890749 ]
 [ 1.47842546   -1.76761757 ]
 [ -1.2518266   -0.89528037 ]
Â
stress = results.stress_
0.0045113518633979315
Â
Now calculate stress by hand.
Â
First calculate Euclidian distances which correspond coordinates returned:
Â
dis = euclidean_distances(coords)
 [ 0.          0.9992518    3.98326395   2.83503675 ]
 [ 0.9992518    0.           2.98470492   2.22956362 ]
 [ 3.98326395  2.98470492   0.           2.86622548 ]
 [ 2.83503675  2.22956362   2.86622548   0.         ]
Â
And now we can calculate stress value:
Â
stress =
((dis.ravel() - disparities.ravel()) ** 2).sum() / 2
0.0020293041020245147
Â
So, the real stress calculated using returned coordinates doesn’t correspond to the stress value
results.stress_ = 0.0045113518633979315
Â
After some debugging it is clear that MDS returns coordinates for the current iteration and stress for the previous iteration.
Â
Â
Here is the script to reproduce:
Â
from sklearn import manifold
from sklearn.metrics.pairwise import euclidean_distances
import pandas as pd
Â
data = [['A',1,5],['B',1,4],['C',1,1],['D',3,3]]
df = pd.DataFrame(data, columns = ['Name', 'X','Y'])
Â
#Distance matrix
disparities = euclidean_distances(df[['X','Y']])
mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=42, metric = True)
results = mds.fit(disparities)
coords = results.embedding_
stress = results.stress_
print ('returned stress=',stress)
Â
# Calculate Stress by hand
dis = euclidean_distances(coords)
real_stress = ((dis.ravel() - disparities.ravel()) ** 2).sum() / 2
print('real stress =',real_stress)
Â
Â
Â
Â
Versions:
System:
   python: 3.7.6 (default, Dec 30 2019, 19:38:36) [Clang 10.0.0 (clang-1000.11.45.5)]
executable: /usr/local/opt/python/bin/python3.7
  machine: Darwin-17.7.0-x86_64-i386-64bit
Â
Python deps:
      pip: 19.3.1
setuptools: 42.0.2
  sklearn: 0.21.3
    numpy: 1.17.4
    scipy: 1.3.2
   Cython: 0.29.15
   pandas: 1.0.1
Â
Â
Â
Â
Â
EDITED by @ogrisel to strip the HTML of the reproducer and use markdown fomatting instead:
from sklearn import manifold
from sklearn.metrics.pairwise import euclidean_distances
import pandas as pd
data = [['A',1,5],['B',1,4],['C',1,1],['D',3,3]]
df = pd.DataFrame(data, columns = ['Name', 'X','Y'])
# Distance matrix
disparities = euclidean_distances(df[['X','Y']])
# Fit MDS on precomputed distances
mds = manifold.MDS(
n_components=2,
dissimilarity="precomputed",
random_state=42,
metric=True,
)
# Print Stress computed by scikit-learn
results = mds.fit(disparities)
coords = results.embedding_
stress = results.stress_
print ('returned stress=',stress)
# Calculate Stress by hand
dis = euclidean_distances(coords)
real_stress = ((dis.ravel() - disparities.ravel()) ** 2).sum() / 2
print('real stress =',real_stress)