8000
In multifold.MDS stress value doesn’t correspond to returned coordinates #16846
Labels
Description
In multifold.MDS stress value doesn’t correspond to returned coordinates.
How to reproduce:
Apply MDS to four points in 2d-plane:
X Y
1 5
1 4
1 1
3 3
If distances are Euclidean, then disparities matrix will be:
disparities =
[ 0. 1. 4. 2.82842712 ]
[ 1. 0. 3. 2.23606798 ]
[ 4. 3. 0. 2.82842712 ]
[ 2.82842712 2.23606798 2.82842712 0. ]
Perform multidimensional scaling:
mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=42, metric = True)
results = mds.fit(disparities)
coords = results.embedding_
cords =
[ -0.32503572 1.78399044 ]
[ 0.09843686 0.87890749 ]
[ 1.47842546 -1.76761757 ]
[ -1.2518266 -0.89528037 ]
stress = results.stress_
0.0045113518633979315
Now calculate stress by hand.
First calculate Euclidian distances which correspond coordinates returned:
dis = euclidean_distances(coords)
[ 0. 0.9992518 3.98326395 2.83503675 ]
[ 0.9992518 0. 2.98470492 2.22956362 ]
[ 3.98326395 2.98470492 0. 2.86622548 ]
[ 2.83503675 2.22956362 2.86622548 0. ]
And now we can calculate stress value:
stress =
((dis.ravel() - disparities.ravel()) ** 2).sum() / 2
0.0020293041020245147
So, the real stress calculated using returned coordinates doesn’t correspond to the stress value
results.stress_ = 0.0045113518633979315
After some debugging it is clear that MDS returns coordinates for the current iteration and stress for the previous iteration.
Here is the script to reproduce:
from sklearn import manifold
from sklearn.metrics.pairwise import euclidean_distances
import pandas as pd
data = [['A',1,5],['B',1,4],['C',1,1],['D',3,3]]
df = pd.DataFrame(data, columns = ['Name', 'X','Y'])
#Distance matrix
disparities = euclidean_distances(df[['X','Y']])
mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=42, metric = True)
results = mds.fit(disparities)
coords = results.embedding_
stress = results.stress_
print ('returned stress=',stress)
# Calculate Stress by hand
dis = euclidean_distances(coords)
real_stress = ((dis.ravel() - disparities.ravel()) ** 2).sum() / 2
print('real stress =',real_stress)
Versions:
System:
python: 3.7.6 (default, Dec 30 2019, 19:38:36) [Clang 10.0.0 (clang-1000.11.45.5)]
executable: /usr/local/opt/python/bin/python3.7
machine: Darwin-17.7.0-x86_64-i386-64bit
Python deps:
pip: 19.3.1
setuptools: 42.0.2
sklearn: 0.21.3
numpy: 1.17.4
scipy: 1.3.2
Cython: 0.29.15
pandas: 1.0.1
EDITED by @ogrisel to strip the HTML of the reproducer and use markdown fomatting instead:
The text was updated successfully, but these errors were encountered: