@@ -418,67 +418,65 @@ Multi-dimensional Scaling (MDS)
418
418
representation of the data in which the distances respect well the
419
419
distances in the original high-dimensional space.
420
420
421
- In general, :class: `MDS ` is a technique used for analyzing similarity or
422
- dissimilarity data. It attempts to model similarity or dissimilarity data as
423
- distances in a geometric space. The data can be ratings of similarity between
421
+ In general, :class: `MDS ` is a technique used for analyzing
422
+ dissimilarity data. It attempts to model dissimilarities as
423
+ distances in a Euclidean space. The data can be ratings of dissimilarity between
424
424
objects, interaction frequencies of molecules, or trade indices between
425
425
countries.
426
426
427
427
There exist two types of MDS algorithm: metric and non-metric. In
428
- scikit-learn, the class :class: `MDS ` implements both. In Metric MDS, the input
429
- similarity matrix arises from a metric (and thus respects the triangular
430
- inequality), the distances between output two points are then set to be as
431
- close as possible to the similarity or dissimilarity data. In the non-metric
432
- version, the algorithms will try to preserve the order of the distances, and
428
+ scikit-learn, the class :class: `MDS ` implements both. In metric MDS,
429
+ the distances in the embedding space are set as
430
+ close as possible to the dissimilarity data. In the non-metric
431
+ version, the algorithm will try to preserve the order of the distances, and
433
432
hence seek for a monotonic relationship between the distances in the embedded
434
- space and the similarities/ dissimilarities.
433
+ space and the input dissimilarities.
435
434
436
435
.. figure :: ../auto_examples/manifold/images/sphx_glr_plot_lle_digits_010.png
437
436
:target: ../auto_examples/manifold/plot_lle_digits.html
438
437
:align: center
439
438
:scale: 50
440
439
441
440
442
- Let :math: `S` be the similarity matrix, and :math: `X` the coordinates of the
443
- :math: `n` input points. Disparities :math: `\hat {d}_{ij}` are transformation of
444
- the similarities chosen in some optimal ways. The objective, called the
445
- stress, is then defined by :math: `\sum _{i < j} d_{ij}(X) - \hat {d}_{ij}(X)`
441
+ Let :math: `\delta _{ij}` be the dissimilarity matrix between the
442
+ :math: `n` input points (possibly arising as some pairwise distances
443
+ :math: `d_{ij}(X)` between the coordinates :math: `X` of the input points).
444
+ Disparities :math: `\hat {d}_{ij} = f(\delta _{ij})` are some transformation of
445
+ the dissimilarities. The MDS objective, called the raw stress, is then
446
+ defined by :math: `\sum _{i < j} (\hat {d}_{ij} - d_{ij}(Z))^2 `,
447
+ where :math: `d_{ij}(Z)` are the pairwise distances between the
448
+ coordinates :math: `Z` of the embedded points.
446
449
447
450
448
451
.. dropdown :: Metric MDS
449
452
450
- The simplest metric :class: `MDS ` model, called *absolute MDS *, disparities are defined by
451
- :math: `\hat {d}_{ij} = S_{ij}`. With absolute MDS, the value :math: `S_{ij}`
452
- should then correspond exactly to the distance between point :math: `i` and
453
- :math: `j` in the embedding point.
454
-
455
- Most commonly, disparities are set to :math: `\hat {d}_{ij} = b S_{ij}`.
453
+ In the metric :class: `MDS ` model (sometimes also called *absolute MDS *),
454
+ disparities are simply equal to the input dissimilarities
455
+ :math: `\hat {d}_{ij} = \delta _{ij}`.
456
456
457
457
.. dropdown :: Nonmetric MDS
458
458
459
459
Non metric :class: `MDS ` focuses on the ordination of the data. If
460
- :math: `S_{ij} > S_{jk}`, then the embedding should enforce :math: `d_{ij} <
461
- d_{jk}`. For this reason, we discuss it in terms of dissimilarities
462
- (:math: `\delta _{ij}`) instead of similarities (:math: `S_{ij}`). Note that
463
- dissimilarities can easily be obtained from similarities through a simple
464
- transform, e.g. :math: `\delta _{ij}=c_1 -c_2 S_{ij}` for some real constants
465
- :math: `c_1 , c_2 `. A simple algorithm to enforce proper ordination is to use a
466
- monotonic regression of :math: `d_{ij}` on :math: `\delta _{ij}`, yielding
467
- disparities :math: `\hat {d}_{ij}` in the same order as :math: `\delta _{ij}`.
468
-
469
- A trivial solution to this problem is to set all the points on the origin. In
470
- order to avoid that, the disparities :math: `\hat {d}_{ij}` are normalized. Note
471
- that since we only care about relative ordering, our objective should be
460
+ :math: `\delta _{ij} > \delta _{kl}`, then the embedding
461
+ seeks to enforce :math: `d_{ij}(Z) > d_{kl}(Z)`. A simple algorithm
462
+ to enforce proper ordination is to use an
463
+ isotonic regression of :math: `d_{ij}(Z)` on :math: `\delta _{ij}`, yielding
464
+ disparities :math: `\hat {d}_{ij}` that are a monotonic transformation
465
+ of dissimilarities :math: `\delta _{ij}` and hence having the same ordering.
466
+ This is done repeatedly after every step of the optimization algorithm.
467
+ In order to avoid the trivial solution where all embedding points are
468
+ overlapping, the disparities :math: `\hat {d}_{ij}` are normalized.
469
+
470
+ Note that since we only care about relative ordering, our objective should be
472
471
invariant to simple translation and scaling, however the stress used in metric
473
- MDS is sensitive to scaling. To address this, non-metric MDS may use a
474
- normalized stress, known as Stress-1 defined as
472
+ MDS is sensitive to scaling. To address this, non-metric MDS returns
473
+ normalized stress, also known as Stress-1, defined as
475
474
476
475
.. math ::
477
- \sqrt {\frac {\sum _{i < j} (d_{ij} - \hat {d}_{ij})^2 }{\sum _{i < j} d_{ij}^2 }}.
476
+ \sqrt {\frac {\sum _{i < j} (\hat {d}_{ij} - d_{ij}(Z))^2 }{\sum _{i < j}
477
+ d_{ij}(Z)^2 }}.
478
478
479
- The use of normalized Stress-1 can be enabled by setting `normalized_stress=True `,
480
- however it is only compatible with the non-metric MDS problem and will be ignored
481
- in the metric case.
479
+ Normalized Stress-1 is returned if `normalized_stress=True `.
482
480
483
481
.. figure :: ../auto_examples/manifold/images/sphx_glr_plot_mds_001.png
484
482
:target: ../auto_examples/manifold/plot_mds.html
@@ -487,6 +485,10 @@ stress, is then defined by :math:`\sum_{
F438
i < j} d_{ij}(X) - \hat{d}_{ij}(X)`
487
485
488
486
.. rubric :: References
489
487
488
+ * `"More on Multidimensional Scaling and Unfolding in R: smacof Version 2"
489
+ <https://www.jstatsoft.org/article/view/v102i10> `_
490
+ Mair P, Groenen P., de Leeuw J. Journal of Statistical Software (2022)
491
+
490
492
* `"Modern Multidimensional Scaling - Theory and Applications"
491
493
<https://www.springer.com/fr/book/9780387251509> `_
492
494
Borg, I.; Groenen P. Springer Series in Statistics (1997)
0 commit comments