8000 Clarification on Kruskal Stress as an Optimization Target in Metric and Non-metric MDS · Issue #30240 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Clarification on Kruskal Stress as an Optimization Target in Metric and Non-metric MDS #30240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dd1735 opened this issue Nov 8, 2024 · 7 comments
Labels
Documentation Needs Triage Issue requires triage

Comments

@dd1735
Copy link
dd1735 commented Nov 8, 2024

Describe the issue linked to the documentation

I am working on research involving the optimization targets used in metric and non-metric MDS, and I have some questions regarding how scikit-learn's implementation of MDS defines and calculates stress, particularly Kruskal Stress. While reviewing the official documentation, I noticed that specific formulas for stress calculations are not explicitly provided, and I would appreciate some clarification.

Non-metric MDS: My understanding is that non-metric MDS typically minimizes Kruskal Stress, defined as:

屏幕截图 2024-11-08 163808

in the reduced space. Could you confirm if scikit-learn's non-metric MDS implementation uses this definition, or if it employs an alternative method?

Metric MDS: Does metric MDS in scikit-learn also optimize for Kruskal Stress, or does it use a different stress formula? If a different approach is used, would it be possible to provide some insight or references on the stress function applied here?

Suggest a potential alternative/fix

Documentation Clarification: It would be incredibly helpful if the documentation could include specific details on the stress formulas used in both metric and non-metric MDS. This addition would help researchers and users better understand the theoretical underpinnings of the algorithm in scikit-learn.
Thank you very much for your guidance and clarification on these points. Your insights would be instrumental in my work with MDS.

@dd1735 dd1735 added Documentation Needs Triage Issue requires triage labels Nov 8, 2024
@adrinjalali
Copy link
Member

Seems this is very similar or duplicate of #15272, could you please add your thoughts there?

Also, note that #22330 tries to add to the existing module.

@adrinjalali adrinjalali closed this as not planned Won't fix, can't repro, duplicate, stale Nov 8, 2024
@dkobak
Copy link
Contributor
dkobak commented Nov 18, 2024

My understanding is that non-metric MDS typically minimizes Kruskal Stress

No -- the stress formula is optimized by metric MDS! Note that the square root and the normalization by the summed squared original distances do not change the loss minimum, so essentially the loss is simply squared error between high-dim and low-dim distances. That's minimized by metric MDS.

Non-metric MDS allows arbitrary monotonic transformation of low-dim distances, please see here: https://en.wikipedia.org/wiki/Multidimensional_scaling#Non-metric_multidimensional_scaling_(NMDS).

Note that I think non-metric MDS in sklearn is broken: #27028

@dd1735
Copy link
Author
dd1735 commented Nov 18, 2024

我的理解是,非公制 MDS 通常会最小化 Kruskal 应力

否 -- 应力公式由_公制 MDS_ 优化!请注意,平方根和由原始平方距离求和的归一化不会改变损失最小值,因此本质上损失只是高微微和低微微距离之间的平方误差。这通过度量 MDS 最小化。

非公制 MDS 允许低微距的任意单调变换,请参阅此处:https://en.wikipedia.org/wiki/Multidimensional_scaling#Non-metric_multidimensional_scaling_(NMDS)。

请注意,我认为 sklearn 中的非公制 MDS 已损坏:#27028

Thank you very much for your detailed explanation of metric and nonmetric MDS.

I have reviewed Kruskal's seminal papers from 1964: "Nonmetric multidimensional scaling: A numerical method" and "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis". These works emphasize minimizing stress as the core optimization criterion for nonmetric MDS(As shown in the figure below), and this has also been cited in the sklearn documentation.
屏幕截图 2024-11-18 193510

Your explanation mentioned that metric MDS minimizes stress. Should I understand this as Kruskal stress being originally proposed for nonmetric MDS but having a formula applicable to metric MDS as well?

Additionally, you mentioned that nonmetric MDS optimizes for rank preservation instead of directly minimizing Kruskal stress. My understanding was that Kruskal introduced the stress metric specifically as a key optimization target for nonmetric MDS. Furthermore, I noticed that the Wikipedia page you shared describes a stress formula for nonmetric MDS, replacing dij
with 𝑓(𝑑𝑖𝑗) in Kruskal's formula. Is this an alternative optimization target distinct from Kruskal stress, or simply a different representation?

I apologize for my limited understanding and truly value your clarification on this matter. Thank you for your time and expertise!

@dkobak
Copy link
Contributor
dkobak commented Nov 18, 2024

Wikipedia's formula is the same as the formulas in Kruskal 1964 papers, but the notation is different.

In Kruskal's papers, $d_{ij}$ are the embedding distances, and $\hat d_{ij}$ are some monotonic trasformation of the original distances which he calls $\delta_{ij}$. The optimization happens over embedding coordinates AND ALSO over all possible monotonic functions.

In Wikipedia, $\hat d_{ij}$ are the embedding distances, and $f(d_{ij})$ is the monotonic transformation of the original distances $d_{ij}$.

Either way, this is non-metric MDS.

If you leave out the monotonic transformation, you get "metric MDS". But I don't think this is explicitly called like that in Kruskal 1964 papers. In fact, I don't know who first coined the term "metric MDS".

@dd1735
Copy link
Author
dd1735 commented Nov 19, 2024

维基百科的公式与 Kruskal 1964 论文中的公式相同,但符号不同。

在 Kruskal 的论文中, d 我 j 是嵌入距离,而 d ^ 我 j 是原始距离的一些单调转换,他称之为 δ 我 j .优化发生在嵌入坐标上,也发生在所有可能的单调函数上。

在维基百科中, d ^ 我 j 是嵌入距离,而 f ( d 我 j ) 是原始距离的单调变换 d 我 j .

无论哪种方式,这都是非公制 MDS。

如果省略单调变换,则会得到“metric MDS”。但我不认为这在 Kruskal 1964 年的论文中是这样明确称呼的。事实上,我不知道是谁首先创造了“公制 MDS”这个词。

Thank you so much for your detailed explanation! !It has significantly clarified my understanding.

I have one further question: When we refer to "Kruskal Stress" today, does it implicitly assume the use of the monotonic transformations required for non-metric MDS? Or can the term "Kruskal Stress" now broadly refer to the squared error between the original high-dimensional distances and the low-dimensional embedding distances, regardless of whether the original distances undergo monotonic transformation?

I deeply appreciate your insights and apologize for my limited understanding. Your clarification is invaluable to me!

@dkobak
Copy link
Contributor
dkobak commented Nov 19, 2024

I am not sure. I would say that "stress" (without Kruskal's name attached to it) often refers to the loss without monotonic transformation, i.e. metric MDS loss. I guess if you say "Kruskal's stress", then maybe it assumes the monotonic transformation in there? In general, I find that terminology surrounding MDS can be very confusing/ambiguous, so I would recommend to be very precise and give a formula or explain in words what you mean exactly.

@dd1735
Copy link
Author
dd1735 commented Nov 20, 2024

I am not sure. I would say that "stress" (without Kruskal's name attached to it) often refers to the loss without monotonic transformation, i.e. metric MDS loss. I guess if you say "Kruskal's stress", then maybe it assumes the monotonic transformation in there? In general, I find that terminology surrounding MDS can be very confusing/ambiguous, so I would recommend to be very precise and give a formula or explain in words what you mean exactly.

Thank you for your clarification and the detailed explanation.
Actually,I am currently working on reproducing a paper where the authors mentioned using Kruskal's stress as the optimization criterion for MDS by R language. The formula they provided matches the one shared earlier in this issue.
image
However, they did not specify whether they used metric or non-metric MDS. Initially, I assumed Kruskal's stress was exclusively for non-metric MDS, but surprisingly, the metric MDS results from scikit-learn align better with the paper's outcomes.
This led me to ask this question to clarify which stress formula is used in scikit-learn's two MDS methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Needs Triage Issue requires triage
Projects
None yet
Development

No branches or pull requests

3 participants
0