8000 FEA Implement classical MDS by dkobak · Pull Request #31322 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

FEA Implement classical MDS #31322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/api_reference.py
Original file line number Diff line number Diff line change
Expand Up @@ -691,6 +691,7 @@ def _get_submodule(module_name, submodule_name):
{
"title": None,
"autosummary": [
"ClassicalMDS",
"Isomap",
"LocallyLinearEmbedding",
"MDS",
Expand Down
9 changes: 9 additions & 0 deletions doc/modules/manifold.rst
Original file line number Diff line number Diff line change
Expand Up @@ -489,6 +489,15 @@ coordinates :math:`Z` of the embedded points.
:align: center
:scale: 60

Apart from that, there is a version called *classical MDS*, also known as
*principal coordinates analysis (PCoA)* or *Torgerson's scaling*, and implemented
in the separate :class:`ClassicalMDS` class. Classical MDS replaces the stress
loss function with a different loss function called *strain*, which allows
exact solution in terms of eigendecomposition of the double-centered dissimilarity
matrix. If the dissimilarity matrix consists of the pairwise Euclidean distances
between some vectors, then classical MDS is equivalent to PCA applied to this
set of vectors.

.. rubric:: References

* `"More on Multidimensional Scaling and Unfolding in R: smacof Version 2"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- :class:`manifold.ClassicalMDS` was implemented to perform classical MDS
(eigendecomposition of the double-centered distance matrix).
By :user:`Dmitry Kobak <dkobak>` and :user:`Meekail Zain <Micky774>`
32 changes: 30 additions & 2 deletions examples/manifold/plot_compare_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,9 +170,37 @@ def add_2d_scatter(ax, points, points_color, title=None):
random_state=0,
normalized_stress=False,
)
S_scaling = md_scaling.fit_transform(S_points)
S_scaling_metric = md_scaling.fit_transform(S_points)

plot_2d(S_scaling, S_color, "Multidimensional scaling")
md_scaling_nonmetric = manifold.MDS(
n_components=n_components,
max_iter=50,
n_init=1,
random_state=0,
normalized_stress=False,
metric=False,
)
S_scaling_nonmetric = md_scaling_nonmetric.fit_transform(S_points)

md_scaling_classical = manifold.ClassicalMDS(n_components=n_components)
S_scaling_classical = md_scaling_classical.fit_transform(S_points)

# %%
fig, axs = plt.subplots(
nrows=1, ncols=3, figsize=(7, 3.5), facecolor="white", constrained_layout=True
)
fig.suptitle("Multidimensional scaling", size=16)

mds_methods = [
("Metric MDS", S_scaling_metric),
("Non-metric MDS", S_scaling_nonmetric),
("Classical MDS", S_scaling_classical),
]
for ax, method in zip(axs.flat, mds_methods):
name, points = method
add_2d_scatter(ax, points, S_color, name)

plt.show()

# %%
# Spectral embedding for non-linear dimensionality reduction
Expand Down
5 changes: 5 additions & 0 deletions examples/manifold/plot_lle_digits.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ def plot_embedding(X, title):
from sklearn.manifold import (
MDS,
TSNE,
ClassicalMDS,
Isomap,
LocallyLinearEmbedding,
SpectralEmbedding,
Expand Down Expand Up @@ -131,6 +132,10 @@ def plot_embedding(X, title):
n_neighbors=n_neighbors, n_components=2, method="ltsa"
),
"MDS embedding": MDS(n_components=2, n_init=1, max_iter=120, eps=1e-6),
"Non-metric MDS embedding": MDS(
n_components=2, n_init=1, max_iter=120, eps=1e-6, metric=False
),
"Classical MDS embedding": ClassicalMDS(n_components=2),
"Random Trees embedding": make_pipeline(
RandomTreesEmbedding(n_estimators=200, max_depth=5, random_state=0),
TruncatedSVD(n_components=2),
Expand Down
46 changes: 36 additions & 10 deletions examples/manifold/plot_manifold_sphere.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
'spread it open' whilst projecting it onto two dimensions.

For a similar example, where the methods are applied to the
S-curve dataset, see :ref:`sphx_glr_auto_examples_manifold_plot_compare_methods.py`
S-curve dataset, see :ref:`sphx_glr_auto_examples_manifold_plot_compare_methods.py`.

Note that the purpose of the :ref:`MDS <multidimensional_scaling>` is
to find a low-dimensional representation of the data (here 2D) in
Expand All @@ -21,7 +21,7 @@
it does not seeks an isotropic representation of the data in
the low-dimensional space. Here the manifold problem matches fairly
that of representing a flat map of the Earth, as with
`map projection <https://en.wikipedia.org/wiki/Map_projection>`_
`map projection <https://en.wikipedia.org/wiki/Map_projection>`.

"""

Expand Down Expand Up @@ -59,12 +59,12 @@
)

# Plot our dataset.
fig = plt.figure(figsize=(15, 8))
fig = plt.figure(figsize=(15, 12))
plt.suptitle(
"Manifold Learning with %i points, %i neighbors" % (1000, n_neighbors), fontsize=14
)

ax = fig.add_subplot(251, projection="3d")
ax = fig.add_subplot(351, projection="3d")
ax.scatter(x, y, z, c=p[indices], cmap=plt.cm.rainbow)
ax.view_init(40, -10)

Expand All @@ -86,7 +86,7 @@
t1 = time()
print("%s: %.2g sec" % (methods[i], t1 - t0))

ax = fig.add_subplot(252 + i)
ax = fig.add_subplot(352 + i)
plt.scatter(trans_data[0], trans_data[1], c=colors, cmap=plt.cm.rainbow)
plt.title("%s (%.2g sec)" % (labels[i], t1 - t0))
ax.xaxis.set_major_formatter(NullFormatter())
Expand All @@ -103,7 +103,7 @@
t1 = time()
print("%s: %.2g sec" % ("ISO", t1 - t0))

ax = fig.add_subplot(257)
ax = fig.add_subplot(357)
plt.scatter(trans_data[0], trans_data[1], c=colors, cmap=plt.cm.rainbow)
plt.title("%s (%.2g sec)" % ("Isomap", t1 - t0))
ax.xaxis.set_major_formatter(NullFormatter())
Expand All @@ -112,18 +112,44 @@

# Perform Multi-dimensional scaling.
t0 = time()
mds = manifold.MDS(2, max_iter=100, n_init=1, random_state=42)
mds = manifold.MDS(2, n_init=1, random_state=42)
trans_data = mds.fit_transform(sphere_data).T
t1 = time()
print("MDS: %.2g sec" % (t1 - t0))

ax = fig.add_subplot(258)
ax = fig.add_subplot(358)
plt.scatter(trans_data[0], trans_data[1], c=colors, cmap=plt.cm.rainbow)
plt.title("MDS (%.2g sec)" % (t1 - t0))
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
plt.axis("tight")

t0 = time()
mds = manifold.MDS(2, n_init=1, random_state=42, metric=False)
trans_data = mds.fit_transform(sphere_data).T
t1 = time()
print("Non-metric MDS: %.2g sec" % (t1 - t0))

ax = fig.add_subplot(359)
plt.scatter(trans_data[0], trans_data[1], c=colors, cmap=plt.cm.rainbow)
plt.title("Non-metric MDS (%.2g sec)" % (t1 - t0))
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
plt.axis("tight")

t0 = time()
mds = manifold.ClassicalMDS(2)
trans_data = mds.fit_transform(sphere_data).T
t1 = time()
print("Classical MDS: %.2g sec" % (t1 - t0))

ax = fig.add_subplot(3, 5, 10)
plt.scatter(trans_data[0], trans_data[1], c=colors, cmap=plt.cm.rainbow)
plt.title("Classical MDS (%.2g sec)" % (t1 - t0))
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
plt.axis("tight")

# Perform Spectral Embedding.
t0 = time()
se = manifold.SpectralEmbedding(
Expand All @@ -133,7 +159,7 @@
t1 = time()
print("Spectral Embedding: %.2g sec" % (t1 - t0))

ax = fig.add_subplot(259)
ax = fig.add_subplot(3, 5, 12)
plt.scatter(trans_data[0], trans_data[1], c=colors, cmap=plt.cm.rainbow)
plt.title("Spectral Embedding (%.2g sec)" % (t1 - t0))
ax.xaxis.set_major_formatter(NullFormatter())
Expand All @@ -147,7 +173,7 @@
t1 = time()
print("t-SNE: %.2g sec" % (t1 - t0))

ax = fig.add_subplot(2, 5, 10)
ax = fig.add_subplot(3, 5, 13)
plt.scatter(trans_data[0], trans_data[1], c=colors, cmap=plt.cm.rainbow)
plt.title("t-SNE (%.2g sec)" % (t1 - t0))
ax.xaxis.set_major_formatter(NullFormatter())
Expand Down
23 changes: 18 additions & 5 deletions examples/manifold/plot_mds.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
distances += noise

# %%
# Here we compute metric and non-metric MDS of the noisy distance matrix.
# Here we compute metric, non-metric, and classical MDS of the noisy distance matrix.

mds = manifold.MDS(
n_components=2,
Expand All @@ -74,17 +74,23 @@
)
X_nmds = nmds.fit_transform(distances)

cmds = manifold.ClassicalMDS(
n_components=2,
metric="precomputed",
)
X_cmds = cmds.fit_transform(distances)

# %%
# Rescaling the non-metric MDS solution to match the spread of the original data.

X_nmds *= np.sqrt((X_true**2).sum()) / np.sqrt((X_nmds**2).sum())

# %%
# To make the visual comparisons easier, we rotate the original data and both MDS
# To make the visual comparisons easier, we rotate the original data and all MDS
# solutions to their PCA axes. And flip horizontal and vertical MDS axes, if needed,
# to match the original data orientation.

# Rotate the data
# Rotate the data (CMDS does not need to be rotated, it is inherently PCA-aligned)
pca = PCA(n_components=2)
X_true = pca.fit_transform(X_true)
X_mds = pca.fit_transform(X_mds)
Expand All @@ -96,17 +102,24 @@
X_mds[:, i] *= -1
if np.corrcoef(X_nmds[:, i], X_true[:, i])[0, 1] < 0:
X_nmds[:, i] *= -1
if np.corrcoef(X_cmds[:, i], X_true[:, i])[0, 1] < 0:
X_cmds[:, i] *= -1

# %%
# Finally, we plot the original data and both MDS reconstructions.
# Finally, we plot the original data and all MDS reconstructions.

fig = plt.figure(1)
ax = plt.axes([0.0, 0.0, 1.0, 1.0])

s = 100
plt.scatter(X_true[:, 0], X_true[:, 1], color="navy", s=s, lw=0, label="True Position")
plt.scatter(X_mds[:, 0], X_mds[:, 1], color="turquoise", s=s, lw=0, label="MDS")
plt.scatter(X_nmds[:, 0], X_nmds[:, 1], color="darkorange", s=s, lw=0, label="NMDS")
plt.scatter(
X_nmds[:, 0], X_nmds[:, 1], color="darkorange", s=s, lw=0, label="Non-metric MDS"
)
plt.scatter(
X_cmds[:, 0], X_cmds[:, 1], color="lightcoral", s=s, lw=0, label="Classical MDS"
)
plt.legend(scatterpoints=1, loc="best", shadow=False)

# Plot the edges
Expand Down
2 changes: 2 additions & 0 deletions sklearn/manifold/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

from ._classical_mds import ClassicalMDS
from ._isomap import Isomap
from ._locally_linear import LocallyLinearEmbedding, locally_linear_embedding
from ._mds import MDS, smacof
Expand All @@ -12,6 +13,7 @@
__all__ = [
"MDS",
"TSNE",
"ClassicalMDS",
"Isomap",
"LocallyLinearEmbedding",
"SpectralEmbedding",
Expand Down
Loading
0