8000 Revert "ENH Replacing np.where lookup by an inverted index (#9907)" · xhluca/scikit-learn@01dfe0f · GitHub
[go: up one dir, main page]

Skip to content

Commit 01dfe0f

Browse files
author
Xing
committed
Revert "ENH Replacing np.where lookup by an inverted index (scikit-learn#9907)"
This reverts commit ef71f53.
1 parent b8186a2 commit 01dfe0f

File tree

2 files changed

+7
-25
lines changed

2 files changed

+7
-25
lines changed

doc/whats_new/v0.21.rst

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -162,15 +162,6 @@ Support for Python 3.4 and below has been officially dropped.
162162
- |Fix| Fixed a bug in the 'saga' solver where the weights would not be
163163
correctly updated in some cases. :issue:`11646` by `Tom Dupre la Tour`_.
164164

165-
:mod:`sklearn.manifold`
166-
............................
167-
168-
- |Efficiency| Make :func:`tsne.trustworthiness` use an inverted index
169-
instead of an `np.where` lookup to find the rank of neighbors in the input
170-
space. This improves trustworthiness in particular when computed with
171-
lots of neighbors and/or small datasets.
172-
:issue:`9907` by :user:`William de Vazelhes <wdevazelhes>`.
173-
174165
Multiple modules
175166
................
176167

sklearn/manifold/t_sne.py

Lines changed: 7 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -454,27 +454,18 @@ def trustworthiness(X, X_embedded, n_neighbors=5,
454454
"parameter instead.", DeprecationWarning)
455455
metric = 'precomputed'
456456
dist_X = pairwise_distances(X, metric=metric)
457-
if metric == 'precomputed':
458-
dist_X = dist_X.copy()
459-
# we set the diagonal to np.inf to exclude the points themselves from
460-
# their own neighborhood
461-
np.fill_diagonal(dist_X, np.inf)
462457
ind_X = np.argsort(dist_X, axis=1)
463-
# `ind_X[i]` is the index of sorted distances between i and other samples
464458
ind_X_embedded = NearestNeighbors(n_neighbors).fit(X_embedded).kneighbors(
465459
return_distance=False)
466460

467-
# We build an inverted index of neighbors in the input space: For sample i,
468-
# we define `inverted_index[i]` as the inverted index of sorted distances:
469-
# inverted_index[i][ind_X[i]] = np.arange(1, n_sample + 1)
470461
n_samples = X.shape[0]
471-
inverted_index = np.zeros((n_samples, n_samples), dtype=int)
472-
ordered_indices = np.arange(n_samples + 1)
473-
inverted_index[ordered_indices[:-1, np.newaxis],
474-
ind_X] = ordered_indices[1:]
475-
ranks = inverted_index[ordered_indices[:-1, np.newaxis],
476-
ind_X_embedded] - n_neighbors
477-
t = np.sum(ranks[ranks > 0])
462+
t = 0.0
463+
ranks = np.zeros(n_neighbors)
464+
for i in range(n_samples):
465+
for j in range(n_neighbors):
466+
ranks[j] = np.where(ind_X[i] == ind_X_embedded[i, j])[0][0]
467+
ranks -= n_neighbors
468+
t += np.sum(ranks[ranks > 0])
478469
t = 1.0 - t * (2.0 / (n_samples * n_neighbors *
479470
(2.0 * n_samples - 3.0 * n_neighbors - 1.0)))
480471
return t

0 commit comments

Comments
 (0)
0