8000 [MRG] set blockwise diagonals to zero for euclidean distance by amueller · Pull Request #12612 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] set blockwise diagonals to zero for euclidean distance #12612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 20, 2018

Conversation

amueller
Copy link
Member
@amueller amueller commented Nov 18, 2018

A shot at the discussion in #12574.

Not sure this is the right solution also the indexing looks weird to me.

@@ -473,7 +473,7 @@ def check_pairwise_distances_chunked(X, Y, working_memory, metric='euclidean'):
metric=metric)
assert isinstance(gen, GeneratorType)
blockwise_distances = list(gen)
Y = np.array(X if Y is None else Y)
Y = X if Y is None else Y
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is necessary otherwise the zeroing logic in pairwise_distances doesn't kick in.

# zeroing diagonal, taking care of aliases of "euclidean",
# i.e. "l2"
D_chunk[np.arange(D_chunk.shape[0]),
np.arange(D_chunk.shape[1])[sl]] = 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surely there's a better way to force inner indexing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we do D_chunk.flat[offset::len(X)+1] = 0?

@amueller amueller added this to the 0.20.1 milestone Nov 18, 2018
# zeroing diagonal, taking care of aliases of "euclidean",
# i.e. "l2"
D_chunk[np.arange(D_chunk.shape[0]),
np.arange(D_chunk.shape[1])[sl]] = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we do D_chunk.flat[offset::len(X)+1] = 0?

@@ -1271,6 +1271,13 @@ def pairwise_distances_chunked(X, Y=None, reduce_func=None,
X_chunk = X[sl]
D_chunk = pairwise_distances(X_chunk, Y, metric=metric,
n_jobs=n_jobs, **kwds)
if ((X is Y or Y is None)
and PAIRWISE_DISTANCE_FUNCTIONS.get(metric, None)
is euclidean_distances):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be introducing this for all metrics, perhaps with a FutureWarning where we detect the diagonal is not close to 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in both chunked and non-chunked? How do we know the callable we get passed is actually a metric? Should we try and detect and warn if it's not? Maybe the deprecation will be enough of a warning.

@amueller
Copy link
Member Author

Do you think this should go into 0.20.1?

@jnothman
Copy link
Member
jnothman commented Nov 19, 2018 via email

@amueller
Copy link
Member Author

Agree on all accounts. Need to fix the indexing and we need a second review ;)

@amueller amueller changed the title trying to set blockwise distance diagonals to zero [MRG] set blockwise diagonals to zero for euclidean distance Nov 20, 2018
Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's new entry for the regression?

chunks = list(pairwise_distances_chunked(X, working_memory=1,
metric=metric))
assert len(chunks) > 1
assert_array_almost_equal(np.diag(np.vstack(chunks)), 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is almost equal good enough? Set a very small tolerance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering. this has "decimal=6" by default. What is small?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's small enough :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decimal=10 is fine.

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, merging.

@ogrisel ogrisel merged commit 0b8650a into scikit-learn:master Nov 20, 2018
@amueller
Copy link
Member Author

Thanks!

@ogrisel
Copy link
Member
ogrisel commented Nov 20, 2018

Oops, I was a bit too quick, I thought @jnothman had given his +1 but it's not the case. All comments seemed addressed though.

@amueller
Copy link
Member Author

whoops, you're right :-/
Also, could you reproduce #10561? Do we know if this fixes it?

@ogrisel
Copy link
Member
ogrisel commented Nov 20, 2018

Also, could you reproduce #10561?

No.

Do we know if this fixes it?

It has to :) If not we need a more detailed feedback from someone with a machine that can reproduce the error.

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Nov 20, 2018
Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub won't let me approve posthumously.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0