8000 [MRG] Adapts cdist jaccard to scipy 1.2.0 by thomasjpfan · Pull Request #12692 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] Adapts cdist jaccard to scipy 1.2.0 #12692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 30, 2018

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Fixes #12685

What does this implement/fix? Explain your changes.

Adjusts JaccardDistance.dist to match with scipy 1.2.0 version.

Any other comments?

The test check_pdist_bool had to be adjusted to be backwards compatible with older versions of scipy.

@amueller
Copy link
Member

that's a backwards-incompatible change, right? So now we're changing jaccard for everyone to give zero, not nan.

@thomasjpfan
Copy link
Member Author

Yes it is not backwards compatible, since scipy's change is also not backwards compatible. Following the discussion: scipy/scipy#7373, it seems the old version of JaccardDistance.dist was defined incorrectly, and 0/0 should return 0.

If we want to be strict about backwards compatibility, we can wait two versions to deprecate JaccardDistance.dist, then make change in 0.23.0.

@amueller
Copy link
Member

I'm fine with making the change, but we should note it prominently in the whatsnew.
I figure any different behavior will be even harder to figure out / debug, as we sometimes switch between scipy and distance trees by a heuristic.

@jnothman
Copy link
Member

Seeing as nearest neighbors wasn't going to work well with NaNs in any case, I'm fine with this.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending what's new entry.

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomasjpfan
I think it's always acceptable to keep consistent with scipy
I'm OK with 0.20.2

random sampling procedures.

- :mod:`sklearn.neighbors` when ``metric=='jaccard'`` (bug fix)
- :mod:`sklearn.metrics.pairwise` when ``metric=='jaccard'` and Scipy >= 1.2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not related to changes in scikit-learn so remove?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was updated to remove the second item regarding sklearn.metrics.pairwise.

@@ -788,6 +788,8 @@ cdef class JaccardDistance(DistanceMetric):
tf2 = x2[j] != 0
nnz += (tf1 or tf2)
n_eq += (tf1 and tf2)
if nnz == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some explanations and point to the scipy issue

@@ -101,6 +103,8 @@ def check_pdist(metric, kwargs, D_true):
def check_pdist_bool(metric, D_true):
dm = DistanceMetric.get_metric(metric)
D12 = dm.pairwise(X1_bool)
if metric == 'jaccard' and LooseVersion(scipy_version) < '1.2.0':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some explanations and point to the scipy issue

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @thomasjpfan

@qinhanmin2014 qinhanmin2014 merged commit 7f5aa85 into scikit-learn:master Nov 30, 2018
@jnothman jnothman added this to the 0.20.2 milestone Dec 3, 2018
amueller pushed a commit to amueller/scikit-learn that referenced this pull request Dec 14, 2018
amueller pushed a commit to amueller/scikit-learn that referenced this pull request Dec 17, 2018
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_pdist_bool_metrics[jaccard] fails on scipy-dev cron job
4 participants
0