10000 ENH Allow 0 < p < 1 for Minkowski distance for `algorithm="brute"` in `NeighborsBase` by RudreshVeerkhare · Pull Request #24750 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH Allow 0 < p < 1 for Minkowski distance for algorithm="brute" in NeighborsBase #24750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

RudreshVeerkhare
Copy link
Contributor
@RudreshVeerkhare RudreshVeerkhare commented Oct 24, 2022

Reference Issues/PRs

Fixes #22811

What does this implement/fix? Explain your changes.

  • Modified _parameter_constraints for NeighborsBase to allow p < 1 for Minkowski calculation
  • Added a flow to conditionally allow p < 1 for bruteforce algorithm and raise warning. Also to raise an exception for kd_tree and ball_tree
  • Modified logic for algorithm="auto" to assign _fit_method="brute if 0 < p < 1 for Minkowski metic.
  • Added Tests to validate all the newly added logic.

Any other comments?

If anything extra needed to be done please let me know.

@RudreshVeerkhare
Copy link
Contributor Author

@jjerphan I've added the basic functionality, let me know what else changes do I need to do...

@jjerphan jjerphan changed the title [WIP] NeighborsBase: allow p value for Minkowski calculation to be less than 1.0 (and greater than 0) for bruteforce algorithm ENH Allow 0 < p < 1 for Minkowski distance for algo="brute" in NeighborsBase Oct 25, 2022
@jjerphan jjerphan marked this pull request as draft October 25, 2022 07:41
Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening this PR, @RudreshVeerkhare.

You are on a good start!

I adapted the title to match the convention used within scikit-learn and turned your PR as a Draft (now preferred over prefixing PRs with [WIP]).

Here are some guidelines about what is to be tested for this change.

@jjerphan jjerphan added the Validation related to input validation label Oct 25, 2022
Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding tests, @RudreshVeerkhare.

Here are some more comments for precision. When all are addressed and a whats_new entry is added for 1.2 changelog, I think this PR will be mergeable.

@RudreshVeerkhare
Copy link
Contributor Author

@jjerphan I've incorporated your suggested changes and also added the changelog to v1.2, let me know your suggestions on this.

@jjerphan jjerphan marked this pull request as ready for review October 26, 2022 12:14
Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few last remarks before this LGTM.

@jjerphan jjerphan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Oct 26, 2022
@RudreshVeerkhare
Copy link
Contributor Author

@jjerphan modified the files as per latest suggestions...

@jjerphan
Copy link
Member

Thank you.

We need another maintainer's approval before merging it — I labeled the PR accordingly.

y = np.ones(10)

model = Estimator(p=0)
with pytest.raises(ValueError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should match the error message using the match keyword in pytest.raises.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre error message for Exception raised by _validate_params is dynamic, so I'm not sure how to handle it🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if this is raised because of _valid_params because p=0 is not in the range, you don't need to test it because it will be tested by the common test for parameter validation. So you can remove this case from your test.

Comment on lines 1542 to 1558
model = Estimator(algorithm="kd_tree", p=0.1)
msg = (
'algo="kd_tree" does not support 0 < p < 1 for '
"the Minkowski metric. To resolve this problem either "
'set p >= 1 or algo="brute".'
)
with pytest.raises(ValueError, match=msg):
model.fit(X, y)

model = Estimator(algorithm="ball_tree", p=0.1)
msg = (
'algo="ball_tree" does not support 0 < p < 1 for '
"the Minkowski metric. To resolve this problem either "
'set p >= 1 or algo="brute".'
)
with pytest.raises(ValueError, match=msg):
model.fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should make a separate test. It will allow to parameterized it and to have less redundant code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre can you elaborate more on this?🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that:

@pytest.mark.parametrize(
    "Estimator",
    [
        neighbors.KNeighborsClassifier,
        neighbors.RadiusNeighborsClassifier,
        neighbors.KNeighborsRegressor,
        neighbors.RadiusNeighborsRegressor,
    ],
)
@pytest.mark.parametrize("algorithm", ["kd_tree", "ball_tree"])
def test_neighbors_minkowski_algo_error(Estimator, algorithm):
    """Check that we raise a proper error if `algorithm!='brute'` and `p<1`."""
    X = rng.random_sample((10, 2))
    y = np.ones(10)

    model = Estimator(algorithm=algorithm, p=0.1)
    msg = (
        f'algorithm="{algorithm}" does not support 0 < p < 1 for '
        "the Minkowski metric. To resolve this problem either "
        'set p >= 1 or algorithm="brute".'
    )
    with pytest.raises(ValueError, match=msg):
        model.fit(X, y)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the idea is to do something similar for code that is repeated in the test.

Comment on lines 1524 to 1540
model = Estimator(p=0.1, algorithm="auto")
msg = (
"Mind that for 0 < p < 1, Minkowski metrics are not distance metric. Continuing"
" the execution with `algo='brute'`."
)
with pytest.warns(UserWarning, match=msg):
model.fit(X, y)

assert model._fit_method == "brute"

model = Estimator(p=0.1, algorithm="brute")
msg = (
"Mind that for 0 < p < 1, Minkowski metrics are not distance metric. Continuing"
" the execution with `algo='brute'`."
)
with pytest.warns(UserWarning, match=msg):
model.fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark regarding the parametrization

RudreshVeerkhare and others added 4 commits November 3, 2022 23:31
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre glemaitre self-assigned this Nov 3, 2022
Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of more suggestions.

Comment on lines 1542 to 1558
model = Estimator(algorithm="kd_tree", p=0.1)
msg = (
'algo="kd_tree" does not support 0 < p < 1 for '
"the Minkowski metric. To resolve this problem either "
'set p >= 1 or algo="brute".'
)
with pytest.raises(ValueError, match=msg):
model.fit(X, y)

model = Estimator(algorithm="ball_tree", p=0.1)
msg = (
'algo="ball_tree" does not support 0 < p < 1 for '
"the Minkowski metric. To resolve this problem either "
'set p >= 1 or algo="brute".'
)
with pytest.raises(ValueError, match=msg):
model.fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that:

@pytest.mark.parametrize(
    "Estimator",
    [
        neighbors.KNeighborsClassifier,
        neighbors.RadiusNeighborsClassifier,
        neighbors.KNeighborsRegressor,
        neighbors.RadiusNeighborsRegressor,
    ],
)
@pytest.mark.parametrize("algorithm", ["kd_tree", "ball_tree"])
def test_neighbors_minkowski_algo_error(Estimator, algorithm):
    """Check that we raise a proper error if `algorithm!='brute'` and `p<1`."""
    X = rng.random_sample((10, 2))
    y = np.ones(10)

    model = Estimator(algorithm=algorithm, p=0.1)
    msg = (
        f'algorithm="{algorithm}" does not support 0 < p < 1 for '
        "the Minkowski metric. To resolve this problem either "
        'set p >= 1 or algorithm="brute".'
    )
    with pytest.raises(ValueError, match=msg):
        model.fit(X, y)

y = np.ones(10)

model = Estimator(p=0)
with pytest.raises(ValueError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if this is raised because of _valid_params because p=0 is not in the range, you don't need to test it because it will be tested by the common test for parameter validation. So you can remove this case from your test.

@RudreshVeerkhare
Copy link
Contributor Author

@glemaitre I've incorporated your review 😄, let me know if anything else is needed...

@RudreshVeerkhare RudreshVeerkhare changed the title ENH Allow 0 < p < 1 for Minkowski distance for algo="brute" in NeighborsBase ENH Allow 0 < p < 1 for Minkowski distance for algorithm="brute" in NeighborsBase Nov 7, 2022
@glemaitre
Copy link
Member

I just fixed two small details in the docstring. Otherwise LGTM.
I will be merging once the CIs turn green.

@glemaitre glemaitre merged commit 13b5b61 into scikit-learn:main Nov 10, 2022
@glemaitre
Copy link
Member

Thanks @RudreshVeerkhare Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:neighbors Validation related to input validation Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KNeighborsClassifier: allow p value for Minkowski calculation to be less than 1.0 (and greater than 0)
3 participants
0