-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
KNeighborsClassifier: allow p value for Minkowski calculation to be less than 1.0 (and greater than 0) #22811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The issue is that minkowski with p< 1 is not a metric (no triangular inequality). Then the question is do we want to support similarity measures that are not metrics ? |
Yes, I agree it's not a metric. My advisor who spent his entire life studying distance (Euclidean and non-Eucliean) would say that Minkowski < 1 has much value. I have a problem where I need to iterate over different values of p for Minkowski but there is a way around this issue (for p < 1). I can use a callable for my own Minkowski method and pass a parameter like this. Here "minkowski_distance" is my method...
This will solve my original problem and for anyone else that wants p < 0 for Minkowski. I have another very closely related issue to KNeightborsClassifier - if it needs a new issue I can open it. What about adding a "weights_params" option? Just like in the above code there is "metric" and "metric_params" but for weights it's just weights (for a callable) - but no weights_params. I have a case where I need to iterate over the value of p for the weights method and I have do something like this:
In this example How about adding a How about including something like this new param "weight_param":
Thanks -Ray |
That seems like a typical example of where it's best to allow that behaviour but raise a warning. |
@raymondj-pace would you mind sharing some references for us to see how this would be useful and how it would be sensible to apply KNN on a non-metric function like this? |
One is here: [Combining Minkowski and Chebyshev - arXivhttps://arxiv.org › pdf] https://arxiv.org/pdf/2112.12549 Scroll down to page 69 - p = 0.5, 0.75. The same diagrams for p < 1 are also on wikipedia as well: https://en.wikipedia.org/wiki/Minkowski_distance |
@scikit-learn/core-devs are we happy with the literature on p<1 here? |
I think that I would rather not support it as a Note that you might still define your own distance using the |
This functionality is available via |
So should we document this one instead of implementing it? |
I am fine with allowing p<1.0 it in KNNClassifier without a warning when using the bruteforce method as it's perfectly fine to compute the datapoints with the lowest pseudo-metric values. But we should still raise an exception when using ball-tree because then the triangle inequality is required to return correct results. The |
Actually a pseudo metric would still satisfy the triangular inequality. It's just that d(x, y) = 0 does not imply the identity x = y. But still the |
Actually minkowski with p<1 does not satisfy triangle inequality (take points (0,0), (1,1) and (0,1), example from wikipedia) |
What's our conclusion?
I'll vote for 1. as I don't have seen a convincing use case to allow it, and on top of it, it is available via passing |
Yes I agree, I was correcting my previous comment where I made a bad use of the word "pseudometric". Still, bruteforce knn is well defined for p<1, so I don't see why we should block it. But I agree that we should prevent running the ball-tree (and even more the kd-tree) algorithms that relies on the |
|
I would allow it for brute force and raise for KD/ball-tree, but no strong opinion. |
@ogrisel What is your favorite option for which you would vote? |
+1. |
I could live with that. What do the others think? @raymondj-pace, @jjerphan, @adrinjalali @eschibli? |
Works for me. 👍 |
Sounds like a good resolution to me. |
Hi if this still needs to be implemented, can I work on this? @adrinjalali @jjerphan @lorentzenchr @ogrisel |
Hi @RudreshVeerkhare. Yes, this needs to be implemented and you can work on it ! |
/take |
Thanks @jjerphan 😁, I've start with setup. |
@jjerphan I'm done with the setup. Basically, I need to add functionality to allow value of p < 1 and p > 0 only when algorithm is explicitly set to "brute", and also to raise a warning that with p < 1, Minkowski is not a valid metric. Is that correct? 🤔 |
Yes, based on what was concluded in discussions starting from #22811 (comment) you are correct up to a clarification for what is to raise. I think @lorentzenchr meant to raise an error when users set The logic which sets scikit-learn/sklearn/neighbors/_base.py Line 585 in 60cc5b5
|
Thanks @jjerphan for the clarification, I'll start working on it, and will create a WIP PR. |
Describe the workflow you want to enable
I would like to be able to use the KNeighborsClassifier with something like:
neigh = KNeighborsClassifier(n_neighbors=2, p=0.1)
The error you get is:
Change the above line to limit minkowski p value to < 0:
if self.metric in ["wminkowski", "minkowski"] and effective_p < 0:
There is nothing wrong with using a p value > 0 and < 1.
Describe your proposed solution
Don't throw an exception if p is < 1.0.
Calculating Minkowski distance is most definitely valid for p > 0 and p < 1. I.e.: 0.1, 0.2, 0.3, 0.4, etc
The only requirement is compute for each dimension raised to the p power. In this case **p and after the sum of all dimensions. Take the p-th root of the sum of dimensions:
distance = distance_sum**p
There are many cases where it is desirable to compute minkowski > 0 and < 1
Describe alternatives you've considered, if relevant
Write my own kNN classifier with my own minkowski distance calculator:
Additional context
No response
The text was updated successfully, but these errors were encountered: