-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Specifying 'cosine' as metric in KDTree throws error #25364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @01-vyom! Thanks for the issue. As far as I can see, sklearn.neighbors.KDTree can only be used with Minkowski distances (with $p \in [1, \infty )$): from sklearn.neighbors import KDTree
KDTree.valid_metrics
>>> ['euclidean', 'l2', 'minkowski', 'p', 'manhattan', 'cityblock', 'l1', 'chebyshev', 'infinity'] For some reason, the documentation here: is rendered in a way that makes it not obvious to run the code above (because of the lowercase and the underline). When KDTree (and BallTree) inherit the docs from BinaryTree, the class name is given to
scikit-learn/sklearn/neighbors/_binary_tree.pxi Lines 285 to 290 in 00f3910
So maybe changing here to
As for the user guide, I agree that, in 1.6.1.2, the text
makes it confusing to understand that you should not use "cosine" for KDTree and probably should be erased (or moved to another part of the text, such as in 1.6.1.1). |
Hi @vitaliset, Thanks for clarifying! Also, for the second one, I would say that we could move it to 1.6.1.1, but we might still need to say that it is not supported by algorithms the |
I agree that we could erase the part about the cosine metric, but I like the first phrase, and I think it is important to be somewhere:
Let's see what others have to say! |
Thanks for reporting this problem! The line you quoted is misleading and the last part (i.e. "and the metrics listed in Is one of you interesting in authoring such improvement to scikit-learn's documentation? 🙂 |
I am interested in authoring improvements. Should I create 2 separate PRs for the first (KDTree Docs.) as well as the second problem (user guide section)? Or Should I create one single PR? |
Both are performable as part of an unique PR. |
Describe the bug
I am trying to implement the
KDTree Algorithm
withcosine
as a distance metric. I first started with scipy's implementation, but it didn't supportcosine
as a metric. Then, I came across sklearn implementation. The documentation indicates that the supported metric can be found (scipy.spatial.distance) and (distance_metrics). Both of these places show that thecosine
metric is supported. Also, the user guide also suggests that cosine is supported as it tries to point outcosine
ascosine_distance
. However, it still throws an error for me when I try to use that metric.Steps/Code to Reproduce
Expected Results
No error is thrown
Actual Results
Versions
The text was updated successfully, but these errors were encountered: