-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Q: why is the correlation metric commented in dist_metrics.pyx? #4213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
SciPy has a vectorized version of it, |
|
Thanks for the replies. I run into this issue because I wanted to run KNN classification using correlation distance as a metric, and passing any user defined function was very slow. I ended up defining my own cython function (but I wasn't aware of |
I see. This use-case and others like it could be accommodated by adding a |
Actually, it looks like the correlation distance is implemented natively for the brute-force method: clf = KNeighborsClassifier(5, algorithm='brute', metric='correlation')
X, y = np.random.random((100, 3)), np.random.randint(0, 3, 100)
clf.fit(X, y)
clf.predict(X) |
If brute-force isn't fast enough for your use-case, there do exist fast algorithms for finding neighbors with a correlation distance, but it will require a data structure that's not implemented in scikit-learn. Look u 8189 p "Kernelized Locality Sensitive Hashing" for one example of an approach that might be useful. |
@jakevdp thanks for taking time to look into this and for the info on that method! |
Hi,
in
dist_metrics.pyx
the correlation metric is commented, and the comment states that it is not a 'true' metric. While this is true as it doesn't satisfy the coincidence axiom, it is still helpful to have such a metric implemented in cython to speed up computations.Also, note that the actual formula implemented in the commented code is incorrect. The returned value at the moment is
but it should be instead
Thanks,
Matteo
The text was updated successfully, but these errors were encountered: