-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
NearestCentroid not handling properly distance metrics other than Manhattan or Euclidean. #23890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree with your analysis. Feel free to open a PR that throws a This could be achieved cleanly with the help of the new parameter validation/constraints API: #23462. |
Actually there is already an open PR here: #23874. It would need to be updated accordingly and the change document in the changelog (as a bugfix). |
It seems that this issue can be closed? @ogrisel |
Indeed, fixed by #24083. Let's close |
Describe the bug
Three issues, quite related.
Using a metric that is not Manhattan or Euclidean triggers the following warning:
UserWarning: Averaging for metrics other than euclidean and manhattan not supported. The average is set to be the mean.
, but I believe an error message should be thrown instead. Indeed, computing the centroids as the average of the class samples is wrong for (I think) all distances except Euclidean, so it is misleading to actually do it even with a warning. Throwing an error here would fix the 2 following issues.Calling
.predict
on aNearestCentroid
with metricwminkowski
,seuclidean
ormahalanobis
throws an error, because those metrics need params that we can't pass toNearestCentroid
. DOC Improve doc of Nearest Neighbors metric param #23806 would fix the documentation, but maybe it should be nice to handle this properly in the code.Using any string as a
metric
(even a random string likefoobar
) does not throws an error when fitting, but does when calling.predict
. Doing this with another classifier likeKNeighborsClassifier
throws an error at the fitting step.If relevant I'd like to work on fixing those issues (new contributor here :)) I noticed a stalled PR (#19262) related to implementing
partial_fit
toNearestCentroid
. Maybe I should start working on it first, as it seems to improve the code quality of_nearest_centroid.py
, so it would be an interesting base to start with.Steps/Code to Reproduce
Expected Results
Fitting a
NearestCentroid
with a metric that is not Manhattan or Euclidean should trigger a proper error message.Actual Results
Error related to the second issue:
ValueError: The 'V' parameter is required for the seuclidean metric when Y is passed.
Versions
The text was updated successfully, but these errors were encountered: