8000 NearestCentroid FutureWarning with cosine metric · Issue #28003 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

NearestCentroid FutureWarning with cosine metric #28003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yohann84L opened this issue Dec 22, 2023 · 1 comment
Closed

NearestCentroid FutureWarning with cosine metric #28003

yohann84L opened this issue Dec 22, 2023 · 1 comment
Labels
Bug Needs Triage Issue requires triage

Comments

@yohann84L
Copy link
yohann84L commented Dec 22, 2023

Describe the bug

NearestCentroid class throw a FutureWarning when using metric='cosine'.
Actually, the metric is use for two things:

  • inside .fit() for the computation of the centroids:
    • with euclidean: centroids are computed using the mean of features
    • with manhattan: centroids are computed using the median
            if self.metric == "manhattan":
                # NumPy does not calculate median of sparse matrices.
                if not is_X_sparse:
                    self.centroids_[cur_class] = np.median(X[center_mask], axis=0)
                else:
                    self.centroids_[cur_class] = csc_median_axis_0(X[center_mask])
            else:
                # TODO(1.5) remove warning when metric is only manhattan or euclidean
                if self.metric != "euclidean":
                    warnings.warn(
                        "Averaging for metrics other than "
                        "euclidean and manhattan not supported. "
                        "The average is set to be the mean."
                    )
                self.centroids_[cur_class] = X[center_mask].mean(axis=0)
  • inside .predict(): the metric is used for the pairwise distance
    def predict(self, X):
        ...
        return self.classes_[
            pairwise_distances_argmin(X, self.centroids_, metric=self.metric)
        ]

But, what if I want to use centroids computed with the mean of features and using cosine (or other) metric to compute the pairwise distance ?

Steps/Code to Reproduce

from sklearn.neighbors import NearestCentroid
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf = NearestCentroid(metric='cosine')
clf.fit(X, y)
print(clf.predict([[-0.8, -1]]))

Expected Results

No error is thrown.

Actual Results

../sklearn/neighbors/_nearest_centroid.py:150: FutureWarning: Support for distance metrics other than euclidean and manhattan and for callables was deprecated in version 1.3 and will be removed in version 1.5.
  warnings.warn(
../sklearn/neighbors/_nearest_centroid.py:201: UserWarning: Averaging for metrics other than euclidean and manhattan not supported. The average is set to be the mean.

Versions

System:
    python: 3.8.3 (default, Jun 16 2021, 12:00:44)  [Clang 12.0.5 (clang-1205.0.22.9)]
executable: /Users/yohanntvmbp/.pyenv/versions/tvmealvisionapi/bin/python
   machine: macOS-14.0-x86_64-i386-64bit
Python dependencies:
      sklearn: 1.3.1
          pip: 22.3.1
   setuptools: 62.1.0
        numpy: 1.21.5
        scipy: 1.7.3
       Cython: None
       pandas: 1.3.5
   matplotlib: 3.6.2
       joblib: 1.3.2
threadpoolctl: 3.1.0
Built with OpenMP: True
@yohann84L yohann84L added Bug Needs Triage Issue requires triage labels Dec 22, 2023
@glemaitre
Copy link
Member

But, what if I want to use centroids computed with the mean of features and using cosine (or other) metric to compute the pairwise distance ?

I think this is the point of the deprecation, only euclidean and manhattan should be supported: #24083.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue requires triage
Projects
None yet
Development

No branches or pull requests

2 participants
0