10000 NearestCentroid FutureWarning with cosine metric · Issue #28003 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
NearestCentroid FutureWarning with cosine metric #28003
Closed
@yohann84L

Description

@yohann84L

Describe the bug

NearestCentroid class throw a FutureWarning when using metric='cosine'.
Actually, the metric is use for two things:

  • inside .fit() for the computation of the centroids:
    • with euclidean: centroids are computed using the mean of features
    • with manhattan: centroids are computed using the median
            if self.metric == "manhattan":
                # NumPy does not calculate median of sparse matrices.
                if not is_X_sparse:
                    self.centroids_[cur_class] = np.median(X[center_mask], axis=0)
                else:
                    self.centroids_[cur_class] = csc_median_axis_0(X[center_mask])
            else:
                # TODO(1.5) remove warning when metric is only manhattan or euclidean
                if self.metric != "euclidean":
                    warnings.warn(
                        "Averaging for metrics other than "
                        "euclidean and manhattan not supported. "
                        "The average is set to be the mean."
                    )
                self.centroids_[cur_class] = X[center_mask].mean(axis=0)
  • inside .predict(): the metric is used for the pairwise distance
    def predict(self, X):
        ...
        return self.classes_[
            pairwise_distances_argmin(X, self.centroids_, metric=self.metric)
        ]

But, what if I want to use centroids computed with the mean of features and using cosine (or other) metric to compute the pairwise distance ?

Steps/Code to Reproduce

from sklearn.neighbors import NearestCentroid
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf = NearestCentroid(metric='cosine')
clf.fit(X, y)
print(clf.predict([[-0.8, -1]]))

Expected Results

No error is thrown.

Actual Results

../sklearn/neighbors/_nearest_centroid.py:150: FutureWarning: Support for distance metrics other than euclidean and manhattan and for callables was deprecated in version 1.3 and will be removed in version 1.5.
  warnings.warn(
../sklearn/neighbors/_nearest_centroid.py:201: UserWarning: Averaging for metrics other than euclidean and manhattan not supported. The average is set to be the mean.

Versions

System:
    python: 3.8.3 (default, Jun 16 2021, 12:00:44)  [Clang 12.0.5 (clang-1205.0.22.9)]
executable: /Users/yohanntvmbp/.pyenv/versions/tvmealvisionapi/bin/python
   machine: macOS-14.0-x86_64-i386-64bit
Python dependencies:
      sklearn: 1.3.1
          pip: 22.3.1
   setuptools: 62.1.0
        numpy: 1.21.5
        scipy: 1.7.3
       Cython: None
       pandas: 1.3.5
   matplotlib: 3.6.2
       joblib: 1.3.2
threadpoolctl: 3.1.0
Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0