8000 `pairwise_distances` is inconsistent with `scipy.spatial.distance` when using `metric="matching"` · Issue #25532 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
pairwise_distances is inconsistent with scipy.spatial.distance when using metric="matching" #25532
Closed
@magnusbarata

Description

@magnusbarata

Describe the bug

Although the metric matching is already removed from the documentation, pairwise_distances function still allows its usage. When used, the input arrays are converted into boolean. This brings inconsistency with the counterpart function cdist and pdist from scipy.spatial.distance (note that scipy.spatial.distance.matching has been completely removed since v1.10.0). In scipy's cdist and pdist, the metric matching is considered a synonym for hamming, which allows non-boolean inputs.

To address this issue, I can propose 2 solutions:

  1. Disallow matching usage as a metric. This fix will remove matching from metrics allowed on pairwise.py and sklearn.neighbors._base.py.
  2. Allow non-boolean inputs when using matching as a metric. This fix will keep it consistent to scipy's implementation.

Once the solution is decided, I can make a PR for it.

Steps/Code to Reproduce

import numpy as np
from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import cdist

x = np.array([[1, 0, -1, 1, 0, -1]])
y = np.array([[0, -1, 1, 1, 0, -1]])
print('pairwise_distances: ', pairwise_distances(x, y, metric='matching'))
print('scipy cdist: ', cdist(x, y, metric='matching'))

Expected Results

pairwise_distances:  [[0.5]]
scipy cdist:  [[0.5]]

Actual Results

/usr/local/lib/python3.11/site-packages/sklearn/metrics/pairwise.py:2025: DataConversionWarning: Data was converted to boolean for metric matching
  warnings.warn(msg, DataConversionWarning)
pairwise_distances:  [[0.33333333]]
scipy cdist:  [[0.5]]

Versions

System:
    python: 3.11.1 (main, Jan 23 2023, 21:39:49) [GCC 10.2.1 20210110]
executable: /usr/local/bin/python3
   machine: Linux-5.15.49-linuxkit-aarch64-with-glibc2.31

Python dependencies:
      sklearn: 1.2.1
          pip: 22.3.1
   setuptools: 65.5.1
        numpy: 1.24.1
        scipy: 1.10.0
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /usr/local/lib/python3.11/site-packages/scikit_learn.libs/libgomp-d22c30c5.so.1.0.0
        version: None
    num_threads: 5

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /usr/local/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-cecebdce.3.21.so
        version: 0.3.21
threading_layer: pthreads
   architecture: armv8
    num_threads: 5

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /usr/local/lib/python3.11/site-packages/scipy.libs/libopenblasp-r0-dff490c2.3.
4F31
18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: armv8
    num_threads: 5

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0