Using `NearestNeighbors` with `p < 1` and floats raises an error #26548

catanzaromj · 2023-06-08T17:10:23Z

Describe the bug

Using NearestNeighbors with p < 1 raises an error if the array X contains floats. It does not seem to raise errors if X consists of integers.

This was originally discussed in #26536

For example, this is fine:

from sklearn.neighbors import NearestNeighbors
import numpy as np
X = np.array([[1,0], [0,0], [0,1]])
neigh = NearestNeighbors(algorithm='brute',metric_params={'p':0.5})
neigh.fit(X)
neigh.radius_neighbors(X[0].reshape(1,-1), radius=4, return_distance=False)

I would expect this behavior whether X consists of floats or integers.

Steps/Code to Reproduce

from sklearn.neighbors import NearestNeighbors
import numpy as np
X = np.array([[1.0,0.0], [0.0,0.0], [0.0,1.0]])
neigh = NearestNeighbors(algorithm='brute',metric_params={'p':0.5})
neigh.fit(X)
neigh.radius_neighbors(X[0].reshape(1,-1), radius=4, return_distance=False)

Expected Results

array([array([0, 1, 2])], dtype=object)

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[40], line 6
      4 neigh = NearestNeighbors(algorithm='brute',metric_params={'p':0.5})
      5 neigh.fit(X)
----> 6 neigh.radius_neighbors(X[0].reshape(1,-1), radius=4, return_distance=False)

File ~/miniconda3/envs/dimmer_env/lib/python3.9/site-packages/sklearn/neighbors/_base.py:1161, in RadiusNeighborsMixin.radius_neighbors(self, X, radius, return_distance, sort_results)
   1153 use_pairwise_distances_reductions = (
   1154     self._fit_method == "brute"
   1155     and RadiusNeighbors.is_usable_for(
   1156         X if X is not None else self._fit_X, self._fit_X, self.effective_metric_
   1157     )
   1158 )
   1160 if use_pairwise_distances_reductions:
-> 1161     results = RadiusNeighbors.compute(
   1162         X=X,
   1163         Y=self._fit_X,
   1164         radius=radius,
   1165         metric=self.effective_metric_,
   1166         metric_kwargs=self.effective_metric_params_,
   1167         strategy="auto",
   1168         return_distance=return_distance,
   1169         sort_results=sort_results,
   1170     )
   1172 elif (
   1173     self._fit_method == "brute" and self.metric == "precomputed" and issparse(X)
   1174 ):
   1175     results = _radius_neighbors_from_graph(
   1176         X, radius=radius, return_distance=return_distance
   1177     )

File ~/miniconda3/envs/dimmer_env/lib/python3.9/site-packages/sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py:421, in RadiusNeighbors.compute(cls, X, Y, radius, metric, chunk_size, metric_kwargs, strategy, return_distance, sort_results)
    335 """Return the results of the reduction for the given arguments.
    336 
    337 Parameters
   (...)
    418 returns.
    419 """
    420 if X.dtype == Y.dtype == np.float64:
--> 421     return RadiusNeighbors64.compute(
    422         X=X,
    423         Y=Y,
    424         radius=radius,
    425         metric=metric,
    426         chunk_size=chunk_size,
    427         metric_kwargs=metric_kwargs,
    428         strategy=strategy,
    429         sort_results=sort_results,
    430         return_distance=return_distance,
    431     )
    433 if X.dtype == Y.dtype == np.float32:
    434     return RadiusNeighbors32.compute(
    435         X=X,
    436         Y=Y,
   (...)
    443         return_distance=return_distance,
    444     )

File sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.pyx:110, in sklearn.metrics._pairwise_distances_reduction._radius_neighbors.RadiusNeighbors64.compute()

File sklearn/metrics/_pairwise_distances_reduction/_datasets_pair.pyx:87, in sklearn.metrics._pairwise_distances_reduction._datasets_pair.DatasetsPair64.get_for()

File sklearn/metrics/_dist_metrics.pyx:285, in sklearn.metrics._dist_metrics.DistanceMetric.get_metric()

File sklearn/metrics/_dist_metrics.pyx:1252, in sklearn.metrics._dist_metrics.MinkowskiDistance.__init__()

ValueError: p must be greater than 1

Versions

System:
    python: 3.9.16 (main, Mar  8 2023, 14:00:05)  [GCC 11.2.0]
executable: /home/XYX/miniconda3/envs/dimmer_env/bin/python
   machine: Linux-5.15.0-69-generic-x86_64-with-glibc2.31

Python dependencies:
      sklearn: 1.2.2
          pip: 23.0.1
   setuptools: 67.8.0
        numpy: 1.24.3
        scipy: 1.10.1
       Cython: None
       pandas: 2.0.1
   matplotlib: 3.7.1
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/XYX/miniconda3/envs/dimmer_env/lib/python3.9/site-packages/numpy.libs/libopenblas64_p-r0-15028c96.3.21.so
        version: 0.3.21
threading_layer: pthreads
   architecture: Zen
    num_threads: 48

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/XYX/miniconda3/envs/dimmer_env/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: Zen
    num_threads: 48

       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/XYX/miniconda3/envs/dimmer_env/lib/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
    num_threads: 48

The text was updated successfully, but these errors were encountered:

Shreesha3112 · 2023-06-14T13:28:26Z

This issue arises due to the different underlying computations when 'X' is of integer or float type.
It looks like, when X is integer type pairwise_distances_chunked is used for computation and when X is[np.float32, np.float64]pairwise_distances_reductions are used which gets the MinkowskiDistance64 metric from cython module. MinkowskiDistance64 class raises value error when P < 1 when it is constructed.

ogrisel · 2023-06-16T15:15:11Z

I think should allow p < 1 for all ways to compute the Minkowski-based neighbors as long as we use an exhaustive search method (such as algorithm="brute") that does not rely on the triangular equality.

Feel free to open a PR.

ogrisel · 2023-06-16T15:16:31Z

Out of curiosity, what is your use case for this combination of hyper-parameters?

catanzaromj · 2023-06-16T22:08:33Z

@ogrisel I was trying to explore what neighbors look like for high-dimensional datasets, and modifying the metric for that computation. I noticed that p<1 was implemented recently in #24750 (understanding that these aren't actually metrics), which is why I chose algorithm="brute".

I'm happy to open or contribute to a PR on this issue.

Shreesha3112 · 2023-06-22T06:09:27Z

@catanzaromj Are you working on this? If not, I will open PR.

catanzaromj · 2023-06-30T14:21:56Z

@Shreesha3112 I am not but I just realized it may be getting fixed on #26568

Shreesha3112 · 2023-07-04T07:47:17Z

@catanzaromj I think #26568 deals specifically with parameter validation for NearestNeighbors. This issue still requires a PR from what I can see.

ChandraPrakash-Bathula · 2023-07-06T18:24:31Z

P should be greater than for Minkowski distance metric and also you can use value p = 1 for Manhattan distance and p = 2 for Euclidean Distance and here is the output when I've used p = 1.2 and the result is as per your expectation.

catanzaromj added Bug Needs Triage Issue requires triage labels Jun 8, 2023

ogrisel removed the Needs Triage Issue requires triage label Jun 16, 2023

glemaitre mentioned this issue Jun 29, 2023

MAINT Parameter validation for sklearn.neighbors.neighbors_graph #26568

Merged

Shreesha3112 mentioned this issue Jul 4, 2023

ENH Allow 0<p<1 for Minkowski metric #26760

Merged

jeremiedbb closed this as completed in #26760 Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `NearestNeighbors` with `p < 1` and floats raises an error #26548

Using `NearestNeighbors` with `p < 1` and floats raises an error #26548

Using NearestNeighbors with p < 1 and floats raises an error #26548

Using NearestNeighbors with p < 1 and floats raises an error #26548

Comments

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Using `NearestNeighbors` with `p < 1` and floats raises an error #26548

Using `NearestNeighbors` with `p < 1` and floats raises an error #26548