8000 Using `NearestNeighbors` with `p < 1` and floats raises an error · Issue #26548 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Using NearestNeighbors with p < 1 and floats raises an error #26548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
catanzaromj opened this issue Jun 8, 2023 · 8 comments · Fixed by #26760
Closed

Using NearestNeighbors with p < 1 and floats raises an error #26548

catanzaromj opened this issue Jun 8, 2023 · 8 comments · Fixed by #26760
Labels

Comments

@catanzaromj
Copy link
catanzaromj commented Jun 8, 2023

Describe the bug

Using NearestNeighbors with p < 1 raises an error if the array X contains floats. It does not seem to raise errors if X consists of integers.

This was originally discussed in #26536

For example, this is fine:

from sklearn.neighbors import NearestNeighbors
import numpy as np
X = np.array([[1,0], [0,0], [0,1]])
neigh = NearestNeighbors(algorithm='brute',metric_params={'p':0.5})
neigh.fit(X)
neigh.radius_neighbors(X[0].reshape(1,-1), radius=4, return_distance=False)

I would expect this behavior whether X consists of floats or integers.

Steps/Code to Reproduce

from sklearn.neighbors import NearestNeighbors
import numpy as np
X = np.array([[1.0,0.0], [0.0,0.0], [0.0,1.0]])
neigh = NearestNeighbors(algorithm='brute',metric_params={'p':0.5})
neigh.fit(X)
neigh.radius_neighbors(X[0].reshape(1,-1), radius=4, return_distance=False)

Expected Results

array([array([0, 1, 2])], dtype=object)

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[40], line 6
      4 neigh = NearestNeighbors(algorithm='brute',metric_params={'p':0.5})
      5 neigh.fit(X)
----> 6 neigh.radius_neighbors(X[0].reshape(1,-1), radius=4, return_distance=False)

File ~/miniconda3/envs/dimmer_env/lib/python3.9/site-packages/sklearn/neighbors/_base.py:1161, in RadiusNeighborsMixin.radius_neighbors(self, X, radius, return_distance, sort_results)
   1153 use_pairwise_distances_reductions = (
   1154     self._fit_method == "brute"
   1155     and RadiusNeighbors.is_usable_for(
   1156         X if X is not None else self._fit_X, self._fit_X, self.effective_metric_
   1157     )
   1158 )
   1160 if use_pairwise_distances_reductions:
-> 1161     results = RadiusNeighbors.compute(
   1162         X=X,
   1163         Y=self._fit_X,
   1164         radius=radius,
   1165         metric=self.effective_metric_,
   1166         metric_kwargs=self.effective_metric_params_,
   1167         strategy="auto",
   1168         return_distance=return_distance,
   1169         sort_results=sort_results,
   1170     )
   1172 elif (
   1173     self._fit_method == "brute" and self.metric == "precomputed" and issparse(X)
   1174 ):
   1175     results = _radius_neighbors_from_graph(
   1176         X, radius=radius, return_distance=return_distance
   1177     )

File ~/miniconda3/envs/dimmer_env/lib/python3.9/site-packages/sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py:421, in RadiusNeighbors.compute(cls, X, Y, radius, metric, chunk_size, metric_kwargs, strategy, return_distance, sort_results)
    335 """Return the results of the reduction for the given arguments.
    336 
    337 Parameters
   (...)
    418 returns.
    419 """
    420 if X.dtype == Y.dtype == np.float64:
--> 421     return RadiusNeighbors64.compute(
    422         X=X,
    423         Y=Y,
    424         radius=radius,
    425         metric=metric,
    426         chunk_size=chunk_size,
    427         metric_kwargs=metric_kwargs,
    428         strategy=strategy,
    429         sort_results=sort_results,
    430         return_distance=return_distance,
    431     )
    433 if X.dtype == Y.dtype == np.float32:
    434     return RadiusNeighbors32.compute(
    435         X=X,
    436         Y=Y,
   (...)
    443         return_distance=return_distance,
    444     )

File sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.pyx:110, in sklearn.metrics._pairwise_distances_reduction._radius_neighbors.RadiusNeighbors64.compute()

File sklearn/metrics/_pairwise_distances_reduction/_datasets_pair.pyx:87, in sklearn.metrics._pairwise_distances_reduction._datasets_pair.DatasetsPair64.get_for()

File sklearn/metrics/_dist_metrics.pyx:285, in sklearn.metrics._dist_metrics.DistanceMetric.get_metric()

File sklearn/metrics/_dist_metrics.pyx:1252, in sklearn.metrics._dist_metrics.MinkowskiDistance.__init__()

ValueError: p must be greater than 1

Versions

System:
    python: 3.9.16 (main, Mar  8 2023, 14:00:05)  [GCC 11.2.0]
executable: /home/XYX/miniconda3/envs/dimmer_env/bin/python
   machine: Linux-5.15.0-69-generic-x86_64-with-glibc2.31

Python dependencies:
      sklearn: 1.2.2
          pip: 23.0.1
   setuptools: 67.8.0
        numpy: 1.24.3
        scipy: 1.10.1
       Cython: None
       pandas: 2.0.1
   matplotlib: 3.7.1
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/XYX/miniconda3/envs/dimmer_env/lib/python3.9/site-packages/numpy.libs/libopenblas64_p-r0-15028c96.3.21.so
        version: 0.3.21
threading_layer: pthreads
   architecture: Zen
    num_threads: 48

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/XYX/miniconda3/envs/dimmer_env/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: Zen
    num_threads: 48

       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/XYX/miniconda3/envs/dimmer_env/lib/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
    num_threads: 48
@catanzaromj catanzaromj added Bug Needs Triage Issue requires triage labels Jun 8, 2023
@Shreesha3112
Copy link
Contributor

This issue arises due to the different underlying computations when 'X' is of integer or float type.
It looks like, when X is integer type pairwise_distances_chunked is used for computation and when X is[np.float32, np.float64]pairwise_distances_reductions are used which gets the MinkowskiDistance64 metric from cython module. MinkowskiDistance64 class raises value error when P < 1 when it is constructed.

@ogrisel
Copy link
Member
ogrisel commented Jun 16, 2023

I think should allow p < 1 for all ways to compute the Minkowski-based neighbors as long as we use an exhaustive search method (such as algorithm="brute") that does not rely on the triangular equality.

Feel free to open a PR.

@ogrisel ogrisel removed the Needs Triage Issue requires triage label Jun 16, 2023
@ogrisel
Copy link
Member
ogrisel commented Jun 16, 2023

Out of curiosity, what is your use case for this combination of hyper-parameters?

@catanzaromj
Copy link
Author
catanzaromj commented Jun 16, 2023

@ogrisel I was trying to explore what neighbors look like for high-dimensional datasets, and modifying the metric for that computation. I noticed that p<1 was implemented recently in #24750 (understanding that these aren't actually metrics), which is why I chose algorithm="brute".

I'm happy to open or contribute to a PR on this issue.

@Shreesha3112
Copy link
Contributor

@catanzaromj Are you working on this? If not, I will open PR.

@catanzaromj
Copy link
Author

@Shreesha3112 I am not but I just realized it may be getting fixed on #26568

@Shreesha3112
Copy link
Contributor

@catanzaromj I think #26568 deals specifically with parameter validation for NearestNeighbors. This issue still requires a PR from what I can see.

@ChandraPrakash-Bathula
Copy link

P should be greater than for Minkowski distance metric and also you can use value p = 1 for Manhattan distance and p = 2 for Euclidean Distance and here is the output when I've used p = 1.2 and the result is as per your expectation.

Github

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
0