8000 `ValueError: Input contains NaN.` in `sklearn.manifold.smacof` · Issue #26999 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ValueError: Input contains NaN. in sklearn.manifold.smacof #26999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amadanmath opened this issue Aug 3, 2023 · 4 comments · Fixed by #30514
Closed

ValueError: Input contains NaN. in sklearn.manifold.smacof #26999

amadanmath opened this issue Aug 3, 2023 · 4 comments · Fixed by #30514
Labels
Bug Needs Investigation Issue requires investigation

Comments

@amadanmath
Copy link
amadanmath commented Aug 3, 2023

Describe the bug

I accidentally stumbled onto a ValueError when executing smacof. I hacked into _mds.py to save both the offending dissimilarities as well as the randomly generated X, then cut them down to minimal shape that still exhibits the error. This data is attached below in the MCVE.

Steps/Code to Reproduce

import numpy as np                          
import sklearn.manifold                     
                                            
dis = np.array([
    [0.0, 1.732050807568877, 1.7320508075688772], 
    [1.732050807568877, 0.0, 6.661338147750939e-16],
    [1.7320508075688772, 6.661338147750939e-16, 0.0]
])  
init = np.array([
    [0.08665881585055124, 0.7939114643387546],
    [0.9959834154297658, 0.7555546025640025],
    [0.8766008278401566, 0.4227358815811242]
])  
sklearn.manifold.smacof(dis, init=init, normalized_stress="auto", metric=False, n_init=1)

Expected Results

No errors

Actual Results

Traceback (most recent call last):
  File ".../rep_error.py", line 14, in <module>
    sklearn.manifold.smacof(dis, init=init, normalized_stress="auto", metric=False, n_init=1)
  File ".../.direnv/python-3.9.5/lib/python3.9/site-packages/sklearn/manifold/_mds.py", line 329, in smacof
    pos, stress, n_iter_ = _smacof_single(
  File ".../.direnv/python-3.9.5/lib/python3.9/site-packages/sklearn/manifold/_mds.py", line 128, in _smacof_single
    dis = euclidean_distances(X)
  File ".../.direnv/python-3.9.5/lib/python3.9/site-packages/sklearn/metrics/pairwise.py", line 310, in euclidean_distances
    X, Y = check_pairwise_arrays(X, Y)
  File ".../.direnv/python-3.9.5/lib/python3.9/site-packages/sklearn/metrics/pairwise.py", line 156, in check_pairwise_arrays
    X = Y = check_array(
  File ".../.direnv/python-3.9.5/lib/python3.9/site-packages/sklearn/utils/validation.py", line 959, in check_array
    _assert_all_finite(
  File ".../.direnv/python-3.9.5/lib/python3.9/site-packages/sklearn/utils/validation.py", line 124, in _assert_all_finite
    _assert_all_finite_element_wise(
  File ".../.direnv/python-3.9.5/lib/python3.9/site-packages/sklearn/utils/validation.py", line 173, in _assert_all_finite_element_wise
    raise ValueError(msg_err)
ValueError: Input contains NaN.

Versions

System:
    python: 3.9.5 (default, Nov 23 2021, 15:27:38)  [GCC 9.3.0]
executable: .../.direnv/python-3.9.5/bin/python
   machine: Linux-5.4.0-148-generic-x86_64-with-glibc2.31

Python dependencies:
      sklearn: 1.3.0
          pip: 23.2.1
   setuptools: 44.0.0
        numpy: 1.25.1
        scipy: 1.11.1
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.3.1
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 32
         prefix: libgomp
       filepath: .../.direnv/python-3.9.5/lib/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 32
         prefix: libopenblas
       filepath: .../.direnv/python-3.9.5/lib/python3.9/site-packages/numpy.libs/libopenblas64_p-r0-7a851222.3.23.so
        version: 0.3.23
threading_layer: pthreads
   architecture: Zen

       user_api: blas
   internal_api: openblas
    num_threads: 32
         prefix: libopenblas
       filepath: .../.direnv/python-3.9.5/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-23e5df77.3.21.dev.so
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: Zen
@amadanmath amadanmath added Bug Needs Triage Issue requires triage labels Aug 3, 2023
@glemaitre
Copy link
Member

I debug a bit to understand who is the culprit. The issue is here:

ir = IsotonicRegression()
for it in range(max_iter):
# Compute distance and monotonic regression
dis = euclidean_distances(X)
if metric:
disparities = dissimilarities
else:
dis_flat = dis.ravel()
# dissimilarities with 0 are considered as missing values
dis_flat_w = dis_flat[sim_flat != 0]
# Compute the disparities using a monotonic regression
disparities_flat = ir.fit_transform(sim_flat_w, dis_flat_w)
disparities = dis_flat.copy()
disparities[sim_flat != 0] = disparities_flat
disparities = disparities.reshape((n_samples, n_samples))
disparities *= np.sqrt(
(n_samples * (n_samples - 1) / 2) / (disparities**2).sum()
)

The IsotonicRegression.fit_transform will output nan values certainly due to the default out_of_bounds policy. I don't know if it makes sense to clip by default here. I would need to understand if this is normal to trigger this issue with the given data.

@adrinjalali adrinjalali added Needs Investigation Issue requires investigation and removed Needs Triage Issue requires triage labels Aug 3, 2023
@amadanmath
Copy link
Author
amadanmath commented Sep 6, 2023

If anyone runs into this, and just wants the error to go away, a bit of rounding worked for me (given that my values were near zero). In this case,

dis = dis.round(15)

@dkobak
Copy link
Contributor
dkobak commented Dec 20, 2024

Thanks for submitting this bug. I just made a PR that fixes it (together with some other things): #30514.

@dkobak
Copy link
Contributor
dkobak commented Jan 7, 2025

@glemaitre Happy New Year! Gentle ping to look at my PR that should fix this and other related issues: #30514... It did not receive any feedback yet.

@github-project-automation github-project-automation bot moved this from Hard to Done in Maintenance Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Investigation Issue requires investigation
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants
0