8000 pca.fit_transform returns error: array must not contain infs or NaNs · Issue #18138 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

pca.fit_transform returns error: array must not contain infs or NaNs #18138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lukemao opened this issue Aug 11, 2020 · 16 comments
Closed

pca.fit_transform returns error: array must not contain infs or NaNs #18138

lukemao opened this issue Aug 11, 2020 · 16 comments

Comments

@lukemao
Copy link
lukemao commented Aug 11, 2020

When I call PCA's fit_transform method I am getting error array must not contain infs or NaNs

Actual code:
sklearn_pca = PCA(n_components = 3) input_vec = sklearn_pca.fit_transform(normalised_tfidf)

Traceback (most recent call last):

  File "C:\Users\User\workspace\caseStudy\main.py", line 135, in <module>
    input_vec = sklearn_pca.fit_transform(normalised_tfidf)

  File "C:\Users\User\anaconda3\lib\site-packages\sklearn\decomposition\_pca.py", line 369, in fit_transform
    U, S, V = self._fit(X)

  File "C:\Users\User\anaconda3\lib\site-packages\sklearn\decomposition\_pca.py", line 418, in _fit
    return self._fit_truncated(X, n_components, self._fit_svd_solver)

  File "C:\Users\User\anaconda3\lib\site-packages\sklearn\decomposition\_pca.py", line 532, in _fit_truncated
    U, S, V = randomized_svd(X, n_components=n_components,

  File "C:\Users\User\anaconda3\lib\site-packages\sklearn\utils\extmath.py", line 354, in randomized_svd
    Uhat, s, V = linalg.svd(B, full_matrices=False)

  File "C:\Users\User\anaconda3\lib\site-packages\scipy\linalg\decomp_svd.py", line 109, in svd
    a1 = _asarray_validated(a, check_finite=check_finite)

  File "C:\Users\User\anaconda3\lib\site-packages\scipy\_lib\_util.py", line 246, in _asarray_validated
    a = toarray(a)

  File "C:\Users\User\anaconda3\lib\site-packages\numpy\lib\function_base.py", line 498, in asarray_chkfinite
    raise ValueError(

ValueError: array must not contain infs or NaNs

Checked if there are infs and NaNs in the input array:
np.any(np.isnan(normalised_tfidf))
Out[2]: False

np.any(np.isinf(normalised_tfidf))
Out[3]: False

Versions:
Python: 3.8
Anaconda: 1.9.12
Sklearn: 0.23.1

@lukemao
Copy link
Author
lukemao commented Aug 11, 2020

sklearn.show_versions()
C:\Users\User\anaconda3\lib\site-packages\setuptools\distutils_patch.py:25: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
warnings.warn(

System:
python: 3.8.3 (default, Jul 2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\User\anaconda3\python.exe
machine: Windows-10-10.0.19041-SP0

Python dependencies:
pip: 20.2.1
setuptools: 49.2.0.post20200714
sklearn: 0.22.1
numpy: 1.18.5
scipy: 1.4.1
Cython: 0.29.14
pandas: 1.0.5
matplotlib: 3.2.2
joblib: 0.16.0

Built with OpenMP: True

@lukemao
Copy link
Author
lukemao commented Aug 11, 2020

Line 134 - 135:
sklearn_pca = PCA(n_components = 3)
input_vec = sklearn_pca.fit_transform(normalised_tfidf)

@lukemao
Copy link
Author
lukemao commented Aug 11, 2020

Verified no NaN and inf in the input:

np.any(np.isnan(normalised_tfidf))
Out[2]: False

np.any(np.isinf(normalised_tfidf))
Out[3]: False

@lukemao
Copy link
Author
lukemao commented Aug 11, 2020

Working fine on:
System:
python: 3.7.6

Python dependencies:
sklearn: 0.22.1

@lukemao
Copy link
Author
lukemao commented Aug 11, 2020

worked in this env:
sklearn.show_versions()

System:
python: 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\User\anaconda3\envs\py376\pythonw.exe
machine: Windows-10-10.0.19041-SP0

Python dependencies:
pip: 20.0.2
setuptools: 45.2.0.post20200210
sklearn: 0.22.1
numpy: 1.18.1
scipy: 1.4.1
Cython: 0.29.15
pandas: 1.0.1
matplotlib: 3.1.3
joblib: 0.14.1

Built with OpenMP: True

@thomasjpfan
Copy link
Member

May you provide the dataset that caused the issue so that we can reproduce it?

@lukemao lukemao closed this as completed Aug 17, 2020
@lukemao
Copy link
Author
lukemao commented Aug 17, 2020

Issue resolved. Can not reproduce anymore

@MichalRIcar
Copy link

Issue resolved. Can not reproduce anymore

Hello,

can you share the solution, pls? As from sklearn > 0.22.1 I have the same issue with many random datasets which never produced such errors before.

Oddly enough, if I add a simple loop e.g.:
try:
x_pca = pca.fit_transform(PCA_data)
except:
x_pca = pca.fit_transform(PCA_data)

then with the second run it ALWAYS passes without error...

As I saw people discussing this issue on forums, I believe it is worthy to solve it in a general way.

@RamziRahli
Copy link

can someone share the solution please ??????

@MichalRIcar
Copy link

@RamziRahli, check #19285
in my case numpy+scikit team solved it by new numpy version → 1.20.01

@RamziRahli
Copy link

@MichalRIcar I just did and it still doesn't work :/

@thomasjpfan
Copy link
Member

@MichalRIcar @RamziRahli May you provide the dataset that causes the error for you? This will help us debug this issue.

@LouisLin725
Copy link

@lukemao Hi! Mr. lukemao, I got a same question while processing a huge data. May I ask you how you solved your problem? I have tried all debugging methods of this issue, but they didn't work. Thank you!

@liuzuo-byte
Copy link

Issue resolved. Can not reproduce anymore

How did you solve it·

@deepdarkfans
Copy link

The unexplained bugs. The same python, numpy and sklearn version on Windows and Wsl2 (Ubuntu). Windows works well but Wsl2 has this issue.
I tried to upgrade numpy1.18.5 to 1.20.2, it did not work. Then I degrade numpy to 1.16.5, issue is solved.

@MaAleem08
Copy link
MaAleem08 commented Apr 3, 2023

can someone share the solution please ??????

bro , you got the solution ?
even iam stuck with the same while doing factor analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants
0