BUG: Invalid value in inplace log #17761

larsoner · 2020-11-12T14:08:08Z

Reproducing code example:

I cannot reproduce on my system (yet).

Error message:

Travis Linux hits a RuntimeWarning it did not previously (with no relevant code changes at our end):

mne/filter.py:387: in _construct_fir_filter
    h = minimum_phase(h)
../../../virtualenv/python3.8.1/lib/python3.8/site-packages/scipy/signal/fir_filter_design.py:1246: in minimum_phase
    np.log(h_temp, out=h_temp)
E   RuntimeWarning: invalid value encountered in log

Above the error is in a SciPy call. But it appears to happen anywhere we do a np.log(x, out=x), here is another:

mne/preprocessing/nirs/_optical_density.py:43: in optical_density
    np.log(raw._data[ii], out=raw._data[ii])
E   RuntimeWarning: invalid value encountered in log

If it's not clear from looking at the past few days of NumPy merges what the culprit might be, I can try to get more information out of Travis about what's actually in the offending arrays.

NumPy/Python version information:

This is on numpy-1.20.0.dev0+0ffaaf8 / 0ffaaf8 from Thursday (today) on pypi.anaconda.org/scipy-wheels-nightly. Previous builds on numpy-1.20.0.dev0+5f071c6 / 5f071c6 from Monday didn't have this problem.

The text was updated successfully, but these errors were encountered:

mattip · 2020-11-12T14:27:02Z

Do you happen to know the dtype of the input? Is it always the same dtype? We did change many of the SIMD-enhanced ufuncs over the past couple of months, but if I recall correctly float_log is not one of them.

larsoner · 2020-11-12T14:29:34Z

It should be float64 in both cases.

larsoner · 2020-11-12T14:30:55Z

... and in the SciPy code it even protects against anything close to zero:

        h_temp = np.abs(fft(h, n_fft))
        # take 0.25*log(|H|**2) = 0.5*log(|H|)
        h_temp += 1e-7 * h_temp[h_temp > 0].min()  # don't let log blow up
        np.log(h_temp, out=h_temp)  # <-- the problematic call

mattip · 2020-11-12T14:36:43Z

it may be that the error state is getting set somewhere else and only getting checked after the call to np.log

larsoner · 2020-11-12T14:46:26Z

In the SciPy and MNE code there are x += and x /= operations, respectively, immediately preceding the np.log(x, out=x) call. Locally with some toy arrays I cannot replicate, but it could be that the SIMD extensions on my machine don't match those of Travis, which I guess could matter here.

larsoner · 2020-11-12T14:52:03Z

And actually it looks like this is causing failures in SciPy directly as well:

https://travis-ci.org/github/scipy/scipy/jobs/743059506#L19563-L19579

That is a build for a PR that did not change the minimum_phase function or anything related to it.

mattip · 2020-11-12T15:13:09Z

Between 5f071c6 and 0ffaaf8 it seems that PR gh-16247 might be the most relevant. It touched sqrt, abs, recip, square.

larsoner · 2020-11-12T15:19:30Z

One option would be to open a SciPy PR that, in one of the NumPy-nightly Travis runs, reverts that PR to see if it fixes the minimum_phase test. Do you want to give it a shot? If not I can try to do it

larsoner · 2020-11-12T15:19:52Z

(This would at least tell us if it's the culprit)

seberg · 2020-11-12T15:26:28Z

@mattip we clear the floating point flags before each ufunc call, so I do not think the flags can be set anywhere else (unless there is casting involved, which would require strange input like integers, if it is possible at all).

mattip · 2020-11-12T15:27:07Z

Sure, if it is not to much bother go ahead. I think we will need a compact reproducer anyway, so it would be best to reproduces lcoally and cut down the test to a minimum. If it is that PR, we need a avx512 machine.

larsoner · 2020-11-12T15:30:02Z

I think we will need a compact reproducer anyway, so it would be best to reproduces lcoally and cut down the test to a minimum.

Agreed this makes sense. But perhaps it makes sense to revert gh-16247 if it does indeed fix the problem so that downstream libraries don't have to live with the bug. Then once a compact reproducer is made, gh-16247 can be revived and fixed?

If it is that PR, we need a avx512 machine.

FWIW my machine has AVX512 I think:

NumPy version 1.20.0.dev0+44d66b8
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?

And I cannot reproduce. I compile with extra_compile_args = -march=native if it matters.

larsoner · 2020-11-12T15:33:07Z

... and I cannot reproduce when installing that pip version either.

mattip · 2020-11-12T15:40:34Z

can you see what NumPy CPU features: are reported for the failing builds? You might need to add

from numpy._pytesttester import _show_numpy_info
 _show_numpy_info()

mattip · 2020-11-12T15:46:24Z

(this needs to be improved for the release, it is part of PR gh-17737)

seberg · 2020-11-12T16:01:14Z

Sorry for the noise.

Could you try to save the h_temp array as npz (maybe also before the guard line)? One silly thing I notice is that: h_temp += 1e-7 * h_temp[h_temp > 0].min() # don't let log blow up might not work if you have subnormal numbers, which could flush to zero when multiplied by 1e-7? So at least in theory that line could do nothing.

EDIT: Well, I guess probably more noise, even if there was a 0 in there, the warning should be "divide by zero" not invalid value.

seberg · 2020-11-12T16:47:24Z

One last bit of noise, but I think we should ping @r-devulap.

I am a bit dubious by this code:

numpy/numpy/core/src/umath/simd.inc.src

Lines 3317 to 3318 in 6f0436d

    
           divide_by_zero_mask = divide_by_zero_mask | (zero_mask & load_mask); 
        
           invalid_mask = invalid_mask | negx_mask;

since the negx_mask here does not have the & load_mask (I just can't explain why it would be different from the earlier version). And x at least in principle (e.g. if the array is short) can be uninitialized. So my best guess is if we change that to:

 divide_by_zero_mask = divide_by_zero_mask | (zero_mask & load_mask); 
 invalid_mask = invalid_mask | (negx_mask & load_mask);

Of course I wasn't able to trigger that (e.g. with a short array or similar).

larsoner · 2020-11-12T16:51:22Z

Thank goodniss I could reproduce on CircleCI, where I can run an interactive SSH session to interact with it afterward. In theory others can, too. But in any case, the output there is:

NumPy version 1.20.0.dev0+0ffaaf8
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F* AVX512CD* AVX512_KNL? AVX512_KNM? AVX512_SKX* AVX512_CLX? AVX512_CNL? AVX512_ICL?

I then shortened one of our tests, added before = raw._data[ii].copy(), and got this in interactive debug:

mne/preprocessing/tests/test_optical_density.py:29: in test_optical_density
    raw = optical_density(raw)
mne/preprocessing/nirs/_optical_density.py:47: in optical_density
    np.log(raw._data[ii], out=raw._data[ii])
E   RuntimeWarning: invalid value encountered in log
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /home/circleci/project/mne/preprocessing/nirs/_optical_density.py(47)optical_density()
-> np.log(raw._data[ii], out=raw._data[ii])
(Pdb) p before
array([0.98274385, 0.97102981, 1.00724412, 0.89445025, 1.04150874,
       1.11441121, 0.85187457, 0.97669253, 0.95033699, 1.08958501,
       0.9713925 , 1.17180138, 0.97692903])
(Pdb) p raw._data[ii]
array([        nan,         nan, -4.93117598, -0.11154599, -3.2022553 ,
        0.10832621, -0.16031598,         nan,         nan,  0.0857969 ,
               nan,  0.15854221,         nan])

After some more whittling, even this does it:

$ python -c "import numpy as np; x = np.array([0.98274385, 0.97102981]); np.log(x, out=x)"
<string>:1: RuntimeWarning: invalid value encountered in log

If there is one element there is no error. If the out=x is omitted, there is no error.

I then went a step further and did sudo apt-get install libopenblas-dev, cloned numpy, pip install cython then pip install --no-use-pep517 -ve .ed it, and replicated the error. Then git bisect iterating over my pip install and python -c lines and found 34d2672 to be the culprit. Hopefully with this and knowing the conditions (>= 2 elements, in-place operation of log) is enough to fix the problem. Or if it's not, enough to revert 34d2672 until it can be figured out.

larsoner · 2020-11-12T16:57:32Z

... and I can confirm that a git revert 34d2672 on top of latest master makes things work again.

I am out of my depth on this, but in that commit message, this line in particular seems suspicious, due to the SIMD and in-place requirements for the failure:

- fix SIMD memory overlap check for aliasing(same ptr & stride)

seberg · 2020-11-12T17:37:36Z

OK, I am 99% sure I found the issue. The gcc loop uses reuses the previous values if they are close to 1 (where the fast algorithm misbehaves).

The values that go to invalid values, are those. Now the m1 & m2 mask will use the already updated values (basically call the whole thing a second time), because of the new commit which fixed that we want to call the fast loops for in-place operations.

So what is needed here is a load_mask = load_mask & glibc_mask to ensure those values are not overwritten.

To be honest, there are more bugs here, because also glibc_mask must be masked with load_mask or it may read beyond the end of the array. And the above invalid_mask that I mentioned may also be wrong...

mattip · 2020-11-12T18:17:37Z

ping @seiko2plus

seiko2plus · 2020-11-12T20:41:13Z

@larsoner, my bad, this bug is unfairly caused by #16247 due to allowing vectorize aliasing(same pointer and stride),
which leads to what @seberg mentioned, "reuses the previous values" during the fallback to libc.

I will release a fix for it, as fast workaround you can disable AVX512 in runtime via environment variable NPY_DISABLE_CPU_FEATURES.

export NPY_DISABLE_CPU_FEATURES="AVX512F AVX512_SKX"

r-devulap · 2020-11-12T22:38:23Z

@larsoner thanks for narrowing down the commit id!

larsoner mentioned this issue Nov 12, 2020

WIP: Move NIRS tests to appropriate directory mne-tools/mne-python#8504

Merged

This was referenced Nov 12, 2020

MAINT: minimum_phase test failing scipy/scipy#13074

Closed

Add Taylor Window function - Common in Radar DSP scipy/scipy#12777

Merged

charris added the 00 - Bug label Nov 12, 2020

charris added this to the 1.20.0 release milestone Nov 12, 2020

larsoner mentioned this issue Nov 12, 2020

MAINT: Add interactive test mode [skip travis] [skip github] mne-tools/mne-python#8508

Merged

charris added the 09 - Backport-Candidate PRs tagged should be backported label Nov 12, 2020

charris removed this from the 1.20.0 release milestone Nov 12, 2020

charris added this to the 1.19.5 release milestone Nov 12, 2020

seiko2plus mentioned this issue Nov 12, 2020

SIMD, BUG: fix reuses the previous values during the fallback to libc #17763

Merged

charris closed this as completed in #17763 Nov 14, 2020

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Nov 14, 2020

charris modified the milestones: 1.19.5 release, 1.20.0 release Nov 14, 2020

cglewis mentioned this issue Nov 24, 2020

rolling back numpy for confidence values on arm64 faucetsdn/NetworkML#754

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Invalid value in inplace log #17761

BUG: Invalid value in inplace log #17761

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: Invalid value in inplace log #17761

BUG: Invalid value in inplace log #17761

Comments

Reproducing code example:

Error message:

NumPy/Python version information:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!