8000 [MRG] test against numpy dev by ogrisel · Pull Request #5609 · scipy/scipy · GitHub
[go: up one dir, main page]

Skip to content

[MRG] test against numpy dev #5609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 29, 2015
Merged

[MRG] test against numpy dev #5609

merged 2 commits into from
Dec 29, 2015

Conversation

ogrisel
Copy link
Contributor
@ogrisel ogrisel commented Dec 15, 2015

This is work in progress for a fix for #5379 building upon the work done for numpy/numpy#6493.

/The current scipy test fails with:

======================================================================
ERROR: test_spearmanr (test_mstats_basic.TestCompareWithStats)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/scipy/stats/tests/test_mstats_basic.py", line 974, in test_spearmanr
    r, p = stats.spearmanr(x, y)
  File "/home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/scipy/stats/stats.py", line 3314, in spearmanr
    t = rs * np.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
RuntimeWarning: invalid value encountered in sqrt

======================================================================
ERROR: test_tie1 (test_stats.TestCorrSpearmanrTies)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/scipy/stats/tests/test_stats.py", line 673, in test_tie1
    sr = stats.spearmanr(x, y)
  File "/home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/scipy/stats/stats.py", line 3314, in spearmanr
    t = rs * np.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
RuntimeWarning: invalid value encountered in sqrt

----------------------------------------------------------------------

It might be revealing a regression in numpy master or it might be caused by something that I don't understand.

I am wondering if we should put this numpy dev entry of the scipy build matrix in allow_failure.

I have not had the opportunity to test the upload section of this PR yet as the tests fail.

@rgommers
Copy link
Member

Those tests error because they give warnings that should be filtered out, and numpy master converts DeprecationWarning and RuntimeWarning to errors to nudge us to do that filtering.

I don't see these warnings locally, but it's easy to filter with warnings.filterwarnings, there's a lot of examples in test_stats.py that can be copied.

@rgommers
Copy link
Member

allow_failure may make sense, so we don't get all PR statuses red if numpy master has a temporary issue. I can't find an example of what a failed allow_failure would look like right now though.

@rgommers rgommers added the maintenance Items related to regular maintenance tasks label Dec 16, 2015
@ogrisel ogrisel force-pushed the travis-wheel branch 3 times, most recently from e974097 to df87dd4 Compare December 17, 2015 16:27
@ogrisel
Copy link
Contributor Author
ogrisel commented Dec 17, 2015

I can reproduce the runtime warning with numpy master. Will have to investigate.

@rgommers
Copy link
Member

It's not at all visible now that the test actually failed. I suspect that no one will now look at the test result against numpy master, and hence it doesn't help much. So I guess just failing is better - if numpy master has a regression that takes a while to fix, we can temporarily disable the test run. Does that make sense?

@ogrisel
Copy link
Contributor Author
ogrisel commented Dec 24, 2015

@rgommers ok I removed the allow_failures configuration. Furthermore I changed the spearmanr code to not launch warnings on seemingly the valid inputs of test_stats. However that might also silence warnings that one might have considered legit when the inputs are invalid.

@ogrisel ogrisel changed the title [WIP] test against numpy dev [MRG] test against numpy dev Dec 24, 2015
@ogrisel
Copy link
Contributor Author
ogrisel commented Dec 28, 2015

I have amended my commit to use the shorter travis-dev-wheels.scipy.org URL instead of the long rackcdn.com URL. If the tests pass, I think this is ready for merge. The real upload mechanism can only be tested once we merge this PR to master though.

@ev-br
Copy link
Member
ev-br commented Dec 28, 2015

While it's a minor quibble, I do think it's better to filter the warnings in the tests rather than the library code (#5621 (comment) lists the exact warnings).

@larsoner
Copy link
Member

@ev-br you mean catch the warnings using e.g. warnings.catch_warnings(record=True) and then check to make sure that any caught warnings have one of the accepted messages (or some accepted message forms)?

@ev-br
Copy link
Member
ev-br commented Dec 28, 2015

@njsmith
Copy link
njsmith commented Dec 28, 2015

IIUC the specific issue here is that scipy is using numpy.testing, and numpy.testing actually has different warning handling behavior depending on whether numpy is a released version or not. This seems kinda wrong, though -- surely what we actually want is that numpy.testing should act differently depending on whether the code being tested is a released version or not. For numpy itself it doesn't make any difference because we're using numpy.testing to test numpy, but when numpy.testing is testing scipy then it does make a difference. Maybe we should fix that?

(This is of course separate from the more general issue of how we want to handle breakage caused by numpy skew, allow_failure, etc.)

@ogrisel
Copy link
Contributor Author
ogrisel commented Dec 29, 2015

@ev-br IMHO, scipy.stats.spearmanr should not raise any warning for valid input.

With numpy 1.10.2 the following does not raise a warning:

>>> from scipy import stats
>>> stats.spearmanr([1.0, 2.0, 3.0, 4.0], [1.0, 2.0, 2.0, 3.0])
SpearmanrResult(correlation=0.94868329805051377, pvalue=0.051316701949486225)

while it does with numpy master.

The problem with the fact that numpy.testing turns warnings into exception is a separate issue but at least it revealed a problem in the scipy code base IMHO.

@ogrisel
Copy link
Contributor Author
ogrisel commented Dec 29, 2015

For reference here is the behavior observed in a shell (outside of the testing environment) with numpy master:

>>> import numpy as np
>>> from scipy import stats
>>> stats.spearmanr([1.0, 2.0, 3.0, 4.0], [1.0, 2.0, 2.0, 3.0])
/Users/ogrisel/code/scipy/scipy/stats/stats.py:3314: RuntimeWarning: invalid value encountered in sqrt
  t = rs * np.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
/Users/ogrisel/code/scipy/scipy/stats/_distn_infrastructure.py:1748: RuntimeWarning: invalid value encountered in greater
  cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Users/ogrisel/code/scipy/scipy/stats/_distn_infrastructure.py:1748: RuntimeWarning: invalid value encountered in less
  cond1 = (scale > 0) & (x > self.a) & (x < self.b)
8000

/Users/ogrisel/code/scipy/scipy/stats/_distn_infrastructure.py:1749: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= self.a)
SpearmanrResult(correlation=0.94868329805051388, pvalue=0.051316701949486114)

I find this really confusing for the user. The input is valid. The fact that the internals of spearmanr compute stuff with non-finite quantities on the diagonal of the correlation matrix is not the users problem as those quantities are not returned as part the output of the function call.

Here are the details of an ipdb session on numpy master:

ipdb> p rs
array([[ 1.       ,  0.9486833],
       [ 0.9486833,  1.       ]])
ipdb> p 1.0 - rs
array([[  0.00000000e+00,   5.13167019e-02],
       [  5.13167019e-02,  -2.22044605e-16]])
ipdb> p (n-2) / ((rs+1.0)*(1.0-rs))
array([[             inf,   2.00000000e+01],
       [  2.00000000e+01,  -4.50359963e+15]])

The close to zero negative value on the diagonal looks like a usual rounding issue to me.

@ogrisel
Copy link
Contributor Author
ogrisel commented Dec 29, 2015

Here is an alternative fix that does not change the warning handling:

diff --git a/scipy/stats/stats.py b/scipy/stats/stats.py
index 505fe92..36b44ca 100644
--- a/scipy/stats/stats.py
+++ b/scipy/stats/stats.py
@@ -3311,7 +3311,8 @@ def spearmanr(a, b=None, axis=0, nan_policy='propagate'):

     olderr = np.seterr(divide='ignore')  # rs can have elements equal to 1
     try:
-        t = rs * np.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
+        tmp = (n - 2) / ((rs + 1.0) * (1.0 - rs))
+        t = rs * np.sqrt(tmp.clip(0))
     finally:
         np.seterr(**olderr)

I can update my PR if you think this is the right fix.

@njsmith
Copy link
njsmith commented Dec 29, 2015

Yeah, the ideal would actually be that numpy.testing acts the same regardless of whether you're using a dev version of numpy... and that tests on scipy dev versions would always convert warnings into errors, regardless of what version of numpy they were running against.

@ogrisel
Copy link
Contributor Author
ogrisel commented Dec 29, 2015

@njsmith I agree. The behavior of the numpy warnings could be configured in the runtests.py utility, maybe with additional commandline flags.

@ogrisel
Copy link
Contributor Author
ogrisel commented Dec 29, 2015

I went on and pushed the clip-based fix. Let me know what you think about it.

@ev-br
Copy link
Member
ev-br commented Dec 29, 2015

OK, spearmanr glitch is somewhat parallel to uploading wheels. I'll merge this to get keep the latter ball rolling, and we can keep discussing the former, if needed. Thanks Olivier, all.

ev-br added a commit that referenced this pull request Dec 29, 2015
@ev-br ev-br merged commit eecc0ed into scipy:master Dec 29, 2015
@ev-br ev-br added this to the 0.18.0 milestone Dec 29, 2015
@ogrisel
Copy link
Contributor Author
ogrisel commented Dec 29, 2015

Thanks!

@njsmith
Copy link
njsmith commented Dec 30, 2015

See also numpy/numpy#6901, which changes numpy's test-runner to always default to release mode.

njsmith added a commit to njsmith/numpy that referenced this pull request Dec 30, 2015
Our test-runner's raise_warning mode traditionally has varied depending
on whether we have a development or release version of numpy: for
development versions we raise on warnings, and for release versions we
don't. This is all very sensible... *if* you're running numpy's test
suite. But our test-runner is also used by other packages like scipy,
and it doesn't make sense for scipy's raise_warning mode to vary
depending on whether *numpy* is a development or release version. (It
should vary depending on whether the scipy-under-test is a development
or release version.) So this commit moves the numpy-version-dependent
raise_warning logic out of the generic NoseTester class and into
numpy-specific code.

(See scipy/scipy#5609 for more discussion.)
njsmith added a commit to njsmith/scipy that referenced this pull request Dec 30, 2015
We want CI tests to fail when they use deprecated functionality or
otherwise raise spurious warnings, so enable raise_warnings="develop"
for development builds.

See scipygh-5609 for related discussion, and numpy/numpy#6901 for a related
change in numpy.
@rgommers
Copy link
Member

Great to have this merged. I see that Python 3.5 wheels are appearing at http://travis-dev-wheels.scipy.org/. Something is not quite right there though - only the very latest wheel should be kept. Now pip will have no way of knowing what the most recent one is right?

@njsmith
Copy link
njsmith commented Dec 30, 2015

IIUC it keeps the most recent 5, and inserts a timestamp into the version
number so that pip can sort them properly.
On Dec 30, 2015 1:40 AM, "Ralf Gommers" notifications@github.com wrote:

Great to have this merged. I see that Python 3.5 wheels are appearing at
http://travis-dev-wheels.scipy.org/. Something is not quite right there
though - only the very latest wheel should be kept. Now pip will have no
way of knowing what the most recent one is right?


Reply to this email directly or view it on GitHub
#5609 (comment).

@pv
Copy link
Member
pv commented Dec 30, 2015

Does pip sort the file names? Or is the part following + just ignored?

@rgommers
Copy link
Member

IIRC pip just takes a random one. But maybe it got smarter recently?

If there are only 5 files, they'll all be recent. But it's still possible that you would fetch a numpy and a scipy wheel from the wheelhouse that are incompatible - there's a pretty decent chance of that happening actually at the moment that numpy adds something to its API.

@njsmith
Copy link
njsmith commented Dec 30, 2015

Interesting question! I can't find anything definitive in a quick search, but https://bitbucket.org/pypa/pypi-metadata-formats/issues/37/ordering-of-local-segments suggests that at some point the relevant people at least decided that pip should sort the local version specifiers in the way that we want.

@ogrisel
Copy link
Contributor Author
ogrisel commented Dec 30, 2015

Based on my tests pip picks up the latest version including some deterministic ordering on the local version segment which is why I made wheelhouse_uploader insert an upload timestamp as a prefix for the local segment (in the wheel filename only).

By default it keeps the last 5 most recent dev builds (per-project / python version / platform tags). So this should all be fine.

@rgommers
Copy link
Member

okay, thanks for confirming @ogrisel

jaimefrio pushed a commit to jaimefrio/numpy that referenced this pull request Mar 22, 2016
Our test-runner's raise_warning mode traditionally has varied depending
on whether we have a development or release version of numpy: for
development versions we raise on warnings, and for release versions we
don't. This is all very sensible... *if* you're running numpy's test
suite. But our test-runner is also used by other packages like scipy,
and it doesn't make sense for scipy's raise_warning mode to vary
depending on whether *numpy* is a development or release version. (It
should vary depending on whether the scipy-under-test is a development
or release version.) So this commit moves the numpy-version-dependent
raise_warning logic out of the generic NoseTester class and into
numpy-specific code.

(See scipy/scipy#5609 for more discussion.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0