-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] Fix scipy-dev-wheels build #11251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Fix scipy-dev-wheels build #11251
Conversation
assert_no_warnings is getting plenty of new PendingDeprecationWarning about using numpy arrays rather than matrices.
Fixes #11194 |
Or does it? |
assert_no_warnings(IsolationForest(max_samples='auto').fit, X) | ||
assert_no_warnings(IsolationForest(max_samples=np.int64(2)).fit, X) | ||
8000 with pytest.warns(None) as record: | ||
IsolationForest(max_samples='auto').fit(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this result in a PendingDeprecationWarning as it stands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does anything result in a PendinDeprecationWarning? There are no matrices in sklearn except for a test in the input validation.
Just to make sure, I got it right - is numpy/numpy#11305 (memmap shape issue) the issue that's causing the test failure due to If I understand correctly the first one was a regression in numpy that got fixed, why then do we need to take action on the scikit-learn side? |
Can you do a travis run here with the dev wheels to be sure? |
Good point, I just launched a Cron daily job on my branch. The results should be available here in 5-10 minutes. |
FYI: the Cron job passed. |
I'm fine with the PR which makes the test more precise, but still don't understand how it fixes the issue. Seems that the PR is only ignoring warnings instead of getting rid of warnings, so users will still get these warnings? |
Yes, that's what I was implying by #11251 (comment) |
Here is where the warning happens: scikit-learn/sklearn/ensemble/iforest.py Lines 335 to 345 in bb38539
So the warning happens in scipy and I don't see an obvious way to avoid it. A few side-comments:
All in all it feels like this warning turning into an error is not going to happen any time soon. I googled a bit and found ~2020. Here is an excerpt from this post from Nathaniel Smith on the numpy mailing list.
Here is an excerpt from this post in the same numpy thread from Ralf Gommers:
For more context, numpy/numpy#10142 (that added the PendingDeprecationWarning in |
so performing the sum is where you get the warning? or passing the matrix
to ravel? if the latter, we can avoid it by using toarray explicitly, but
should probably suggest numpy that they suppress the warning in ravel
because it is used to translate a np.matrix column/row vector into a 1d
array.
|
Yes, because |
Just for completeness: the reason we get the warning is because we are using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lesteve I just want to make sure that users won't get these warnings (seems that it's the case on my PC), otherwise I think we should keep the issue open.
The users are not going to get these warnings. There are at least two layers of protection:
|
Having said that I feel like a better way to do it would be to ignore PendingDeprecationWarning in the |
it may even be fine for assert_*warn to only check warnings endogenous to
scikit-learn....
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 to merge and close the issue. Thanks @lesteve for the great work and detailed explanations.
Also +1 to filter PendingDeprecationWarning (ensure assert_*warn to only check warnings endogenous to
scikit-learn is definitely better but I wonder whether it's possible).
I don't understand, where are we using matrices? |
@amueller See #11251 (comment) from @lesteve . I think this PR makes the test more robust by only detecting certain kind of warning (at the same time solve the cron failure). Since users will not get PendingDeprecationWarning, we're going to ignore it in the |
Thanks @qinhanmin2014 hadn't seen that. Is scipy fixing the internal warnings, though, so that summing sparse matrices will return np.ndarray? I'm surprised this is the only place where we would sum a sparse matrix. Or is this the only place where we sum a sparse matrix and check for warnings? That seems very fragile. |
If I understand correctly, scipy is simply ignoring the warning.
I think it is, since ping @lesteve for a double check and more insights maybe? |
Yep agreed (a quick |
We should still consider whether assert_no_warnings deserves adjustment, but I thought this was better merged so that it's easy to identify any ongoing issues in the cron build. |
I am a user and I do get the following warning:
I'm trying to figure out which part of the code is behind this, but I haven't succeeded yet :( I suspect it has to do with EDIT This is not a minimal example, but here's how I get the warning: Create from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from numpy.random import choice
from faker import Faker
faker = Faker()
def main():
WORDS = [faker.word() for i in range(1000)]
TARGETS = ['hello', 'world']
N = 10000
X = [
list(choice(WORDS, size=4)) for i in range(N)
]
y = [choice(TARGETS) for i in range(N)]
print(X[:10])
ppl = Pipeline(
[
('cver', CountVectorizer(analyzer=set)),
('clf', LinearSVC())
]
)
ppl.fit(X, y)
return True and a testing function from content_tagging_srv import foo
def test_foo():
foo.main() When running the test using
|
@rth How can I do it from |
scipy-dev-wheels Cron daily build has been failing for a while. One error was due to a regression in numpy and has been recently fixed in numpy (see numpy/numpy#11305 for more details) and another one is fixed by this PR. Fix #11194.
https://travis-ci.org/scikit-learn/scikit-learn/builds/391183710#L2385
assert_no_warnings is getting plenty of new PendingDeprecationWarning about
using numpy arrays rather than matrices.
If you can think of a better way of tackling this, better suggestions would be more than welcome!