8000 [MRG+1] Fix scipy-dev-wheels build by lesteve · Pull Request #11251 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] Fix scipy-dev-wheels build #11251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 24, 2018

Conversation

lesteve
Copy link
Member
@lesteve lesteve commented Jun 13, 2018

scipy-dev-wheels Cron daily build has been failing for a while. One error was due to a regression in numpy and has been recently fixed in numpy (see numpy/numpy#11305 for more details) and another one is fixed by this PR. Fix #11194.

https://travis-ci.org/scikit-learn/scikit-learn/builds/391183710#L2385

AssertionError: Got warnings when calling fit: [{message : PendingDeprecationWarning('the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.',)

assert_no_warnings is getting plenty of new PendingDeprecationWarning about
using numpy arrays rather than matrices.

If you can think of a better way of tackling this, better suggestions would be more than welcome!

assert_no_warnings is getting plenty of new PendingDeprecationWarning about
using numpy arrays rather than matrices.
@jnothman
Copy link
Member

Fixes #11194

@jnothman
Copy link
Member

Or does it?

assert_no_warnings(IsolationForest(max_samples='auto').fit, X)
assert_no_warnings(IsolationForest(max_samples=np.int64(2)).fit, X)
8000 with pytest.warns(None) as record:
IsolationForest(max_samples='auto').fit(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this result in a PendingDeprecationWarning as it stands?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does anything result in a PendinDeprecationWarning? There are no matrices in sklearn except for a test in the input validation.

@rth
Copy link
Member
rth commented Jun 13, 2018

Just to make sure, I got it right - is numpy/numpy#11305 (memmap shape issue) the issue that's causing the test failure due to PendingDeprecationWarning ? Or are those two separate issues?

If I understand correctly the first one was a regression in numpy that got fixed, why then do we need to take action on the scikit-learn side?

@lesteve
Copy link
Member Author
lesteve commented Jun 13, 2018

Sorry this PR fixes #11194, i.e. the PendingDeprecationWarning. The other one was visible in some scipy-dev-wheels builds (e.g. this one) but has been fixed since in numpy.

@jnothman
Copy link
Member

Can you do a travis run here with the dev wheels to be sure?

@lesteve
Copy link
Member Author
lesteve commented Jun 13, 2018

Good point, I just launched a Cron daily job on my branch. The results should be available here in 5-10 minutes.

@lesteve
Copy link
Member Author
lesteve commented Jun 13, 2018

FYI: the Cron job passed.

@qinhanmin2014
Copy link
Member

I'm fine with the PR which makes the test more precise, but still don't understand how it fixes the issue. Seems that the PR is only ignoring warnings instead of getting rid of warnings, so users will still get these warnings?

@jnothman
Copy link
Member

Yes, that's what I was implying by #11251 (comment)

@lesteve
Copy link
Member Author
lesteve commented Jun 13, 2018

Here is where the warning happens:

for i, (tree, features) in enumerate(zip(self.estimators_,
self.estimators_features_)):
if subsample_features:
X_subset = X[:, features]
else:
X_subset = X
leaves_index = tree.apply(X_subset)
node_indicator = tree.decision_path(X_subset)
n_samples_leaf[:, i] = tree.tree_.n_node_samples[leaves_index]
depths[:, i] = np.ravel(node_indicator.sum(axis=1))
depths[:, i] -= 1

  • line 342: node_indicator is a CSR matrix.
  • line 344: node_indicator.sum(axis=1) is a np.matrix which causes the warning.

So the warning happens in scipy and I don't see an obvious way to avoid it.

A few side-comments:

  • PendingDeprecationWarning are ignored by default warnings.filters
  • when you import scipy.sparse, an additional filter is added which ignore any warning with a message that matches 'the matrix subclass is not the recommended way'

All in all it feels like this warning turning into an error is not going to happen any time soon. I googled a bit and found ~2020. Here is an excerpt from this post from Nathaniel Smith on the numpy mailing list.

I doubt we'll be able to remove either Python 2 or matrix support before 2020 at the earliest

Here is an excerpt from this post in the same numpy thread from Ralf Gommers:

Timeline could be something like:

  1. Now: create new package, deprecate np.matrix in docs.
  2. In say 1.5 years: start issuing visible deprecation warnings in numpy
  3. After 2020: remove matrix from numpy.

For more context, numpy/numpy#10142 (that added the PendingDeprecationWarning in np.matrix.__new__) and scipy/scipy#8887 that added the additional warnings filter to ignore warning message matching 'the matrix subclass is not the recommended way'.

@jnothman
Copy link
Member
jnothman commented Jun 13, 2018 via email

@lesteve
Copy link
Member Author
lesteve commented Jun 14, 2018

so performing the sum is where you get the warning?

Yes, because csr_matrix.sum(axis=1) returns a np.matrix.

@lesteve
Copy link
Member Author
lesteve commented Jun 14, 2018

Just for completeness: the reason we get the warning is because we are using warnings.simplefilter('always') in assert_no_warnings. At the moment we use warnings.simplefilter('always') in all our assert_*warn* functions but we ignore np.VisibleDeprecationWarnings. One possible change would be to also ignore PendingDeprecationWarning.

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lesteve I just want to make sure that users won't get these warnings (seems that it's the case on my PC), otherwise I think we should keep the issue open.

@lesteve
Copy link
Member Author
lesteve commented Jun 14, 2018

@lesteve I just want to make sure that users won't get these warnings

The users are not going to get these warnings. There are at least two layers of protection:

  • PendingDeprecationWarning is ignored by default
  • even if you tweak warning filters to show PendingDeprecationWarning, e.g. by using PYTHONWARNINGS=always::PendingDeprecationWarning, import scipy.sparse is adding another filter that is going to ignore warnings messages matching 'the matrix subclass is not the recommended way'

@lesteve
Copy link
Member Author
lesteve commented Jun 14, 2018

Having said that I feel like a better way to do it would be to ignore PendingDeprecationWarning in the assert_*warn* functions. We already have a precedent of ignoring np.VisibleDeprecationWarning.

8000
@jnothman
Copy link
Member
jnothman commented Jun 14, 2018 via email

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1 to merge and close the issue. Thanks @lesteve for the great work and detailed explanations.
Also +1 to filter PendingDeprecationWarning (ensure assert_*warn to only check warnings endogenous to
scikit-learn is definitely better but I wonder whether it's possible).

@qinhanmin2014 qinhanmin2014 changed the title [MRG] Fix scipy-dev-wheels build [MRG+1] Fix scipy-dev-wheels build Jun 14, 2018
@qinhanmin2014
Copy link
Member

ping @jnothman @rth for a second review.

@amueller
Copy link
Member

I don't understand, where are we using matrices?

@qinhanmin2014
Copy link
Member

@amueller See #11251 (comment) from @lesteve . I think this PR makes the test more robust by only detecting certain kind of warning (at the same time solve the cron failure). Since users will not get PendingDeprecationWarning, we're going to ignore it in the assert_*warn* functions (See #11251 (comment)).

@amueller
Copy link
Member
amueller commented Jun 16, 2018

Thanks @qinhanmin2014 hadn't seen that. Is scipy fixing the internal warnings, though, so that summing sparse matrices will return np.ndarray? I'm surprised this is the only place where we would sum a sparse matrix. Or is this the only place where we sum a sparse matrix and check for warnings? That seems very fragile.

@qinhanmin2014
Copy link
Member

Is scipy fixing the internal warnings, though, so that summing sparse matrices will return np.ndarray

If I understand correctly, scipy is simply ignoring the warning.
Also see #11251 (comment): For more context, numpy/numpy#10142 (that added the PendingDeprecationWarning in np.matrix.new) and scipy/scipy#8887 that added the additional warnings filter to ignore warning message matching 'the matrix subclass is not the recommended way'.

is this the only place where we sum a sparse matrix and check for warnings?

I think it is, since assert_no_warnings doesn't occur very often.

ping @lesteve for a double check and more insights maybe?

@lesteve
Copy link
Member Author
lesteve commented Jun 18, 2018

is this the only place where we sum a sparse matrix and check for warnings?

I think it is, since assert_no_warnings doesn't occur very often.

Yep agreed (a quick git grep indicates 60 assert_no_warnings in our codebase)

@jnothman jnothman merged commit 60f887e into scikit-learn:master Jun 24, 2018
@jnothman
Copy link
Member

We should still consider whether assert_no_warnings deserves adjustment, but I thought this was better merged so that it's easy to identify any ongoing issues in the cron build.

@drorata
Copy link
Contributor
drorata commented Oct 8, 2018

I am a user and I do get the following warning:

PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
  return matrix(data, dtype=dtype, copy=False)

I'm trying to figure out which part of the code is behind this, but I haven't succeeded yet :( I suspect it has to do with sklearn.feature_extraction.text.CountVectorizer which yields a sparse matrix.

EDIT

This is not a minimal example, but here's how I get the warning:

Create foo.py:

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from numpy.random import choice
from faker import Faker

faker = Faker()


def main():
    WORDS = [faker.word() for i in range(1000)]
    TARGETS = ['hello', 'world']
    N = 10000

    X = [
        list(choice(WORDS, size=4)) for i in range(N)
    ]
    y = [choice(TARGETS) for i in range(N)]

    print(X[:10])

    ppl = Pipeline(
        [
            ('cver', CountVectorizer(analyzer=set)),
            ('clf', LinearSVC())
        ]
    )

    ppl.fit(X, y)
    return True

and a testing function test_foo.py:

from content_tagging_srv import foo


def test_foo():
    foo.main()

When running the test using pytest, I get the following message twice:

/Users/user/anaconda3/envs/test-env/lib/python3.6/site-packages/numpy/matrixlib/defmatrix.py:68: PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
  return matrix(data, dtype=dtype, copy=False)

@rth
Copy link
Member
rth commented Oct 8, 2018

@drorata thanks for the report, I can confirm, opened an issue in #12327 -- let's move the discussion there.

You can run your code with python -W error::PendingDeprecationWarning to error on this warning which will show you where it originates from.

@drorata
Copy link
Contributor
drorata commented Oct 8, 2018

@rth How can I do it from pytest?

@lesteve lesteve deleted the fix-scipy-dev-wheels-build branch October 9, 2018 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NumPy dev causes test errors due to use of np.matrix
6 participants
0