8000 Tests failing on master due to tree recythonization issues · Issue #7094 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Tests failing on master due to tree recythonization issues #7094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nelson-liu opened this issue Jul 27, 2016 · 18 comments · Fixed by #7719
Closed

Tests failing on master due to tree recythonization issues #7094

nelson-liu opened this issue Jul 27, 2016 · 18 comments · Fixed by #7719

Comments

@nelson-liu
Copy link
Contributor
nelson-liu commented Jul 27, 2016

Description

The last commit to master, 376aa50 (from #6954) is failing tests on travisCI (although the PR passed all tests). The reason it's doing so is because of a phenomenon that people who work with the tree module are quite familiar with --- for some reason, it isn't recythonizing all of the dependencies and thus throws an ImportError.

This behavior has been reported several times, e.g. at:
#4899 (comment)

Steps/Code to Reproduce

see: https://travis-ci.org/scikit-learn/scikit-learn/jobs/147795508 line 2226

Expected Results

Tests pass

Actual Results

Tests fail

======================================================================
ERROR: Failure: ValueError (sklearn.tree._tree.TreeBuilder has the wrong size, try recompiling. Expected 72, got 64)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/scikit-learn/scikit-learn/testvenv/local/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/travis/build/scikit-learn/scikit-learn/testvenv/local/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/travis/build/scikit-learn/scikit-learn/testvenv/local/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/travis/sklearn_build_ubuntu/scikit-learn/sklearn/ensemble/__init__.py", line 17, in <module>
    from .gradient_boosting import GradientBoostingClassifier
  File "/home/travis/sklearn_build_ubuntu/scikit-learn/sklearn/ensemble/gradient_boosting.py", line 36, in <module>
    from ._gradient_boosting import predict_stages
  File "sklearn/tree/_tree.pxd", line 84, in init sklearn.ensemble._gradient_boosting (sklearn/ensemble/_gradient_boosting.c:19325)
ValueError: sklearn.tree._tree.TreeBuilder has the wrong size, try recompiling. Expected 72, got 64

Misc Comments

I've been using hacky methods to get around this for awhile (e.g. manually adding a change to the file that isn't being recythonized), but it also seems to work if someone clears the travis cache. However, these are annoying "solutions" -- is there anyway we can fix the behavior? Namely, it seems that _gradient_boosting.pyx isn't being recythonized when things change in the tree module.

@amueller
Copy link
Member

@arthurmensch you're the expert here. This is really not great at the moment...

@nelson-liu
Copy link
Contributor Author

travis is green again, but this remains an inconvenience. i'm closing this for now.

@rasbt
Copy link
Contributor
rasbt commented Aug 7, 2016

Hm, I am not a Travis expert either, but would it maybe help to use the "Trusty" environment?

Travis CI runs each Trusty build in an isolated Google Compute Engine virtual machine that offer a vanilla build environment for every build. This has the advantage that no state persists between builds, offering a clean slate and making sure that your tests run in an environment built from scratch. Builds have access to a variety of services for data storage and messaging, and can install anything that’s required for them to run.

https://docs.travis-ci.com/user/trusty-ci-environment/

It says it's beta but I haven't had any issues with that so far. Maybe it's worth a try?
All that needs to be changed are the following 2 lines at the top

sudo: required
dist: trusty
language: python
matrix:
    include:
        - python: 3.5
          env: LATEST="false" TENSORFLOW="false" COVERAGE="false" NUMPY_VERSION="1.10.4" SCIPY_VERSION="0.17" SKLEARN_VERSION="0.17" PANDAS_VERSION="0.17.1" MATPLOTLIB_VERSION="1.5.1"
...

@nelson-liu nelson-liu reopened this Aug 7, 2016
@nelson-liu
Copy link
Contributor Author

Hmm, how is using trusty different from just turning off build caching all together? Can you even do the latter?

@rasbt
Copy link
Contributor
rasbt commented Aug 7, 2016

how is using trusty different from just turning off build caching all together?

sorry, that I don't know ;). The reason why is started using trusty was because of other issues I had when compiling tensorflow. If I remember correctly, there were some issues with some C libraries, and trusty is more up to date regarding the toolchain (plus, I noticed that builds are much faster). In any case, I just thought that may be interesting, but turning off caching may have the same effect here.

@nelson-liu
Copy link
Contributor Author

Ah I see, thanks for the suggestion, I'd be in favor of turning off the cache / using trusty (I'll first read a bit more into the docs ;) )

@rasbt
Copy link
Contributor
rasbt commented Aug 7, 2016

Sure. The specific issue I had with the default travis environment was more like

ImportError: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.17' not found (required by /home/travis/miniconda/envs/test-environment/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so)

Just noticed the

no state persists between builds, ... making sure that your tests run in an environment built from scratch.

line in the trusty doc and thought it could be relevant. I am really curious about the diff between this and the regular environment without caching (I think there must be some diff., otherwise they wouldn't write this out so explicitly in the docs!?)

@arthurmensch
Copy link
Contributor

I am away from computer until august 16th (I have only my phone right now),
and I will work on it as soon as I get it back. I guess a quick work around
would be to disable Travis cache until we fix the issue.
Le 8 août 2016 2:40 AM, "Sebastian Raschka" notifications@github.com a
écrit :

Sure. The specific issue I had with the default travis environment was
more like

ImportError: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.17' not
found (required by /home/travis/miniconda/envs/
test-environment/lib/python3.5/site-packages/tensorflow/
python/_pywrap_tensorflow.so)

Just noticed the

no state persists between builds, ... making sure that your tests run in
an environment built from scratch.

line in the trusty doc and thought it could be relevant. I am really
curious about the diff between this and the regular environment without
caching (I think there must be some diff., otherwise they wouldn't write
this out so explicitly in the docs!?)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7094 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD48YzRWwBN_CN6pJRG1I3F3oqDo-YyOks5qdiaYgaJpZM4JWjXS
.

@raghavrv
Copy link
Member

I guess a quick work around would be to disable Travis cache until we fix the issue.

+1 for that!

@amueller
Copy link
Member

ping @arthurmensch :)

@raghavrv
Copy link
Member
raghavrv commented Oct 19, 2016

Could I kindly dearly gently but adamantly re-request that the travis cache be disabled until we fix it to not fail on PRs that work perfectly?!! :) This wastes the developer's time in searching for a solution that does not exist.

We save a maximum of 10 minutes by caching the compilation, but appveyor builds typically take 30 mins to pass... Travis is never the bottle neck...

@raghavrv
Copy link
Member
raghavrv commented Oct 19, 2016

I think, I know why this fails, so our cythonize.py re-cythonizes only the changed cython files. Now when the generated c for that file exists and has been compiled before, ccache doesn't re-compile it.

The problem happens at a particular class in a file A that has dependencies in another file B that has been edited/recythonized and hence re-compiled. I think file A is not re-cythonized and hence not re-compiled (and is instead taken from cache)... And hence the class at file A complains that the dependent class in file B is of wrong size...

If my understanding is right, we need to re-cythonize all the interdependent files. (This is somewhat similar to the my fix for pxd...)

@amueller
Copy link
Member

what do you mean by dependency?

I'm pretty sure travis is often the bottle neck and changing cython is relatively rare to the number of all PRs. Fix welcome though ;)

@nelson-liu
Copy link
Contributor Author

I think raghav is on the right track, several times I've modified tree cython files and gotten this error in gradient boosting.

@raghavrv
Copy link
Member
raghavrv commented Oct 19, 2016

what do you mean by dependency?

Not sure I used the right term here but I mean splitter.pyx's generated c (or maybe it needs to re-cythonized too?) needs to be recompiled if there is a change in criterion.pyx...

@raghavrv
Copy link
Member
raghavrv commented Oct 19, 2016

For anyone who wants to quickly verify if the travis cache is the reason for failure try this

curl -Ls https://goo.gl/fj71qd | git apply -v --index; git commit -m "NOMERGE use new cache dir travis";

and push to your branch.

@raghavrv
Copy link
Member
raghavrv commented Oct 19, 2016

Ah wait this patch is better. This will delete the cythonize.dat from cache and force a recythonizing of all the cython files.

curl -Ls https://goo.gl/fprtqz | git apply -v --index; git commit -m "NOMERGE recythonize all";

and push to your branch.

Make sure to delete this commit later, once the tests pass and force push again.

Or

curl -Ls https://goo.gl/D3sF0m | git apply -v --index; git commit -m "UNDO recythonize all";

and push.

@lesteve
Copy link
Member
lesteve commented Oct 20, 2016

A quick stop-gap solution would be to always do make clean on Travis, this way you always recythonize from scratch and you don't have this dependency issue.

Cythonization of all files takes 40s on my machine so I agree the caching is not that crucial. Looking at timings on Travis most build take 8-12 minutes so you are looking at saving at most 8% of build time for each individual build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants
0