-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] Move binder setup to scikit-learn/scikit-learn repo #14719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Move binder setup to scikit-learn/scikit-learn repo #14719
Conversation
d35a7a5
to
2547214
Compare
I am slightly confused but this sounds very impressive to me nonetheless :-) I wonder if you'd be interested in writing up a short post just describing this build system? |
I think the main take-home message is: notebooks don't need to be in the git repo but in the binder docker image. This may well be known (and used) already ... IMO this was the main reason #11221 stalled. |
The link looks good (the binder URL uses Binder link looks goodIn this scikit-learn example, the link looks good (the main thing to check is that it is pointing to scikit-learn/scikit-learn): Binder build failure detailsIt looks like the Binder sees the |
For me the link you posted works. So there is no build failure any more? With a I like this because it is much simpler than having two repos. One drawback could be that the image that is produced is much larger because it contains the git history of the scikit-learn repository but you take care of this by deleting the actual repo. So I think that isn't a problem. How will released versions work? It would be cool if One "easy" (but not super nice) way out is to actually build scikit-learn from the checked out repo when building the docker image. For this we'd need cython (and other build time deps). Builds would take longer and the resulting docker image might also be bigger. Another way that I think would be nicer but not sure if it is feasible is to not include scikit-learn in the Maybe there is a third way? |
scikit-image | ||
pandas | ||
sphinx-gallery | ||
scikit-learn>=0.22.dev0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my education what does --find-links https://sklearn-nightly.scdn8.secure.raxcdn.com scikit-learn
and scikit-learn>=0.22.dev0
do? Is the result that you always install the latest nightly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Maybe using the --pre
flag and the --find-links URL
flag would be enough (instead of putting the version constraint for scikit-learn).
https://scikit-learn.org/stable/developers/advanced_installation.html#installing-nightly-builds
Sorry I should have been more precise: everything works fine as far as binder is concerned (the binder link you mention is using my PR branch and works as you noticed). My comment was more a warning for reviewers of this PR that if they go to the documentation build of this PR the binder links won't work (but that this is completely expected)
I have this on my radar once things stabilize a bit. For scikit-learn, we have a branch for each stable release plus its bug-fix releases, e.g.
I was doing this at one point in #11221 but to make the build faster, I switch to using scikit-learn nightly development wheels. These development wheels are on |
Any comments on this one? I think this is a low risk PR to merge, since this will only affect the dev doc. I am reasonably certain that everything will work as expected once the PR is merged in master. If there are any problems, I'll try to fix them at short notice. Having said that, any feed-back on the |
.binder/postBuild
Outdated
TMP_CONTENT_DIR=/tmp/scikit-learn | ||
mkdir -p $TMP_CONTENT_DIR | ||
cp -r examples .binder $TMP_CONTENT_DIR | ||
rm -rf * .* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one is scary even if it's in a postBuild
script. I can always imagine something running this file by accident. Is there a way to be more specific about the wildcard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rth I pushed a commit to avoid running outside of repo2docker
context. That should avoid people running postBuild
locally and deleting all their files in their current directory.
@betatim: to figure out whether I am in a repo2docker
context I am using the following lines below, let me know if there is a more robust way to do that:
if [[ -z "$NB_USER" ]] || [[ "$USER" != "$NB_USER" ]]; then
echo "This script is supposed to run in a repo2docker context."
echo "Exiting because this script can remove files if run locally.\n"
exit 1
fi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I don't think we currently have a good way for scripts during build or run time to detect that they are inside a repo2docker run.
I'd probably check the REPO_DIR
environment variable to know "I am in repo2docker land" as well as /.dockerenv
which is a docker specific thing to tell if you are inside a container or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your input @betatim. I think it's fine to only check whether we are inside a docker container actually so I went for checking /.dockerenv
.
More details:
- I googled a bit and it looks like
/.dockerenv
may not be reliable in the long-term. For example this seems to advise against relying on/.dockerenv
. Having said that,./dockerenv
is still used (at the time of writing) indocker/libnetwork
see this and this comment was from 3 years ago, so maybe that's robust enough for our purposes. - Another way is to check if we are running inside a docker container is to check if docker is in
/proc/1/cgroup
e.g. a small adaptation of this:
INSIDE_DOCKER=$(grep -q docker /proc/1/cgroup && echo "yes" || echo "no")
- Overall
/.dockerenv
feels simpler and robust enough for our purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. There is only one way to know if it really works though :)
Merged. If it's broken we can always easily revert to the previous solution if it's no easy to fix. |
Seems like it is working (as expected): |
Confirmed! |
FYI: I removed the |
This simplifies the setup in #11221: there is no need for a side-car repo for binder (which I realised only recently).
The advantage is that the binder configuration is now in the scikit-learn/scikit-learn repo rather than scikit-learn/binder-examples.
cc @betatim and @choldgraf that may be interested for advanced use cases of binder + sphinx-gallery. I seem to remember matplotlib was one such example because they don't have a repo for the documentation build (they just upload the
_build/html
to a server). The main points are:postBuild
step notebooks are generated from the.py
examples via thesphx_glr_python_to_jupyter.py
sphinx-gallery scriptAn example notebook from my PR branch.
Once the doc build is done, I'll double-check that the binder links work as expected.