10000 [MRG] Move binder setup to scikit-learn/scikit-learn repo by lesteve · Pull Request #14719 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] Move binder setup to scikit-learn/scikit-learn repo #14719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lesteve
Copy link
Member
@lesteve lesteve commented Aug 22, 2019

This simplifies the setup in #11221: there is no need for a side-car repo for binder (which I realised only recently).

The advantage is that the binder configuration is now in the scikit-learn/scikit-learn repo rather than scikit-learn/binder-examples.

  • this increases the bus factor: I am not the only person to know about how the binder setup works
  • there is no need for the Travis cron job on the sidecar repo to track scikit-learn branches

cc @betatim and @choldgraf that may be interested for advanced use cases of binder + sphinx-gallery. I seem to remember matplotlib was one such example because they don't have a repo for the documentation build (they just upload the _build/html to a server). The main points are:

  • the notebooks live in the docker image rather than in the git repo
  • binder is set-up binder on the source repo (in our case scikit-learn/scikit-learn) rather than on the documentation build repo (in our case scikit-learn/scikit-learn.github.io)
  • during the postBuild step notebooks are generated from the .py examples via the sphx_glr_python_to_jupyter.py sphinx-gallery script
  • generated notebooks needs to be in the right path in the docker image to match the links created by sphinx-gallery in the example HTML

An example notebook from my PR branch.

Once the doc build is done, I'll double-check that the binder links work as expected.

@lesteve lesteve changed the title Move binder setup to scikit-learn/scikit-learn repo [MRG] Move binder setup to scikit-learn/scikit-learn repo Aug 22, 2019
@lesteve lesteve force-pushed the binder-setup-in-scikit-learn-repo branch from d35a7a5 to 2547214 Compare August 22, 2019 13:44
@choldgraf
Copy link
Contributor

I am slightly confused but this sounds very impressive to me nonetheless :-)

I wonder if you'd be interested in writing up a short post just describing this build system?

@lesteve
Copy link
Member Author
lesteve commented Aug 22, 2019

I am slightly confused but this sounds very impressive to me nonetheless :-)

I think the main take-home message is: notebooks don't need to be in the git repo but in the binder docker image. This may well be known (and used) already ... IMO this was the main reason #11221 stalled.

@lesteve
Copy link
Member Author
lesteve commented Aug 22, 2019

Once the doc build is done, I'll double-check that the binder links work as expected.

The link looks good (the binder URL uses scikit-learn/scikit-learn). Note that the binder link is failing at build stage. This is expected because there is no .binder folder on master right now. The binder link will work when this PR is merged.

Binder link looks good

In this scikit-learn example, the link looks good (the main thing to check is that it is pointing to scikit-learn/scikit-learn):
https://mybinder.org/v2/gh/scikit-learn/scikit-learn/master?urlpath=lab/tree/notebooks/auto_examples/plot_isotonic_regression.ipynb

Binder build failure details

It looks like the Binder sees the setup.py (this is the only thing it finds to decides what to install as dependencies). The Binder build fails trying to do pip install . since cython is not installed.

@betatim
Copy link
Member
betatim commented Aug 22, 2019

Binder build failure details

For me the link you posted works. So there is no build failure any more? With a requirements.txt in .binder/ the setup.py should be ignored.

I like this because it is much simpler than having two repos. One drawback could be that the image that is produced is much larger because it contains the git history of the scikit-learn repository but you take care of this by deleting the actual repo. So I think that isn't a problem.

How will released versions work? It would be cool if https://mybinder.org/v2/gh/scikit-learn/scikit-learn/<releasedversiontag> would take you to the state of the examples that corresponds to the released version. repo2docker will checkout the right commit for you but as I understand the requirements.txt it will install the latest nightly of scikit-learn at build time, not the latest version at "release time". Is that right?

One "easy" (but not super nice) way out is to actually build scikit-learn from the checked out repo when building the docker image. For this we'd need cython (and other build time deps). Builds would take longer and the resulting docker image might also be bigger.

Another way that I think would be nicer but not sure if it is feasible is to not include scikit-learn in the requirements.txt but pip install scikit-learn==<theversionwewant> in postBuild. You'd have to do some git acrobatics to figure out if the checked out version is a tagged version for a release -> run pip install scikit-learn==<theversion> and if it isn't then try and get the nightly for the revision that has been checked out.

Maybe there is a third way?

scikit-image
pandas
sphinx-gallery
scikit-learn>=0.22.dev0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education what does --find-links https://sklearn-nightly.scdn8.secure.raxcdn.com scikit-learn and scikit-learn>=0.22.dev0 do? Is the result that you always install the latest nightly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Maybe using the --pre flag and the --find-links URL flag would be enough (instead of putting the version constraint for scikit-learn).

https://scikit-learn.org/stable/developers/advanced_installation.html#installing-nightly-builds

@lesteve
Copy link
Member Author
lesteve commented Aug 22, 2019

For me the link you posted works. So there is no build failure any more? With a requirements.txt in .binder/ the setup.py should be ignored.

Sorry I should have been more precise: everything works fine as far as binder is concerned (the binder link you mention is using my PR branch and works as you noticed). My comment was more a warning for reviewers of this PR that if they go to the documentation build of this PR the binder links won't work (but that this is completely expected)

How will released versions work? It would be cool if https://mybinder.org/v2/gh/scikit-learn/scikit-learn/ would take you to the state of the examples that corresponds to the released version

I have this on my radar once things stabilize a bit. For scikit-learn, we have a branch for each stable release plus its bug-fix releases, e.g. 0.21.X branch was used for 0.21, 0.21.1, etc ... by cherry-picking the commits related to binder into 0.21.X and changing requirements.txt to have scikit-learn==0.21.*, this would work. In other words, https://mybinder.org/v2/gh/scikit-learn/scikit-learn/0.21.X would use scikit-learn==0.21.* and have the notebook examples from the latest 0.21.X release.

One "easy" (but not super nice) way out is to actually build scikit-learn from the checked out repo when building the docker image.

I was doing this at one point in #11221 but to make the build faster, I switch to using scikit-learn nightly development wheels. These development wheels are on https://sklearn-nightly.scdn8.secure.raxcdn.com, hence the --find-links in the requirement.txt.

@lesteve 10000
Copy link
Member Author
lesteve commented Aug 26, 2019

Any comments on this one?

I think this is a low risk PR to merge, since this will only affect the dev doc. I am reasonably certain that everything will work as expected once the PR is merged in master. If there are any problems, I'll try to fix them at short notice.

Having said that, any feed-back on the postBuild script would be more than welcome! The postBuild script converts scikit-learn .py examples to notebooks. I tried to comment the code, but let me know if some things can be clarified.

TMP_CONTENT_DIR=/tmp/scikit-learn
mkdir -p $TMP_CONTENT_DIR
cp -r examples .binder $TMP_CONTENT_DIR
rm -rf * .*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is scary even if it's in a postBuild script. I can always imagine something running this file by accident. Is there a way to be more specific about the wildcard?

Copy link
Member Author
@lesteve lesteve Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rth I pushed a commit to avoid running outside of repo2docker context. That should avoid people running postBuild locally and deleting all their files in their current directory.

@betatim: to figure out whether I am in a repo2docker context I am using the following lines below, let me know if there is a more robust way to do that:

if [[ -z "$NB_USER" ]] || [[ "$USER" != "$NB_USER" ]]; then
    echo "This script is supposed to run in a repo2docker context."
    echo "Exiting because this script can remove files if run locally.\n"
    exit 1
fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I don't think we currently have a good way for scripts during build or run time to detect that they are inside a repo2docker run.

I'd probably check the REPO_DIR environment variable to know "I am in repo2docker land" as well as /.dockerenv which is a docker specific thing to tell if you are inside a container or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your input @betatim. I think it's fine to only check whether we are inside a docker container actually so I went for checking /.dockerenv.

More details:

  • I googled a bit and it looks like /.dockerenv may not be reliable in the long-term. For example this seems to advise against relying on /.dockerenv. Having said that, ./dockerenv is still used (at the time of writing) in docker/libnetwork see this and this comment was from 3 years ago, so maybe that's robust enough for our purposes.
  • Another way is to check if we are running inside a docker container is to check if docker is in /proc/1/cgroup e.g. a small adaptation of this:
INSIDE_DOCKER=$(grep -q docker /proc/1/cgroup && echo "yes" || echo "no")
  • Overall /.dockerenv feels simpler and robust enough for our purposes.

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There is only one way to know if it really works though :)

@ogrisel ogrisel merged commit 9ef4700 into scikit-learn:master Aug 28, 2019
@ogrisel
Copy link
Member
ogrisel commented Aug 28, 2019

Merged. If it's broken we can always easily revert to the previous solution if it's no easy to fix.

@lesteve lesteve deleted the binder-setup-in-scikit-learn-repo branch August 28, 2019 11:42
@lesteve
Copy link
Member Author
lesteve commented Aug 28, 2019

Seems like it is working (as expected):
https://mybinder.org/v2/gh/scikit-learn/scikit-learn/master?urlpath=lab/tree/notebooks/auto_examples/plot_isotonic_regression.ipynb

@ogrisel
Copy link
Member
ogrisel commented Aug 28, 2019

Confirmed!

@lesteve
Copy link
Member Author
lesteve commented Aug 28, 2019

FYI: I removed the scikit-learn/binder-examples repo which was not useful any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0