[MRG] DOC: add binder links to examples #11221

lesteve · 2018-06-08T08:54:42Z

This is using the binder feature in the recent 0.2 sphinx-gallery release.

Summary of the changes:

conf.py needs to configure sphinx-gallery to create binder links
binder/requirements.txt: requirements for the docker image created on binder
build_tools/circle/push_doc.sh: binder needs the binder repo to be at the root of the repo and the binder folder is in dev/binder.

At the moment the latest scikit-learn release is installed so some examples may not work if they use some features from master.
Maybe a question for @choldgraf: if I add -f git+https://github.com/scikit-learn/scikit-learn.git to requirements.txt will it ensure that the docker image is rebuilt on each push and that I have always scikit-learn master in the docker image?

I have to say I haven't thought too much about how stable doc vs dev doc is going to work yet. For now the binder links are only visible on the dev doc anyway.

ogrisel · 2018-06-08T09:34:30Z

Maybe we should only do it for the stable doc to avoid having to keep the docker image in sync with scikit-learn master.

lesteve · 2018-06-08T09:41:23Z

Something I forgot to mention, this PR needs to be merged before the binder links actually work.

I pushed a test-binder branch in scikit-learn/scikit-learn.github.io with the doc I generated locally. If you want to play with binder a bit you can try links like this:
https://mybinder.org/v2/gh/scikit-learn/scikit-learn.github.io/test-binder?urlpath=lab/tree/notebooks/auto_examples/plot_isotonic_regression.ipynb

glemaitre · 2018-06-08T11:50:11Z

Maybe we should only do it for the stable doc to avoid having to keep the docker image in sync with scikit-learn master.

Maybe an interesting thing is to actually play with new features which are in master.

choldgraf · 2018-06-08T20:06:19Z

woot! just one thing I want to point out: mybinder.org runs on fairly limited hardware, so depending on how much number crunching or memory use happens in the examples, it may be a better/worse experience for people. I'm not sure how much sklearn uses CPU/memory-heavy examples, but just wanted to flag it.

choldgraf · 2018-06-08T20:10:07Z

and @lesteve on your question, yep that'll pull the master branch of sklearn, but if you want to keep up-to-date with updates you'll need to make a push to the docs to update the hash so that repo2docker will re-build things

jnothman · 2018-06-09T11:50:09Z

re -f git+https://github.com/scikit-learn/scikit-learn.git

https://github.com/scikit-learn/scikit-learn/archive/master.zip is always faster for pip... A nightly wheel feed would be even better to work with...

@choldgraf, where does the change to the hash go?

Is there any problem with conf.py dynamically generating the requirements.txt to include the git hash of the current HEAD?

choldgraf · 2018-06-09T15:53:02Z

@jnothman nope that'd be fine, but shouldn't be necessary (if you just specify a direct path to the github repo, it'll build from latest master). In deciding to (re)build a repository, binder will just look for the latest git commit hash on the repository, and if it's different from whatever has been built last, it'll trigger a new build

amueller · 2018-06-09T19:41:04Z

In a sense slightly less open but I'm not sure what the options are to run on google datalab or azure notebooks. Both are free and can run custom notebooks and have probably more capacity than mybinder.

qinhanmin2014 · 2018-06-11T01:22:50Z

+1 for the great PR which enables users to easily try scikit-learn at anytime and anywhere as long as the server can afford such a popular project.

Maybe an interesting thing is to actually play with new features which are in master.

+1. It will be cool to announce a feature (e.g., on twitter) with a link to the example notebook, so that users can try it immediately and provide their feedback. If it's possible, I might prefer two images, one for stable version, and one for dev version.

choldgraf · 2018-06-12T18:06:14Z

re: stable and dev, you could change the configuration in conf.py to accommodate this.

E.g.: here you could pin scikit-learn to a version for non-master documentation, and for the "latest" documentation you could have it build off of master.

lesteve · 2018-06-13T07:10:10Z

re: stable and dev, you could change the configuration in conf.py to accommodate this.

We are using a single repo (https://github.com/scikit-learn/scikit-learn.github.io) for the dev doc and the stable doc. I am not sure how binder can handle two different requirements for a single repo.

choldgraf · 2018-06-13T15:14:47Z

ahh I see - are you just putting the dev docs in a different folder? I thought they'd be built from a tag or something (e.g. what readthedocs does)

lesteve · 2018-06-13T16:17:05Z

ahh I see - are you just putting the dev docs in a different folder?

Yes. Happy to hear how other projects are doing it if there is a better (and more binder-friendly) way. I guess the constraint we have is that we are using github-pages and I think you can only use one branch (master of gh-pages) for the website.

choldgraf · Jun 13, 2018

gotcha - it sounds like the short-term solution is to either choose the latest version or "master" for the Binder links

choldgraf · 2018-06-13T18:47:56Z

my recommendation there would be to figure out which part of the sphinx-gallery docs gets more traffic (probably whatever the latest released version is) and pin the binder links against that

lesteve · 2018-10-09T14:05:21Z

So I gave this another go today, and it looks like the docker image takes a long time to be generated (I have not timed it precisely but my binder tab has been opened with "Repository scikit-learn/scikit-learn.github.io/test-binder is taking longer than usual to load, hang tight!" for ~1 hour now). I remember it was slow to generate the docker image when I worked on this originally, but not that slow I have to admit.

This is very likely related to the fact that our doc repo scikit-learn/scikit-learn.github.io is very big (cloning it with --depth 1 is taking ~4 minutes on my machine). For completeness, I tried to clone without --depth 1 and gave up after 1h30, I was at "Receiving objects: 40%".

Not sure whether we can do something about this in our doc repo. I saw that there was a recent PR in repo2docker jupyterhub/repo2docker#420 to use --depth 1 which may help in some cases.

Another option would to push the generated notebooks into a different repo. Other suggestions more than welcome!

choldgraf · 2018-10-09T15:51:04Z

@lesteve where's the branch you're working from? I didn't see a binder/ folder or requirements.txt or something in here: https://github.com/lesteve/scikit-learn/tree/add-binder-links

A few quick thoughts:

Matplotlib builds its documentation in a separate repository, partially for this reason (MPL core and MPL docs are both quite beefy). Seems to work relatively well, though it definitely adds to the complexity of deployments/updates etc.
Don't forget that the Docker image will only be built one time if you pin your Binder links to point to a specific version/commit/etc. After that, i 8000 t'll use the cached image. The other main thing that affects load times is whether the image exists on the node where a session is being launched. If not, then a Docker Pull happens, which takes some time. However this is also a one-time operation per-node, and given how popular the sklearn docs are, I don't imagine this would be much of an issue long-term.

lesteve · 2018-10-09T18:19:34Z

@choldgraf thanks for the inputs!

I am currently testing with: https://mybinder.org/v2/gh/scikit-learn/scikit-learn.github.io/test-binder?urlpath=lab/tree/notebooks/auto_examples/plot_isotonic_regression.ipynb. In other words I pushed the doc I generated locally (with the binder requirements) in the test-binder branch) in our scikit-learn/scikit-learn.github.io repo: https://github.com/scikit-learn/scikit-learn.github.io/tree/test-binder.

I am not sure I fully got your point 1. but I'll try to look at how matplotlib does it.

About 2. I thought about the caching mechanism but my understanding is that each time there is a new commit in the repo the docker image needs to gets rebuild (which makes complete sense of course). Because we have commits reasonably frequently in scikit-learn (say once a day), and because we use the same repo for the stable doc and the dev doc, I am not sure the user experience is going to be very smooth (as things stand today. The --depth 1 PR that is in repo2docker master may make help as I said) ...

To be perfectly honest, I need to run simple tests with a toy repo to make sure that I understand a bit better the binder behaviour.

choldgraf · 2018-10-09T18:29:12Z

@lesteve to clarify the caching point - you can pin a Binder link so that it is linked against a specific tag (or even commit hash). This will ensure that it isn't re-built every time a new commit is pushed to the branch. E.g. this link

https://mybinder.org/v2/gh/scikit-learn/scikit-learn.github.io/35ea7b?urlpath=lab/tree/notebooks/auto_examples/plot_isotonic_regression.ipynb

will build a Binder for that specific commit (the latest one on your test-binder branch), so it wouldn't be re-built if new commits are pushed to that branch.

So, you could pin the Binder links to the branch, and only update them every now and then when you wanted Binder to reflect the latest feature updates

jnothman · 2018-10-10T03:20:49Z

Another option would to push the generated notebooks into a different

repo. Other suggestions more than welcome! What are the disadvantages of pushing the generated notebooks into a different repo? Does binderhub take advantage of the history at all? Can we instead mirror the repo containing history to one where HEAD is always being amended with the latest version???

betatim · 2018-10-10T05:56:41Z

What are the disadvantages of pushing the generated notebooks into a different repo?

I can't think of one beyond the added maintenance burden. BinderHub doesn't care about the history of the repo, it just wants to use one particular ref (the one explicitly specified or that the branch/tag resolves to)

The --depth 1 PR that is in repo2docker master may make help as I said) ...

Unfortunately not. repo2docker will only use --depth 1 if no explicit ref was specified. BinderHub (the software behind mybinder.org) always resolves branch names to SHAs, so we always specify the ref. We have had an attempt to use a shallow clone (say depth 50 like travis), check if the ref has been cloned and if not unshallow the clone. However we reverted that as it had bugs (jupyterhub/repo2docker#120, jupyterhub/repo2docker#130). If someone wants to make a new attempt that would be great. Maybe now a more modern version of git is part of Ubuntu?

I would go with having a separate repo with generated notebooks in it or take the approach of spacy.io who added a small bit of JS to their docs to make code examples runnable (scroll down a bit). You could use this for the example gallery instead of converting them to notebooks. It degrades well when the user has no JS or mybinder.org is down.

betatim · 2018-11-16T07:20:54Z

Below the steps that I think need to happen to get this to work:

add support for repo2docker https://github.com/org/repo/archive/master.zip. This would solve the problem that currently cloning the docs repo takes too long. I think how to do it is clear, but it needs someone to actually do it. I'd be happy to help someone implement this.
add support to BinderHub (software serving mybinder.org) so that it supports the new "from zip file" feature of repo2docker. This might require a bit of discussion first to figure out what the best way of doing this is. In particular how we can support the current behaviour of only building each SHA-1 once per repo. This is something I'd like to see in BinderHub but low priority for me for the time being. I will help if someone else champions this.

A completely different way of doing would be something like what I suggested in sphinx-gallery/sphinx-gallery#419 (comment) If I had a large example gallery I think I'd take this approach. The hard part here is figuring out how exactly to make the small "side car" repository that gets launched on mybinder.org. You'd want to update this side car repo every time you want to have your examples use a new version of scikit-learn. I'd create a repository that contains a requirements.txt like the following on the master branch:

numpy
matplotlib
https://github.com/scikit-learn/scikit-learn/archive/<sha-1-of-master>.zip

and have a process that updates the SHA-1 every time a PR is merged. On branches for each release you can have:

numpy
matplotlib
https://github.com/scikit-learn/scikit-learn/archive/<name-of-the-release-tag>.zip

This way the docs for a release can use the corresponding branch of the side car repo and the dev build of the docs can use master.

lesteve · 2019-07-29T08:27:54Z

Just a FYI, after some discussion with @ogrisel and @glemaitre, here is what a simple "side car" repo would yout like: https://github.com/lesteve/test-binder-scikit-learn.

git submodule for the scikit-learn repo (binder automatically do the git submodule update --init)
notebooks get created from the python examples in the postBuild binder step (sphinx-gallery has a script for the python to notebook conversion).

For now this is for the stable documentation (environment.yml and scikit-learn git submodule). I need to think a little bit more how dev vs stable would work and how the "sidecar repo" would be updated in the doc build process on CircleCI. Any comments more than welcome!

adrinjalali · 2019-07-30T16:33:52Z

I was just thinking about this and was reminded of this PR. Even for the stable website this would be awesome to have.

lesteve · 2019-08-07T18:21:43Z

@thomasjpfan there is a 0.21.X branch in scikit-learn/binder-examples that is tracking the 0.21.X branch of scikit-learn/scikit-learn so, unless I am missing something, what you are proposing is already implemented.

thomasjpfan · 2019-08-07T18:29:46Z

@lesteve

I think binder does not know that `0.21.X` is unchanged everytime the `binder-examples` is updated nightly. It will always try to rebuild the initial docker image because of the daily `update submodule to commit` from travis ci.
On the binder website under "how it works 2." it states:

If a new commit has been made, the image will automatically be rebuilt.

EDIT: Okay I see that 0.21.X is not being updated.

lesteve · 2019-08-07T19:12:17Z

Sorry if this was not clear enough, there is no daily update of scikit-learn/binder-examples. There is a daily cron job that tries to update the scikit-learn submodule, if no update has happened on the scikit-learn/scikit-learn tracked branch, there is no commit on scikit-learn/binder-examples branch.

Look for example at the commit history of scikit-learn/binder-examples 0.21.X, it has not been updated for a few days (the submodule has not been updated since the initial commit actually):
https://github.com/scikit-learn/binder-examples/commits/0.21.X

Similar thing with scikit-learn/binder-examples master, it has not been updated on August 4 :
https://github.com/scikit-learn/binder-examples/commits/master

thomasjpfan · 2019-08-07T19:34:33Z

@lesteve The 0.21.X makes sense. 0.21.x does not update unless we are doing a release. I hope this means, binder does not rebuild the image that often.

For scikit-learn/binder-examples master, https://github.com/scikit-learn/binder-examples/commits/master it looks like its being updated everyday. Binder would most likely need to rebuild the master version quite frequently, which is okay.

lesteve · 2019-08-07T20:59:12Z

@lesteve The 0.21.X makes sense. 0.21.x does not update unless we are doing a release. I hope this means, binder does not rebuild the image that often.

Yes it does. From Tim's message above #11221 (comment):

So for the released version of scikit-learn you'd not expect that 99.9% of users wouldn't need to wait for a build. They'd get a prebuilt image. For master builds would happen more frequently because the commit that master points to changes.

betatim · 2019-08-07T21:55:48Z

When presented with a tag or branch name mybinder.org will resolve it to a SHA1, then check its cache if there is already an image for that SHA1 (and repository). If there is an image, this is what gets launched. Otherwise the image gets built and stored in the cache.

Some situations where there won't be an image in the cache:

the commit is new, it has never been built
- because we operate two clusters with two separate caches we need to build the image twice, once for each cache. This means there will be two users who need to wait for a build. (We are working on improving this)
we empty our cache on purpose (big changes or security vulnerability etc)

thomasjpfan · 2019-08-07T22:39:38Z

So for the released version of scikit-learn you'd not expect that 99.9% of users wouldn't need to wait for a build.

This statement seems to contradict itself. By removing the double negative, this states: "So for the released version of scikit-learn you'd expect that 99.9% of the users would need to wait for a build". This statement conflicts with "They'd get a prebuilt image."

@betatim Sorry if this question has already been asked. I want to verify that, for a given branch, if the commit is usually not new (like in our 0.20.X branch), then the image would most likely have been built and users would not need to wait.

betatim · 2019-08-08T05:38:39Z

You are right, that sentence with double negative is terrible and incorrect!

I confirm that if the commit doesn't/rarely changes the most likely thing to happen is that users don't have to wait for a build.

thomasjpfan

LGTM

thomasjpfan · 2019-08-09T18:18:00Z

doc/conf.py

+if 'dev' in version:
+    binder_branch = 'master'
+else:
+    match = re.match(r'^(\d+)\.(\d+)(?:\.\d)?$', version)


Passive groups... awesome. :)

To be sure:

Suggested change

match = re.match(r'^(\d+)\.(\d+)(?:\.\d)?$', version)

match = re.match(r'^(\d+)\.(\d+)(?:\.\d+)?$', version)

thomasjpfan · 2019-08-09T18:19:43Z

@betatim Thank you for your patience and your thoughtful explanations! :)

jnothman

I'm happy to merge this and see how it goes. I don't think it implies much risk.

jnothman · 2019-08-12T08:28:22Z

doc/conf.py

+if 'dev' in version:
+    binder_branch = 'master'
+else:
+    match = re.match(r'^(\d+)\.(\d+)(?:\.\d)?$', version)


To be sure:

Suggested change

match = re.match(r'^(\d+)\.(\d+)(?:\.\d)?$', version)

match = re.match(r'^(\d+)\.(\d+)(?:\.\d+)?$', version)

lesteve · 2019-08-19T04:20:59Z

I'm happy to merge this and see how it goes. I don't think it implies much risk.

Glad to hear this!

I pused a commit with your recommended change (and a minor change in the error message)

jnothman · 2019-08-19T05:32:44Z

Let's give it a whirl

lesteve · 2019-08-19T06:31:00Z

Great to see this merged. The dev doc has been uploaded, everything seems to work: there is a binder link and the binder works.

Here is an example.

Side-comment for @betatim (not crucial at all): the nbviewer preview does a 404:

betatim · 2019-08-19T08:21:56Z

Nods, we have "don't show the preview if it would show an error" on our to do list. As well as trying to get better at guessing the correct URL to construct so that the preview works :)

If there is any problem I'll open an issue ok? Or if you have a problem can you either open an issue on https://github.com/jupyterhub/mybinder.org-deploy/ or @jupyterhub/mybinder-org-operators so we get a notification?

lesteve · 2019-08-19T08:43:02Z

Great if you have this on your todo list already. Not showing the preview would be completely fine for us I think.

By the way, thinking more about this it completely makes sense that the preview does not work:

we are pointing to a specific .ipynb by using a urlpath in the binder URL
the .ipynb does not exist in the binder-examples repo: the .ipynb file is generated in postBuild from the .py file

The reason I never noticed this very minor issue before: when I was testing this PR, I was always using a binder URL without urlpath ...

ogrisel · 2019-08-19T09:58:45Z

Indeed it would be great to have a configuration file in the binder-examples repo to tell binder to not display the nbviewer based preview on that repo (or to automatically disable the preview if the ipynb file is missing).

Edit: Github was not updated with the past two replies when I first wrote this comment.

lesteve · 2019-08-20T15:02:20Z

Thanks to @betatim, there is no preview with 404 any more (mentioned in #11221 (comment))!

For example, see this binder link.

For more details about the binderhub fix: jupyterhub/binderhub#934.

lesteve force-pushed the add-binder-links branch from 1b7d223 to b089c93 Compare June 8, 2018 09:28

TomDLT added the Documentation label Jun 21, 2018

lesteve mentioned this pull request Oct 9, 2018

Scikit learn examples as LIVE Jupyter notebooks #12075

Closed

lesteve force-pushed the add-binder-links branch from b089c93 to afa4de0 Compare July 29, 2019 13:47

lesteve force-pushed the add-binder-links branch 2 times, most recently from fed0546 to 9f5ab10 Compare July 30, 2019 17:45

betatim mentioned this pull request Aug 7, 2019

adding help text loop to loading page jupyterhub/binderhub#917

Merged

thomasjpfan approved these changes Aug 9, 2019

View reviewed changes

jnothman reviewed Aug 12, 2019

View reviewed changes

Tweaks in the version regex + error message

11f711e

jnothman merged commit f0faaee into scikit-learn:master Aug 19, 2019

lesteve deleted the add-binder-links branch August 19, 2019 06:10

betatim mentioned this pull request Aug 19, 2019

404 page shown during build when pointing to a repo jupyterhub/binderhub#922

Closed

lesteve mentioned this pull request Aug 20, 2019

Allow for disabling of nbviewer jupyterhub/binderhub#816

Closed

lesteve mentioned this pull request Aug 22, 2019

[MRG] Move binder setup to scikit-learn/scikit-learn repo #14719

Merged

lesteve added a commit to lesteve/scikit-learn that referenced this pull request Aug 28, 2019

DOC: add binder links to examples (scikit-learn#11221)

b0af586

lesteve mentioned this pull request Aug 28, 2019

CI/DOC Use sphinx-gallery 0.3.1 #14833

Closed

adrinjalali mentioned this pull request Nov 7, 2019

MNT use x.y.z (no rc/beta/etc) for version in sphinx config #15566

Merged

This was referenced Jan 6, 2020

Add Binder links to examples nilearn/nilearn#2284

Closed

[WIP, ENH] Add Binder links to examples nilearn/nilearn#2285

Closed

	match = re.match(r'^(\d+)\.(\d+)(?:\.\d)?$', version)
	match = re.match(r'^(\d+)\.(\d+)(?:\.\d+)?$', version)

Uh oh!

[MRG] DOC: add binder links to examples #11221

[MRG] DOC: add binder links to examples #11221

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!