-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] DOC: add binder links to examples #11221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Maybe we should only do it for the stable doc to avoid having to keep the docker image in sync with scikit-learn master. |
Something I forgot to mention, this PR needs to be merged before the binder links actually work. I pushed a test-binder branch in scikit-learn/scikit-learn.github.io with the doc I generated locally. If you want to play with binder a bit you can try links like this: |
Maybe an interesting thing is to actually play with new features which are in master. |
woot! just one thing I want to point out: mybinder.org runs on fairly limited hardware, so depending on how much number crunching or memory use happens in the examples, it may be a better/worse experience for people. I'm not sure how much sklearn uses CPU/memory-heavy examples, but just wanted to flag it. |
and @lesteve on your question, yep that'll pull the master branch of sklearn, but if you want to keep up-to-date with updates you'll need to make a push to the docs to update the hash so that repo2docker will re-build things |
re
@choldgraf, where does the change to the hash go? Is there any problem with conf.py dynamically generating the requirements.txt to include the git hash of the current HEAD? |
@jnothman nope that'd be fine, but shouldn't be necessary (if you just specify a direct path to the github repo, it'll build from latest master). In deciding to (re)build a repository, binder will just look for the latest git commit hash on the repository, and if it's different from whatever has been built last, it'll trigger a new build |
In a sense slightly less open but I'm not sure what the options are to run on google datalab or azure notebooks. Both are free and can run custom notebooks and have probably more capacity than mybinder. |
+1 for the great PR which enables users to easily try scikit-learn at anytime and anywhere as long as the server can afford such a popular project.
+1. It will be cool to announce a feature (e.g., on twitter) with a link to the example notebook, so that users can try it immediately and provide their feedback. If it's possible, I might prefer two images, one for stable version, and one for dev version. |
re: stable and dev, you could change the configuration in E.g.: here you could pin scikit-learn to a version for non-master documentation, and for the "latest" documentation you could have it build off of master. |
We are using a single repo (https://github.com/scikit-learn/scikit-learn.github.io) for the dev doc and the stable doc. I am not sure how binder can handle two different requirements for a single repo. |
ahh I see - are you just putting the dev docs in a different folder? I thought they'd be built from a tag or something (e.g. what readthedocs does) |
Yes. Happy to hear how other projects are doing it if there is a better (and more binder-friendly) way. I guess the constraint we have is that we are using github-pages and I think you can only use one branch (master of gh-pages) for the website. |
gotcha - it sounds like the short-term solution is to either choose the latest version or "master" for the Binder links |
my recommendation there would be to figure out which part of the sphinx-gallery docs gets more traffic (probably whatever the latest released version is) and pin the binder links against that |
So I gave this another go today, and it looks like the docker image takes a long time to be generated (I have not timed it precisely but my binder tab has been opened with "Repository scikit-learn/scikit-learn.github.io/test-binder is taking longer than usual to load, hang tight!" for ~1 hour now). I remember it was slow to generate the docker image when I worked on this originally, but not that slow I have to admit. This is very likely related to the fact that our doc repo scikit-learn/scikit-learn.github.io is very big (cloning it with Not sure whether we can do something about this in our doc repo. I saw that there was a recent PR in repo2docker jupyterhub/repo2docker#420 to use Another option would to push the generated notebooks into a different repo. Other suggestions more than welcome! |
@lesteve where's the branch you're working from? I didn't see a A few quick thoughts:
|
@choldgraf thanks for the inputs! I am currently testing with: https://mybinder.org/v2/gh/scikit-learn/scikit-learn.github.io/test-binder?urlpath=lab/tree/notebooks/auto_examples/plot_isotonic_regression.ipynb. In other words I pushed the doc I generated locally (with the binder requirements) in the test-binder branch) in our scikit-learn/scikit-learn.github.io repo: https://github.com/scikit-learn/scikit-learn.github.io/tree/test-binder. I am not sure I fully got your point 1. but I'll try to look at how matplotlib does it. About 2. I thought about the caching mechanism but my understanding is that each time there is a new commit in the repo the docker image needs to gets rebuild (which makes complete sense of course). Because we have commits reasonably frequently in scikit-learn (say once a day), and because we use the same repo for the stable doc and the dev doc, I am not sure the user experience is going to be very smooth (as things stand today. The To be perfectly honest, I need to run simple tests with a toy repo to make sure that I understand a bit better the binder behaviour. |
@lesteve to clarify the caching point - you can pin a Binder link so that it is linked against a specific tag (or even commit hash). This will ensure that it isn't re-built every time a new commit is pushed to the branch. E.g. this link will build a Binder for that specific commit (the latest one on your So, you could pin the Binder links to the branch, and only update them every now and then when you wanted Binder to reflect the latest feature updates |
Another option would to push the generated notebooks into a different
repo. Other suggestions more than welcome!
What are the disadvantages of pushing the generated notebooks into a
different repo?
Does binderhub take advantage of the history at all? Can we instead mirror
the repo containing history to one where HEAD is always being amended with
the latest version???
|
I can't think of one beyond the added maintenance burden. BinderHub doesn't care about the history of the repo, it just wants to use one particular ref (the one explicitly specified or that the branch/tag resolves to)
Unfortunately not. repo2docker will only use I would go with having a separate repo with generated notebooks in it or take the approach of spacy.io who added a small bit of JS to their docs to make code examples runnable (scroll down a bit). You could use this for the example gallery instead of converting them to notebooks. It degrades well when the user has no JS or mybinder.org is down. |
Below the steps that I think need to happen to get this to work:
A completely different way of doing would be something like what I suggested in sphinx-gallery/sphinx-gallery#419 (comment) If I had a large example gallery I think I'd take this approach. The hard part here is figuring out how exactly to make the small "side car" repository that gets launched on mybinder.org. You'd want to update this side car repo every time you want to have your examples use a new version of scikit-learn. I'd create a repository that contains a
and have a process that updates the SHA-1 every time a PR is merged. On branches for each release you can have:
This way the docs for a release can use the corresponding branch of the side car repo and the dev build of the docs can use |
Just a FYI, after some discussion with @ogrisel and @glemaitre, here is what a simple "side car" repo would yout like: https://github.com/lesteve/test-binder-scikit-learn.
For now this is for the stable documentation ( |
b089c93
to
afa4de0
Compare
I was just thinking about this and was reminded of this PR. Even for the stable website this would be awesome to have. |
fed0546
to
9f5ab10
Compare
@thomasjpfan there is a 0.21.X branch in |
On the binder website under "how it works 2." it states:
EDIT: Okay I see that |
Sorry if this was not clear enough, there is no daily update of Look for example at the commit history of Similar thing with scikit-learn/binder-examples master, it has not been updated on August 4 : |
@lesteve The For scikit-learn/binder-examples master, https://github.com/scikit-learn/binder-examples/commits/master it looks like its being updated everyday. Binder would most likely need to rebuild the master version quite frequently, which is okay. |
Yes it does. From Tim's message above #11221 (comment):
|
When presented with a tag or branch name mybinder.org will resolve it to a SHA1, then check its cache if there is already an image for that SHA1 (and repository). If there is an image, this is what gets launched. Otherwise the image gets built and stored in the cache. Some situations where there won't be an image in the cache:
|
This statement seems to contradict itself. By removing the double negative, this states: "So for the released version of scikit-learn you'd expect that 99.9% of the users would need to wait for a build". This statement conflicts with "They'd get a prebuilt image." @betatim Sorry if this question has already been asked. I want to verify that, for a given branch, if the commit is usually not new (like in our |
You are right, that sentence with double negative is terrible and incorrect! I confirm that if the commit doesn't/rarely changes the most likely thing to happen is that users don't have to wait for a build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
doc/conf.py
Outdated
if 'dev' in version: | ||
binder_branch = 'master' | ||
else: | ||
match = re.match(r'^(\d+)\.(\d+)(?:\.\d)?$', version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passive groups... awesome. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure:
match = re.match(r'^(\d+)\.(\d+)(?:\.\d)?$', version) | |
match = re.match(r'^(\d+)\.(\d+)(?:\.\d+)?$', version) |
@betatim Thank you for your patience and your thoughtful explanations! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to merge this and see how it goes. I don't think it implies much risk.
doc/conf.py
Outdated
if 'dev' in version: | ||
binder_branch = 'master' | ||
else: | ||
match = re.match(r'^(\d+)\.(\d+)(?:\.\d)?$', version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure:
match = re.match(r'^(\d+)\.(\d+)(?:\.\d)?$', version) | |
match = re.match(r'^(\d+)\.(\d+)(?:\.\d+)?$', version) |
Glad to hear this! I pused a commit with your recommended change (and a minor change in the error message) |
Let's give it a whirl |
Nods, we have "don't show the preview if it would show an error" on our to do list. As well as trying to get better at guessing the correct URL to construct so that the preview works :) If there is any problem I'll open an issue ok? Or if you have a problem can you either open an issue on https://github.com/jupyterhub/mybinder.org-deploy/ or |
Great if you have this on your todo list already. Not showing the preview would be completely fine for us I think. By the way, thinking more about this it completely makes sense that the preview does not work:
The reason I never noticed this very minor issue before: when I was testing this PR, I was always using a binder URL without |
Indeed it would be great to have a configuration file in the Edit: Github was not updated with the past two replies when I first wrote this comment. |
Thanks to @betatim, there is no preview with 404 any more (mentioned in #11221 (comment))! For example, see this binder link. For more details about the binderhub fix: jupyterhub/binderhub#934. |
This is using the binder feature in the recent 0.2 sphinx-gallery release.
Summary of the changes:
binder
repo to be at the root of the repo and thebinder
folder is indev/binder
.At the moment the latest scikit-learn release is installed so some examples may not work if they use some features from master.
Maybe a question for @choldgraf: if I add
-f git+https://github.com/scikit-learn/scikit-learn.git
torequirements.txt
will it ensure that the docker image is rebuilt on each push and that I have always scikit-learn master in the docker image?I have to say I haven't thought too much about how stable doc vs dev doc is going to work yet. For now the binder links are only visible on the dev doc anyway.