8000 Scikit learn examples as LIVE Jupyter notebooks by mwouts · Pull Request #12075 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Scikit learn examples as LIVE Jupyter notebooks #12075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Scikit learn examples as LIVE Jupyter notebooks #12075

wants to merge 4 commits into from

Conversation

mwouts
Copy link
@mwouts mwouts commented Sep 14, 2018

The binder project and Jupytext can turn the beautiful scikit-learn example library into a collection of interactive notebooks.

See the proposed change on the README, and try it now: Binder.

image

@GaelVaroquaux
Copy link
Member

There is an option in the recent versions of sphinx-gallery to do this. We already use sphinx-gallery in scikit-learn, maybe it would be good using this option?

PS: we should explore interplay between sphinx gallery and Jupytext, but that discussion should happen on the sphinx-gallery tracker.

@lesteve
Copy link
Member
lesteve commented Oct 9, 2018

There is an option in the recent versions of sphinx-gallery to do this. We already use sphinx-gallery in scikit-learn, maybe it would be good using this option?

There is a #11221 to do just that. Now that 0.20 has been released, maybe it is a good time to review it.

The main thing to decide is whether we want binder to use scikit-learn stable (0.20 right now) or from master.

@mwouts
Copy link
Author
mwouts commented Oct 9, 2018

Hello @lesteve , good to see you here!

I am not sure I completely got the point about the scikit-learn version in binder. Do you mean the sphinx gallery binder is able to run an unpublished version of scikit-learn (is that what you mean by master)?

Second question, do you have any idea why the jupytext binder starts faster than the sphinx gallery one? Is that related to having the jupytext binder using pypi's version of scikit-learn?

@lesteve
Copy link
Member
lesteve commented Oct 9, 2018

I am not sure I completely got the point about the scikit-learn version in binder. Do you mean the sphinx gallery binder is able to run an unpublished version of scikit-learn (is that what you mean by master)?

master is the development version yes.

Second question, do you have any idea why the jupytext binder starts faster than the sphinx gallery one? Is that related to having the jupytext binder using pypi's version of scikit-learn?

I suspect this is because of the caching mechanism in binder. The first time is quite slow because you need to build the docker image but next time the docker image is found and it is a lot faster.

Having said that, I think our scikit-learn.github.io repo is very big and this may be a problem to use it for binder. I'll put a bit more details in #11221 to try to keep the conversation in a single place.

@jnothman
Copy link
Member
jnothman commented Oct 10, 2018

Having said that, I think our scikit-learn.github.io repo is very big and this may be a problem to use it for binder.

I haven't really understood the binderhub infrastructure. Does it fetch the repo through repo2docker via Build.get_cmd? It seems like that specifies a --ref to repo2docker which might cause it to do an unlimited depth clone which should be very slow for scikit-learn.github.io. I don't understand that "if no ref: limit the depth" logic.

I also don't know if a partial checkout could be applicable here. We should not need to check out anything but dev/_downloads or 0.20/_downloads, should we?

@jnothman jnothman closed this Oct 10, 2018
@jnothman jnothman reopened this Oct 10, 2018
@jnothman
Copy link
Member

In fact, it's a bit weird that it relies on repos at all, if it does not make use of history. All it should really need is something equivalent to read-only FTP! But if a feature request to binderhub or repo2docker is appropriate, I'd appreciate help proposing it precisely.

@GaelVaroquaux
Copy link
Member

Ping @choldgraf

@betatim
Copy link
Member
betatim commented Oct 10, 2018

A few of Joel's questions are answered in #11221 (comment)

I don't understand that "if no ref: limit the depth" logic.

If you run repo2docker https://github.com/norvig/pytudes/ we will do a depth one clone. However if you specify a particular SHA to use (which BinderHub always does) via repo2docker --ref 12345 https://github.com/norvig/pytudes/ we will do an unlimited depth clone.

In fact, it's a bit weird that it relies on repos at all, if it does not make use of history. All it should really need is something equivalent to read-only FTP!

Correct. Being able to use sources other than git repositories has been on our wish list for a while. Progress is slow because "Too many things to do" :-/

@choldgraf
Copy link
Contributor
choldgraf commented Oct 10, 2018

what @betatim said :-) we'd love for repo2docker to support non-git code artifacts, PRs and scoping via issues welcome!

@jnothman
Copy link
Member

So we should be closing this in favour of #11221?

@betatim
Copy link
Member
betatim commented Nov 16, 2018

👍 for that.

@mwouts
Copy link
Author
mwouts commented Nov 16, 2018

Sure, you should do as best fit the project. I had proposed the PR as I liked the idea of contributing only two short files. And the resulting binder starts quickly (link in first comment). But do I understand that it is not as well integrated with the sphinx gallery as #11221 will be, and that my PR does not provide a fine control on the sk-learn version.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Nov 16, 2018 via email

@choldgraf
Copy link
Contributor

@GaelVaroquaux could you explain a bit more what you mean by this? Do you mean replacing the sphinx-gallery .py -> .ipynb code with using jupytext?

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Nov 16, 2018 via email

@choldgraf
Copy link
Contributor

@GaelVaroquaux I opened this back in October :-) sphinx-gallery/sphinx-gallery#424 maybe we can discuss implementation there?

@mwouts
Copy link
Author
mwouts commented Nov 17, 2018

Good! I will close this PR, and instead subscribe and contribute to the two other trackers if I there is anything I can help with. See you there!

@mwouts mwouts closed this Nov 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0