-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
ENH Add JupyterLite button to gallery examples #6911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add JupyterLite button to gallery examples #6911
Conversation
I'm not really knowledge-able on this but is this a problem with the server from which these URLs scikit-image/skimage/data/_registry.py Lines 167 to 170 in f551579
are served? If so I'm not sure what to do about it or how we would use a CORS proxy. But one possibility might be to include these files in the Pyodide package of scikit-image itself. The fetcher would then find these alongside the legacy datasets included in scikit-image/skimage/data/meson.build Lines 15 to 19 in f551579
Though, that would increase the size of the package quite a bit by more than 300 MiB. |
I am not an expert but I have learned some terms recently through my work on Pyodide. There is nothing wrong with the server per se. It is more than to enable the Pyodide use case (or more generally an URL being fetched through javascript in a browser), the server needs to explicitly include some headers in the response. For example, for scikit-learn we needed to ask OpenML to add those headers, see openml/OpenML#1135 for more details. I don't think including the images in the scikit-image package is worth it, only to support the Pyodide use case. I would say that for now, the more reasonable thing to do is to add a big red "danger" markdown cell in the notebooks that use these datasets saying that this notebook is not expected to work inside Pyodide. This is easily doable with the sphinx-gallery JupyterLite integration. |
One advantage of the registry is that we can move to a different provided if that's helpful. How has OpenML been working out for you? Alternatively, I can run a CORS proxy such as: https://github.com/Rob--W/cors-anywhere |
So if the data was on github (I imagine there is a good reason it is on gitlab.com of course), we could probably modify the notebook to use
I am not going to say I use it very often and datasets are cached on the disk which may hide sporadic openml.org glitches. Having said that, I don't think we have had many user reports in scikit-learn about openml.org problems. If your idea was to move the scikit-image data to OpenML it may be also a bit of a stretch if they are not machine learning datasets but maybe 🤷.
I have to say I don't really know how much work this represents. At this point I am a bit reluctant to ask scikit-image to do too much work to support the Pyodide use case. IMO adding red "danger" cells, saying "this notebook is not expected to work inside JupyterLite you can use Binder instead" would be a reasonable first step but of course I would understand if you are reluctant to do this. If some scikit-image people have some spare time, I think it would be great to try more notebooks and report issues because there may well be more than the CORS issue I noticed 😉. |
Actually looking a bit more (in the right way), it seems like statically (a jsdelivr alternative) supports gitlab.com so we should be fine with modifying the JupyterLite notebooks to use the statically CDN. |
So the good news is that some datasets now works for example this one using the eagle photo: Right now it seems to work for all the datasets in the registry but these three:
It seems like there are size limit see "File Size Limit"section in https://statically.io/docs/using-staticzap/ (slightly weird that I get a 500 and not a 413 as the doc says but oh well ...). This will likely be the case for other CDNs e.g. jsDelivr e.g. see https://github.com/jsdelivr/jsdelivr#restrictions. |
That's already fantastic! Thanks for your work on sorting it out. For now, we could manually insert warnings into the gallery examples that utilize those functions. |
TODO:
|
Running the build locally, I cannot get a Try Jupyterlite button to appear on example. |
Maybe you have not installed jupyterlite-sphinx? I think requirements/docs.txt should have the necessary packages, but maybe I am missing something |
I can see the extension running, it's just not adding the output button.
|
You also need sphinx-gallery>=0.13 |
Got that too, no luck. But I see the extension running, e.g.:
|
Caching issue; rst files were being rebuilt, but not re-used. Looks good now. |
Woops, accidentally picked up a few other commits along the way, but this is ready to go; will clean up the commit history tonight. |
statically.io has a 30MB limit: https://statically.io/docs/using-staticzap/
826d083
to
4b27081
Compare
Please squash when merging. |
Great to see this one merged! Note that we could also have added a "danger" (red cell) in the notebooks with the If there are any issues on this, don't hesitate to ping me! |
Another thing that is worth mentioning: right now the scikit-image version available in the latest Pyodide release (0.23.2) is scikit-image 0.19.3. So when you release scikit-image 0.21, you will still have 0.19.3 in JupyterLite (see https://pyodide.org/en/stable/usage/packages-in-pyodide.html). This may be an issue if some examples rely on recent features. I recently has a quick look at packaging scikit-image 0.20 for unrelated reason (dealing with meson in Pyodide) but it looks like this needs a bit more work and also the same thing stands you will need a Pyodide release before you can use it in JupyterLite. For a better long-term solution, we can already build a Pyodide wheel but we don't have a good story on where to host it for now. Two possible paths right now:
There were two follow-ups PRs in scikit-learn that you may want to consider: |
@lesteve Fixing the version number is quite important, thanks for bringing that up. It looks like the version number is tied to pyodide, atm, so would be good if we can switch to a custom wheel that matches the docs version. Can PyPi not host pyiodide wheels yet and, if so, should we contact pypa about it? If GitHub works in the interim, happy to do that. I'm not sure about the "add wheel at build time to jupyterlite" --- sounds like it may blow up the size of our docs repo. Taking the GitHub road, can we just build these wheels as part of our release process? If you have an example somewhere, I can try to work it into our CI. |
Not yet, my understanding is that the Pyodide team plans to write a PEP to add this as a possibility, see https://discuss.python.org/t/support-wasm-wheels-on-pypi/21924 for example. In principle building a Pyodide wheel is not too hard, see
You can have a look at my ongoing PR to add a Pyodide wheel build in scikit-learn: In the scikit-image case, it is quite likely you will have issues with meson though. The quickest thing to try (if applicable as for scipy) would be to patch scikit-image (for the purpose to build the Pyodide wheel only) to use setuptools rather than meson, see Pyodide patch for scipy. On the meson inside Pyodide front, see e.g. pyodide/pyodide#3649 which is the last attempt I know of (I may have a closer look in the not too distant future since it is probably easier to make it work for scikit-image than for scipy).
Yep that is definitely a valid concern. Not that on this front the JupyterLite assets are already quite big ~15MB after removing sourcemaps according to scikit-learn/scikit-learn#26246. As a single data point, the scikit-learn Pyodide wheel is ~6MB. |
* Enable JupyterLite integration * lint * Enable JupyterLite integration in CircleCI * precommit run * Use statically CDN for downloads * See if we can avoid always importing matplotlib, pandas * Build examples by default * Remove mybinder links * Add warnings to examples that cannot run on JupterLite statically.io has a 30MB limit: https://statically.io/docs/using-staticzap/ --------- Co-authored-by: Stefan van der Walt <stefanv@berkeley.edu>
* Enable JupyterLite integration * lint * Enable JupyterLite integration in CircleCI * precommit run * Use statically CDN for downloads * See if we can avoid always importing matplotlib, pandas * Build examples by default * Remove mybinder links * Add warnings to examples that cannot run on JupterLite statically.io has a 30MB limit: https://statically.io/docs/using-staticzap/ --------- Co-authored-by: Stefan van der Walt <stefanv@berkeley.edu>
Description
close #6903
This is a draft PR (for now) that uses sphinx-gallery JupyterLite integration (new in sphinx-gallery 0.13).
Because of a CircleCI artifacts limitation, JupyterLite does not work inside CircleCI (seehttps://github.com/sphinx-gallery/sphinx-gallery/pull/977#issuecomment-1330438821 for more details), so I have built the scikit-image doc locally (without running the examples to save time) and put it on https://lesteve.github.io/scikit-image-jupyterlite/.
You can see the JupyterLite button in an example for example this one:
https://lesteve.github.io/scikit-image-jupyterlite/auto_examples/numpy_operations/plot_camera_numpy.html
Here is a direct link to this notebook inside JupyterLite.
Feel free to try any example you like in the gallery and report issues.
One issue I noticed (there may well be others) is loading the data in:
https://lesteve.github.io/scikit-image-jupyterlite/auto_examples/data/plot_3d.html
Long story short, there is no socket inside Pyodide (browser limitation) so
pyodide-http
replace all the fetch by jsXMLHTTPRequest
. This causes CORS issue because generally the server does not include CORS headers in the response. In scikit-learn examples, most of the datasets are on OpenML which has the necessary CORS headers. Besides finding (or running) a CORS proxy, I don't really know what would be a good solution for this issue ...More details about JupyterLite sphinx-gallery integration may be available in scikit-learn/scikit-learn#25887 that added this functionality for scikit-learn.
For reviewers
later.
__init__.py
.doc/release/release_dev.rst
.example, to backport to v0.19.x after merging, add the following in a PR
comment:
@meeseeksdev backport to v0.19.x
run-benchmark
label. To rerun, the labelcan be removed and then added again. The benchmark output can be checked in
the "Actions" tab.