8000 ENH Add JupyterLite button to gallery examples by lesteve · Pull Request #6911 · scikit-image/scikit-image · GitHub
[go: up one dir, main page]

Skip to content

ENH Add JupyterLite button to gallery examples #6911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 18, 2023

Conversation

lesteve
Copy link
Contributor
@lesteve lesteve commented Apr 27, 2023

Description

close #6903

This is a draft PR (for now) that uses sphinx-gallery JupyterLite integration (new in sphinx-gallery 0.13).

Because of a CircleCI artifacts limitation, JupyterLite does not work inside CircleCI (seehttps://github.com/sphinx-gallery/sphinx-gallery/pull/977#issuecomment-1330438821 for more details), so I have built the scikit-image doc locally (without running the examples to save time) and put it on https://lesteve.github.io/scikit-image-jupyterlite/.

You can see the JupyterLite button in an example for example this one:
https://lesteve.github.io/scikit-image-jupyterlite/auto_examples/numpy_operations/plot_camera_numpy.html

Here is a direct link to this notebook inside JupyterLite.

Feel free to try any example you like in the gallery and report issues.

One issue I noticed (there may well be others) is loading the data in:
https://lesteve.github.io/scikit-image-jupyterlite/auto_examples/data/plot_3d.html

Long story short, there is no socket inside Pyodide (browser limitation) so pyodide-http replace all the fetch by js XMLHTTPRequest. This causes CORS issue because generally the server does not include CORS headers in the response. In scikit-learn examples, most of the datasets are on OpenML which has the necessary CORS headers. Besides finding (or running) a CORS proxy, I don't really know what would be a good solution for this issue ...

More details about JupyterLite sphinx-gallery integration may be available in scikit-learn/scikit-learn#25887 that added this functionality for scikit-learn.

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.
  • There is a bot to help automate backporting a PR to an older branch. For
    example, to backport to v0.19.x after merging, add the following in a PR
    comment: @meeseeksdev backport to v0.19.x
  • To run benchmarks on a PR, add the run-benchmark label. To rerun, the label
    can be removed and then added again. The benchmark output can be checked in
    the "Actions" tab.

@lesteve lesteve marked this pull request as draft April 27, 2023 12:12
@lagru
Copy link
Member
lagru commented Apr 28, 2023

Long story short, there is no socket inside Pyodide (browser limitation) so pyodide-http replace all the fetch by js XMLHTTPRequest. This causes CORS issue because generally the server does not include CORS headers in the response. In scikit-learn examples, most of the datasets are on OpenML which has the necessary CORS headers. Besides finding (or running) a CORS proxy, I don't really know what would be a good solution for this issue ...

I'm not really knowledge-able on this but is this a problem with the server from which these URLs

registry_urls = {
"data/brain.tiff": "https://gitlab.com/scikit-image/data/-/raw/2cdc5ce89b334d28f06a58c9f0ca21aa6992a5ba/brain.tiff",
"data/cells3d.tif": "https://gitlab.com/scikit-image/data/-/raw/2cdc5ce89b334d28f06a58c9f0ca21aa6992a5ba/cells3d.tif",
"data/eagle.png": "https://gitlab.com/scikit-image/data/-/raw/2cdc5ce89b334d28f06a58c9f0ca21aa6992a5ba/eagle.png",

are served?

If so I'm not sure what to do about it or how we would use a CORS proxy. But one possibility might be to include these files in the Pyodide package of scikit-image itself. The fetcher would then find these alongside the legacy datasets included in

py3.install_sources([
# See legacy_datasets in _registry.py
'README.txt',
'astronaut.png',
'brick.png',

Though, that would increase the size of the package quite a bit by more than 300 MiB.

@lesteve
Copy link
Contributor Author
lesteve commented Apr 28, 2023

I'm not really knowledge-able on this but is this a problem with the server from which these URLs

I am not an expert but I have learned some terms recently through my work on Pyodide. There is nothing wrong with the server per se. It is more than to enable the Pyodide use case (or more generally an URL being fetched through javascript in a browser), the server needs to explicitly include some headers in the response. For example, for scikit-learn we needed to ask OpenML to add those headers, see openml/OpenML#1135 for more details.

I don't think including the images in the scikit-image package is worth it, only to support the Pyodide use case.

I would say that for now, the more reasonable thing to do is to add a big red "danger" markdown cell in the notebooks that use these datasets saying that this notebook is not expected to work inside Pyodide. This is easily doable with the sphinx-gallery JupyterLite integration.

@stefanv
Copy link
Member
stefanv commented Apr 28, 2023

One advantage of the registry is that we can move to a different provided if that's helpful. How has OpenML been working out for you?

Alternatively, I can run a CORS proxy such as: https://github.com/Rob--W/cors-anywhere

@lesteve
Copy link
Contributor Author
lesteve commented Apr 29, 2023

One advantage of the registry is that we can move to a different provided if that's helpful.

So if the data was on github (I imagine there is a good reason it is on gitlab.com of course), we could probably modify the notebook to use cdn.jsdelivr.net/gh which is a CORS proxy for github. I googled a bit but I don't think there is an equivalent for gitlab.com, I found jsdelivr/jsdelivr#18058 which says this is not a priority but the answer is 5 years old, maybe we can ask again? In particular, I think the Pyodide people have some contact with jsdelivr.

How has OpenML been working out for you?

I am not going to say I use it very often and datasets are cached on the disk which may hide sporadic openml.org glitches. Having said that, I don't think we have had many user reports in scikit-learn about openml.org problems. If your idea was to move the scikit-image data to OpenML it may be also a bit of a stretch if they are not machine learning datasets but maybe 🤷.

Alternatively, I can run a CORS proxy such as: Rob--W/cors-anywhere

I have to say I don't really know how much work this represents.

At this point I am a bit reluctant to ask scikit-image to do too much work to support the Pyodide use case. IMO adding red "danger" cells, saying "this notebook is not expected to work inside JupyterLite you can use Binder instead" would be a reasonable first step but of course I would understand if you are reluctant to do this.

If some scikit-image people have some spare time, I think it would be great to try more notebooks and report issues because there may well be more than the CORS issue I noticed 😉.

@lesteve
Copy link
Contributor Author
lesteve commented Apr 29, 2023

Actually looking a bit more (in the right way), it seems like statically (a jsdelivr alternative) supports gitlab.com so we should be fine with modifying the JupyterLite notebooks to use the statically CDN.

@lesteve
Copy link
Contributor Author
lesteve commented Apr 29, 2023

So the good news is that some datasets now works for example this one using the eagle photo:
https://lesteve.github.io/scikit-image-jupyterlite/lite/lab/?path=auto_examples/segmentation/plot_marked_watershed.ipynb

Right now it seems to work for all the datasets in the registry but these three:

It seems like there are size limit see "File Size Limit"section in https://statically.io/docs/using-staticzap/ (slightly weird that I get a 500 and not a 413 as the doc says but oh well ...). This will likely be the case for other CDNs e.g. jsDelivr e.g. see https://github.com/jsdelivr/jsdelivr#restrictions.

@stefanv
Copy link
Member
stefanv commented Apr 29, 2023

That's already fantastic! Thanks for your work on sorting it out.

For now, we could manually insert warnings into the gallery examples that utilize those functions.

@stefanv
Copy link
Member
stefanv commented May 15, 2023

TODO:

  • Add warning to examples that cannot run
  • Remove debugging lines
  • Remove pandas and matplotlib explicit import
  • Remove mybinder links

@stefanv
Copy link
Member
stefanv commented May 16, 2023

Running the build locally, I cannot get a Try Jupyterlite button to appear on example.

@lesteve
Copy link
Contributor Author
lesteve commented May 16, 2023

Running the build locally, I cannot get a Try Jupyterlite button to appear on example.

Maybe you have not installed jupyterlite-sphinx? I think requirements/docs.txt should have the necessary packages, but maybe I am missing something

@stefanv
Copy link
Member
stefanv commented May 16, 2023

I can see the extension running, it's just not adding the output button.

jupyterlite-core              0.1.0
jupyterlite-pyodide-kernel    0.0.8
jupyterlite-sphinx            0.8.0

@lesteve
Copy link
Contributor Author
lesteve commented May 17, 2023

You also need sphinx-gallery>=0.13

@stefanv
Copy link
Member
stefanv commented May 17, 2023

You also need sphinx-gallery>=0.13

Got that too, no luck. But I see the extension running, e.g.:

[LiteBuildApp] @jupyterlite/pyodide-kernel-extension:kernel wrote settings in /home/stefan/src/scikit-image/doc/build/html/lite/jupyter-lite.json: {'pipliteUrls': ['./extensions/@jupyterlite/pyodide-kernel-extension/static/pypi/all.json?sha256=55f1d3982adfce14adaebe30c941e4857226d00982e31d3c3a92acf14de17eac']}

@stefanv
Copy link
Member
stefanv commented May 18, 2023

Caching issue; rst files were being rebuilt, but not re-used. Looks good now.

@stefanv stefanv marked this pull request as ready for review May 18, 2023 01:04
@stefanv
Copy link
Member
stefanv commented May 18, 2023

Woops, accidentally picked up a few other commits along the way, but this is ready to go; will clean up the commit history tonight.

@stefanv stefanv force-pushed the jupyterlite-examples branch from 826d083 to 4b27081 Compare May 18, 2023 02:57
@stefanv
Copy link
Member
stefanv commented May 18, 2023

Please squash when merging.

@jarrodmillman jarrodmillman added this to the 0.21 milestone May 18, 2023
@jarrodmillman jarrodmillman merged commit 7ee3e6e into scikit-image:main May 18, 2023
@lesteve lesteve deleted the jupyterlite-examples branch May 19, 2023 14:31
@lesteve
Copy link
Contributor Author
lesteve commented May 19, 2023

Great to see this one merged!

Note that we could also have added a "danger" (red cell) in the notebooks with the notebook_modification_function. The way you did this is a bit more user friendly since you know it is not going to work before clicking the JupyterLite button.

If there are any issues on this, don't hesitate to ping me!

@lesteve
Copy link
Contributor Author
lesteve commented May 19, 2023

Another thing that is worth mentioning: right now the scikit-image version available in the latest Pyodide release (0.23.2) is scikit-image 0.19.3.

So when you release scikit-image 0.21, you will still have 0.19.3 in JupyterLite (see https://pyodide.org/en/stable/usage/packages-in-pyodide.html). This may be an issue if some examples rely on recent features.

I recently has a quick look at packaging scikit-image 0.20 for unrelated reason (dealing with meson in Pyodide) but it looks like this needs a bit more work and also the same thing stands you will need a Pyodide release before you can use it in JupyterLite.

For a better long-term solution, we can already build a Pyodide wheel but we don't have a good story on where to host it for now. Two possible paths right now:

There were two follow-ups PRs in scikit-learn that you may want to consider:

@stefanv
Copy link
Member
stefanv commented May 19, 2023

@lesteve Fixing the version number is quite important, thanks for bringing that up. It looks like the version number is tied to pyodide, atm, so would be good if we can switch to a custom wheel that matches the docs version.

Can PyPi not host pyiodide wheels yet and, if so, should we contact pypa about it?

If GitHub works in the interim, happy to do that. I'm not sure about the "add wheel at build time to jupyterlite" --- sounds like it may blow up the size of our docs repo.

Taking the GitHub road, can we just build these wheels as part of our release process? If you have an example somewhere, I can try to work it into our CI.

@lesteve
Copy link
Contributor Author
lesteve commented May 19, 2023

Can PyPi not host pyiodide wheels yet and, if so, should we contact pypa about it?

Not yet, my understanding is that the Pyodide team plans to write a PEP to add this as a possibility, see https://discuss.python.org/t/support-wasm-wheels-on-pypi/21924 for example.

In principle building a Pyodide wheel is not too hard, see
https://pyodide.org/en/stable/development/building-and-testing-packages.html

  • setup emscripten
  • install pyodide-build
  • do pyodide build

You can have a look at my ongoing PR to add a Pyodide wheel build in scikit-learn:
https://github.com/scikit-learn/scikit-learn/pull/26374/files#diff-296b76b94f1f7e114ce9412410b50fc2b38ec524dd27501a9b177e64c231de05

In the scikit-image case, it is quite likely you will have issues with meson though. The quickest thing to try (if applicable as for scipy) would be to patch scikit-image (for the purpose to build the Pyodide wheel only) to use setuptools rather than meson, see Pyodide patch for scipy. On the meson inside Pyodide front, see e.g. pyodide/pyodide#3649 which is the last attempt I know of (I may have a closer look in the not too distant future since it is probably easier to make it work for scikit-image than for scipy).

If GitHub works in the interim, happy to do that. I'm not sure about the "add wheel at build time to jupyterlite" --- sounds like it may blow up the size of our docs repo.

Yep that is definitely a valid concern. Not that on this front the JupyterLite assets are already quite big ~15MB after removing sourcemaps according to scikit-learn/scikit-learn#26246. As a single data point, the scikit-learn Pyodide wheel is ~6MB.

@lagru lagru added 📄 type: Documentation Updates, fixes and additions to documentation 🏆 type: Highlight Highlight in next release notes labels May 31, 2023
jarrodmillman added a commit to jarrodmillman/scikit-image that referenced this pull request May 31, 2023
jarrodmillman added a commit that referenced this pull request May 31, 2023
* Revert "MNT Fix typo in JupyterLite comment (#6945)"

This reverts commit 84abc1c.

* Revert "ENH Add JupyterLite button to gallery examples (#6911)"

This reverts commit 7ee3e6e.
@lagru lagru removed the 🏆 type: Highlight Highlight in next release notes label May 31, 2023
jarrodmillman pushed a commit to jarrodmillman/scikit-image that referenced this pull request May 31, 2023
* Enable JupyterLite integration

* lint

* Enable JupyterLite integration in CircleCI

* precommit run

* Use statically CDN for downloads

* See if we can avoid always importing matplotlib, pandas

* Build examples by default

* Remove mybinder links

* Add warnings to examples that cannot run on JupterLite

statically.io has a 30MB limit: https://statically.io/docs/using-staticzap/

---------

Co-authored-by: Stefan van der Walt <stefanv@berkeley.edu>
jarrodmillman pushed a commit to jarrodmillman/scikit-image that referenced this pull request Dec 5, 2023
* Enable JupyterLite integration

* lint

* Enable JupyterLite integration in CircleCI

* precommit run

* Use statically CDN for downloads

* See if we can avoid always importing matplotlib, pandas

* Build examples by default

* Remove mybinder links

* Add warnings to examples that cannot run on JupterLite

statically.io has a 30MB limit: https://statically.io/docs/using-staticzap/

---------

Co-authored-by: Stefan van der Walt <stefanv@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📄 type: Documentation Updates, fixes and additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use sphinx-gallery JupyterLite integration?
4 participants
0