8000 DOC move notebooks to sphinx user guides and generated docs by adrinjalali · Pull Request #336 · fairlearn/fairlearn · GitHub
[go: up one dir, main page]

Skip to content

DOC move notebooks to sphinx user guides and generated docs #336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 6, 2020

Conversation

adrinjalali
Copy link
Member

Our notebooks are examples for users, and they are conceptually a part of the documentation.

This PR proposes to move them to a sphinx gallery generated example. You can call make doc (or run the command that it runs) to generate the examples locally.

@philippjfr @cmarmo I'm trying to point to the examples from the main index.rst, not sure how to do that. The auto_generated examples doesn't exist when the index.rst is processed, and therefore gives an error.

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
@philippjfr
Copy link

Thanks, I'll take a look as soon as I get a chance.

@romanlutz romanlutz requested a review from riedgar-ms March 19, 2020 03:09
@romanlutz romanlutz linked an issue Mar 19, 2020 that may be closed by this pull request
@romanlutz romanlutz added this to the Improve Testing milestone Mar 19, 2020
@cmarmo
Copy link
cmarmo commented Mar 19, 2020

@adrinjalali , may I suggest to add sphinx_gallery in the requirements.txt in this PR? Cheers.

@adrinjalali
Copy link
Member Author

How do we keep this in sync with the notebook itself?

So the idea is that there are a few ways of having this content on the docs:

  • on the notebooks
  • in a python file, as an example, have it processed with sphinx-gallery which generates an html page for it, and gives a link to the .py and the .ipynb file. The notebook is then generated by the sphinx-gallery and there's no need to have a notebook in the repo.
  • have the content, or some of it, as a user manual, in an .rst file. That'll include the output in the .rst file, and sphinx will run the code, and then check the output against the one put in the file, and error if not the same (AFAIK).

We usually have a combination of the user guide .rst file and the .py examples in sklearn. Here in this PR I've included both, for us to see how they look, and decide which path to move forward. But both of them will remove the notebooks from the repo. I've just not removed it from the repo yet in this PR.

@riedgar-ms
Copy link
Member

OK, but I think we will want to have example notebooks for users as well. So is there a way of automatically generating the notebooks from these changes, and having them appear on a webpage?

@adrinjalali
Copy link
Member Author

Yes, the examples are used to automatically generate the notebook. As an example you can see this one. There's a notebook download button on the bottom of the page.

@adrinjalali
Copy link
Member Author

sphinx-gallery is the one doing it.

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
@adrinjalali
Copy link
Member Author

The tests are failing due to some pep8 issues (line too long). In sklearn we find it hard to maintain the like length in examples, especially with math lines or links. Should we relax on the linter on the docs/ folder and exclude them from the linter?

@adrinjalali
Copy link
Member Author

I've removed the linter, and now we have the link to the example on the main index page. Let me know what you think.

@romanlutz
Copy link
Member

The tests are failing due to some pep8 issues (line too long). In sklearn we find it hard to maintain the like length in examples, especially with math lines or links. Should we relax on the linter on the docs/ folder and exclude them from the linter?

Works for me! @riedgar-ms ?

@riedgar-ms
Copy link
Member

I'm OK with upping the line length limit in the docs directory

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
@adrinjalali
Copy link
Member Author
adrinjalali commented Apr 1, 2020

@riedgar-ms
the generated stuff will be under doc/_build/html/auto_examples, and you can use make doc or what that command does (a sphinx call) to generate the docs locally. It won't look the same as what you see on readthedocs because of the default sphinx theme on the readthedocs side, but gives you a good idea. Then you can open the doc/_build/html/index.html and you'll have a link to the generated html page, which has links to the .py and the .ipynb files.

@romanlutz
The generated index.html now has a link to the generated example, because of the auto_examples/plot_binary_classification_COMPAS line in the doc/index.rst.

We can also have a quick chat about the whole sphinx, sphinx-gallery, etc if it helps.

docs/index.rst Outdated
fairlearn.reductions
fairlearn.widget

auto_examples/plot_binary_classification_COMPAS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have an 'examples' entry at this level of the table of contents, and the notebook beneath it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this look better?

ex

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the landing page, I'd suggest not opening up the TOC for each example. Just list the examples. Well, 'example' for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there just be an 'examples' link in the sidebar TOC too. Right now, the Notebook itself is listed there:
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it look now?

Binary Classification on COMPAS dataset
=======================================
"""
print(__doc__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this supposed to show up on the rendered webpage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. It's there to show up when you run the .py version of the example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it print? All of the (former) notebook text?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Could remove it if you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe leave it in for now; I think it looks a little odd, but I don't have the only vote here :-)

When you push your changes, I'll bring them over to my branch, and get the updated sample documentation.

@riedgar-ms
Copy link
Member

I've generated the docs locally as suggested, and apart from the two small layout issues I've mentioned, it looks OK to me.

I wouldn't do more than this right now, though. We need to figure out how to fit this into our build pipeline (both for testing the notebooks and autopublishing the docs), and there may also be other integration points with other resources we're putting together.

However, I doubt any of that can happen until this is in the main Fairlearn repo (unless I can suck the branch from the fork into the main repo.... I imagine git can do that somehow).

Any other comments @romanlutz ?

@romanlutz
Copy link
Member

Everything looks good to me. I'd like to proceed since we're not breaking anything existing, get it into master, so that we can see what it looks like, and then step-by-step replace existing ones.

@adrinjalali I would love to have that chat on sphinx, sphinx-gallery, etc. and I'm sure @riedgar-ms is also interested. Feel free to DM me/us on Gitter or Teams or wherever you like with possible times. I'm willing to compromise on timing since I'm 9h away from you ;-)

@MiroDudik
Copy link
Member

I like the proposal here and I think we should go ahead and merge this PR. I really like the ability to more naturally include examples in the documentation.

However, we shouldn't yet remove the current notebooks--and I also don't think we should quite yet put them into the new format. My main concern is the ease of editing the notebooks. I like developing and editing the notebook in jupyter. I see that sphinx-gallery can generate .ipynb from .py--does it also support going in the opposite direction or is there some other tool / suggested workflow that handles that without too much pain?

@adrinjalali
Copy link
Member Author

The issue is that .ipynb does not play very nice with git, and editing them are really not that easy if you don't have a notebook server up all the time (and some of us really don't, like me). But that doesn't mean I don't do exploratory analysis or coding. For that, I have two points:

  • Exploratory code IMHO should not be in the main repo, and it's worth going through the hassle of converting it into a python file before putting it in the repo.
  • Not having notebooks doesn't mean you can't do notebook style coding. For instance, in vscode you can have a .py file, and as soon as you have a #%% line, it assumes it's a new cell. You can have your cells and run them on the notebook server in the IDE and still have a .py file. I believe other IDEs have a similar feature.
  • Looking up usage instances when there are notebooks in the repo is not trivial, and changing them mostly requires running a notebook server. I ended up having to go through that pain when working on ThresholdOptimizer to make it more sklearn compatible. And since that PR was touching on the API, needed to fix the usage examples in the notebooks, and it took me quite a few CI cycles more than it should have.

So I believe once we get used to it, it shouldn't affect our productivity while working on the .py files.

* adding sphinx-gallery

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* add example as user guide

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* remove exceptions

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* cleanups

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* fix link to doc

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* remove user guide format

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* improve on pep8

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* fix after merge upstream/master

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* change flake8 settings and fix some pep8 issues

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* pep8, blacked the example

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* ignore docstring warnings

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* fix inline math

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

* minor change

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
@riedgar-ms
Copy link
Member
riedgar-ms commented Apr 2, 2020

FYI, this is what the docs will look like after the change:
https://fairlearn.readthedocs.io/en/riedgar-ms-adrin-nb-copy/
(I stole @adrinjalali 's branch into this repo as a workaround)

@romanlutz
Copy link
Member

The issue is that .ipynb does not play very nice with git, and editing them are really not that easy if you don't have a notebook server up all the time (and some of us really don't, like me). But that doesn't mean I don't do exploratory analysis or coding. For that, I have two points:

* Exploratory code IMHO should not be in the main repo, and it's worth going through the hassle of converting it into a python file before putting it in the repo.

* Not having notebooks doesn't mean you can't do notebook style coding. For instance, in vscode you can have a `.py` file, and as soon as you have a `#%%` line, it assumes it's a new _cell_. You can have your cells and run them on the notebook server in the IDE and still have a `.py` file. I believe other IDEs have a similar feature.

* Looking up usage instances when there are notebooks in the repo is not trivial, and changing them mostly requires running a notebook server. I ended up having to go through that pain when working on `ThresholdOptimizer` to make it more sklearn compatible. And since that PR was touching on the API, needed to fix the usage examples in the notebooks, and it took me quite a few CI cycles more than it should have.

So I believe once we get used to it, it shouldn't affect our productivity while working on the .py files.

I share all your concerns! The process you outline here sounds promising and I'll go ahead and try that. It seems to address lots of the issues I have with notebooks.

I'll count Miro's statement above as an endorsement to complete the PR. We'll have to have a broader discussion about notebooks since some of them are examples on how to use the package, others are more like case studies.

@romanlutz
Copy link
Member

Oh I missed the two outstanding comments by @riedgar-ms . Richard, please complete the PR when you're happy with them.

adrinjalali and others added 4 commits April 3, 2020 14:18
Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: adrinjalali <adrin.jalali@gmail.com>
@adrinjalali
Copy link
Member Author

The DCO is wrong in failing, it's failing because of your commit to my branch @riedgar-ms (I so much dislike this DCO thing)

@riedgar-ms
Copy link
Member

The DCO is wrong in failing, it's failing because of your commit to my branch @riedgar-ms (I so much dislike this DCO thing)

I committed to your branch? In your fork? Sorry :-(

I don't even know how I managed that :-(

@adrinjalali
Copy link
Member Author

haha, yes you did. It's a good tool. It's a nice one for maintainers. Basically, if the author of the PR checks the "allow changes from maintainers", then maintainers can commit to the relevant branch of the author directly ;) and we do it pretty often in sklearn to do last fixes on people's PRs. It's odd that DCO can't understand that.

@riedgar-ms
Copy link
Member

Yes, but I thought I was updating my copy of your branch.... I was doing a PR from your branch of your fork into my branch of Fairlearn itself (so that we could view in ReadTheDocs). And I thought the conflict was on that PR, and that I'd merged into my branch.

@riedgar-ms
Copy link
Member

I've just shown this to @mesameki and @vingu . They like the basic flow, but aren't sure about converting our big 'scenario' notebooks over to this. For smaller code samples (I think that the Metrics notebook is a good candidate), they think it looks great and a valuable addition to our documentation.

I've expanded on that a bit in the Fairlearn Teams channel (or are we moving to Gitter?).

@MiroDudik
Copy link
Member
MiroDudik commented Apr 3, 2020

@adrinjalali : I think you make an excellent point re. .ipynb not playing nice with git. I suggest that for the time being, let's keep both examples and notebooks, but let's not duplicate the content between the two--except for the current example that you worked out. I'll try to see how easily I can play with examples/plot_binary_classification_COMPAS.py to get a sense of what makes sense to cover under examples and what we should cover under notebooks (if anything).

One of the issues that we are running into here is that there is no established way to talk about fairness in AI, so we are developing standards as we go and that's why our examples are more exploratory than one would like. Also--as soon as we discover that something might have bad unintented consequences, we want to be able to easily edit things--any any barrier to that will slow things down.

Given all these disclaimers: I'm good to go ahead with this PR.

@riedgar-ms : I don't think we need to move this conversation elsewhere given that the bulk of it is here (and we are reaching a consensus).

@adrinjalali
Copy link
Member Author

I think it'd be best if we could have a call on this one. Any day this week 17:30-19:30 would work for me.

@riedgar-ms
Copy link
Member

I think it'd be best if we could have a call on this one. Any day this week 17:30-19:30 would work for me.

I'm assuming that's European time? Have your clocks gone forward, @adrinjalali . What were your particular concerns?

With @MiroDudik being happy with this, we can complete the PR (after that ever-fun DCO issue is fixed). However, I don't think we'd go ahead and convert other notebooks, while we decide how best to present things. It's great to have this new option, though.

@adrinjalali
Copy link
Member Author
adrinjalali commented Apr 6, 2020

I'm assuming that's European time?

Yes, sorry, CEST time.

after that ever-fun DCO issue is fixed

that'll mean force pushing and losing the history here, which I'm not super fond of, do we have to? You can still merge I guess w/o.

However, I don't think we'd go ahead and convert other notebooks, while we decide how best to present things. It's great to have this new option, though.

I'd like to be a part of that conversation if you don't mind. It doesn't give the contributor (in this case me) a feeling of inclusion when you come and say we talked and we decided we won't do it for the other ones. I think it's better to have a discussion, hear your concerns, and here the pro side, and see what we can do. I agree that the larger notebooks shouldn't be converted to an example, but those are perfect examples of something which should go to a user guide. At the end of the day, the website is where people find information, and those user guides are important for people to understand how the package works.

@riedgar-ms
Copy link
Member

Yes, sorry, CEST time.

No problem. Invitation set.

after that ever-fun DCO issue is fixed

that'll mean force pushing and losing the history here, which I'm not super fond of, do we have to? You can still merge I guess w/o.

There's a suggested fix from the DCO bot:
https://github.com/fairlearn/fairlearn/pull/336/checks?check_run_id=558031246
However, I believe that that only works if the troublesome commit was the last one.

I'd like to be a part of that conversation if you don't mind. It doesn't give the contributor (in this case me) a feeling of inclusion when you come and say we talked and we decided we won't do it for the other ones. I think it's better to have a discussion, hear your concerns, and here the pro side, and see what we can do. I agree that the larger notebooks shouldn't be converted to an example, but those are perfect examples of something which should go to a user guide. At the end of the day, the website is where people find information, and those user guides are important for people to understand how the package works.

Understood, and my apologies for the lack of organisation on this my end.

@romanlutz
Copy link
Member

Yes, sorry, CEST time.

No problem. Invitation set.

after that ever-fun DCO issue is fixed
that'll mean force pushing and losing the history here, which I'm not super fond of, do we have to? You can still merge I guess w/o.

There's a suggested fix from the DCO bot:
https://github.com/fairlearn/fairlearn/pull/336/checks?check_run_id=558031246
However, I believe that that only works if the troublesome commit was the last one.

I'd like to be a part of that conversation if you don't mind. It doesn't give the contributor (in this case me) a feeling of inclusion when you come and say we talked and we decided we won't do it for the other ones. I think it's better to have a discussion, hear your concerns, and here the pro side, and see what we can do. I agree that the larger notebooks shouldn't be converted to an example, but those are perfect examples of something which should go to a user guide. At the end of the day, the website is where people find information, and those user guides are important for people to understand how the package works.

Understood, and my apologies for the lack of organisation on this my end.

DCO can be overridden so don't worry. I don't see your sign-off, @riedgar-ms, though, so I'll wait.

Copy link
Member
@riedgar-ms riedgar-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for showing us how to do this!

@romanlutz romanlutz merged commit 681389e into fairlearn:master Apr 6, 2020
@adrinjalali adrinjalali deleted the doc/sphinx branch April 7, 2020 08:07
@MiroDudik
Copy link
Member

[i'm playing with jupytext to try things out a bit]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

python tests instead of notebooks
6 participants
0