-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Reduce the size of some images in the documentation #17568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If the goal is to reduce the doc repo, we should not necessarily focus on large images, but rather on images that change at each build. Images can change at each build because of different random states or because of different run durations. For example, images in this example are about 30 KB, but they are updated with a change of 1-2 KB for each image at each commit, which can lead to much higher diff sizes than large images that are not updated very often. Also, if this doc repo is too large, is there a reason not to rewrite git history, e.g. squashing many bot commits ? |
This seems like a good candidate for first time contributors. I'm adding this to the pyladies sprint in Berlin. (The history on the website repo was removed in #21171 (comment)) To contributors: For each image, you first need to find the example. For instance, the example file for the file |
If the issue is the size of the docs repo, I suspect it's because we keep pushing dev docs into scikit-learn.github.io. If we host the dev docs in another repo and only push to scikit-learn.github.io when we release, then we can have scikit-learn.github.io be a manageable size. This is what matplotlib does with https://github.com/matplotlib/devdocs and https://github.com/matplotlib/matplotlib.github.com . For PNGs, different versions of matplotlib will not generate the same binary even with the same data. In the simple case, the PNG generated by matplotlib will have its version in the PNG's metadata. If we switch the backend to product SVGs, I think they can become more reproducible. For SVGs, even if there are changes they will be in plain text which is more manageable with git. In any case, I still think it is good to remove the randomness in the examples. |
I checked the folder
I don't see any big files there. Maybe we can close this issue... |
@dmitryhits sorry for the suggestion at the sprint that maybe we can close this. I just re-read the comments and in particular this one. It seems the problem isn't so much large images. So I think we can keep this open and maybe update the top comment to reflect the focus on removing randomness from examples. |
Me and @AnnaWey will work on this issue, starting with |
Me and @TamaraAtanasoska will continue with the plot_cluster_comparison.py |
@glemaitre this is not yet ready to be closed, there are 29 images left :) I will take them on slowly in multiple subsequent PRs. |
Indeed, it has been automatically close with the mention in the PR. Thanks @TamaraAtanasoska for noticing. Reopening. |
Here is a task list so we can keep track of where we stand with the issue, especially if someone else wants to join in. pinging @adrinjalali as we already talked about this. One question would be how do we update it? Do I post an updated list when I want to take on new files and you just copy paste it?
|
Closing this issue since all changes have been done (whenever possible). |
The documentation repository is becoming quite large (#17564 (comment)) and in particular there are 66MB of images. It might be worth checking if the size of lagest ones couldn't be reduced a bit by adjusting matplotlib options in examples. In the
_images/
folder,The text was updated successfully, but these errors were encountered: