8000 Add alt text to scikit-learn documentation · Issue #21214 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
8000

Add alt text to scikit-learn documentation #21214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
reshamas opened this issue Oct 1, 2021 · 16 comments
Open

Add alt text to scikit-learn documentation #21214

reshamas opened this issue Oct 1, 2021 · 16 comments

Comments

@reshamas
Copy link
Member
reshamas commented Oct 1, 2021

Describe the issue linked to the documentation

Adding alt text to images permits visually impaired users to have greater access.

Suggest a potential alternative/fix

About Alt Text

Alt text (alternative text), also known as "alt attributes," “alt descriptions," or technically incorrectly as "alt tags,” are used within an HTML code to describe the appearance and function of an image on a page.

Alt text uses:

  1. Adding alternative text to photos is first and foremost a principle of web accessibility. Visually impaired users using screen readers will be read an alt attribute to better understand an on-page image.

  2. Alt text will be displayed in place of an image if an image file cannot be loaded.

  3. Alt text provide better image context/descriptions to search engine crawlers, helping them to index an image properly.

Reference

Questions

  1. For scikit-learn, what is the maximum line length for writing alt text descriptions for images?
  2. Is there a way to do a grep of the library and see how many images exist in the documentation?
  3. Can you confirm that the images are static? They are produced from code, within the documentation. Are images always the same that are produced?
@adrinjalali
Copy link
Member

Images are not always the same. They change when an underlying model or method used to generate the data on the image changes. But they mostly stay the same.

Maximum line length would be the same as other places in the code, i.e. 88

I think this would be a nice improvement.

@thomasjpfan
Copy link
Member

Images are not always the same. They change when an underlying model or method used to generate the data on the image changes. But they mostly stay the same.

During code review, we need to keep in mind that the alt text needs to be updated if the image changes. I agree most of the time the images are the same.

Another concern is if alt text increases the barrier of entry. If a contributor is at a point of adding a new image, then it should not be too difficult to describe it. I think the hard part would be "What makes a good alt text", but that can be improved over time.

Overall, I think having alt text is a net improvement.

@reshamas
Copy link
Member Author
reshamas commented Oct 1, 2021

We were discussing this challenge last night, with @isabela-pf, @MarsBarLee, @InessaPawson, and others.

  • How descriptive should we be?
  • How much domain knowledge should be relevant?
  • Is it better to describe the shapes, or the interpretation of the graph?
  • Who is the audience? Is it a subject matter expert on that topic, or someone who is objectively looking at a visualization for first time, without specific context?

Example 1

This is my description of a heat map; it assumes the user knows what a heat map is.

The graph is a square heat map, 5x5, with axes from 0 to 250. The darkest 5 squares of heat map run diagonally from top left to bottom right.

Example 2

Here is another commit describing clustering.

Three distinct clusters created using affinity propagation.

We are curious to see how the core-devs review these alt text descriptions and if it's in agreement with your primary interpretation of the visualizations.

@reshamas
Copy link
Member Author
reshamas commented Oct 1, 2021

Also, this is for the next phase. Are the colors currently used in creating visualizations color-blind friendly?

@thomasjpfan
Copy link
Member
thomasjpfan commented Oct 1, 2021

Also, this is for the next phase. Are the colors currently used in creating visualizations color-blind friendly?

There has been work to updated our visualizations to be color-blind friendly: #5435 (The issue links to PRs that update visualizations)

@reshamas
Copy link
Member Author
reshamas commented Oct 1, 2021

Also, this is for the next phase. Are the colors currently used in creating visualizations color-blind friendly?

There has been work to updated our visualizations to be color-blind friendly: #5435

I think in addition to color specs for, say line plots, line types (dash, dotted) can also be considered.

@ogrisel
Copy link
Member
ogrisel commented Oct 4, 2021

In recent versions of sphinx-gallery the example thumbnails should now come with the title of the figure as alt-text:

sphinx-gallery/sphinx-gallery#668

This is better than nothing but not necessarily the best text description of the content of the image. In the context of the examples given in #21214 (comment) this should match approximately example 2, that is, give a high level interpretation of the content of the figure but not necessarily describe enough the actual content of the plot.

@ogrisel
Copy link
Member
ogrisel commented Oct 4, 2021

BTW sorry @lucyleeow for not giving you feedback on that PR, I missed the notification...

@lucyleeow
Copy link
Member

No worries, I'm sure you get many notifications!

@ogrisel
Copy link
Member
ogrisel commented Oct 4, 2021

For instance for:

https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_plusplus.html#sphx-glr-auto-examples-cluster-plot-kmeans-plusplus-py

The alt text is "K-Means++ Initialization" but it does not describe the actual content enough to draw the intented conclusions from the figure for the visually impaired. A more extensive description could be "Scatter plot with 4 well separated groups of data points in a 2D space. In each group of points, a candidate point has been selected as initial centroid by k-means++. While not lying at the center of the groups, the candidates found by k-means++ span all groups yielding a reasonably robust initialization for k-means on this data.". But then if we go deeper in this kind of analysis, maybe this should be in the text of the example to be visible by anybody, not just people reading the alt-text contents.

In particular I am not sure about how long should the alt-text be in general. I would suspect that one sentence would be good in many cases but maybe it can be helpful to write a small paragraph to highlight the interesting aspects of the figure that are not already stated in the surrounding text.

@ogrisel
Copy link
Member
ogrisel commented Oct 4, 2021

Note: that some (many?) examples have plots without title which means that there is no meaningful alt text for the gallery thumbnails of those examples. Adding missing titles to would help both people who read the alt-text and people who "read" the image itself.

Also: when an image generated by an example is manually reused to illustrate a paragraph the user guide, the title of the plot (if present) is not automatically reused in the user guide figure in that case (instead the filename is used which is not very helpful). See for instance the first image on this page:

https://scikit-learn.org/stable/_sources/modules/clustering.rst.txt

@reshamas
Copy link
Member Author
reshamas commented Oct 4, 2021

It seems like there is a lot of work that can be done here:

  1. Add titles to visualizations where they are missing.
  2. Add (more verbose) descriptions / interpretations to visualizations which will be accessible to all users.
  3. Add (brief) visualization descriptions/interpretations to alt text.

@lucyleeow
Copy link
Member

I think 2 & 3 may require added functionality to Sphinx Gallery? Let me know if you have something in mind, I am happy to add the functionality or have a think about how to allow this.

@ogrisel
Copy link
Member
ogrisel commented Oct 7, 2021

As I understand it, 2 is about improving the content of the examples themselves to make sure that the conclusions drawn from the plots in the main text of the example are made explicit enough to be understandable in the context of either watching the graphical plot itself or only the alt-text alone. So no change required in sphinx-gallery.

3 is primarily about adding alt-text (not necessarily based on the title of the figure), to the alt attribute of the figure sphinx directives in the restructured text in sections of the user guide (outside of the gallery). This could be done manually.

For the figures directly displayed in the rendering of the sphinx gallery itself, it could be interesting to add the possibility to write longer alt-text independently from the title of the figure.

...

x = np.linspace(-5, 5, 100)
plt.plot(x, x ** 2)
plt.title("Quadratic function")
# alt-text: A symmetric parabolic curve with a minimum at zero that represents
# a 1d quadratic function

...

Maybe sphinx-gallery could also provide a new sphinx directive to make it easy to insert a figure in the main text of the documentation from an image generated by the gallery that would automatically reuse the accompanying alt-text (either from the matplotlib figure title or from a structured Python comment).

It would also probably be useful to add an option in sphinx-gallery to report the list of examples with figures that lack a title, maybe via warnings that could be enabled or disabled in the sphinx configuration file for instance.

@isabela-pf
Copy link

I'm here to respond to some of the thoughts listed above. I'm always happy to see an active discussion around these topics as we have a lot less resources to draw from when it comes to writing alt text for scientific diagrams specifically.

Images are not always the same. They change when an underlying model or method used to generate the data on the image changes. But they mostly stay the same.

Responding to @adrinjalali, this is good to know! We had a few contributors to the alt text mini sprint 8000 mention the same thing when we were talking.

A key part of writing helpful alt text is understanding the role and information an image provides in its surrounding context. This means that the same image used in different places might benefit from different alt text depending what it's meant to illustrate in that instance. Or, in the case that you are talking about, plots with differing content may be perfectly fine with the same alt text if they still serve the same role in the documentation.

I'm not personally a user of scikit-learn (or anything similar), but when I was reading the docs my understanding is that many of the images are examples of a process described in the preceding paragraph. The images aren't usually adding new information (like a step-by-step guide on how to use the process) or relying on the reader to understand each point on the plot. So unless the type of plot or its axes are changing, I think this might be a less critical problem for this project.

(I had a similar discussion with people about variable content in the numpy-tutorials repo when we worked on alt text there.)

Another concern is if alt text increases the barrier of entry. If a contributor is at a point of adding a new image, then it should not be too difficult to describe it. I think the hard part would be "What makes a good alt text", but that can be improved over time.

Responding to @thomasjpfan, I agree on the making it easy for contributors. Some description is better than none (none usually just reads the name of the image), so if I were to one reviewing PRs I'd be looking for a non-empty alt attribute as the baseline.

As for "what makes good alt text," I have some resources that might help. The main resource I've found for plots (with the help of @MarsBarLee) is the Diagram center's checklist which asks for type of graph, axes, and points. Personally, I've found that we are usually dealing with far too many points to reasonably list out in the alt attribute, so I take this as a chance to describe the trend of the plot or any other defining features relevant to the text that surrounds it.

I also put together a guide collecting resources and guidelines for different types of images that we've used to help people new to alt text ignoring the few Jupyter-specific bits).

And this is the checklist we used during the alt text mini-sprint:

The alt text has
- [ ] Correct spelling and no typos
- [ ] Periods and commas where relevant
- [ ] No more than three short, complete sentences
- [ ] A logical way of fitting in the rest of the documentation
- [ ] Consistent text and/or descriptions for the same elements in different images
- [ ] A description of any text in the image

I hope that helps! Let me know if you have any other questions or there's other ways I can help.

@reshamas
Copy link
Member Author

I'm adding some resources from a presentation I attended by Wandke Consulting.

alt-text-tips

hyperlinks1

b-0779202048f4.png"

hyperlinks2

>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
0