10000 Add functionality to label individual bars with Axes.bar() by stefmolin · Pull Request #23525 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

Add functionality to label individual bars with Axes.bar() #23525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 18, 2022

Conversation

stefmolin
Copy link
Contributor
@stefmolin stefmolin commented Jul 30, 2022

PR Summary

Currently, if you need to label each bar in a plot say for an animation, you have to loop over the bars in the bar container that Axes.bar() returns and call set_label() on each bar. I have an example here in a workshop I deliver. When compared with stackplot() (which has a labels argument for this) this can be a gotcha for newcomers. There is a label key shown in the docs as available on the Rectangle, but it doesn't have the expected effect of labeling the bars, rather it labels the BarContainer:

>>> import matplotlib.pyplot as plt
>>> x = ["a", "b", "c"]
>>> y = [10, 20, 15]
>>> fig, ax = plt.subplots()
>>> bar_container = ax.barh(x, y, label=x)
>>> print([bar.get_label() for bar in bar_container])
['_nolegend_', '_nolegend_', '_nolegend_']
>>> bar_container.get_label()
"['a', 'b', 'c']

This PR adds a labels argument to Axes.bar(), which makes it possible to easily label each bar and color them differently, making it possible to create a legend immediately after calling the bar()/barh() method.

x = ["a", "b", "c"]
y = [10, 20, 15]

fig, ax = plt.subplots()
_ = ax.barh(x, y, labels=x)
ax.legend()

Screen Shot 2022-07-30 at 6 09 23 PM

Default color behavior is preserved when labels isn't passed in:

x = ["a", "b", "c"]
y = [10, 20, 15]

fig, ax = plt.subplots()
_ = ax.barh(x, y)

Screen Shot 2022-07-30 at 6 12 38 PM

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

@jklymak
Copy link
Member
jklymak commented Jul 30, 2022

Thanks for the PR. First there is already a colors kwarg for bars, so how does this interact with that? Secondly there is tick_label kwarg that seems to be what this pr is suggesting? Can you clarify how this is different?

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.barh.html

@jklymak jklymak added the status: needs clarification Issues that need more information to resolve. label Jul 30, 2022
@stefmolin
Copy link
Contributor Author
stefmolin commented Jul 30, 2022

This PR populates the label attribute on each of the bars in the bar container (i.e., to be able to call get_label() on the bar). The current behavior gives them all a label of _nolegend_:

>>> bar_container = ax.barh(x, y, label=x)
>>> print([bar.get_label() for bar in bar_container])
['_nolegend_', '_nolegend_', '_nolegend_']

When color isn't provided, passing in labels will now cycle through the colors:

x = ["a", "b", "c"]
y = [10, 20, 15]

fig, ax = plt.subplots()
_ = ax.barh(x, y, labels=["Apple", "Banana", "Cherry"])
ax.legend()

Screen Shot 2022-07-30 at 6 54 30 PM

When labels isn't provided, the colors behave exactly as they did before:

x = ["a", "b", "c"]
y = [10, 20, 15]

fig, ax = plt.subplots()
_ = ax.barh(x, y)

Screen Shot 2022-07-30 at 6 12 38 PM

If you pass in both:

fig, ax = plt.subplots()
bar_container = ax.barh(x, y, labels=["Apple", "Banana", "Cherry"], color=['blue', 'red', 'orange'])
ax.legend()

Screen Shot 2022-07-30 at 6 54 58 PM

@jklymak
Copy link
Member
jklymak commented Jul 30, 2022

Thanks I see. Do people want a legend if the bars are already labeled via the ticks?

@stefmolin
Copy link
Contributor Author
stefmolin commented Jul 30, 2022

My main use case was actually building animations. I use the get_label() to make sure I have the correct bar (e.g., this animation). The legend makes it easier to explain what is going on for the proposed changes.

@tacaswell
Copy link
Member

I'm of two minds on this.

On one hand I see how much nicer this is that having to do the loop outside and I can totally see a use case for setting the legend and dropping the ticks / axis all together. I also see the analogy to stack plot (even if it is a bit rough because stackplots takes a sequence of sequences of scalars and bar only takes a sequence of scalars and a better analogy to stack plot would be extending bar to make stacked bar charts).

On the other hand I am worried about stacking yet more complexity into the public APIs!

I think in addition colors, labels will need to deconfilct with tick_labels (can you pass both? if you pass one is the other implied? do they have to match? do we need a way to ask for them to match?), the plain label (can you pass both labels and label? I can see arguments for both yes and no!), and the ax.bar_label method (which might need a way to ask the bars what their labels are now?).


Even if we do not take this, this is nice work. Thank you for a fully documented and tested PR out of the gate @stefmolin !


ignore the linting error #23527 will fix it.

@stefmolin
Copy link
Contributor Author

I think in addition colors, labels will need to deconfilct with tick_labels (can you pass both? if you pass one is the other implied? do they have to match? do we need a way to ask for them to match?), the plain label (can you pass both labels and label? I can see arguments for both yes and no!) and the ax.bar_label method (which might need a way to ask the bars what their labels are now?).

Initially, I was trying to match the API of stackplot, but I definitely understand the concerns of making the API more complicated. For my use case, it would be perfectly acceptable for tick_labels to be used to label the bars.

My change to the bar colors was to make the legend in my examples make sense. So if we are more comfortable with just using the tick_labels already going on the axis to label the bars and not touching anything else, I'm happy to simplify the logic here 😄

@tacaswell
Copy link
Member

If we promote tick_label to also label the bars I think that would break cases like:

import matplotlib.pyplot as plt

x = [1, 2, 3]
y1 = [1, 5, 7]
y2 = [3, 1, 6]

fig, ax = plt.subplots()
ax.bar(x, y1, label='G1', tick_label=['a', 'b', 'c'])
ax.bar(x, y2, bottom=y1, label='G2', tick_label=['a', 'b', 'c'])
ax.legend()

so

Maybe only do it if the overall bar does not have a label? Maybe make it opt-in like ax.bar(..., use_tick_label_as_bar_label=True) (but with a better name)?

@stefmolin
Copy link
Contributor Author

Good point. Another option would be to prefix the individual bar labels with _nolegend_ and use namespacing like these for the blue ones in your example:

['_nolegend_:G1:a', '_nolegend_:G1:b', _nolegend_:G1:c']

so essentially naming everything in the case of stacked bars as _nolegend_:{label}:{tick_label}.

That way they don't show up in the legend, and at the same time, they have unique labels.

@timhoffm
Copy link
Member

I feel that making bars indiviudally configurable was an overreach of the API of bar(), which we should not have done in the first place. That'd better been a seperate function.
But now that we are down the road, we can carefully expand - though I will not give a free-for-all ticket on individual customization.

The minimal (and possibly reasonable) extension is label supporting a list of labels (of matching length) that are assigned to the individual bars. Period.
I oppose auto-switching to color-cycling. bar() is primarily intended for same-style bars. Any bar-individual customiztation should be explicit.

I'm very sceptical on mixing with tick_labels. These are conecptually different things. Mixing them complicates things and I don't see a benefit. IMHO users rarely need tick_labels and a legend. And if they do, they can pass the list to both parameters.

@timhoffm
Copy link
Member

Another option would be to prefix the individual bar labels with _nolegend_ and use namespacing

We guarantee that labels starting with an underscore are not drawn in the legend:

Specific lines can be excluded from the automatic legend element selection by defining a label starting with an underscore.

@stefmolin If your only conern is giving unique IDs to bars, you can define any label you want starting with an underscore for this. With the list-of-labels API suggested above, you can easily do that - and decide yourself how your IDs look like.
I'm not clear if you propose the namespacing as a concept or automatism in matplotlib, but just to clarify, I don't think we want or need that complexity.

@stefmolin
Copy link
Contributor Author

@timhoffm - That logic was if we were going to use the tick_labels to automatically label the bars. I agree that there is no need to impose any such logic on Matplotlib if we pass a list to label. I'll update my implementation to do just do that.

@stefmolin
Copy link
Contributor Author

Here are some examples of the new implementation. Note that colors are no longer altered.

  1. Passing a list of labels:
>>> import matplotlib.pyplot as plt
>>>
>>> fig, ax = plt.subplots()
>>> bar_container = ax.barh(
...     ["a", "b", "c"],
...     [10, 20, 15],
...     label=["Apple", "Banana", "Cherry"]
... )
>>> [bar.get_label() for bar in bar_container]
['Apple', 'Banana', 'Cherry']
  1. Plotting a single bar
>>> import matplotlib.pyplot as plt
>>>
>>> fig, ax = plt.subplots()
>>> bar_container = ax.barh(
...     "a",
...     10,
...     label="Apple"
... )
>>> [bar.get_label() for bar in bar_container]
['Apple']
  1. Not passing in labels:
>>> import matplotlib.pyplot as plt
>>>
>>> fig, ax = plt.subplots()
>>> bar_container = ax.barh(
...     ["a", "b", "c"],
...     [10, 20, 15]
... )
>>> [bar.get_label() for bar in bar_container]
['_nolegend_', '_nolegend_', '_nolegend_']
  1. Plotting a stacked bar plot
>>> import matplotlib.pyplot as plt
>>> import itertools
>>>
>>> x = [1, 2, 3]
>>> y1 = [1, 5, 7]
>>> y2 = [3, 1, 6]
>>> 
>>> fig, ax = plt.subplots()
>>> bar_container1 = ax.bar(
...     x, y1, label='G1', tick_label=['a', 'b', 'c']
... )
>>> bar_container2 = ax.bar(
...     x, y2, bottom=y1, label='G2', tick_label=['a', 'b', 'c']
... )
>>> [
...     bar.get_label()
...     for bar in itertools.chain(bar_container1, bar_container2)
... ]
['_nolegend_',
 '_nolegend_',
 '_nolegend_',
 '_nolegend_',
 '_nolegend_',
 '_nolegend_']

@timhoffm timhoffm removed the status: needs clarification Issues that need more information to resolve. label Aug 17, 2022
Copy link
Member
@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the label behavior grew more complex now, it deseves explicit mention
in the Other Parameters section of the docstring (preferably right below tick_label.

I suggest something like:

label : str or list of str, optional
    A single label is attached to the resulting BarContainer as a
    label for the whole dataset.
    If a list is given, it must be the same length as *x* and
    labels the individual bars. For example this may used with
    lists of *color*.

@story645
Copy link
Member
story645 commented Aug 18, 2022

Definitely support this feature, but curious about the behavior where multiple bars that are styled the same way share a label? My bias is that a feature like this could/would be used in conjunction w/ tick labels to do some sort of grouping

fig, ax = plt.subplots()

x = ['a', 'b', 'c']
y = [2, 1, 3]
l = ['A', 'B', 'A']
c = ['tab:orange', 'tab:blue', 'tab:orange']

ax.bar(x, y, label=l, color=c)

ax.legend()

I tried to pull this branch and test against it, but I could be wrong, and this is what I got:
image

and I think the optimal behavior would be something like:
image

but I wonder about implementation complexity - I think it's something like check which bars have the same label and the same vectorized properties (color, edgecolor, linewidth) and then only label the first bar? Would there be a problem w/ making the duplicates no-legend?

@timhoffm
Copy link
Member

The fourth element (['A', 'B', 'A']) in @story645's first plot is indeed a bug. label should be mapped either to the individual patches or to the BarContainer itself, but not both.

I advise against trying to automatically filter duplicates. That's tedious due to normalization. It's also a bit magical, as the entries legend entries are associated with the bars, if you filter duplicates out, technically some bars don't have a label, e.g. 'a' would be associtated with 'A' but 'c' wouldn't - it just looks the same. You could even break that by re-styling 'a' now. Then the legend would follow, but 'c' would not.
Instead, you could explicitly use ['A', 'B', '_nolegend_'], and then you know what is happening.

@story645
Copy link
Member
story645 commented Aug 18, 2022

I advise against trying to automatically filter duplicates.

I won't block if labels are repeated in the legend, but I think then this choice has to be clearly documented as I expect it to be a follow up feature request.

Instead, you could explicitly use ['A', 'B', '_nolegend_'], and then you know what is happening.

I'd be ok w/ this being the example of how to use this keyword to do grouping, but I think it'd be worth either expanding one of the gallery examples or adding a new one discussing this.

Also, it seems like at least one image test wouldn't hurt.

@timhoffm
Copy link
Member

think then this choice has to be clearly documented as I expect it to be a follow up feature request.

I'm fine with documenting that the behavior for repeated labels is not defined and may change in the future.

Also, it seems like at least one image test wouldn't hurt.

For now, the expected behavior is exactly defined by testing the labels of the individual bars and the label of the BarContainer: "Where does the information go?". Every Artistvwith a label shows up in the legend. There's no additional magic here that needs testing as an image.

@stefmolin
Copy link
Contributor Author

I addressed the comments and fixed that bug:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()

x = ['a', 'b', 'c']
y = [2, 1, 3]
l = ['A', 'B', 'A']
c = ['tab:orange', 'tab:blue', 'tab:orange']

ax.bar(x, y, label=l, color=c)

ax.legend()

Screen Shot 2022-08-18 at 8 43 33 AM

Instead, you could explicitly use ['A', 'B', '_nolegend_'], and then you know what is happening.

I'd be ok w/ this being the example of how to use this keyword to do grouping, but I think it'd be worth either expanding one of the gallery examples or adding a new one discussing this.

Can you provide some additional information on this?

@story645
Copy link
Member
story645 commented Aug 18, 2022

There's no additional magic here that needs testing as an image.

Yeah I didn't quite grok how to test the double labeling issue, but I like @stefmolin adding it to the code tests better than an image test.

Can you provide some additional information on this?

I think this new keyword argument could be more discoverable with an addition to the gallery in the lines-bars-and-markers section showing 1) the use of this keyword 2) the use of this keyword with a mix of labels and no legend. The latter could also show off the list of colors, which is another keyword we don't have an explicit example for. Granted I can also spin thus request off into a follow up issue so this is another non-blocking request.

@tacaswell
Copy link
Member

@stefmolin could you rebase this to squash out the adding / removed API change note?

@tacaswell tacaswell added this to the v3.6.0 milestone Aug 18, 2022
Copy link
Member
@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minus the remaining comments.

Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>
@stefmolin
Copy link
Co FEB8 ntributor Author

@tacaswell - I rebased to remove those changes.

I also incorporated the change to the docstring as suggested. Linting is failing from the latest changes on master after the rebase.

I think this new keyword argument could be more discoverable with an addition to the gallery in the lines-bars-and-markers section showing 1) the use of this keyword 2) the use of this keyword with a mix of labels and no legend. The latter could also show off the list of colors, which is another keyword we don't have an explicit example for. Granted I can also spin thus request off into a follow up issue so this is another non-blocking request.

@story645 - Should I move forward with this in a separate PR?

@story645
Copy link
Member

@story645 - Should I move forward with this in a separate PR?

Yes, that would be awesome!

@QuLogic QuLogic merged commit e68c1e8 into matplotlib:main Aug 18, 2022
@QuLogic
Copy link
Member
QuLogic commented Aug 18, 2022

I squash-merged, as I don't think we need the history of no-longer-implemented functionality.

@stefmolin stefmolin deleted the bar-labels branch August 18, 2022 23:08
@stefmolin stefmolin mentioned this pull request Aug 19, 2022
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0