8000 [Doc]: Update multiple category bar chart gallery examples · Issue #23465 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

[Doc]: Update multiple category bar chart gallery examples #23465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scottshambaugh opened this issue Jul 22, 2022 · 16 comments
Closed

[Doc]: Update multiple category bar chart gallery examples #23465

scottshambaugh opened this issue Jul 22, 2022 · 16 comments
Milestone

Comments

@scottshambaugh
Copy link
Contributor
scottshambaugh commented Jul 22, 2022

Documentation Link

Applies to the following examples:

Problem

There are 2 issues here:

  1. The examples shown work well with plotting 2 categories, but there is no obvious way to extend them to more than that (especially for the grouped bar chart ones).
  2. There's an improvement to inclusivity to be made here with only 'men' and 'women' as the listed genders.

image

Suggested improvement

  • Update these gallery examples to extend easily to n categories.
  • Use an n=3 for the examples, with categories 'men', 'women', and 'other'.

For the grouped bar charts, this example handles the bar width for n categories well and can be adapted: https://matplotlib.org/devdocs/gallery/scales/log_bar.html#sphx-glr-gallery-scales-log-bar-py

@story645
Copy link
Member
story645 commented Jul 22, 2022

So totally agree that expanding to 3 categories is a good way to show these generalize.

Use an n=3 for the examples, with categories 'men', 'women', and 'other'.

Since these examples use random/arbitrary data, I'd be far more comfortable using a different triplet - vegetable, mineral, animal? numpy, scipy, matplotlib? It's just that it's one thing if the example is visualizing a data set that bins folks into men/women/other, but I don't think we should be lumping together various somewhat disparate gender identities in service of an arbitrary example.

xref #23352

ETA: Penguins! The penguins data set is probably perfect for this https://allisonhorst.github.io/palmerpenguins/

@scottshambaugh
Copy link
Contributor Author
scottshambaugh commented Jul 22, 2022

Good to see that was already being discussed! If the categories are being switched up then n=2 is fine as long as it's easy to add additional ones, whoever implements this can choose an n that they think looks best.

@story645
Copy link
Member

as long as we have apolitical categories

Eh, I think it's great to have examples in the gallery that reinforce the project values as stated in the code of conduct and mission statement. My objection to gender is solely that we're unnecessarily enforcing arbitrary binning and I'd love a queer friendly example that didn't do that.

@kostyafarber
Copy link
Contributor
kostyafarber commented Nov 9, 2022

Would be interested in working on this. Is there any clarification on what the steps for this would be?

  • Is it just extending these charts to n=3

As I gather the gender specific examples have been replaced with the apt tea-coffee comparison. Would this mean adding another beverage... or including another triplet?

8000

@story645
Copy link
Member
story645 commented Nov 9, 2022

Adding another beverage could work, but @timhoffm raised the point that these examples would probably make more sense/be easier to follow if they were semantically meaningful.

My preference is that if these examples are reworked, then the underlying table being visualized should be very clear. I waffle between semantically meaningful and plot is purely self referential, kinda like the anatomy of matplotlib figure, but we don't really have many examples of the latter.

Eta: yeah I think expanding to 3 is fine - I think the goal is more to make it clear that the group # is arbitrary.

@kostyafarber
Copy link
Contributor
kostyafarber commented Nov 10, 2022

Okay sounds good. So I guess what do we want to do going forward. Do I need to:

  • add a note somewhere that data is arbitrary?

  • expand to n=3

  • we agree on the type of third beverage (maybe juice?)

Or try find a dataset that isnt arbitrary.

@story645
Copy link
Member

Either option - 3rd beverage and semantically reasonable x/y/group or non-arbitrary dataset.

@kostyafarber
Copy link
Contributor

Thanks @story645 ill try go for the non arbitrary dataset (I liked your suggestion of the penguin dataset)

@kostyafarber
Copy link
Contributor

What would be the best way to import this dataset? There's a repo for it on GitHub and also some packages to easily obtain it but we don't want to add extra dependencies. We could also just store the csv but not sure of this as an option?

@kostyafarber
Copy link
Contributor

UPDATE:

Had a play around with the penguins data set, trying to think of a way to visualise the data in this example, as a starting point:

image

I was thinking of:

  • making the x-axis as the island on which the species is found
  • y-axis as flipper_length_mm for example
  • the stacked bar grouped by species (which there are n=3 of)

However I found that not all islands have all three species so we wouldn't have the same three groups in each stacked bar. Don't know if that's an issue since all these examples have two groups in each bar.

I understand the underlying data in these examples are arbitrary but not sure if we wanted the groups in each bar to be equal for demonstration purposes.

Also, there are only three islands, which means the # of bars will be considerably smaller, not sure if this is desired or not.

There would be some wrangling involved in the dataset which would clutter the example, don't know if we could move this somewhere as a utility function or something? (obvs can't have it inline has the dataset is somewhat large)

I guess some guidance on what to do:

  • How we want to grab the data before we display anything
  • How close do the new non arbitrary graphs have to be to the current ones?

Happy to put something together and tweak as we go.

@timhoffm
Copy link
Member

How we want to grab the data before we display anything

You can have a few hard-coded lines of data like in the current example. Minimally, two rows are sufficient:

group_1 = [1, 3, 7, ...]
group_2 = [2, 1, 2, ...]

How close do the new non arbitrary graphs have to be to the current ones?

The example should ill 8000 ustrate the respective concept and its use as concise as possible. As long as you achieve that (and possible to do that) you can change whatever you feel is necessary. For example, the error bars in the current example are definitively overkill for the first plot and should be removed.

@story645
Copy link
Member

There would be some wrangling involved in the dataset which would clutter the example

So honestly what I'd prefer in the example is the post wrangled table cause I think showing the underlying table could help make it easier to parse how to stack/group the table. & then link out to the full dataset.

And that's also true if you stick w/ beverages and I think the original numbers are like that for ease but I'd actually like a null value in some places to illustrate that condition.

@scottshambaugh
Copy link
Contributor Author

My intent with opening the ticket was just hoping for an example where the bars were built with a for loop, but I think there’s a lot of creative freedom in what you want to show!

@timhoffm
Copy link
Member
timhoffm commented Nov 11, 2022

My intent with opening the ticket was just hoping for an example where the bars were built with a for loop, but I think there’s a lot of creative freedom in what you want to show!

That really depends on the actual example. Loops could makes sense for the stacked and grouped bar examples. Though, I hope to get rid of their need soon (xref #24313).

For the bar label example, the first one could be as simple as:

import matplotlib.pyplot as plt
import numpy as np

meals = ['Breakfast', 'Lunch', 'Dinner']
coffee_customers = (27, 35, 12)
tea_customers = (25, 20, 33)

fig, ax = plt.subplots()

p1 = ax.bar(meals, coffee_customers, width=0.6, label='Coffee')
p2 = ax.bar(meals, tea_customers, width=0.6, bottom=coffee_customers, label='Tea')
ax.bar_label(p1, label_type='center')
ax.bar_label(p2, label_type='center')

ax.set_title('Number of customers')
ax.legend()

plt.show()

image

Note that it's rather by coincidence that I found a concise example close to the existing data context. This was not a boundary condition. The point is to make the example simple and focused on the topic in question.

BTW: Anybody is welcome to put this into a dedicated PR.

@kostyafarber
Copy link
Contributor
kostyafarber commented Nov 12, 2022

Came up with this as a mockup for the stacked bar chart (ideally I would add this to bar label example but I'm getting errors for trying to set labels on masked values).

islands = ['Biscoe', 'Dream', 'Torgersen']
adelie_means = (44, 56, 52)
gentoo_means = (124, 0 , 0)
chinstrap_means = (0, 68, 0)
width = 0.5

fig, ax = plt.subplots()

p1 = ax.bar(islands, adelie_means, width, label="Adelie")
p2 = ax.bar(islands, gentoo_means, width, bottom=adelie_means, label="Gentoo")
p3 = ax.bar(islands, chinstrap_means, width, bottom=adelie_means, label="Chinstrap")

ax.set_title('Numer of penguins by island')
ax.legend(loc='upper right')

plt.show()

Figure_1

In terms of adding a reference to the dataset in each of the examples, would it be better to do as a comment in the code or somewhere in the text description above the examples?

In terms of the reference itself, I've pulled in the dataset from here https://github.com/mcnakhaee/palmerpenguins but the OG raw data is originally published in a paper here, what would do we think would be more useful for the user to have reference to?

Finally is it worth mentioning that I am doing some simple transformations behind the scenes (simple groupbys) which result to those hardcoded values (the reason they are hardcoded is to not clutter the examples)?

I'll apply whatever we decide to do here in the rest of the examples and open a PR.

(Aside: there is a useful column that identifies the gender of the penguin that would make sense in one of these graphs, but don't want to reintroduce gender into the examples, but in saying that don't think the connotations apply in the context of penguins? What are everyone's thoughts?)

@story645
Copy link
Member
story645 commented Aug 1, 2023

My intent with opening the ticket was just hoping for an example where the bars were built with a for loop, but I think there’s a lot of creative freedom in what you want to show!

This now exists so closing this issue.

@story645 story645 closed this as completed Aug 1, 2023
@QuLogic QuLogic added this to the v3.7.0 milestone Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
0