[Doc]: Update multiple category bar chart gallery examples #23465

scottshambaugh · 2022-07-22T02:11:12Z

Documentation Link

Applies to the following examples:

Problem

There are 2 issues here:

The examples shown work well with plotting 2 categories, but there is no obvious way to extend them to more than that (especially for the grouped bar chart ones).
There's an improvement to inclusivity to be made here with only 'men' and 'women' as the listed genders.

Suggested improvement

Update these gallery examples to extend easily to n categories.
Use an n=3 for the examples, with categories 'men', 'women', and 'other'.

For the grouped bar charts, this example handles the bar width for n categories well and can be adapted: https://matplotlib.org/devdocs/gallery/scales/log_bar.html#sphx-glr-gallery-scales-log-bar-py

story645 · 2022-07-22T02:27:42Z

So totally agree that expanding to 3 categories is a good way to show these generalize.

Use an n=3 for the examples, with categories 'men', 'women', and 'other'.

Since these examples use random/arbitrary data, I'd be far more comfortable using a different triplet - vegetable, mineral, animal? numpy, scipy, matplotlib? It's just that it's one thing if the example is visualizing a data set that bins folks into men/women/other, but I don't think we should be lumping together various somewhat disparate gender identities in service of an arbitrary example.

xref #23352

ETA: Penguins! The penguins data set is probably perfect for this https://allisonhorst.github.io/palmerpenguins/

scottshambaugh · 2022-07-22T02:48:20Z

Good to see that was already being discussed! If the categories are being switched up then n=2 is fine as long as it's easy to add additional ones, whoever implements this can choose an n that they think looks best.

story645 · 2022-07-22T03:13:27Z

as long as we have apolitical categories

Eh, I think it's great to have examples in the gallery that reinforce the project values as stated in the code of conduct and mission statement. My objection to gender is solely that we're unnecessarily enforcing arbitrary binning and I'd love a queer friendly example that didn't do that.

kostyafarber · 2022-11-09T13:34:29Z

Would be interested in working on this. Is there any clarification on what the steps for this would be?

Is it just extending these charts to n=3

As I gather the gender specific examples have been replaced with the apt tea-coffee comparison. Would this mean adding another beverage... or including another triplet?

8000

story645 · 2022-11-09T15:34:08Z

Adding another beverage could work, but @timhoffm raised the point that these examples would probably make more sense/be easier to follow if they were semantically meaningful.

My preference is that if these examples are reworked, then the underlying table being visualized should be very clear. I waffle between semantically meaningful and plot is purely self referential, kinda like the anatomy of matplotlib figure, but we don't really have many examples of the latter.

Eta: yeah I think expanding to 3 is fine - I think the goal is more to make it clear that the group # is arbitrary.

kostyafarber · 2022-11-10T08:24:31Z

Okay sounds good. So I guess what do we want to do going forward. Do I need to:

add a note somewhere that data is arbitrary?
expand to n=3
we agree on the type of third beverage (maybe juice?)

Or try find a dataset that isnt arbitrary.

story645 · 2022-11-11T08:03:22Z

Either option - 3rd beverage and semantically reasonable x/y/group or non-arbitrary dataset.

kostyafarber · 2022-11-11T08:27:42Z

Thanks @story645 ill try go for the non arbitrary dataset (I liked your suggestion of the penguin dataset)

kostyafarber · 2022-11-11T08:55:22Z

What would be the best way to import this dataset? There's a repo for it on GitHub and also some packages to easily obtain it but we don't want to add extra dependencies. We could also just store the csv but not sure of this as an option?

kostyafarber · 2022-11-11T10:52:26Z

UPDATE:

Had a play around with the penguins data set, trying to think of a way to visualise the data in this example, as a starting point:

I was thinking of:

making the x-axis as the island on which the species is found
y-axis as flipper_length_mm for example
the stacked bar grouped by species (which there are n=3 of)

However I found that not all islands have all three species so we wouldn't have the same three groups in each stacked bar. Don't know if that's an issue since all these examples have two groups in each bar.

I understand the underlying data in these examples are arbitrary but not sure if we wanted the groups in each bar to be equal for demonstration purposes.

Also, there are only three islands, which means the # of bars will be considerably smaller, not sure if this is desired or not.

There would be some wrangling involved in the dataset which would clutter the example, don't know if we could move this somewhere as a utility function or something? (obvs can't have it inline has the dataset is somewhat large)

I guess some guidance on what to do:

How we want to grab the data before we display anything
How close do the new non arbitrary graphs have to be to the current ones?

Happy to put something together and tweak as we go.

timhoffm · 2022-11-11T16:16:30Z

How we want to grab the data before we display anything

You can have a few hard-coded lines of data like in the current example. Minimally, two rows are sufficient:

group_1 = [1, 3, 7, ...]
group_2 = [2, 1, 2, ...]

How close do the new non arbitrary graphs have to be to the current ones?

The example should ill 8000 ustrate the respective concept and its use as concise as possible. As long as you achieve that (and possible to do that) you can change whatever you feel is necessary. For example, the error bars in the current example are definitively overkill for the first plot and should be removed.

story645 · 2022-11-11T16:19:05Z

There would be some wrangling involved in the dataset which would clutter the example

So honestly what I'd prefer in the example is the post wrangled table cause I think showing the underlying table could help make it easier to parse how to stack/group the table. & then link out to the full dataset.

And that's also true if you stick w/ beverages and I think the original numbers are like that for ease but I'd actually like a null value in some places to illustrate that condition.

scottshambaugh · 2022-11-11T16:19:33Z

My intent with opening the ticket was just hoping for an example where the bars were built with a for loop, but I think there’s a lot of creative freedom in what you want to show!

timhoffm · 2022-11-11T16:33:47Z

My intent with opening the ticket was just hoping for an example where the bars were built with a for loop, but I think there’s a lot of creative freedom in what you want to show!

That really depends on the actual example. Loops could makes sense for the stacked and grouped bar examples. Though, I hope to get rid of their need soon (xref #24313).

For the bar label example, the first one could be as simple as:

import matplotlib.pyplot as plt
import numpy as np

meals = ['Breakfast', 'Lunch', 'Dinner']
coffee_customers = (27, 35, 12)
tea_customers = (25, 20, 33)

fig, ax = plt.subplots()

p1 = ax.bar(meals, coffee_customers, width=0.6, label='Coffee')
p2 = ax.bar(meals, tea_customers, width=0.6, bottom=coffee_customers, label='Tea')
ax.bar_label(p1, label_type='center')
ax.bar_label(p2, label_type='center')

ax.set_title('Number of customers')
ax.legend()

plt.show()

Note that it's rather by coincidence that I found a concise example close to the existing data context. This was not a boundary condition. The point is to make the example simple and focused on the topic in question.

BTW: Anybody is welcome to put this into a dedicated PR.

kostyafarber · 2022-11-12T10:35:38Z

Came up with this as a mockup for the stacked bar chart (ideally I would add this to bar label example but I'm getting errors for trying to set labels on masked values).

islands = ['Biscoe', 'Dream', 'Torgersen']
adelie_means = (44, 56, 52)
gentoo_means = (124, 0 , 0)
chinstrap_means = (0, 68, 0)
width = 0.5

fig, ax = plt.subplots()

p1 = ax.bar(islands, adelie_means, width, label="Adelie")
p2 = ax.bar(islands, gentoo_means, width, bottom=adelie_means, label="Gentoo")
p3 = ax.bar(islands, chinstrap_means, width, bottom=adelie_means, label="Chinstrap")

ax.set_title('Numer of penguins by island')
ax.legend(loc='upper right')

plt.show()

In terms of adding a reference to the dataset in each of the examples, would it be better to do as a comment in the code or somewhere in the text description above the examples?

In terms of the reference itself, I've pulled in the dataset from here https://github.com/mcnakhaee/palmerpenguins but the OG raw data is originally published in a paper here, what would do we think would be more useful for the user to have reference to?

Finally is it worth mentioning that I am doing some simple transformations behind the scenes (simple groupbys) which result to those hardcoded values (the reason they are hardcoded is to not clutter the examples)?

I'll apply whatever we decide to do here in the rest of the examples and open a PR.

(Aside: there is a useful column that identifies the gender of the penguin that would make sense in one of these graphs, but don't want to reintroduce gender into the examples, but in saying that don't think the connotations apply in the context of penguins? What are everyone's thoughts?)

story645 · 2023-08-01T18:22:59Z

My intent with opening the ticket was just hoping for an example where the bars were built with a for loop, but I think there’s a lot of creative freedom in what you want to show!

This now exists so closing this issue.

scottshambaugh added the Documentation label Jul 22, 2022

kostyafarber mentioned this issue Nov 18, 2022

DOC: Update multiple category bar chart examples #24498

Merged

1 task

story645 closed this as completed Aug 1, 2023

QuLogic added this to the v3.7.0 milestone Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Doc]: Update multiple category bar chart gallery examples #23465

[Doc]: Update multiple category bar chart gallery examples #23465

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Doc]: Update multiple category bar chart gallery examples #23465

[Doc]: Update multiple category bar chart gallery examples #23465

Comments

Uh oh!

Documentation Link

Problem

Suggested improvement

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!