-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[Doc]: Update multiple category bar chart gallery examples #23465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So totally agree that expanding to 3 categories is a good way to show these generalize.
Since these examples use random/arbitrary data, I'd be far more comfortable using a different triplet - vegetable, mineral, animal? numpy, scipy, matplotlib? It's just that it's one thing if the example is visualizing a data set that bins folks into men/women/other, but I don't think we should be lumping together various somewhat disparate gender identities in service of an arbitrary example. xref #23352 ETA: Penguins! The penguins data set is probably perfect for this https://allisonhorst.github.io/palmerpenguins/ |
Good to see that was already being discussed! If the categories are being switched up then n=2 is fine as long as it's easy to add additional ones, whoever implements this can choose an n that they think looks best. |
Eh, I think it's great to have examples in the gallery that reinforce the project values as stated in the code of conduct and mission statement. My objection to gender is solely that we're unnecessarily enforcing arbitrary binning and I'd love a queer friendly example that didn't do that. |
Would be interested in working on this. Is there any clarification on what the steps for this would be?
As I gather the gender specific examples have been replaced with the apt tea-coffee comparison. Would this mean adding another beverage... or including another triplet? |
Adding another beverage could work, but @timhoffm raised the point that these examples would probably make more sense/be easier to follow if they were semantically meaningful. My preference is that if these examples are reworked, then the underlying table being visualized should be very clear. I waffle between semantically meaningful and plot is purely self referential, kinda like the anatomy of matplotlib figure, but we don't really have many examples of the latter. Eta: yeah I think expanding to 3 is fine - I think the goal is more to make it clear that the group # is arbitrary. |
Okay sounds good. So I guess what do we want to do going forward. Do I need to:
Or try find a dataset that isnt arbitrary. |
Either option - 3rd beverage and semantically reasonable x/y/group or non-arbitrary dataset. |
Thanks @story645 ill try go for the non arbitrary dataset (I liked your suggestion of the penguin dataset) |
What would be the best way to import this dataset? There's a repo for it on GitHub and also some packages to easily obtain it but we don't want to add extra dependencies. We could also just store the csv but not sure of this as an option? |
UPDATE: Had a play around with the penguins data set, trying to think of a way to visualise the data in this example, as a starting point: I was thinking of:
However I found that not all islands have all three species so we wouldn't have the same three groups in each stacked bar. Don't know if that's an issue since all these examples have two groups in each bar. I understand the underlying data in these examples are arbitrary but not sure if we wanted the groups in each bar to be equal for demonstration purposes. Also, there are only three islands, which means the # of bars will be considerably smaller, not sure if this is desired or not. There would be some wrangling involved in the dataset which would clutter the example, don't know if we could move this somewhere as a utility function or something? (obvs can't have it inline has the dataset is somewhat large) I guess some guidance on what to do:
Happy to put something together and tweak as we go. |
You can have a few hard-coded lines of data like in the current example. Minimally, two rows are sufficient:
The example should ill 8000 ustrate the respective concept and its use as concise as possible. As long as you achieve that (and possible to do that) you can change whatever you feel is necessary. For example, the error bars in the current example are definitively overkill for the first plot and should be removed. |
So honestly what I'd prefer in the example is the post wrangled table cause I think showing the underlying table could help make it easier to parse how to stack/group the table. & then link out to the full dataset. And that's also true if you stick w/ beverages and I think the original numbers are like that for ease but I'd actually like a null value in some places to illustrate that condition. |
My intent with opening the ticket was just hoping for an example where the bars were built with a |
That really depends on the actual example. Loops could makes sense for the stacked and grouped bar examples. Though, I hope to get rid of their need soon (xref #24313). For the bar label example, the first one could be as simple as:
Note that it's rather by coincidence that I found a concise example close to the existing data context. This was not a boundary condition. The point is to make the example simple and focused on the topic in question. BTW: Anybody is welcome to put this into a dedicated PR. |
Came up with this as a mockup for the stacked bar chart (ideally I would add this to bar label example but I'm getting errors for trying to set labels on masked values). islands = ['Biscoe', 'Dream', 'Torgersen']
adelie_means = (44, 56, 52)
gentoo_means = (124, 0 , 0)
chinstrap_means = (0, 68, 0)
width = 0.5
fig, ax = plt.subplots()
p1 = ax.bar(islands, adelie_means, width, label="Adelie")
p2 = ax.bar(islands, gentoo_means, width, bottom=adelie_means, label="Gentoo")
p3 = ax.bar(islands, chinstrap_means, width, bottom=adelie_means, label="Chinstrap")
ax.set_title('Numer of penguins by island')
ax.legend(loc='upper right')
plt.show() In terms of adding a reference to the dataset in each of the examples, would it be better to do as a comment in the code or somewhere in the text description above the examples? In terms of the reference itself, I've pulled in the dataset from here https://github.com/mcnakhaee/palmerpenguins but the OG raw data is originally published in a paper here, what would do we think would be more useful for the user to have reference to? Finally is it worth mentioning that I am doing some simple transformations behind the scenes (simple I'll apply whatever we decide to do here in the rest of the examples and open a PR. (Aside: there is a useful column that identifies the gender of the penguin that would make sense in one of these graphs, but don't want to reintroduce gender into the examples, but in saying that don't think the connotations apply in the context of penguins? What are everyone's thoughts?) |
This now exists so closing this issue. |
Uh oh!
There was an error while loading. Please reload this page.
Documentation Link
Applies to the following examples:
Problem
There are 2 issues here:
Suggested improvement
For the grouped bar charts, this example handles the bar width for n categories well and can be adapted: https://matplotlib.org/devdocs/gallery/scales/log_bar.html#sphx-glr-gallery-scales-log-bar-py
The text was updated successfully, but these errors were encountered: