-
Notifications
You must be signed in to change notification settings - Fork 28
blog: add xarray in bio post #775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ianhi is attempting to deploy a commit to the xarray Team on Vercel. A member of the Team first needs to authorize it. |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This post is missing a clearer narrative arc. Your personal story is a good one:
- researcher doing bio without xarray,
- discovering it but not realising its generality,
- actually adopting it,
- finding it so useful you are now proselytizing it,
- wondering about why it hasn't caught on more widely in bio.
Following something like this would tie the disparate threads together better.
src/posts/xarray-biology/index.md
Outdated
summary: 'A discussion of how Xarray fits into Biological analysis workflows' | ||
--- | ||
|
||
If you are a biologist and work with array data (microscopy images, genomic sequences, or anything else you might currently analyze using NumPy). Then you've probably spent hours juggling metadata, battling unclear axes labels, and asking questions like “Why is there a transpose here?” Imagine a tool that will solve those frustrations for you. `Xarray` is that tool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to explicitly write who the target audience is (i.e. biologists who have not used xarray?)
src/posts/xarray-biology/index.md
Outdated
|
||
## What is Xarray and Why Should You Use it? | ||
|
||
Biological data almost always has rich context and metadata associated with the actual measurements. For example: sample conditions, genetic modifications in a well, timepoints, spatial coordinates. While `NumPy` is a powerful tool, it has limitations when it comes to working with these datasets. Selecting data based on array indices, rather than the physical values, can be confusing. You know you switched the buffer at 32 minutes, but which array index is that? Similarly, keeping track of which dimension is which can be difficult without labels. You have a five-dimensional array, but there are a few transposes in this code from last week, and now you don’t remember which axis is which in the output. Managing a collection of multiple related arrays with slightly different shapes can be tricky. Imagine sending data into a batch job and trying to keep segmentations and raw images together. Or maybe you’ve tried to follow poorly commented analysis from an interesting paper and gotten lost in the details? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what this "buffer" you speak of here is. So it's important to either explain it or make it clear before this point that I am not in the target audience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect undergrad and above biologists will know what i mean here, but its still worth it come up with a slightly different example and/or be more explicit in what I mean. Ideally if you can understand then every biologist will.
fwiw:
It is essentially swapping the liquid around the sample to a different liquid. Which will induce biological changes. Buffer has a precise meaning, but is also often used less precisely as I have here.
src/posts/xarray-biology/index.md
Outdated
|
||
<RawHTML filePath='/posts/xarray-biology/dataarray-repr.html' /> | ||
|
||
Just by looking at the `repr` you can probably understand a lot about the experiment without any explanation. You no longer need to mentally keep track of transposes, axis labels, and metadata because you can always check the current state! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well not really, because all the dimensions are single-letter names... Can you not use slightly more informative dimension names? (e.g. channel
instead of C
) With metadata/units in the attrs?
src/posts/xarray-biology/index.md
Outdated
|
||
Just by looking at the `repr` you can probably understand a lot about the experiment without any explanation. You no longer need to mentally keep track of transposes, axis labels, and metadata because you can always check the current state! | ||
|
||
Not only does this make it easier for you to develop your analysis, but it also makes your work much more easily understandable and discoverable by others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really seem true. Xarray isn't a file format or a distribution method. I would either drop this or elaborate by talking about Zarr / code sharing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I made this clear, but I'm getting at the use case of someone shares a notebook with you, and you are trying to decipher how it works. I suspect that this will be very compelling, several people have actually brought this up of their own accord to me. I will make this clearer.
src/posts/xarray-biology/index.md
Outdated
|
||
## What has limited adoption by Biologists? | ||
|
||
Given the benefits of switching to `Xarray`, why aren’t more biologists using it? Is it secretly not as good as this blog claims? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should put something like this up at the start. The thesis of this post is that
a) Xarray is a powerful tool for bio
b) Some bio people use it
c) A lot more bio people should use it
d) Ideas for how to make that happen
This thesis should be communicated early on and clearly.
src/posts/xarray-biology/index.md
Outdated
|
||
### Technical Barriers | ||
|
||
Once a potential user is convinced of `Xarray`'s value, they may still face technical barriers. Ranging from rough edges to missing features, however, none are insurmountable. An example of a rough edge is that, as of May 2025, you cannot use integers as keys in a `DataTree`. That is a problem, as integers are a natural key to use when tracking single cell lineages. Rough edges like this one haven’t been smoothed over yet because there has not been a user base of biologists using `Xarray`, discovering them, and raising issues to get them fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be two paragraphs. One introducing the idea that technical limitations might be preventing use by bio people, then start enumerating examples in a separate paragraph / list.
src/posts/xarray-biology/index.md
Outdated
|
||
### Data Loading/Lack of Integration | ||
|
||
Finally, we have been limited by a lack of integration with existing software tools. First, in loading the outputs of other tools into `Xarray` (do they have a `to_xarray` method?), and second, in other tools accepting `Xarray` arrays and using the extra features. For example, [Napari](https://napari.org/stable/) has had a long-standing [open issue](https://github.com/napari/napari/issues/14) about using `Xarray` to add extra information to dimension sliders. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a lack of integration though? Lack of funding? Lack of understanding of the benefits?
src/posts/xarray-biology/index.md
Outdated
|
||
**Support** other biologists learning to use Xarray. Respond to forum posts and help budding users, write and share small examples of using Xarray with biology data. Teach tutorials to your peers. | ||
|
||
My current role is an “Xarray community Developer” focusing on biological applications. So for my part, I’m always happy to talk to you about whether Xarray might be a good fit for your biology data. Please reach out if you have a question! I’m `@ianhi` on most platforms. You can also join our new Xarray in Biology office hours [LINK], or book some time with me to talk Xarray and Biology [here](https://calendly.com/ian-earthmover/30min). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your role is an important part of the narrative arc of the post.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with this. Let's move your own introduction to the intro. The arc could be:
- Hi, I'm Ian. I'm the new Xarray community dev, I'll be focusing on bio applications. btw, thanks CZI for the funding.
- In this post, I'll be surveying the landscape of Xarray for bio applications. This post is for you if: ...
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing so vercel will let me approve deployment
✅ Deploy Preview for xarraydev ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
||
### When Can You Use Xarray? | ||
|
||
As great as Xarray sounds, it does have limitations. Xarray is an array library; it's in the name! So, if your data is tabular and the tabular ecosystem is working well for you then keep using that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Xarray does handle collections of 1D arrays perfectly well though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have suggested a few changes. Looks good overall!
|
||
I have spent the last several months interviewing scientists and software developers across many fields of Biology. I also contributed biology related fixes to Xarray and Zarr, attended conferences and studied existing use cases of Xarray. These conversations and experiences are the basis of the research I have been doing on the how Xarray can be used in biological applications. | ||
|
||
This post contains a summary of my findings. I will introduce the concepts of Xarray at a high level with biological context and give examples where it is already in use. Then, based on the interviews I conducted I will explain what has limited adoption. Finally, I will describe what we (Biologists and Xarray contributors) can do to increase adoption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clear preview ++
Thank you @kmdalton for the review. Context for the non-biologists: Kevin is a bona-fide biologist who develops cutting edge computational tools for structural biology. I think this finished now. To include multiple links in the banner I changed the code a bit. We can always revert that later. But right now it looks like this: ![]() |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work, @ianhi!
Co-authored-by: Joe Hamman <jhamman1@gmail.com>
Co-authored-by: Joe Hamman <jhamman1@gmail.com>
A blog post primarily aimed at getting biologists interested in Xarray as well as laying out the start of a roadmap to
Hopefully this will accomplish the following:
Images:
I've played around a bit with chatgpt to generate some ideas of a fun image for the intro section. Didn't end up with anything I loved but some of the better options that came out of it were: