8000 blog: add xarray in bio post by ianhi · Pull Request #775 · xarray-contrib/xarray.dev · GitHub
[go: up one dir, main page]

Skip to content

blog: add xarray in bio post #775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jun 6, 2025
Merged

Conversation

ianhi
Copy link
Contributor
@ianhi ianhi commented May 1, 2025

A blog post primarily aimed at getting biologists interested in Xarray as well as laying out the start of a roadmap to

Hopefully this will accomplish the following:

  1. Get some biologists interested
  2. Drive some traffic to Xarray biology office hours/send interested but wary people my way
  3. be a shareable thing to send to someone you meet doing bio who is interested but hasn't seen their colleagues using Xarray
  4. Serve as a jumping off point for more detailed roadmap style plans with concrete actions for better Xarray support for biological field.

Images:

I've played around a bit with chatgpt to generate some ideas of a fun image for the intro section. Didn't end up with anything I loved but some of the better options that came out of it were:

image

image

image

Copy link
vercel bot commented May 1, 2025

@ianhi is attempting to deploy a commit to the xarray Team on Vercel.

A member of the Team first needs to authorize it.

@ianhi ianhi changed the title [DRAFT] blog: add start of xarray in bio post blog: add xarray in bio post May 5, 2025
@ianhi ianhi marked this pull request as ready for review May 5, 2025 05:10
Copy link
vercel bot commented May 5, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
xarray-dev ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 5, 2025 8:59pm

Copy link
Member
@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This post is missing a clearer narrative arc. Your personal story is a good one:

  • researcher doing bio without xarray,
  • discovering it but not realising its generality,
  • actually adopting it,
  • finding it so useful you are now proselytizing it,
  • wondering about why it hasn't caught on more widely in bio.

Following something like this would tie the disparate threads together better.

summary: 'A discussion of how Xarray fits into Biological analysis workflows'
---

If you are a biologist and work with array data (microscopy images, genomic sequences, or anything else you might currently analyze using NumPy). Then you've probably spent hours juggling metadata, battling unclear axes labels, and asking questions like “Why is there a transpose here?” Imagine a tool that will solve those frustrations for you. `Xarray` is that tool.
Copy link
Member
8000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to explicitly write who the target audience is (i.e. biologists who have not used xarray?)


## What is Xarray and Why Should You Use it?

Biological data almost always has rich context and metadata associated with the actual measurements. For example: sample conditions, genetic modifications in a well, timepoints, spatial coordinates. While `NumPy` is a powerful tool, it has limitations when it comes to working with these datasets. Selecting data based on array indices, rather than the physical values, can be confusing. You know you switched the buffer at 32 minutes, but which array index is that? Similarly, keeping track of which dimension is which can be difficult without labels. You have a five-dimensional array, but there are a few transposes in this code from last week, and now you don’t remember which axis is which in the output. Managing a collection of multiple related arrays with slightly different shapes can be tricky. Imagine sending data into a batch job and trying to keep segmentations and raw images together. Or maybe you’ve tried to follow poorly commented analysis from an interesting paper and gotten lost in the details?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what this "buffer" you speak of here is. So it's important to either explain it or make it clear before this point that I am not in the target audience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect undergrad and above biologists will know what i mean here, but its still worth it come up with a slightly different example and/or be more explicit in what I mean. Ideally if you can understand then every biologist will.

fwiw:
It is essentially swapping the liquid around the sample to a different liquid. Which will induce biological changes. Buffer has a precise meaning, but is also often used less precisely as I have here.


<RawHTML filePath='/posts/xarray-biology/dataarray-repr.html' />

Just by looking at the `repr` you can probably understand a lot about the experiment without any explanation. You no longer need to mentally keep track of transposes, axis labels, and metadata because you can always check the current state!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well not really, because all the dimensions are single-letter names... Can you not use slightly more informative dimension names? (e.g. channel instead of C) With metadata/units in the attrs?


Just by looking at the `repr` you can probably understand a lot about the experiment without any explanation. You no longer need to mentally keep track of transposes, axis labels, and metadata because you can always check the current state!

Not only does this make it easier for you to develop your analysis, but it also makes your work much more easily understandable and discoverable by others.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really seem true. Xarray isn't a file format or a distribution method. I would either drop this or elaborate by talking about Zarr / code sharing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I made this clear, but I'm getting at the use case of someone shares a notebook with you, and you are trying to decipher how it works. I suspect that this will be very compelling, several people have actually brought this up of their own accord to me. I will make this clearer.


## What has limited adoption by Biologists?

Given the benefits of switching to `Xarray`, why aren’t more biologists using it? Is it secretly not as good as this blog claims?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should put something like this up at the start. The thesis of this post is that
a) Xarray is a powerful tool for bio
b) Some bio people use it
c) A lot more bio people should use it
d) Ideas for how to make that happen

This thesis should be communicated early on and clearly.


### Technical Barriers

Once a potential user is convinced of `Xarray`'s value, they may still face technical barriers. Ranging from rough edges to missing features, however, none are insurmountable. An example of a rough edge is that, as of May 2025, you cannot use integers as keys in a `DataTree`. That is a problem, as integers are a natural key to use when tracking single cell lineages. Rough edges like this one haven’t been smoothed over yet because there has not been a user base of biologists using `Xarray`, discovering them, and raising issues to get them fixed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be two paragraphs. One introducing the idea that technical limitations might be preventing use by bio people, then start enumerating examples in a separate paragraph / list.


### Data Loading/Lack of Integration

Finally, we have been limited by a lack of integration with existing software tools. First, in loading the outputs of other tools into `Xarray` (do they have a `to_xarray` method?), and second, in other tools accepting `Xarray` arrays and using the extra features. For example, [Napari](https://napari.org/stable/) has had a long-standing [open issue](https://github.com/napari/napari/issues/14) about using `Xarray` to add extra information to dimension sliders.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a lack of integration though? Lack of funding? Lack of understanding of the benefits?


**Support** other biologists learning to use Xarray. Respond to forum posts and help budding users, write and share small examples of using Xarray with biology data. Teach tutorials to your peers.

My current role is an “Xarray community Developer” focusing on biological applications. So for my part, I’m always happy to talk to you about whether Xarray might be a good fit for your biology data. Please reach out if you have a question! I’m `@ianhi` on most platforms. You can also join our new Xarray in Biology office hours [LINK], or book some time with me to talk Xarray and Biology [here](https://calendly.com/ian-earthmover/30min).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your role is an important part of the narrative arc of the post.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this. Let's move your own introduction to the intro. The arc could be:

  • Hi, I'm Ian. I'm the new Xarray community dev, I'll be focusing on bio applications. btw, thanks CZI for the funding.
  • In this post, I'll be surveying the landscape of Xarray for bio applications. This post is for you if: ...

@ianhi ianhi force-pushed the xarray-bio-roadmap branch from 4e10ba8 to 2f50fd5 Compare May 8, 2025 20:45
Copy link
Member
@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing so vercel will let me approve deployment

Copy link
netlify bot commented Jun 3, 2025

Deploy Preview for xarraydev ready!

Name Link
🔨 Latest commit e735dc1
🔍 Latest deploy log https://app.netlify.com/projects/xarraydev/deploys/6841fe18ab558a0008ac794c
😎 Deploy Preview https://deploy-preview-775--xarraydev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.


### When Can You Use Xarray?

As great as Xarray sounds, it does have limitations. Xarray is an array library; it's in the name! So, if your data is tabular and the tabular ecosystem is working well for you then keep using that!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xarray does handle collections of 1D arrays perfectly well though.

Copy link
@kmdalton kmdalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have suggested a few changes. Looks good overall!


I have spent the last several months interviewing scientists and software developers across many fields of Biology. I also contributed biology related fixes to Xarray and Zarr, attended conferences and studied existing use cases of Xarray. These conversations and experiences are the basis of the research I have been doing on the how Xarray can be used in biological applications.

This post contains a summary of my findings. I will introduce the concepts of Xarray at a high level with biological context and give examples where it is already in use. Then, based on the interviews I conducted I will explain what has limited adoption. Finally, I will describe what we (Biologists and Xarray contributors) can do to increase adoption.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear preview ++

@ianhi ianhi force-pushed the xarray-bio-roadmap branch from d3ce5ce to 6f652f5 Compare June 5, 2025 16:01
@ianhi
Copy link
Contributor Author
ianhi commented Jun 5, 2025

Thank you @kmdalton for the review. Context for the non-biologists: Kevin is a bona-fide biologist who develops cutting edge computational tools for structural biology.


I think this finished now.

To include multiple links in the banner I changed the code a bit. We can always revert that later. But right now it looks like this:

image

@ianhi ianhi force-pushed the xarray-bio-roadmap branch from 830db32 to f6e6ed9 Compare June 5, 2025 16:12
Copy link
@kmdalton kmdalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, @ianhi!

ianhi and others added 2 commits June 5, 2025 12:53
Co-authored-by: Joe Hamman <jhamman1@gmail.com>
Co-authored-by: Joe Hamman <jhamman1@gmail.com>
@jhamman jhamman merged commit 0fe2627 into xarray-contrib:main Jun 6, 2025
7 checks passed
@ianhi ianhi deleted the xarray-bio-roadmap branch June 6, 2025 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0