Abstract
The growth of omic data presents evolving challenges in data manipulation, analysis and integration. Addressing these challenges, Bioconductor provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming offers a revolutionary data organization and manipulation standard. Here we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analyzing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas, spanning six data frameworks and ten analysis tools.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Human Cell Atlas peripheral blood mononuclear single-cell data were downloaded from the CELLxGENE database. The relative weblink for each sample is listed in Supplementary Table 1. The samples analyzed are accessible at the Human Cell Atlas. Metadata and gene-transcript abundance for these datasets from the CuratedAtlasQuery database is accessible at sample_metadata.0.2.3.parquet. CELLxGENE sample accession codes are available in Supplementary Table 1. Source data are provided with this paper.
Code availability
The tidyomics homepage is https://github.com/tidyomics31, which provides links to the constituent packages. The tidyomics meta-package is available at Bioconductor bioconductor.org/packages/tidyomics/. The tidySummarizedExperiment package is available at Bioconductor bioconductor.org/packages/tidySummarizedExperiment. The tidySingleCellExperiment package is available at Bioconductor bioconductor.org/packages/tidySingleCellExperiment. The tidySpatialExperiment package is available at Bioconductor bioconductor.org/packages/tidySpatialExperiment/. The code used to benchmark workflow efficiency and analyze peripheral blood mononuclear cells from the Human Cell Atlas is available at github.com/tidyomics/tidyomics_paper. Source data for Fig. 2h are available at github.com/tidyomics/tidyomics_paper.
References
Tarazona, S., Arzalluz-Luque, A. & Conesa, A. Undisclosed, unmet and neglected challenges in multi-omics studies. Nat. Comput. Sci. 1, 395–402 (2021).
Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Wickham, H. et al. Welcome to the Tidyverse. J. Open Source Softw. 4, 1686 (2019).
Li, P. Computation and Visualization of Package Download Counts and Percentiles [R package packageRank version 0.8.3] (R Project, 2023).
Çetinkaya-Rundel, M. et al. An educator’s perspective of the tidyverse. Preprint at https://doi.org/10.48550/arXiv.2108.03510 (2021).
Lee, S., Cook, D. & Lawrence, M. plyranges: a grammar of genomic data transformation. Genome Biol. 20, 4 (2019).
Mangiola, S., Doyle, M. A. & Papenfuss, A. T. Interfacing Seurat with the R tidy universe. Bioinformatics https://doi.org/10.1093/bioinformatics/btab404 (2021).
Mangiola, S., Molania, R., Dong, R., Doyle, M. A. & Papenfuss, A. T. tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biol. 22, 42 (2021).
Mu, W. et al. bootRanges: flexible generation of null sets of genomic ranges for hypothesis testing. Bioinformatics 39, btad190 (2023).
Keyes, T. J., Koladiya, A., Lo, Y.-C., Nolan, G. P. & Davis, K. L. tidytof: a user-friendly framework for scalable and reproducible high-dimensional cytometry data analysis. Bioinform. Adv. 3, vbad071 (2023).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Davis, E. S. et al. matchRanges: generating null hypothesis genomic ranges via covariate-matched sampling. Bioinformatics 39, btad197 (2023).
Amezquita, R. A. et al. Orchestrating single-cell analysis with Bioconductor. Nat. Methods 17, 137–145 (2020).
Ko, M. E. et al. FLOW-MAP: a graph-based, force-directed layout algorithm for trajectory mapping in single-cell time course datasets. Nat. Protoc. 15, 398–420 (2020).
Righelli, D. et al. SpatialExperiment: infrastructure for spatially-resolved transcriptomics data in R using Bioconductor. Bioinformatics 38, 3128–3131 (2022).
Wang, Y. et al. Spatial transcriptomics: technologies, applications and experimental considerations. Genomics 115, 110671 (2023).
Rozenblatt-Rosen, O. et al. Building a high-quality Human Cell Atlas. Nat. Biotechnol. 39, 149–153 (2021).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).
Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 20, 163–172 (2019).
Fernández, J. M. et al. The BLUEPRINT Data Analysis Portal. Cell Syst 3, 491–495.e5 (2016).
Xu, W. et al. Mapping of γ/δ T cells reveals Vδ2+ T cells resistance to senescence. EBioMedicine 39, 44–58 (2019).
Law, C. W. et al. RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR. F1000Res https://doi.org/10.12688/f1000research.9005.3 (2016).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Lewis, M., Goldmann, K., Sciacca, E., Cubut, C. & Surace, A. glmmSeq: General Linear Mixed Models for Gene-Level Differential Expression (glmmSeq: General Linear, 2022).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Wickham, H. Ggplot2: Elegant Graphics for Data Analysis (Springer, 2016).
International Multiple Sclerosis Genetics Consortium. Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science 365, eaav7188 (2019).
Wang, Y.-F. et al. Identification of 38 novel loci for systemic lupus erythematosus and genetic heterogeneity between ancestral groups. Nat. Commun. 12, 772 (2021).
Mangiola, S. et al. A multi-organ map of the human immune system across age, sex and ethnicity. Preprint at bioRxiv https://doi.org/10.1101/2023.06.08.542671 (2023).
tidyomics. GitHub https://github.com/tidyomics (2024).
Acknowledgements
We acknowledge Bioconductor and tidyverse communities, whose software and coding paradigms this work is based on and would not be possible without. We also thank the tidyomics community for their feedback and contribution. We thank V. Carey for his support and feedback on the project. Also, we thank M. Ritchie for his continuous support and feedback. Human illustrations were created with BioRender.com. S.M. was supported by the Victorian Cancer Agency Early Career Research Fellowship (ECRF21036). M.I.L. was supported by the Chan Zuckerberg Initiative (EOSS3-0000000057). A.T.P. was supported by the National Health and Medical Research Council (NHMRC) Senior Research Fellowship (1116955) and Investigator Grant (2026643). A.T.P., S.M. and W.H. were supported by the Lorenzo and Pamela Galli Medical Research Trust and the Galli Next Generation Discoveries Initiative. K.L.D. is the Anne T. and Robert M. Bass Endowed Faculty Scholar in Pediatric Cancer and Blood Diseases of the Stanford Maternal Child Health Research Institute and the Harriet and Mary Zelencik Endowed Faculty in Children’s Cancer and Blood Diseases. P.-P.A. was supported by the Cancéropole GSO and Intergroupe Français du Myélome. R.G. was funded by a project grant from the Swiss National Foundation. M.M. was supported by the NHGRI and NCI of the National Institutes of Health under award numbers U41HG004059 and U24CA180996. This work was supported by an ASPIRE award from the Mark Foundation for Cancer Research and the B+ Foundation. The research benefited from support from the Victorian State Government Operational Infrastructure Support and Australian Government NHMRC Independent Research Institute Infrastructure Support. The funders had no role in study design, data collection and analysis, or decision to publish or prepare the manuscript.
Author information
Authors and Affiliations
Consortia
Contributions
S.M. proposed the study, and S.M. and M.I.L. designed the study. W.J.H. and S.M. developed the novel tidy adapters for transcriptomics, W.J.H., T.J.K., S.M. and M.I.L. performed the analyses. W.J.H., T.J.K., H.L.C., J.S., C.S., E.S.D., N.S., L.M., B.T., A.A.N., M.K., Q.C., V.Y., W.M., J.-E.P., I.M., M.H.R., P.-P.A., P.P., C.-L.P., M.T., R.G., M.M., S.L., M.L., S.C.H., G.P.N., K.L.D., A.T.P., M.I.L. and S.M. contributed to the ecosystem’s development and ongoing improvement. S.M., M.I.L., A.T.P., K.L.D., S.C.H., M.L., M.M. and R.G. acted as the supervisory team. S.M., M.I.L. and A.T.P. contributed equally and jointly led the study. W.J.H. and T.J.K. contributed equally. All authors contributed to the manuscript’s writing.
Corresponding authors
Ethics declarations
Competing interests
R.G. has received consulting income from Takeda and Sanofi, and declares ownership in Ozette Technologies. M.K. is an employee of and declares ownership in Achilles Therapeutics. The other authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Bo Li and Judith Zaugg for their contribution to the peer review of this work. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Table 1
List of samples used in peripheral blood mononuclear cell analysis.
Source data
Source Data Fig. 2
Source data used to create the benchmarking plot Fig. 2h.
Rights and permissions
About this article
Cite this article
Hutchison, W.J., Keyes, T.J., The tidyomics Consortium. et al. The tidyomics ecosystem: enhancing omic data analyses. Nat Methods 21, 1166–1170 (2024). https://doi.org/10.1038/s41592-024-02299-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-024-02299-2