Contents
Proceedings of SciPy 2023
The 22nd annual SciPy conference was held in Austin, Texas at the AT&T Center, July 10-16, 2023. 25 peer reviewed articles were published in the conference proceedings. Full proceedings, posters and slides, and organizing committee can be found at https://
Data Reduction Network
Data Reduction Network
Multidimensional categorical data is widespread but not easily visualized using standard methods. For example, questionnaire data generally consists of questions with categorical responses. Popular methods of handling categorical data include one-hot encoding and enumeration, which applies an unwarranted and potentially misleading notional order to the data. To address this, we introduce a novel visualization method named Data Reduction Network.
Haoyin Xu, Haw-minn Lu, José Unpingco
https://doi.org/10.25080/gerudo-f2bc6f59-012
libyt: a Tool for Parallel In Situ Analysis with yt
libyt: a Tool for Parallel In Situ Analysis with yt
In the era of exascale computing, storage and analysis of large scale data have become more important and difficult. We present libyt, an open source C++ library, that allows researchers to analyze and visualize data using yt or other Python packages in parallel during simulation runtime.
Shin-Rong Tsai, Hsi-Yu Schive, Matthew J. Turk
https://doi.org/10.25080/gerudo-f2bc6f59-011
Pandera: Going Beyond Pandas Data Validation
Pandera: Going Beyond Pandas Data Validation
Data quality remains a core concern for practitioners in machine learning, data science, and data engineering, and many specialized packages have emerged to fulfill the need of validating and monitoring data and models. This paper outlines pandera’s motivation and challenges that took it from being a pandas-only data validation framework to one that is extensible to other non-pandas-compliant dataframe-like libraries.
Niels Bantilan
https://doi.org/10.25080/gerudo-f2bc6f59-010
aPhyloGeo-Covid: A Web Interface for Reproducible Phylogeographic Analysis of SARS-CoV-2 Variation using Neo4j and Snakemake
aPhyloGeo-Covid: A Web Interface for Reproducible Phylogeographic Analysis of SARS-CoV-2 Variation using Neo4j and Snakemake
The gene sequencing data, along with the associated lineage tracing and research data generated throughout the Coronavirus disease 2019 (COVID-19) pandemic, constitute invaluable resources that profoundly empower phylogeography research. To optimize the utilization of these resources, we have developed an interactive analysis platform called aPhyloGeo-Covid.
Wanlin Li, Nadia Tahiri
https://doi.org/10.25080/gerudo-f2bc6f59-00f
PyQtGraph - High Performance Visualization for All Platforms
PyQtGraph - High Performance Visualization for All Platforms
PyQtGraph is a plotting library with high performance, cross-platform support and interactivity as its primary objectives. These goals are achieved by connecting the Qt GUI framework and the scientific Python ecosystem.
Ognyan Moore, Nathan Jessurun, Martin Chase, +2
https://doi.org/10.25080/gerudo-f2bc6f59-00e
itk-elastix: Medical image registration in Python
itk-elastix: Medical image registration in Python
Image registration plays a vital role in understanding changes that occur in 2D and 3D scientific imaging datasets. In this paper, we introduce itk-elastix, a user-friendly Python wrapping of the mature elastix registration toolbox.
Konstantinos Ntatsis, Niels Dekker, Viktor van der Valk, +5
https://doi.org/10.25080/gerudo-f2bc6f59-00d
Spatial Microsimulation and Activity Allocation in Python: An Update on the Likeness Toolkit
Spatial Microsimulation and Activity Allocation in Python: An Update on the Likeness Toolkit
Understanding human security and social equity issues within human systems requires large-scale models of population dynamics that simulate high-fidelity representations of individuals and access to essential activities (work/school, social, errands, health). Likeness is a Python toolkit that provides spatial microsimulation project.
Joseph V. Tuccillo, James D. Gaboardi
https://doi.org/10.25080/gerudo-f2bc6f59-00c
The Pandata Scalable Open-Source Analysis Stack
The Pandata Scalable Open-Source Analysis Stack
As the scale of scientific data analysis continues to grow, traditional domain-specific tools often struggle with data of increasing size and complexity. We introduce the Pandata open-source software stack as a solution, emphasizing the use of domain-independent tools at critical stages of the data life cycle, without compromising the depth of domain-specific analyses.
James A. Bednar, Martin Durant
https://doi.org/10.25080/gerudo-f2bc6f59-00b
MDAKits: A Framework for FAIR-Compliant Molecular Simulation Analysis
MDAKits: A Framework for FAIR-Compliant Molecular Simulation Analysis
The reproducibility and transparency of scientific findings are widely recognized as crucial for promoting scientific progress. The MDAKits framework provides a cookiecutter template, best practices documentation, and a continually validated registry.
Irfan Alibay, Lily Wang, Fiona Naughton, +4
https://doi.org/10.25080/gerudo-f2bc6f59-00a
Using Blosc2 NDim As A Fast Explorer Of The Milky Way (Or Any Other NDim Dataset)
Using Blosc2 NDim As A Fast Explorer Of The Milky Way (Or Any Other NDim Dataset)
Large multidimensional datasets are widely used in various engineering and scientific applications. We have added support for large dimensional datasets to Blosc2, a compression and format library.
Project Blosc, Francesc Alted, Marta Iborra, +3
https://doi.org/10.25080/gerudo-f2bc6f59-000
Emukit: A Python toolkit for decision making under uncertainty
Emukit: A Python toolkit for decision making under uncertainty
Emukit is a highly flexible Python toolkit for enriching decision making under uncertainty with statistical emulation. It is particularly pertinent to complex processes and simulations where data are scarce or difficult to acquire.
Andrei Paleyes, Maren Mahsereci, Neil D. Lawrence
https://doi.org/10.25080/gerudo-f2bc6f59-009
vak: a neural network framework for researchers studying animal acoustic communication
vak: a neural network framework for researchers studying animal acoustic communication
The study of acoustic communication is being revolutionized by deep neural network models. To address this need, we developed vak, a neural network framework designed for acoustic communication researchers.
David Nicholson, Yarden Cohen
https://doi.org/10.25080/gerudo-f2bc6f59-008
EEG-to-fMRI Neuroimaging Cross Modal Synthesis in Python
EEG-to-fMRI Neuroimaging Cross Modal Synthesis in Python
Electroencepholography and functional magnetic resonance imaging are two ways of recording brain activity. We developed a Python package, EEG-to-fMRI, which provides cross modal neuroimaging synthesis functionalities.
David Calhas
https://doi.org/10.25080/gerudo-f2bc6f59-007
Using Numba for GPU acceleration of Neutron Beamline Digital Twins
Using Numba for GPU acceleration of Neutron Beamline Digital Twins
Digital twins of neutron instruments using Monte Carlo ray tracing have proven to be useful in neutron data analysis and verifying instrument and sample designs. In this paper, we present a GPU accelerated version of MCViNE using Python and Numba to balance user extensibility with performance.
Coleman J. Kendrick, Jiao Y. Y. Lin, Garrett E. Granroth
https://doi.org/10.25080/gerudo-f2bc6f59-006
Bayesian Statistics with Python, No Resampling Necessary
Bayesian Statistics with Python, No Resampling Necessary
TensorFlow Probability is a powerful library for statistical analysis in Python. Using TensorFlow Probability’s implementation of Bayesian methods, modelers can incorporate prior information and obtain parameter estimates and a quantified degree of belief in the results.
Charles Lindsey
https://doi.org/10.25080/gerudo-f2bc6f59-005
Biomolecular Crystallographic Computing with Jupyter
Biomolecular Crystallographic Computing with Jupyter
To further advance this use of Jupyter, we developed a collection of code fragments that use the vast Computational Crystallography Toolbox (cctbx) library for novel analyses. We made versions of this library for use in JupyterLab and Colab.
Blaine H. M. Mooers
https://doi.org/10.25080/gerudo-f2bc6f59-004
An Accessible Python based Author Identification Process
An Accessible Python based Author Identification Process
Author identification also known as ‘author attribution’ and more recently ‘forensic linguistics’ involves identifying true authors of anonymous texts. In this paper we replicate the analysis but in a much more accessible way using modern text mining methods and Python.
Anthony Breitzman
https://doi.org/10.25080/gerudo-f2bc6f59-003
A Modified Strassen Algorithm to Accelerate Numpy Large Matrix Multiplication with Integer Entries
A Modified Strassen Algorithm to Accelerate Numpy Large Matrix Multiplication with Integer Entries
We present a Strassen type algorithm for multiplying large matrices with integer entries. The algorithm is the standard Strassen divide and conquer algorithm but it crosses over to Numpy when either the row or column dimension of one of the matrices drops below 128.
Anthony Breitzman
https://doi.org/10.25080/gerudo-f2bc6f59-002
Python Array API Standard: Toward Array Interoperability in the Scientific Python Ecosystem
Python Array API Standard: Toward Array Interoperability in the Scientific Python Ecosystem
The Python array API standard specifies standardized application programming interfaces and behaviors for array and tensor objects and operations. The establishment and subsequent adoption of the standard aims to reduce ecosystem fragmentation and facilitate array library interoperability.
Aaron Meurer, Athan Reines, Ralf Gommers, +15
https://doi.org/10.25080/gerudo-f2bc6f59-001