Abstract
Large scale experimental science workflows require support for a unified, interactive, real-time platform that can manage a distributed set of resources connected to High Performance Computing (HPC) systems. What is needed is a tool that provides the ease-of-use and interactivity of a web science gateway, while providing the scientist the ability to build custom, ad-hoc workflows in a composable way. The Jupyter platform can play a key role here to enable the ingestion and analysis of real-time streaming data, integrate with HPC resources in a closed-loop, and enable interactive ad-hoc analyses with running workflows.
We want to enable high-quality reproducible human-in-the-loop science using HPC and Jupyter at the National Energy Research Scientific Computing Center (NERSC). Achieving that goal is challenging in the general case because scientific workflows and data can vary significantly in size and type between disciplines. There are many areas of work to achieve highly reproducible science, let alone human-in-the-loop interactive scientific workflows, but we focus here on some basic elements for enabling an improved interactive HPC experience including creating reusable recipes and workflows with Notebooks, sharing and cloning Notebooks, and parallelization and scaling of scientific code requiring HPC and using Jupyter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Clonenotebooks. https://github.com/krinsman/clonenotebooks/
Convert notebooks to other formats. https://nbconvert.readthedocs.io/en/latest/
dask-jobqueue. https://jobqueue.dask.org/en/latest/
Gif quantum k2 system. https://www.gatan.com/products/tem-imaging-spectroscopy/gif-quantum-k2-system
Ipyparallel: Using ipython for parallel computing. https://ipyparallel.readthedocs.io/en/latest/
Jupyter contents api. https://jupyter-notebook.readthedocs.io/en/stable/extending/contents.html
Jupyterlab: Real time collaboration. https://github.com/jupyterlab/jupyterlab/issues/5382
Jupyterlab: The next generation web-based user interface for project jupyter. https://github.com/jupyterlab/jupyterlab
lbnl-2019-resistive-casing notebook. https://github.com/simpeg-research/lbnl-2019-resistive-casing
mlhenderson fork of lbnl-2019-resistive-casing notebook. https://github.com/mlhenderson/lbnl-2019-resistive-casing
nbviewer. https://nbviewer.jupyter.org/
Scalable analytics in python. https://dask.org/
Pangeo (2018). https://pangeo.io/
Bsavitzky, et al.: py4dstem/py4dstem: Doi release, July 2019. https://doi.org/10.5281/zenodo.3333960
Clausen, A., et al.: Libertem/libertem: 0.1.0, November 2018. https://doi.org/10.5281/zenodo.1478763
Das, S., et al.: Observation of room-temperature polar skyrmions. Nature 568(7752), 368 (2019)
Dask: dask-labextension, September 2019. https://github.com/dask/dask-labextension
Farrell, S., et al.: Interactive distributed deep learning with Jupyter notebooks. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 678–687. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02465-9_49
Folk, M., Cheng, A., Yates, K.: HDF5: a file format and I/O library for high performance computing applications. In: Proceedings of Supercomputing, vol. 99, pp. 5–33 (1999)
Guizar-Sicairos, M., Thurman, S.T., Fienup, J.R.: Efficient subpixel image registration algorithms. Opt. Lett. 33(2), 156–158 (2008)
Heagy, L.J., Oldenburg, D.W.: Modeling electromagnetics on cylindrical meshes with applications to steel-cased wells. Comput. Geosci. 125, 115–130 (2019). https://doi.org/10.1016/j.cageo.2018.11.010
Jupyter: jupyter/nbviewer, September 2019. https://github.com/jupyter/nbviewer
Project Jupyter, et al.: Binder 2.0 - reproducible, interactive, sharable environments for science at scale. In: Akici, F., Lippa, D., Niederhut, D., Pacer, M. (eds.) Proceedings of the 17th Python in Science Conference, pp. 113–120 (2018). https://doi.org/10.25080/Majora-4af1f417-011
Kanitpanyacharoen, W., et al.: A comparative study of x-ray tomographic microscopy on shales at different synchrotron facilities: ALS, APS and SLS. J. Synchrotron Radiat. 20(1), 172–180 (2013)
Kluyver, T., et al.: Jupyter notebooks-a publishing format for reproducible computational workflows. In: ELPUB, pp. 87–90 (2016)
Nteract: nteract/papermill, September 2019. https://github.com/nteract/papermill
Oliphant, T.E.: A Guide to NumPy, vol. 1. Trelgol Publishing USA (2006)
Ophus, C.: Four-dimensional scanning transmission electron microscopy (4D-STEM): from scanning nanodiffraction to ptychography and beyond. Microsc. Microanal. 25(3), 563–582 (2019)
Panova, O., et al.: Diffraction imaging of nanocrystalline structures in organic semiconductor molecular thin films. Nat. Mater. 18, 860–865 (2019). https://doi.org/10.1038/s41563-019-0387-3
Pekin, T.C., et al.: Direct measurement of nanostructural change during in situ deformation of a bulk metallic glass. Nat. Commun. 10(1), 2445 (2019)
Soummer, R., Pueyo, L., Sivaramakrishnan, A., Vanderbei, R.J.: Fast computation of Lyot-style coronagraph propagation. Opt. Express 15(24), 15935–15951 (2007)
Van Der Walt, S., Colbert, S.C., Varoquaux, G.: The NumPy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13(2), 22 (2011)
Acknowledgements
This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231.
We wish to thank the Jupyter team; Colin Ophus, Benjamin Savitzky, and Steven Zeltmann at NCEM; and Dilworth Parkinson at ALS Beamline 8.3.2. We would also like to thank Lindsey Heagy for the geoscience Notebook example.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply
About this paper
Cite this paper
Henderson, M.L., Krinsman, W., Cholia, S., Thomas, R., Slaton, T. (2020). Accelerating Experimental Science Using Jupyter and NERSC HPC. In: Juckeland, G., Chandrasekaran, S. (eds) Tools and Techniques for High Performance Computing. HUST SE-HER WIHPC 2019 2019 2019. Communications in Computer and Information Science, vol 1190. Springer, Cham. https://doi.org/10.1007/978-3-030-44728-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-44728-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44727-4
Online ISBN: 978-3-030-44728-1
eBook Packages: Computer ScienceComputer Science (R0)