[go: up one dir, main page]

Skip to main content

Accelerating Experimental Science Using Jupyter and NERSC HPC

  • Conference paper
  • First Online:
Tools and Techniques for High Performance Computing (HUST 2019, SE-HER 2019, WIHPC 2019)

Abstract

Large scale experimental science workflows require support for a unified, interactive, real-time platform that can manage a distributed set of resources connected to High Performance Computing (HPC) systems. What is needed is a tool that provides the ease-of-use and interactivity of a web science gateway, while providing the scientist the ability to build custom, ad-hoc workflows in a composable way. The Jupyter platform can play a key role here to enable the ingestion and analysis of real-time streaming data, integrate with HPC resources in a closed-loop, and enable interactive ad-hoc analyses with running workflows.

We want to enable high-quality reproducible human-in-the-loop science using HPC and Jupyter at the National Energy Research Scientific Computing Center (NERSC). Achieving that goal is challenging in the general case because scientific workflows and data can vary significantly in size and type between disciplines. There are many areas of work to achieve highly reproducible science, let alone human-in-the-loop interactive scientific workflows, but we focus here on some basic elements for enabling an improved interactive HPC experience including creating reusable recipes and workflows with Notebooks, sharing and cloning Notebooks, and parallelization and scaling of scientific code requiring HPC and using Jupyter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Clonenotebooks. https://github.com/krinsman/clonenotebooks/

  2. Conda. https://docs.conda.io/projects/conda/en/latest/

  3. Convert notebooks to other formats. https://nbconvert.readthedocs.io/en/latest/

  4. dask-jobqueue. https://jobqueue.dask.org/en/latest/

  5. Gif quantum k2 system. https://www.gatan.com/products/tem-imaging-spectroscopy/gif-quantum-k2-system

  6. Ipyparallel: Using ipython for parallel computing. https://ipyparallel.readthedocs.io/en/latest/

  7. Jupyter contents api. https://jupyter-notebook.readthedocs.io/en/stable/extending/contents.html

  8. Jupyterlab: Real time collaboration. https://github.com/jupyterlab/jupyterlab/issues/5382

  9. Jupyterlab: The next generation web-based user interface for project jupyter. https://github.com/jupyterlab/jupyterlab

  10. lbnl-2019-resistive-casing notebook. https://github.com/simpeg-research/lbnl-2019-resistive-casing

  11. mlhenderson fork of lbnl-2019-resistive-casing notebook. https://github.com/mlhenderson/lbnl-2019-resistive-casing

  12. nbviewer. https://nbviewer.jupyter.org/

  13. Scalable analytics in python. https://dask.org/

  14. Spin. https://www.nersc.gov/users/data-analytics/spin/

  15. Pangeo (2018). https://pangeo.io/

  16. Bsavitzky, et al.: py4dstem/py4dstem: Doi release, July 2019. https://doi.org/10.5281/zenodo.3333960

  17. Clausen, A., et al.: Libertem/libertem: 0.1.0, November 2018. https://doi.org/10.5281/zenodo.1478763

  18. Das, S., et al.: Observation of room-temperature polar skyrmions. Nature 568(7752), 368 (2019)

    Article  Google Scholar 

  19. Dask: dask-labextension, September 2019. https://github.com/dask/dask-labextension

  20. Farrell, S., et al.: Interactive distributed deep learning with Jupyter notebooks. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 678–687. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02465-9_49

    Chapter  Google Scholar 

  21. Folk, M., Cheng, A., Yates, K.: HDF5: a file format and I/O library for high performance computing applications. In: Proceedings of Supercomputing, vol. 99, pp. 5–33 (1999)

    Google Scholar 

  22. Guizar-Sicairos, M., Thurman, S.T., Fienup, J.R.: Efficient subpixel image registration algorithms. Opt. Lett. 33(2), 156–158 (2008)

    Article  Google Scholar 

  23. Heagy, L.J., Oldenburg, D.W.: Modeling electromagnetics on cylindrical meshes with applications to steel-cased wells. Comput. Geosci. 125, 115–130 (2019). https://doi.org/10.1016/j.cageo.2018.11.010

    Article  Google Scholar 

  24. Jupyter: jupyter/nbviewer, September 2019. https://github.com/jupyter/nbviewer

  25. Project Jupyter, et al.: Binder 2.0 - reproducible, interactive, sharable environments for science at scale. In: Akici, F., Lippa, D., Niederhut, D., Pacer, M. (eds.) Proceedings of the 17th Python in Science Conference, pp. 113–120 (2018). https://doi.org/10.25080/Majora-4af1f417-011

  26. Kanitpanyacharoen, W., et al.: A comparative study of x-ray tomographic microscopy on shales at different synchrotron facilities: ALS, APS and SLS. J. Synchrotron Radiat. 20(1), 172–180 (2013)

    Article  Google Scholar 

  27. Kluyver, T., et al.: Jupyter notebooks-a publishing format for reproducible computational workflows. In: ELPUB, pp. 87–90 (2016)

    Google Scholar 

  28. Nteract: nteract/papermill, September 2019. https://github.com/nteract/papermill

  29. Oliphant, T.E.: A Guide to NumPy, vol. 1. Trelgol Publishing USA (2006)

    Google Scholar 

  30. Ophus, C.: Four-dimensional scanning transmission electron microscopy (4D-STEM): from scanning nanodiffraction to ptychography and beyond. Microsc. Microanal. 25(3), 563–582 (2019)

    Article  Google Scholar 

  31. Panova, O., et al.: Diffraction imaging of nanocrystalline structures in organic semiconductor molecular thin films. Nat. Mater. 18, 860–865 (2019). https://doi.org/10.1038/s41563-019-0387-3

    Article  Google Scholar 

  32. Pekin, T.C., et al.: Direct measurement of nanostructural change during in situ deformation of a bulk metallic glass. Nat. Commun. 10(1), 2445 (2019)

    Article  MathSciNet  Google Scholar 

  33. Soummer, R., Pueyo, L., Sivaramakrishnan, A., Vanderbei, R.J.: Fast computation of Lyot-style coronagraph propagation. Opt. Express 15(24), 15935–15951 (2007)

    Article  Google Scholar 

  34. Van Der Walt, S., Colbert, S.C., Varoquaux, G.: The NumPy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13(2), 22 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231.

We wish to thank the Jupyter team; Colin Ophus, Benjamin Savitzky, and Steven Zeltmann at NCEM; and Dilworth Parkinson at ALS Beamline 8.3.2. We would also like to thank Lindsey Heagy for the geoscience Notebook example.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew L. Henderson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Henderson, M.L., Krinsman, W., Cholia, S., Thomas, R., Slaton, T. (2020). Accelerating Experimental Science Using Jupyter and NERSC HPC. In: Juckeland, G., Chandrasekaran, S. (eds) Tools and Techniques for High Performance Computing. HUST SE-HER WIHPC 2019 2019 2019. Communications in Computer and Information Science, vol 1190. Springer, Cham. https://doi.org/10.1007/978-3-030-44728-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-44728-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-44727-4

  • Online ISBN: 978-3-030-44728-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics