youtube image
From YouTube: Jupyter demo video

Description

Presented by Matthew Henderson (with Shreyas Cholia and Rollin Thomas)
June 3, 2020

Large scale "Superfacility" type experimental science workflows require support for a unified, interactive, real-time platform that can manage a distributed set of resources connected to High Performance Computing (HPC) systems. Here we demonstrate how the Jupyter platform plays a key role in this space - it provides the ease-of-use and interactivity of a web science gateway while providing scientists the ability to build custom, ad-hoc workflows in a composable way. Using real-world use cases from the National Center for Electron Microscopy (NCEM) we show how Jupyter facilitates interactive analysis of data at scale on NERSC HPC resources.

Jupyter Notebooks combine live executable code cells, with inline documentation and embedded interactive visualizations. This allows us to capture an experiment in a fully contained executable Notebook that is self-documenting and incorporates live rendering of outputs and results as they are generated. The Notebook format lends itself to a highly modular and composable workflow, where individual steps and parameters can be adjusted on the fly. Additionally, the Jupyter platform can support custom applications and extensions that live alongside the core Notebook interface.

We will use real world science examples to show how we create an improved interactive HPC experience in Jupyter including:
- Improvements to the NERSC JupyterHub Deployment
- Scaling up code in a Jupyter notebook to run on HPC resources through the use of parallel task execution frameworks
- Demonstrating the use of the Dask task framework as a backend to manage workers from Jupyter
- Enabling project-wide workflows and collaboration through sharing and cloning Notebooks, and their associated software environments
We will also discuss related projects and potential future directions.