youtube image
From YouTube: 2022-09-13 - Karan Vahi - Pegasus Workflow Management System

Description

NERSC Data Seminars Series: https://github.com/NERSC/data-seminars

Title:
Pegasus Workflow Management System

Speaker:
Karan Vahi, Information Sciences Institute, University of Southern California

Abstract:
Workflows are a key technology for enabling complex scientific computations. They capture the interdependencies between processing steps in data analysis and simulation pipelines as well as the mechanisms to execute those steps reliably and efficiently. Workflows can capture complex processes to promote sharing and reuse, and also provide provenance information necessary for the verification of scientific results and scientific reproducibility. Pegasus (https://pegasus.isi.edu) is being used in a number of scientific domains doing production grade science. In 2016 the LIGO gravitational wave experiment used Pegasus to analyze instrumental data and confirm the first detection of a gravitational wave. The Southern California Earthquake Center (SCEC) based at USC, uses a Pegasus managed workflow infrastructure called Cybershake to generate hazard maps for the Southern California region. In 2021, SCEC conducted a CyberShake study on DOE systems Summit that used a simulation-based ERF for the first time. Overall, the study required 65,470 node-hours (358,000 GPU-hours and 243,000 CPU-hours ) of computation with Pegasus submitting tens of thousands of remote jobs automatically, and managed 165 TB of data over the 29-day study. Pegasus is also being used in astronomy, bioinformatics, civil engineering, climate modeling, earthquake science, molecular dynamics and other complex analyses. Pegasus users express their workflows using an abstract representation devoid of resource- specific information. Pegasus plans these abstract workflows by mapping tasks to available resources, augmenting the workflow with data management tasks, and optimizing the workflow by grouping small tasks into more efficient clustered batch jobs. Pegasus then executes this plan. If an error occurs at runtime, Pegasus automatically retries the failed task and provides checkpointing in case the workflow cannot continue. Pegasus can record provenance about the data, software and hardware used. Pegasus has a foundation for managing workflows in different environments, using workflow engines that are customized for a particular workload and system. Pegasus has a well defined support for major container technologies such as Docker, Singulartiy, Shifter that allows users to have the jobs in their workflow use containers of their choice. Pegasus most recent major release Pegasus 5.0 is a major improvement over previous releases. Pegasus 5.0 provides a brand new Python3 workflow API developed from the ground up so that, in addition to generating the abstract workflow and all the catalogs, it now allows you to plan, submit, monitor, analyze and generate statistics of your workflow.

Bio:
Karan Vahi is a Senior Computer Scientist in the Science Automation Technologies group at the USC Information Sciences Institute. He has been working in the field of scientific workflows since 2002, and has been closely involved in the development of the Pegasus Workflow Management System. He is currently the architect/lead developer for Pegasus and in charge of the core development of Pegasus. His work on implementing integrity checking in Pegasus for scientific workflows won the best paper and the Phil Andrews Most Transformative Research Award at PEARC19. He currently leads the Cloud Platforms group at CI Compass, a NSF CI Center, which includes CI practitioners from various NFS Major facilities(MF’s) and aims to understand the current practices for Cloud Infrastructure used by MFs and research alternative solutions. https://www.isi.edu/directory/vahi/

Host of Seminar:
Hai Ah Nam, Advanced Technologies Group
National Energy Research Scientific Computing Center (NERSC)
Lawrence Berkeley National Laboratory