youtube image
From YouTube: 2021-03-09 - Shane Snyder - Darshan: Enabling IO Understanding in an Evolving HPC Landscape

Description

NERSC Data Seminars Series: https://github.com/NERSC/data-seminars

Title: Darshan: Enabling Application IO Understanding in an Evolving HPC Landscape

Speaker:
Shane Snyder (Argonne National Laboratory)

Abstract:
Darshan is a lightweight, application I/O characterization tool that captures detailed statistics describing an application's I/O workload. Installed and enabled by default at many production HPC facilities (including at NERSC), Darshan has become an invaluable tool for users, system admins, and I/O researchers to investigate and tune the I/O behavior of applications. While the initial focus of Darshan was on instrumenting file-based APIs (e.g., POSIX, MPI-IO) for MPI applications, much recent work has focused on extending Darshan to new contexts that are increasingly relevant in the HPC community, including object-based storage APIs (e.g., DAOS) and non-MPI computational frameworks (e.g., Spark, TensorFlow). In this seminar, we describe how users can leverage Darshan to better understand the I/O behavior of their applications. We provide details on how users can produce Darshan instrumentation data for their applications and how to further analyze this data, focusing specifically on the Cori system at NERSC. New and upcoming features are covered that aim to extend Darshan to exciting I/O instrumentation contexts for HPC, including instrumentation modules for HDF5 and DAOS libraries, as well as support for instrumenting non-MPI applications and frameworks. We further walk through a couple of Darshan log analysis examples to help illustrate the types of I/O insights that can be attained using Darshan log data and analysis tools.


Bio:
Shane Snyder is a software engineer in the Mathematics and Computer Science Division of Argonne National Laboratory. He received his master's degree in computer engineering from Clemson University in 2013. His research interests primarily include the design of high-performance distributed storage systems and the characterization and analysis of I/O workloads on production HPC systems.