youtube image
From YouTube: 2021-11-02 - Hariharan Devarajan - Characterizing I/O of large scale scientific Deep Learning apps

Description

NERSC Data Seminars Series: https://github.com/NERSC/data-seminars

Title:
Characterizing I/O behavior of large-scale scientific deep learning applications

Speaker:
Hariharan Devarajan, Lawrence Livermore National Laboratory

Abstract:
Deep learning has been shown as a successful method for various tasks, and its popularity results in numerous open-source deep learning software tools. Deep learning has been applied to a broad spectrum of scientific domains such as cosmology, particle physics, computer vision, fusion, and astrophysics. Scientists have performed a great deal of work to optimize the computational performance of deep learning frameworks. However, the same cannot be said for I/O performance. As deep learning algorithms rely on big-data volume and variety to effectively train neural networks accurately, I/O is a significant bottleneck on large-scale distributed deep learning training. In this talk, I will share our experiences of running large-scale DL applications on Theta supercomputer with a detailed investigation of the I/O behavior of various scientific deep learning workloads. Additionally, I will showcase our DLIO Benchmark, which accurately represents the class of applications previously characterized to foster I/O research in these classes of applications. I will share some key results and insights we discovered in modern scientific DL applications including, access patterns, integration with scientific data formats, and their I/O scalability in production supercomputers. Finally, I would highlight key pain points in doing I/O characterization of DL applications and discuss some research directions to improve these aspects.

Bio:
Hariharan Devarajan is a Postdoctoral researcher at Lawrence Livermore National Laboratory. He received his Ph.D. in Computer Science at Illinois Institute of Technology, advised by Dr. Xian-He Sun. His research is focused on accurate I/O characterization of distributed applications and building highly configurable storage systems on large-scale distributed systems. He has worked on I/O optimizations in several domains such as scientific simulations, AI, and Big Data Analytics and specializes in designing solutions for hierarchical storage environments. He is the recipient of the best paper awards at HPDC and CCGrid.

Host of Seminar:
Suren Byna
Computational Research Division
Lawrence Berkeley National Laboratory