youtube image
From YouTube: 2021-04-16 - Zachary Ulissi, Larry Zitnick - The Open Catalyst 2020 Dataset & Community Challenges


NERSC Data Seminars Series:

Title: The Open Catalyst 2020 (OC20) Dataset and Community Challenges

Zachary Ulissi, Assistant Professor of Chemical Engineering at Carnegie Mellon University
Larry Zitnick, Research Scientist at Facebook AI Research

The Open Catalyst Project aims to develop new ML methods and models to accelerate the catalyst simulation process for renewable energy technologies and improve our ability to predict activity/selectivity across catalyst composition. To achieve that in the short term we need participation from the ML community in solving key challenges in catalysis. One path to interaction is the development of grand challenge datasets that are representative of common challenges in catalysis, large enough to excite the ML community, and large enough to take advantage of and encourage advances in deep learning models. Similar datasets have had a large impact in small molecule drug discovery, organic photovoltaics, and inorganic crystal structure prediction. We present the first open dataset from this effort on thermochemical intermediates across stable multi-metallic and p-block doped surfaces. This dataset includes full-accuracy DFT calculations across 53 elements and their binary/ternary materials, various low-index facets. Adsorbates span 56 common reaction intermediates with relevance to carbon, oxygen, and nitrogen thermal and electrochemical reactions. Off-equilibrium structures are also generated and included to aid in machine learning force field design and fitting. Collectively, this dataset represents the largest systematic dataset that bridges organic and inorganic chemistry and will enable a new generation of catalyst structure/property relationships. Fixed train/test splits that represent common chemical challenges and an open challenge website will be discussed to encourage competition and buy-in from the ML community.

Zachary Ulissi is an Assistant Professor of Chemical Engineering at Carnegie Mellon University. He works on the development and application of high-throughput computational methods in catalysis, machine learning models to predict their properties, and active learning methods to guide these systems. Applications include energy materials, CO2 utilization, fuel cell development, and additive manufacturing. He has been recognized nationally for his work including the 3M Non-Tenured Faculty Award and the AIChE 35-under-35 award among others.

Larry Zitnick is a research scientist at Facebook AI Research in Menlo Park. His current areas of interest include scientific applications of AI, language and vision, and object recognition. He serves on the board of the Common Visual Data Foundation whose mission is to aid the computer vision community in creating datasets and competitions. Previously, he spent 12 great years at Microsoft Research, and obtained a PhD in Robotics from CMU's Robotics Institute.

Host of Seminar:
Brandon Wood
Data & Analytics Group
National Energy Research Scientific Computing Center (NERSC)
Lawrence Berkeley National Laboratory