youtube image
From YouTube: Nicholas Schrock: Dagster - An open source Python library for building data applications at Crunch

Description

👉 Learn more about the talk and download the slides at https://crunchconf.com/speaker/NicholasSchrock#talks
📬 Sign up to our newsletter so you won't miss the updates about the next Crunch Data Conference: http://eepurl.com/dGwi1f

We introduce Dagster, an open source Python library for building ETL processes, ML pipelines, and similar software systems, all of which we call data applications.

Data applications are graphs of functional computations that consume and produce data assets. Dagster provides abstractions and tools for modeling the semantics of these applications by providing a unified type system, a data dependency graph, a configuration system, a structured API for emitting events such as data quality tests and materializations, and high-quality developer tools built on those abstractions. Builders can use the tools they know -- e.g. Spark jobs for data engineers, SQL statements for analysts, Python for data scientists -- and the application can be deployed to arbitrary orchestration engines -- such as Airflow, Dask, or Kubernetes-based execution -- in a pluggable fashion. - Captured Live on Ustream at https://www.ustream.tv/channel/JUMjvCF2ucj