Dagster Dagster Talks, Panels, & Interviews Open Meetings

6 Jul 2023

On June 8th of this year, Sandy Ryza, lead engineer on the Dagster project gave a presentation at the DATA + AI Summit in San Francisco. The talk was entitled "The Future of Data Orchestration: Asset-Based Orchestration".

We are happy to share the key points of the talk in the video below.

Sandy's thesis: Data orchestration is a core component for any batch data processing platform and we’ve been using patterns that haven't changed since the 1980s. Sandy introduces a new pattern and way of thinking for data orchestration known as asset-based orchestration, with data freshness sensors to trigger pipelines.

1 participant
15 minutes

pipelines

pipeline

workflow

workflows

data

process

inputs

flow

debugging

observability

24 May 2023

In this video we will revisit dagster. We will talk about changes to this workflow orchestration system due to recent updates (update from version 0.15 to 1.3.1)
Dagster is an orchestrator that's designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports.
We will cover Software Defined Assets as dagster is pushing towards the Software Defined Assets. By default our pipeline outputs are stored as a pickle file in the dagster home folder. What if we want to store the outputs in a database table, or in a readable file such as a csv or parquet file. Dagster provides us with Input and Output managers (IO managers) that enable reading and writing data to storage systems. Using Store IO managers we can save the outputs in a file system or store our data as tables in a database. We will define file csv/parquet and database IO Managers.

Link to previous video: https://www.youtube.com/watch?v=t8QADtYdWEI&t
Link to GitHub repo: https://github.com/hnawaz007/pythondataanalysis/tree/main/dagster-project/etl

Get started with Dagster in just three quick steps:
Install Dagster, Define assets and Materialize assets.

Create a virtual environment: python -m venv env
Activate the virtual environment: env\Scripts\activate

To install Dagster into an existing Python environment, run:
pip install dagster dagit

Command to create a new project
dagster project scaffold --name my-dagster-project

Additional libraries required: Pandas, psycopg2

To run dagster issue following command:
dagit
dagster-daemon run

Access Dagit UI on port 3000: http://127.0.0.1:3000

💥Subscribe to our channel:
https://www.youtube.com/c/HaqNawaz

📌 Links
-----------------------------------------
#️⃣ Follow me on social media! #️⃣

🔗 GitHub: https://github.com/hnawaz007
📸 Instagram: https://www.instagram.com/bi_insights_inc
📝 LinkedIn: https://www.linkedin.com/in/haq-nawaz/
🔗 https://medium.com/@hnawaz100

-----------------------------------------

#Python #ETL #Dagster

Topics covered in this video:
==================================
0:00 - Introduction to Dagster
2:11 - Dagster create new project
3:03 - Dagster Project Structure
4:18 - Software Defined Assets
5:35 - Install Required Libraries
5:58 - Source DB Connection
6:27 - Source Asset
10:05 - File IO Manager
14:16 - Second Asset
16:19 - Parquet IO Manager
16:26 - Database IO Manager
19:05 - Materialize Assets

1 participant
21 minutes

daxter

dagster

workflow

project

implemented

stored

dexter

orchestrator

dbio

updates

8 Feb 2023

Many data engineers are looking to get past the limitations of Apache Airflow, the incumbent in the data orchestration layer. Dagster proposes a new paradigm centered on Data Assets and the tools to support a full development lifecycle that radically boosts the productivity of data engineering teams.

1 participant
7 minutes

pipelines

workflow

workflows

pipeline

airflows

debugging

dagster

models

cumbersome

development

8 Feb 2023

INFOSTRUX is a Dagster technology partner. They help organizations select and implement the best possible technology to meet their business goals. In this video, Nasko Grozdanov (Manager, Engineering) explains the business benefits of using Dagster over Airflow.

1 participant
5 minutes

workflows

daxter

automation

advantageous

quicker

sophisticated

collaborate

dexter

deployments

plane

14 Apr 2022

ABOUT THE TALK

This talk discusses software-defined assets, an approach to orchestration and data management that makes it drastically easier to trust and evolve data assets, like tables and ML models.

In traditional data platforms, code and data are only loosely coupled. As a consequence, deploying changes to data feels dangerous, backfills are error-prone and irreversible, and it’s difficult to trust data, because you don’t know where it comes from or how it’s intended to be maintained. Each time you run a job that mutates a data asset, you add a new variable to account for when debugging problems.

Dagster proposes an alternative approach to data management that tightly couples data assets to code - each table or ML model corresponds to the function that’s responsible for generating it. This results in a “Data as Code” approach that mimics the “Infrastructure as Code” approach that’s central to modern DevOps. Your git repo becomes your source of truth on your data, so pushing data changes feels as safe as pushing code changes. Backfills become easy to reason about. You trust your data assets because you know how they’re computed and can reproduce them at any time. The role of the orchestrator is to ensure that physical assets in the data warehouse match the logical assets that are defined in code, so each job run is a step towards order.

Software-defined assets is a natural approach to orchestration for the modern data stack, in part because dbt models are a kind of software-defined asset.

Attendees of this session will learn what it looks like to build and maintain a warehouse or data lake of software-defined assets with Dagster.

ABOUT THE SPEAKER

Sandy is a software engineer at Elementl, building Dagster. Prior, he led machine learning and data science teams at KeepTruckin and Clover Health. He's a committer on Spark and Hadoop, and co-authored O'Reilly's Advanced Analytics with Spark.

ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.

FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai/
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520

1 participant
27 minutes

devops

evolving

workflow

frameworks

dag

infrastructure

dbt

terraform

data

kubernetes

13 Apr 2022

As organizations grow so does the complexity and number of tools. In small data teams it is not uncommon to be responsible for more systems than there are team members. Without a strong orchestration layer to hold these systems together, many workflows are held together with one-off jobs and shaky deployments.

Dagster is a modern framework for defining jobs that treats the data platform as an application. Jobs can be parameterized and iterated on locally before moving into the cloud and interacting with production services. Using Dagster as the foundation for a modern data platform ensures that all data stakeholders can easily interact with their data and build out specific pipelines to maximize value without having to worry about underlying infrastructure.

Presenter: Dennis Hume @Dutchie

1 participant
32 minutes

dagstr

dagster

dagsta

dag

dagstar

orchestration

data

daemon

setups

infrastructure

4 Mar 2022

The last few weeks have featured a loud debate in the data community - should the data stack be bundled, unbundled, or rebundled? We're going to dive into this debate with Nick Schrock (Elementl) and Scott Breitenother (Brooklyn Data Co). This should be full of fireworks, so tune in!

Streamed live on YouTube and LinkedIn

4 participants
1:01 hours

bundling

rebundling

airflow

fuzzing

dang

federated

cloud

orchestrating

providers

increasingly

1 Dec 2021

The modern data stack is often defined by the type of technologies that exist within it. Cloud-based, open source, low/no code tools, ELT, and reverse ETL. But surely there’s more to it… isn’t there?

What holds the modern data stack together and makes it the architecture of choice for so many data-driven enterprises? Join Tim, Juan and special guest, Nick Schrock, founder of Elemental and creator of Dagster and GraphQL, to chat about all things MDS.

3 participants
60 minutes

congratulations

conversation

having

thanks

cocktails

great

fellow

guests

announcing

busy

19 Nov 2021

Tune in to MAD Data Podcast to hear from data engineers, leaders of data teams, and everyone who has a stake in their data quality. Be sure to check out our upcoming guest lineup at: https://databand.ai/mad-data-podcast/
. . .
Nick Schrock, Founder & CEO of Elementl, and Scott Breitenother, Founder of Brooklyn Data Co., discuss the evolution of data from Big Data to Big Complexity – what's next now that the data industry has solved the problem of data storage? While the modern data stack has become embraced as every data team's "must-have" to address 'modern data problems,' Nick and Scott muse on the struggles that continue to plague data teams and the next wave of potential in data infrastructure innovation. With one problem solved, a new era of possibility and complexity is now unleashed.
. . .
Listen to MAD Data on -

Simplecast: https://mad-data.simplecast.com/

Google Podcast: https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5zaW1wbGVjYXN0LmNvbS9OU3FITE5kZQ?sa=X&ved=0CAMQ4aUDahcKEwjA-qzZoKX0AhUAAAAAHQAAAAAQAQ&hl=en

Amazon Music: https://music.amazon.com/podcasts/2969426c-15f3-49e9-a54a-13edcf6e6768/mad-data-a-data-quality-podcast-by-databand-ai

Spotify: https://open.spotify.com/show/3V0dhexP4cSYIRBlKLf9KV
. . .
View our upcoming guest lineup at: https://databand.ai/mad-data-podcast/
. . .
#dataquality #dataops #datapipelines #dataobservability #datareliability #dataengineering

4 participants
52 minutes

data

harper

scott

personally

mike

come

consultancy

interviewed

nascent

dagsar

1 Nov 2021

Nick will cover the principles and origin of Dagster. Dagster is a new type of workflow engine: a data orchestrator. Moving beyond just managing the ordering and physical execution of data computations, Dagster considers the entire data application lifecycle. Practitioners in Dagster build data-aware dependency graphs designed for local development and testing; deploy those graphs to a multi-tenant, cloud-native orchestration engine; and then monitor and observe the data assets produced by those computations.

This talk will also cover our new major release, which makes significant changes to our core API that dramatically improve usability and ergonomics.

3 participants
51 minutes

dexter

cisco

meetup

presentation

interact

insights

dashboard

hosted

workflow

cloud

21 Jun 2021

Nick Schrock, Founder and CEO of Elementl, spoke with FirstMark's Matt Turck in a virtual fireside chat at Data Driven NYC in June 2021. They spoke about Dagster, open source, and much more.

Data Driven NYC is a monthly event covering Big Data and data-driven products and startups, hosted by Matt Turck, partner at FirstMark Capital.

2 participants
28 minutes

workflow

technologies

data

monitoring

important

exploratory

elemental

discussion

integrate

daxter

17 Jun 2021

Nick Schrock covers the principles and origin of Dagster. Dagster is a new type of workflow engine: a data orchestrator. Moving beyond just managing the ordering and physical execution of data computations, Dagster considers the entire data application lifecycle. Practitioners in Dagster build data-aware dependency graphs designed for local development and testing; deploy those graphs to a multi-tenant, cloud-native orchestration engine; and then monitor and observe the data assets produced by those computations.

In this talk, Nick discusses how Dagster differentiates itself across the three stages (dev & test, deploy & execute, monitor & observer) of the application lifecycle. Through a demo and code snippets, the talk shows how the Dagit web UI and Dagster programming model can power a variety of data practitioners.

Related video:
Data Driven NYC talk on Fundamentals of Data Engineering (https://youtu.be/mPSzL8Lurs0)

#data #dataengineering #airflow #prefect #cloudnative #cloudcomputing #cloud #fundamentalsofdataengineering
-------------------------
About Nick Schrock

Nick Schrock is the founder and CEO of Elementl, the company behind Dagster. Previously, Nick worked at Facebook, where he co-created GraphQL. Nick believes deeply in the power of well-designed developer tools to make engineers more productive, accelerate their careers, make their lives more enjoyable, and transform the organizations in which they work.

7 participants
1:10 hours

workflow

functionality

infrastructure

technologies

api

data

orchestrators

stakeholders

dag

commoditization

26 May 2021

This talk was given at one of four live meetups as part of Data + AI Summit 2021 on May 26, 2021.

Title: Introduction, principles and origin of Dagster by Nick Schrock

Abstract: Nick will cover the principles and origin of Dagster (https://dagster.io/). Dagster is a new type of workflow engine: a data orchestrator. Moving beyond just managing the ordering & physical execution of data computations, Dagster considers the entire data application lifecycle. Practitioners in Dagster build data-aware dependency graphs designed for local development and testing; deploy those graphs to multi-tenant, cloud-native orchestration engine; and then monitor and observe the data assets produced by those computations.

In this talk, Nick will cover how Dagster differentiates itself across the three stages (dev & test, deploy & execute, monitor & observer) of the application lifecycle. Through a demo and code snippets, the talk aims to show how the Dagit web UI and Dagster programming model can power a variety of data practitioners.

Speaker: Nick Schrock is the founder and CEO of Elementl, the company behind Dagster. Previously, Nick worked at Facebook, where he co-created GraphQL. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner

2 participants
30 minutes

orchestrating

orchestrator

infrastructure

workflow

data

complexity

api

organization

stakeholders

operationally

19 May 2021

Eric Anderson interviews Nick Schrock about Dagster, the open-source data orchestrator for machine learning, analytics, and ETL. Nick is the founder and CEO of Elementl, and is well-known for creating the Project Infrastructure group at Facebook, which spawned GraphQL and React. On today’s episode of Contributor, Nick explains how he set out to fix an inefficiency he identified amongst the complexity of the data infrastructure domain.

Find show notes and previous episodes at https://www.contributor.fyi/

2 participants
40 minutes

daxter

dagster

dax

modern

developer

overseeing

understanding

schrock

enterprise

warehousing

18 Apr 2021

Building ML Pipelines with Dagster: The role of the orchestrator in machine learning
Dagster is an orchestrator that puts data at the center. While orchestrators typically focus on sequencing computations in production, Dagster brings orchestration to the entire ML development lifecycle
Slides: https://yadi.sk/i/DuZkASxS7ZbvDA
Sandy Ryza, Software Engineer @ Elementl, working on Dagster

All talks & tutorials from Machine Learning REPA Week 2021:
- All talks at Track 1 - Machine Learning Product and Team Management: https://youtube.com/playlist?list=PLlxErbAvYYLBa2uQkROxn4OZJvGQACn8i
- All talks at Track 2 - ML pipelines automation. Code and Data version control. Reproducibility. MLOps: https://youtube.com/playlist?list=PLlxErbAvYYLDRP6cHtVP76f2g5Yoh6c5R

Links:
- Learn more about ML REPA: https://mlrepa.com/
- Learn more about LeanDS: https://leands.ru/
- Learn more about DataTalks.Club: https://datatalks.club/
- Machine Learning REPA Week 2021 Online Conference: https://mlrepa.com/mlrepa-week-2021

Join us:
DataTalks.Club #mlrepa - https://datatalks-club.slack.com/archives/C01Q6698JTV
Slack ODS.ai #mlrepa -https://opendatascience.slack.com/archives/C019A9H5V0X

#mlrepa #machinelearning #datascience #reproducibility #mlops #artificialintelligence #ai #python #deeplearning #technology #programming #coding #bigdata #computerscience #data #dataanalytics #tech #datascientist #pythonprogramming #ml #developer #software #robotics #innovation #coder #datavisualization #analytics #neuralnetworks #leands #automation

2 participants
26 minutes

dagstr

modeling

data

sophisticated

flows

managed

dbt

process

debugging

visualize

14 Dec 2020

dbt defined an entire new subspecialty of software engineering: Analytics Engineering. But it is one discipline among many: analytics engineers must collaborate with data scientists, data engineers, and data platform engineers to deliver a cohesive data platform. In this video, Nick Schrock of Elementl talks about how orchestrating dbt with Dagster allows you to place dbt in context, de-silo your operational systems, improve monitoring, and enable self-service operations.

1 participant
25 minutes

dbt

intuitive

discussion

daxter

dag

users

introduce

contexts

personas

models

24 Nov 2020

Max Gasner, Co-author of Dagster – data orchestration framework for ETL –talks about principles of building reliable data applications with Dagster. Sign up for the next Data Quality Meetup: https://bit.ly/3yiUH2H
Join our Meetup Group: https://www.meetup.com/data-quality-meetup/

1 participant
8 minutes

data

apps

developer

dbt

orchestrator

dagster

debug

robust

monitoring

execution

9 Sep 2020

ABOUT THE TALK (https://www.datacouncil.ai/talks/dagster-workflows-for-data-science-machine-learning-and-data-engineering)

Dagster is an open-source, Python library for building data applications: ML, Analytics, ETL, and more. We will describe how Dagster can enable a unified stack of tooling across a wide dynamic range of use cases: from a laptop to a Kubernetes cluster; from local development, to monitoring in production; from data science, to data engineering, to machine learning.

ABOUT THE SPEAKER

Nick Schrock is the founder and CEO of Elementl, a company aiming to reshape the data management ecosystem, and the creator of Dagster, a new programming model for data processing. Previously, Nick was a Principal Engineer and Director of Engineering at Facebook. In that time, Nick co-created GraphQL, and led its implementation and adoption across the entire organization and product line. He also formed the Product Infrastructure group, whose engineers, in addition to GraphQL, created React, React Native, and many other broadly-used developer technologies, both inside Facebook and the technology industry at large.

ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.

FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520

1 participant
34 minutes

dagster

dag

orchestrator

abstractions

data

dbt

systems

augment

workflow

developers

31 Oct 2019

👉 Learn more about the talk and download the slides at https://crunchconf.com/speaker/NicholasSchrock#talks
📬 Sign up to our newsletter so you won't miss the updates about the next Crunch Data Conference: http://eepurl.com/dGwi1f

We introduce Dagster, an open source Python library for building ETL processes, ML pipelines, and similar software systems, all of which we call data applications.

Data applications are graphs of functional computations that consume and produce data assets. Dagster provides abstractions and tools for modeling the semantics of these applications by providing a unified type system, a data dependency graph, a configuration system, a structured API for emitting events such as data quality tests and materializations, and high-quality developer tools built on those abstractions. Builders can use the tools they know -- e.g. Spark jobs for data engineers, SQL statements for analysts, Python for data scientists -- and the application can be deployed to arbitrary orchestration engines -- such as Airflow, Dask, or Kubernetes-based execution -- in a pluggable fashion. - Captured Live on Ustream at https://www.ustream.tv/channel/JUMjvCF2ucj

2 participants
47 minutes

concepts

etl

implementation

abstractions

development

applications

api

dataflow

processing

elemental

19 Jun 2019

Get the slides: https://www.datacouncil.ai/talks/dagster-a-new-programming-model-for-data-processing

ABOUT THE TALK

This talk would introduce Dagster, an open source framework for building and modeling data processing computations. Data processing systems typically span multiple runtime, storage, tooling, and organizational boundaries. But all the stages in a data processing system share a fundamental property: They are directed, acyclic graphs (DAGs) of functional computations that consume and produce data assets. Dagster defines a standard for containerizing, describing and operating these computations, and that standard is opinionated and informed by the best practices in the industry leading to more testable, more reliable, better structured data systems.

By defining a standard one can build these computations in tools that users know and love such as Jupyter Notebooks (via Papermill), Dbt, Spark and leverage that standard in order to build high quality developer- and ops-facing tools to inspect, operate, and monitor those computations. These tools range from our beautiful introspection and execution tool Dagit, to tools that schedule these computations on systems ranging from Airflow to Lambda, among others. Dagster embraces the chaotic reality of the modern data management, and is an abstraction designed for incremental adoption within an increasingly heterogenous ecosystem. We would describe both the technology and the technical and organizational insights gained by production use of Dagster.

ABOUT THE SPEAKER

Nick Schrock is the founder and CEO of Elementl, a company aiming to reshape the data management ecosystem, and the creator of Dagster, a new programming model for data processing. Previously, Nick was a Principal Engineer and Director of Engineering at Facebook. In that time, Nick co-created GraphQL, and led its implementation and adoption across the entire organization and product line. He also formed the Product Infrastructure group, whose engineers, in addition to GraphQL, created React, React Native, and many other broadly-used developer technologies, both inside Facebook and the technology industry at large.

ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.

FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520

2 participants
39 minutes

introspect

confusing

facebook

emerging

users

concerns

data

abstraction

scientist

daxter

Dagster / Dagster Talks, Panels, & Interviews

6 Jul 2023

24 May 2023

8 Feb 2023

8 Feb 2023

14 Apr 2022

13 Apr 2022

4 Mar 2022

1 Dec 2021

19 Nov 2021

1 Nov 2021

21 Jun 2021

17 Jun 2021

26 May 2021

19 May 2021

18 Apr 2021

14 Dec 2020

24 Nov 2020

9 Sep 2020

31 Oct 2019

19 Jun 2019