Dagster / Dagster Talks, Panels, & Interviews

Add meeting Rate page Subscribe

Dagster / Dagster Talks, Panels, & Interviews

These are all the meetings we have in "Dagster Talks, Panel…" (part of the organization "Dagster"). Click into individual meeting pages to watch the recording and search or read the transcript.

6 Jul 2023

On June 8th of this year, Sandy Ryza, lead engineer on the Dagster project gave a presentation at the DATA + AI Summit in San Francisco. The talk was entitled "The Future of Data Orchestration: Asset-Based Orchestration".

We are happy to share the key points of the talk in the video below.

Sandy's thesis: Data orchestration is a core component for any batch data processing platform and we’ve been using patterns that haven't changed since the 1980s. Sandy introduces a new pattern and way of thinking for data orchestration known as asset-based orchestration, with data freshness sensors to trigger pipelines.
  • 1 participant
  • 15 minutes
pipelines
pipeline
workflow
workflows
data
process
inputs
flow
debugging
observability
youtube image

24 May 2023

In this video we will revisit dagster. We will talk about changes to this workflow orchestration system due to recent updates (update from version 0.15 to 1.3.1)
Dagster is an orchestrator that's designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports.
We will cover Software Defined Assets as dagster is pushing towards the Software Defined Assets. By default our pipeline outputs are stored as a pickle file in the dagster home folder. What if we want to store the outputs in a database table, or in a readable file such as a csv or parquet file. Dagster provides us with Input and Output managers (IO managers) that enable reading and writing data to storage systems. Using Store IO managers we can save the outputs in a file system or store our data as tables in a database. We will define file csv/parquet and database IO Managers.

Link to previous video: https://www.youtube.com/watch?v=t8QADtYdWEI&t
Link to GitHub repo: https://github.com/hnawaz007/pythondataanalysis/tree/main/dagster-project/etl

Get started with Dagster in just three quick steps:
Install Dagster, Define assets and Materialize assets.

Create a virtual environment: python -m venv env
Activate the virtual environment: env\Scripts\activate

To install Dagster into an existing Python environment, run:
pip install dagster dagit

Command to create a new project
dagster project scaffold --name my-dagster-project

Additional libraries required: Pandas, psycopg2

To run dagster issue following command:
dagit
dagster-daemon run

Access Dagit UI on port 3000: http://127.0.0.1:3000


💥Subscribe to our channel:
https://www.youtube.com/c/HaqNawaz

📌 Links
-----------------------------------------
#️⃣ Follow me on social media! #️⃣

🔗 GitHub: https://github.com/hnawaz007
📸 Instagram: https://www.instagram.com/bi_insights_inc
📝 LinkedIn: https://www.linkedin.com/in/haq-nawaz/
🔗 https://medium.com/@hnawaz100

-----------------------------------------

#Python #ETL #Dagster


Topics covered in this video:
==================================
0:00 - Introduction to Dagster
2:11 - Dagster create new project
3:03 - Dagster Project Structure
4:18 - Software Defined Assets
5:35 - Install Required Libraries
5:58 - Source DB Connection
6:27 - Source Asset
10:05 - File IO Manager
14:16 - Second Asset
16:19 - Parquet IO Manager
16:26 - Database IO Manager
19:05 - Materialize Assets
  • 1 participant
  • 21 minutes
daxter
dagster
workflow
project
implemented
stored
dexter
orchestrator
dbio
updates
youtube image

8 Feb 2023

Many data engineers are looking to get past the limitations of Apache Airflow, the incumbent in the data orchestration layer. Dagster proposes a new paradigm centered on Data Assets and the tools to support a full development lifecycle that radically boosts the productivity of data engineering teams.
  • 1 participant
  • 7 minutes
pipelines
workflow
workflows
pipeline
airflows
debugging
dagster
models
cumbersome
development
youtube image

8 Feb 2023

INFOSTRUX is a Dagster technology partner. They help organizations select and implement the best possible technology to meet their business goals. In this video, Nasko Grozdanov (Manager, Engineering) explains the business benefits of using Dagster over Airflow.
  • 1 participant
  • 5 minutes
workflows
daxter
automation
advantageous
quicker
sophisticated
collaborate
dexter
deployments
plane
youtube image

14 Apr 2022

ABOUT THE TALK

This talk discusses software-defined assets, an approach to orchestration and data management that makes it drastically easier to trust and evolve data assets, like tables and ML models.

In traditional data platforms, code and data are only loosely coupled. As a consequence, deploying changes to data feels dangerous, backfills are error-prone and irreversible, and it’s difficult to trust data, because you don’t know where it comes from or how it’s intended to be maintained. Each time you run a job that mutates a data asset, you add a new variable to account for when debugging problems.

Dagster proposes an alternative approach to data management that tightly couples data assets to code - each table or ML model corresponds to the function that’s responsible for generating it. This results in a “Data as Code” approach that mimics the “Infrastructure as Code” approach that’s central to modern DevOps. Your git repo becomes your source of truth on your data, so pushing data changes feels as safe as pushing code changes. Backfills become easy to reason about. You trust your data assets because you know how they’re computed and can reproduce them at any time. The role of the orchestrator is to ensure that physical assets in the data warehouse match the logical assets that are defined in code, so each job run is a step towards order.

Software-defined assets is a natural approach to orchestration for the modern data stack, in part because dbt models are a kind of software-defined asset.

Attendees of this session will learn what it looks like to build and maintain a warehouse or data lake of software-defined assets with Dagster.

ABOUT THE SPEAKER

Sandy is a software engineer at Elementl, building Dagster. Prior, he led machine learning and data science teams at KeepTruckin and Clover Health. He's a committer on Spark and Hadoop, and co-authored O'Reilly's Advanced Analytics with Spark.

ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.

FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai/
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520
  • 1 participant
  • 27 minutes
devops
evolving
workflow
frameworks
dag
infrastructure
dbt
terraform
data
kubernetes
youtube image

13 Apr 2022

As organizations grow so does the complexity and number of tools. In small data teams it is not uncommon to be responsible for more systems than there are team members. Without a strong orchestration layer to hold these systems together, many workflows are held together with one-off jobs and shaky deployments.

Dagster is a modern framework for defining jobs that treats the data platform as an application. Jobs can be parameterized and iterated on locally before moving into the cloud and interacting with production services. Using Dagster as the foundation for a modern data platform ensures that all data stakeholders can easily interact with their data and build out specific pipelines to maximize value without having to worry about underlying infrastructure.

Presenter: Dennis Hume @Dutchie
  • 1 participant
  • 32 minutes
dagstr
dagster
dagsta
dag
dagstar
orchestration
data
daemon
setups
infrastructure
youtube image

4 Mar 2022

The last few weeks have featured a loud debate in the data community - should the data stack be bundled, unbundled, or rebundled? We're going to dive into this debate with Nick Schrock (Elementl) and Scott Breitenother (Brooklyn Data Co). This should be full of fireworks, so tune in!

Streamed live on YouTube and LinkedIn
  • 4 participants
  • 1:01 hours
bundling
rebundling
airflow
fuzzing
dang
federated
cloud
orchestrating
providers
increasingly
youtube image

1 Dec 2021

The modern data stack is often defined by the type of technologies that exist within it. Cloud-based, open source, low/no code tools, ELT, and reverse ETL. But surely there’s more to it… isn’t there?

What holds the modern data stack together and makes it the architecture of choice for so many data-driven enterprises? Join Tim, Juan and special guest, Nick Schrock, founder of Elemental and creator of Dagster and GraphQL, to chat about all things MDS.
  • 3 participants
  • 60 minutes
congratulations
conversation
having
thanks
cocktails
great
fellow
guests
announcing
busy
youtube image

19 Nov 2021

Tune in to MAD Data Podcast to hear from data engineers, leaders of data teams, and everyone who has a stake in their data quality. Be sure to check out our upcoming guest lineup at: https://databand.ai/mad-data-podcast/
. . .
Nick Schrock, Founder & CEO of Elementl, and Scott Breitenother, Founder of Brooklyn Data Co., discuss the evolution of data from Big Data to Big Complexity – what's next now that the data industry has solved the problem of data storage? While the modern data stack has become embraced as every data team's "must-have" to address 'modern data problems,' Nick and Scott muse on the struggles that continue to plague data teams and the next wave of potential in data infrastructure innovation. With one problem solved, a new era of possibility and complexity is now unleashed.
. . .
Listen to MAD Data on -

Simplecast: https://mad-data.simplecast.com/

Google Podcast: https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5zaW1wbGVjYXN0LmNvbS9OU3FITE5kZQ?sa=X&ved=0CAMQ4aUDahcKEwjA-qzZoKX0AhUAAAAAHQAAAAAQAQ&hl=en

Amazon Music: https://music.amazon.com/podcasts/2969426c-15f3-49e9-a54a-13edcf6e6768/mad-data-a-data-quality-podcast-by-databand-ai

Spotify: https://open.spotify.com/show/3V0dhexP4cSYIRBlKLf9KV
. . .
View our upcoming guest lineup at: https://databand.ai/mad-data-podcast/
. . .
#dataquality #dataops #datapipelines #dataobservability #datareliability #dataengineering
  • 4 participants
  • 52 minutes
data
harper
scott
personally
mike
come
consultancy
interviewed
nascent
dagsar
youtube image

1 Nov 2021

Nick will cover the principles and origin of Dagster. Dagster is a new type of workflow engine: a data orchestrator. Moving beyond just managing the ordering and physical execution of data computations, Dagster considers the entire data application lifecycle. Practitioners in Dagster build data-aware dependency graphs designed for local development and testing; deploy those graphs to a multi-tenant, cloud-native orchestration engine; and then monitor and observe the data assets produced by those computations.

This talk will also cover our new major release, which makes significant changes to our core API that dramatically improve usability and ergonomics.
  • 3 participants
  • 51 minutes
dexter
cisco
meetup
presentation
interact
insights
dashboard
hosted
workflow
cloud
youtube image

21 Jun 2021

Nick Schrock, Founder and CEO of Elementl, spoke with FirstMark's Matt Turck in a virtual fireside chat at Data Driven NYC in June 2021. They spoke about Dagster, open source, and much more.

Data Driven NYC is a monthly event covering Big Data and data-driven products and startups, hosted by Matt Turck, partner at FirstMark Capital.
  • 2 participants
  • 28 minutes
workflow
technologies
data
monitoring
important
exploratory
elemental
discussion
integrate
daxter
youtube image

17 Jun 2021

Nick Schrock covers the principles and origin of Dagster. Dagster is a new type of workflow engine: a data orchestrator. Moving beyond just managing the ordering and physical execution of data computations, Dagster considers the entire data application lifecycle. Practitioners in Dagster build data-aware dependency graphs designed for local development and testing; deploy those graphs to a multi-tenant, cloud-native orchestration engine; and then monitor and observe the data assets produced by those computations.

In this talk, Nick discusses how Dagster differentiates itself across the three stages (dev & test, deploy & execute, monitor & observer) of the application lifecycle. Through a demo and code snippets, the talk shows how the Dagit web UI and Dagster programming model can power a variety of data practitioners.

Related video:
Data Driven NYC talk on Fundamentals of Data Engineering (https://youtu.be/mPSzL8Lurs0)

#data #dataengineering #airflow #prefect #cloudnative #cloudcomputing #cloud #fundamentalsofdataengineering
-------------------------
About Nick Schrock

Nick Schrock is the founder and CEO of Elementl, the company behind Dagster. Previously, Nick worked at Facebook, where he co-created GraphQL. Nick believes deeply in the power of well-designed developer tools to make engineers more productive, accelerate their careers, make their lives more enjoyable, and transform the organizations in which they work.
  • 7 participants
  • 1:10 hours
workflow
functionality
infrastructure
technologies
api
data
orchestrators
stakeholders
dag
commoditization
youtube image

26 May 2021

This talk was given at one of four live meetups as part of Data + AI Summit 2021 on May 26, 2021.

Title: Introduction, principles and origin of Dagster by Nick Schrock

Abstract: Nick will cover the principles and origin of Dagster (https://dagster.io/). Dagster is a new type of workflow engine: a data orchestrator. Moving beyond just managing the ordering & physical execution of data computations, Dagster considers the entire data application lifecycle. Practitioners in Dagster build data-aware dependency graphs designed for local development and testing; deploy those graphs to multi-tenant, cloud-native orchestration engine; and then monitor and observe the data assets produced by those computations.

In this talk, Nick will cover how Dagster differentiates itself across the three stages (dev & test, deploy & execute, monitor & observer) of the application lifecycle. Through a demo and code snippets, the talk aims to show how the Dagit web UI and Dagster programming model can power a variety of data practitioners.

Speaker: Nick Schrock is the founder and CEO of Elementl, the company behind Dagster. Previously, Nick worked at Facebook, where he co-created GraphQL. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 2 participants
  • 30 minutes
orchestrating
orchestrator
infrastructure
workflow
data
complexity
api
organization
stakeholders
operationally
youtube image

19 May 2021

Eric Anderson interviews Nick Schrock about Dagster, the open-source data orchestrator for machine learning, analytics, and ETL. Nick is the founder and CEO of Elementl, and is well-known for creating the Project Infrastructure group at Facebook, which spawned GraphQL and React. On today’s episode of Contributor, Nick explains how he set out to fix an inefficiency he identified amongst the complexity of the data infrastructure domain.

Find show notes and previous episodes at https://www.contributor.fyi/
  • 2 participants
  • 40 minutes
daxter
dagster
dax
modern
developer
overseeing
understanding
schrock
enterprise
warehousing
youtube image

18 Apr 2021

Building ML Pipelines with Dagster: The role of the orchestrator in machine learning
Dagster is an orchestrator that puts data at the center. While orchestrators typically focus on sequencing computations in production, Dagster brings orchestration to the entire ML development lifecycle
Slides: https://yadi.sk/i/DuZkASxS7ZbvDA
Sandy Ryza, Software Engineer @ Elementl, working on Dagster

All talks & tutorials from Machine Learning REPA Week 2021:
- All talks at Track 1 - Machine Learning Product and Team Management: https://youtube.com/playlist?list=PLlxErbAvYYLBa2uQkROxn4OZJvGQACn8i
- All talks at Track 2 - ML pipelines automation. Code and Data version control. Reproducibility. MLOps: https://youtube.com/playlist?list=PLlxErbAvYYLDRP6cHtVP76f2g5Yoh6c5R

Links:
- Learn more about ML REPA: https://mlrepa.com/
- Learn more about LeanDS: https://leands.ru/
- Learn more about DataTalks.Club: https://datatalks.club/
- Machine Learning REPA Week 2021 Online Conference: https://mlrepa.com/mlrepa-week-2021

Join us:
DataTalks.Club #mlrepa - https://datatalks-club.slack.com/archives/C01Q6698JTV
Slack ODS.ai #mlrepa -https://opendatascience.slack.com/archives/C019A9H5V0X

#mlrepa #machinelearning #datascience #reproducibility #mlops #artificialintelligence #ai #python #deeplearning #technology #programming #coding #bigdata #computerscience #data #dataanalytics #tech #datascientist #pythonprogramming #ml #developer #software #robotics #innovation #coder #datavisualization #analytics #neuralnetworks #leands #automation
  • 2 participants
  • 26 minutes
dagstr
modeling
data
sophisticated
flows
managed
dbt
process
debugging
visualize
youtube image

14 Dec 2020

dbt defined an entire new subspecialty of software engineering: Analytics Engineering. But it is one discipline among many: analytics engineers must collaborate with data scientists, data engineers, and data platform engineers to deliver a cohesive data platform. In this video, Nick Schrock of Elementl talks about how orchestrating dbt with Dagster allows you to place dbt in context, de-silo your operational systems, improve monitoring, and enable self-service operations.
  • 1 participant
  • 25 minutes
dbt
intuitive
discussion
daxter
dag
users
introduce
contexts
personas
models
youtube image

24 Nov 2020

Max Gasner, Co-author of Dagster – data orchestration framework for ETL –talks about principles of building reliable data applications with Dagster. Sign up for the next Data Quality Meetup: https://bit.ly/3yiUH2H
Join our Meetup Group: https://www.meetup.com/data-quality-meetup/
  • 1 participant
  • 8 minutes
data
apps
developer
dbt
orchestrator
dagster
debug
robust
monitoring
execution
youtube image

9 Sep 2020

ABOUT THE TALK (https://www.datacouncil.ai/talks/dagster-workflows-for-data-science-machine-learning-and-data-engineering)

Dagster is an open-source, Python library for building data applications: ML, Analytics, ETL, and more. We will describe how Dagster can enable a unified stack of tooling across a wide dynamic range of use cases: from a laptop to a Kubernetes cluster; from local development, to monitoring in production; from data science, to data engineering, to machine learning.

ABOUT THE SPEAKER

Nick Schrock is the founder and CEO of Elementl, a company aiming to reshape the data management ecosystem, and the creator of Dagster, a new programming model for data processing. Previously, Nick was a Principal Engineer and Director of Engineering at Facebook. In that time, Nick co-created GraphQL, and led its implementation and adoption across the entire organization and product line. He also formed the Product Infrastructure group, whose engineers, in addition to GraphQL, created React, React Native, and many other broadly-used developer technologies, both inside Facebook and the technology industry at large.

ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.

FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520
  • 1 participant
  • 34 minutes
dagster
dag
orchestrator
abstractions
data
dbt
systems
augment
workflow
developers
youtube image

31 Oct 2019

👉 Learn more about the talk and download the slides at https://crunchconf.com/speaker/NicholasSchrock#talks
📬 Sign up to our newsletter so you won't miss the updates about the next Crunch Data Conference: http://eepurl.com/dGwi1f

We introduce Dagster, an open source Python library for building ETL processes, ML pipelines, and similar software systems, all of which we call data applications.

Data applications are graphs of functional computations that consume and produce data assets. Dagster provides abstractions and tools for modeling the semantics of these applications by providing a unified type system, a data dependency graph, a configuration system, a structured API for emitting events such as data quality tests and materializations, and high-quality developer tools built on those abstractions. Builders can use the tools they know -- e.g. Spark jobs for data engineers, SQL statements for analysts, Python for data scientists -- and the application can be deployed to arbitrary orchestration engines -- such as Airflow, Dask, or Kubernetes-based execution -- in a pluggable fashion. - Captured Live on Ustream at https://www.ustream.tv/channel/JUMjvCF2ucj
  • 2 participants
  • 47 minutes
concepts
etl
implementation
abstractions
development
applications
api
dataflow
processing
elemental
youtube image

19 Jun 2019

Get the slides: https://www.datacouncil.ai/talks/dagster-a-new-programming-model-for-data-processing


ABOUT THE TALK

This talk would introduce Dagster, an open source framework for building and modeling data processing computations. Data processing systems typically span multiple runtime, storage, tooling, and organizational boundaries. But all the stages in a data processing system share a fundamental property: They are directed, acyclic graphs (DAGs) of functional computations that consume and produce data assets. Dagster defines a standard for containerizing, describing and operating these computations, and that standard is opinionated and informed by the best practices in the industry leading to more testable, more reliable, better structured data systems.

By defining a standard one can build these computations in tools that users know and love such as Jupyter Notebooks (via Papermill), Dbt, Spark and leverage that standard in order to build high quality developer- and ops-facing tools to inspect, operate, and monitor those computations. These tools range from our beautiful introspection and execution tool Dagit, to tools that schedule these computations on systems ranging from Airflow to Lambda, among others. Dagster embraces the chaotic reality of the modern data management, and is an abstraction designed for incremental adoption within an increasingly heterogenous ecosystem. We would describe both the technology and the technical and organizational insights gained by production use of Dagster.

ABOUT THE SPEAKER

Nick Schrock is the founder and CEO of Elementl, a company aiming to reshape the data management ecosystem, and the creator of Dagster, a new programming model for data processing. Previously, Nick was a Principal Engineer and Director of Engineering at Facebook. In that time, Nick co-created GraphQL, and led its implementation and adoption across the entire organization and product line. He also formed the Product Infrastructure group, whose engineers, in addition to GraphQL, created React, React Native, and many other broadly-used developer technologies, both inside Facebook and the technology industry at large.

ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.

FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520
  • 2 participants
  • 39 minutes
introspect
confusing
facebook
emerging
users
concerns
data
abstraction
scientist
daxter
youtube image