Dagster Dagster in Media Open Meetings

23 Jan 2023

In this video, Ben Pankow, Software Engineer at Elementl, will show you how to streamline your process using dagster to manage your Airbyte connections and orchestrate syncs with downstream computation using DBT. He'll take you step-by-step through setting up Airbyte with Dagster from scratch, so even if you're new to these tools, you'll be able to follow along. And for those who are more experienced, we'll also show you how dagster and Airbyte can elevate your project to new heights. Don't miss out on this game-changing demo.

Subscribe to our newsletter: https://airbyte.com/newsletter?utm_source=youtube
Learn more about Airbyte: https://airbyte.com

#dataorchestration #data #communitycall

3 participants
21 minutes

dag

data

workflow

dexter

dashboard

analysts

orchestrator

interfacing

dbt

advanced

23 Jan 2023

A productionalized notebook integrated with an orchestration platform provides an excellent balance of reproducibility, flexibility, and intent in a way that will be quickly consumable. This tutorial is valuable to data scientists and data engineers. This setup makes it easy to take notebooks from exploratory to production, but even easier to debug and ensure quality over time. This tutorial will show how you can achieve:

Time-saving in initiating jobs: Allowing users to seamlessly transition an exploratory workflow created within a Noteable notebook, into a productionalized scheduled workflow in Dagster.
Time and Cost Saving for debugging failed runs: Allowing users to immediately dive into a live running notebook at the point of failure, with all of the in-memory state preserved. This saves the users' time, as well as saves companies' compute costs by not requiring debugging to re-execute previous steps of the workflow.

Bios:
Pierre Brunelle
Pierre Brunelle is the CEO and Co-Founder of Noteable, a collaborative data notebook that enables data-driven teams to use and visualize data, together. Prior to Noteable, Pierre led Amazon’s notebook initiatives both for internal use as well as for SageMaker. He also worked on many open source initiatives including a standard for Data Quality work and an open source collaboration between Amazon and UC Berkeley to advance AI and machine learning. Pierre helped launch the first Amazon online car leasing store in Europe. At Amazon Pierre also launched a Price Elasticity Service and pushed investments in Probabilistic Programming Frameworks. And Pierre represented Amazon on many occasions to teach Machine Learning or at conferences such as NeurIPS. Pierre also writes about Time in Organization Studies. Pierre holds an MS in Building Engineering from ESTP Paris and an MRes in Decision Sciences and Risk Management from Arts et Métiers ParisTech.

Jamie DeMaria
Jamie is a software engineer working on Dagster. She has also built data analysis tools (using Dagster!) for a robotics startup and developed software to train mission planners for the Mars Curiosity rover.

===

www.pydata.org

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.

Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps

5 participants
1:25 hours

workflow

documenting

workshop

tasks

methodology

project

tableau

tutorial

provide

presentation

8 Dec 2022

This is an ML pipeline with both CI & CD components*.

The story today is about CD - Continuous Deployment**. I’m assuming as the Data Scientist I don’t have access to the production environment (no “one-click deploy to prod” for me)***.

I develop my model (new feature branch) locally, using our dev database and local compute. I track experiments with MLflow and orchestrate my code with Dagster.

Once satisfied with my new model, I do the following to deploy:
- I check my code into a feature-branch in source control
- Open a pull request to the dev branch of our code base
- Kick back and watch

This starts a CI job via Github actions that:
- Builds my project
- Test my code
- Deploys to dev environment (teeny little deployment :)

If successful, another team member will merge my code into the dev branch.

Upon the merge into dev, this automatically triggers (dare I say, continuously) a deployment job:
- Deploys my code to Staging
- In my case, this job re-runs my ML pipeline and tests on staging data
- The benefit here is that typically staging data is closer to production data than whatever I was using in dev

If successful, this job initiates a manual review process for deployment to prod:
- Prompts an admin to review my code and choose whether to run the final job in the workflow: deploying to production.

- As the admin, I approve the deployment and the CD jobs finishes by training and deploying the model in prod.

Lots of hand-waving in this example, but I hope it helps show the git-based workflow moving between environments and the larger theme of the significant work required to actually deploy an ML project.

* This is part of my ongoing saga is to get closer to something resembling an actual production deployment instead of the notebook-based fit/predict/API patterns you tend to see

** I have a tendency of using CI/CD interchangeably (read: incorrectly). Setting up this example has really helped clarify where

*** In my examples, I only move code between environments, never models. This is the pattern I see most often with ML teams. It’s possible you might deploy your model from dev to staging to prod

Continual - we're lucky to work with ML teams that care about software engineering best practices. If that sounds like you, please hit us up.
#python #ml #dagster #mlflow

Feel free to connect with me on LI: https://www.linkedin.com/in/gustafrcavanaugh/

1 participant
7 minutes

workflow

workflows

mlflow

process

deployments

staging

automated

project

grc

github

31 Oct 2022

Practica de Data engineer usando tecnologias open source

1 participant
10 minutes

docker

launcher

problem

configuration

host

executed

ron

duster

daxter

tutorial

31 Oct 2022

Practica de Data engineer usando tecnologias open source

1 participant
28 minutes

tutorial

creating

project

application

installing

copying

cd

docker

problem

connect

31 Oct 2022

Practica de Data engineer usando tecnologias open source

1 participant
20 minutes

doctor

host

ports

documentation

services

docker

use

analyze

network

papa

31 Oct 2022

Practica de Data engineer usando tecnologias open source

2 participants
19 minutes

daxter

docker

repository

installed

loading

daemon

problem

process

duster

notes

31 Oct 2022

Practica de Data engineer usando tecnologias open source

1 participant
21 minutes

tutorial

doctor

error

execute

configure

host

repository

copy

communicates

data

31 Oct 2022

Practica de Data engineer usando tecnologias open source

2 participants
26 minutes

installing

tbt

bt

dvd

project

bit

problem

execute

git

plugins

31 Oct 2022

Practica de Data engineer usando tecnologias open source

2 participants
14 minutes

dependencies

installed

installing

dbt

dvt

docker

dvd

xt

tutorial

taking

31 Oct 2022

Practica de Data engineer usando tecnologias open source

1 participant
31 minutes

script

copy

importing

implement

configure

project

dvd

profile

adding

environment

31 Oct 2022

Practica de Data engineer usando tecnologias open source

1 participant
7 minutes

docker

airbite

models

airbag

connect

warehouse

executed

daxter

project

platform

31 Oct 2022

Practica de Data engineer usando tecnologias open source

1 participant
9 minutes

clone

repository

daxter

project

platform

analyze

init

tools

documentation

deploy

27 Sep 2022

Today we will be interviewing EvolutionIQ and why they decided to use Dagster

Tomas Vykruta - https://www.linkedin.com/in/tvykruta/
Karan Uppal - https://www.linkedin.com/in/karanuppal/

If you enjoyed this video, check out some of my other top videos.

Top Courses To Become A Data Engineer In 2022
https://www.youtube.com/watch?v=kW8_l57w74g

What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
https://www.youtube.com/watch?v=-ClWgwC0Sbw

If you would like to learn more about data engineering, then check out Googles GCP certificate
https://bit.ly/3NQVn7V

If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.

https://seattledataguy.substack.com/

Or check out my blog
https://www.theseattledataguy.com/

And if you want to support the channel, then you can become a paid member of my newsletter
https://seattledataguy.substack.com/subscribe

Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio

_____________________________________________________________
Subscribe: https://www.youtube.com/channel/UCmLGJ3VYBcfRaWbP6JLJcpA?sub_confirmation=1
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.

*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.

3 participants
1:09 hours

innovation

curious

evolution

exciting

today

thoughts

humanity

consultancy

google

ai

Dagster / Dagster in Media

23 Jan 2023

23 Jan 2023

8 Dec 2022

31 Oct 2022

31 Oct 2022

31 Oct 2022

31 Oct 2022

31 Oct 2022

31 Oct 2022

31 Oct 2022

31 Oct 2022

31 Oct 2022

31 Oct 2022

27 Sep 2022