Dagster / Dagster in Media

Add meeting Rate page Subscribe

Dagster / Dagster in Media

These are all the meetings we have in "Dagster in Media" (part of the organization "Dagster"). Click into individual meeting pages to watch the recording and search or read the transcript.

23 Jan 2023

In this video, Ben Pankow, Software Engineer at Elementl, will show you how to streamline your process using dagster to manage your Airbyte connections and orchestrate syncs with downstream computation using DBT. He'll take you step-by-step through setting up Airbyte with Dagster from scratch, so even if you're new to these tools, you'll be able to follow along. And for those who are more experienced, we'll also show you how dagster and Airbyte can elevate your project to new heights. Don't miss out on this game-changing demo.

Subscribe to our newsletter: https://airbyte.com/newsletter?utm_source=youtube
Learn more about Airbyte: https://airbyte.com

#dataorchestration #data #communitycall
  • 3 participants
  • 21 minutes
dag
data
workflow
dexter
dashboard
analysts
orchestrator
interfacing
dbt
advanced
youtube image

23 Jan 2023

A productionalized notebook integrated with an orchestration platform provides an excellent balance of reproducibility, flexibility, and intent in a way that will be quickly consumable. This tutorial is valuable to data scientists and data engineers. This setup makes it easy to take notebooks from exploratory to production, but even easier to debug and ensure quality over time. This tutorial will show how you can achieve:

Time-saving in initiating jobs: Allowing users to seamlessly transition an exploratory workflow created within a Noteable notebook, into a productionalized scheduled workflow in Dagster.
Time and Cost Saving for debugging failed runs: Allowing users to immediately dive into a live running notebook at the point of failure, with all of the in-memory state preserved. This saves the users' time, as well as saves companies' compute costs by not requiring debugging to re-execute previous steps of the workflow.

Bios:
Pierre Brunelle
Pierre Brunelle is the CEO and Co-Founder of Noteable, a collaborative data notebook that enables data-driven teams to use and visualize data, together. Prior to Noteable, Pierre led Amazon’s notebook initiatives both for internal use as well as for SageMaker. He also worked on many open source initiatives including a standard for Data Quality work and an open source collaboration between Amazon and UC Berkeley to advance AI and machine learning. Pierre helped launch the first Amazon online car leasing store in Europe. At Amazon Pierre also launched a Price Elasticity Service and pushed investments in Probabilistic Programming Frameworks. And Pierre represented Amazon on many occasions to teach Machine Learning or at conferences such as NeurIPS. Pierre also writes about Time in Organization Studies. Pierre holds an MS in Building Engineering from ESTP Paris and an MRes in Decision Sciences and Risk Management from Arts et Métiers ParisTech.

Jamie DeMaria
Jamie is a software engineer working on Dagster. She has also built data analysis tools (using Dagster!) for a robotics startup and developed software to train mission planners for the Mars Curiosity rover.

===

www.pydata.org

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.

Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps
  • 5 participants
  • 1:25 hours
workflow
documenting
workshop
tasks
methodology
project
tableau
tutorial
provide
presentation
youtube image

8 Dec 2022

This is an ML pipeline with both CI & CD components*.

The story today is about CD - Continuous Deployment**. I’m assuming as the Data Scientist I don’t have access to the production environment (no “one-click deploy to prod” for me)***.

I develop my model (new feature branch) locally, using our dev database and local compute. I track experiments with MLflow and orchestrate my code with Dagster.

Once satisfied with my new model, I do the following to deploy:
- I check my code into a feature-branch in source control
- Open a pull request to the dev branch of our code base
- Kick back and watch

This starts a CI job via Github actions that:
- Builds my project
- Test my code
- Deploys to dev environment (teeny little deployment :)

If successful, another team member will merge my code into the dev branch.

Upon the merge into dev, this automatically triggers (dare I say, continuously) a deployment job:
- Deploys my code to Staging
- In my case, this job re-runs my ML pipeline and tests on staging data
- The benefit here is that typically staging data is closer to production data than whatever I was using in dev

If successful, this job initiates a manual review process for deployment to prod:
- Prompts an admin to review my code and choose whether to run the final job in the workflow: deploying to production.

- As the admin, I approve the deployment and the CD jobs finishes by training and deploying the model in prod.

Lots of hand-waving in this example, but I hope it helps show the git-based workflow moving between environments and the larger theme of the significant work required to actually deploy an ML project.

* This is part of my ongoing saga is to get closer to something resembling an actual production deployment instead of the notebook-based fit/predict/API patterns you tend to see

** I have a tendency of using CI/CD interchangeably (read: incorrectly). Setting up this example has really helped clarify where

*** In my examples, I only move code between environments, never models. This is the pattern I see most often with ML teams. It’s possible you might deploy your model from dev to staging to prod

Continual - we're lucky to work with ML teams that care about software engineering best practices. If that sounds like you, please hit us up.
#python #ml #dagster #mlflow

Feel free to connect with me on LI: https://www.linkedin.com/in/gustafrcavanaugh/
  • 1 participant
  • 7 minutes
workflow
workflows
mlflow
process
deployments
staging
automated
project
grc
github
youtube image

31 Oct 2022

Practica de Data engineer usando tecnologias open source
  • 1 participant
  • 10 minutes
docker
launcher
problem
configuration
host
executed
ron
duster
daxter
tutorial
youtube image

31 Oct 2022

Practica de Data engineer usando tecnologias open source
  • 1 participant
  • 28 minutes
tutorial
creating
project
application
installing
copying
cd
docker
problem
connect
youtube image

31 Oct 2022

Practica de Data engineer usando tecnologias open source
  • 1 participant
  • 20 minutes
doctor
host
ports
documentation
services
docker
use
analyze
network
papa
youtube image

31 Oct 2022

Practica de Data engineer usando tecnologias open source
  • 2 participants
  • 19 minutes
daxter
docker
repository
installed
loading
daemon
problem
process
duster
notes
youtube image

31 Oct 2022

Practica de Data engineer usando tecnologias open source
  • 1 participant
  • 21 minutes
tutorial
doctor
error
execute
configure
host
repository
copy
communicates
data
youtube image

31 Oct 2022

Practica de Data engineer usando tecnologias open source
  • 2 participants
  • 26 minutes
installing
tbt
bt
dvd
project
bit
problem
execute
git
plugins
youtube image

31 Oct 2022

Practica de Data engineer usando tecnologias open source
  • 2 participants
  • 14 minutes
dependencies
installed
installing
dbt
dvt
docker
dvd
xt
tutorial
taking
youtube image

31 Oct 2022

Practica de Data engineer usando tecnologias open source
  • 1 participant
  • 31 minutes
script
copy
importing
implement
configure
project
dvd
profile
adding
environment
youtube image

31 Oct 2022

Practica de Data engineer usando tecnologias open source
  • 1 participant
  • 7 minutes
docker
airbite
models
airbag
connect
warehouse
executed
daxter
project
platform
youtube image

31 Oct 2022

Practica de Data engineer usando tecnologias open source
  • 1 participant
  • 9 minutes
clone
repository
daxter
project
platform
analyze
init
tools
documentation
deploy
youtube image

27 Sep 2022

Today we will be interviewing EvolutionIQ and why they decided to use Dagster

Tomas Vykruta - https://www.linkedin.com/in/tvykruta/
Karan Uppal - https://www.linkedin.com/in/karanuppal/


If you enjoyed this video, check out some of my other top videos.

Top Courses To Become A Data Engineer In 2022
https://www.youtube.com/watch?v=kW8_l57w74g

What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
https://www.youtube.com/watch?v=-ClWgwC0Sbw

If you would like to learn more about data engineering, then check out Googles GCP certificate
https://bit.ly/3NQVn7V

If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.

https://seattledataguy.substack.com/​​

Or check out my blog
https://www.theseattledataguy.com/

And if you want to support the channel, then you can become a paid member of my newsletter
https://seattledataguy.substack.com/subscribe


Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio

_____________________________________________________________
Subscribe: https://www.youtube.com/channel/UCmLGJ3VYBcfRaWbP6JLJcpA?sub_confirmation=1
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.

*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.
  • 3 participants
  • 1:09 hours
innovation
curious
evolution
exciting
today
thoughts
humanity
consultancy
google
ai
youtube image