youtube image
From YouTube: Manage you data pipelines with Dagster | Software defined assets | IO Managers | Updated project

Description

In this video we will revisit dagster. We will talk about changes to this workflow orchestration system due to recent updates (update from version 0.15 to 1.3.1)
Dagster is an orchestrator that's designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports.
We will cover Software Defined Assets as dagster is pushing towards the Software Defined Assets. By default our pipeline outputs are stored as a pickle file in the dagster home folder. What if we want to store the outputs in a database table, or in a readable file such as a csv or parquet file. Dagster provides us with Input and Output managers (IO managers) that enable reading and writing data to storage systems. Using Store IO managers we can save the outputs in a file system or store our data as tables in a database. We will define file csv/parquet and database IO Managers.

Link to previous video: https://www.youtube.com/watch?v=t8QADtYdWEI&t
Link to GitHub repo: https://github.com/hnawaz007/pythondataanalysis/tree/main/dagster-project/etl

Get started with Dagster in just three quick steps:
Install Dagster, Define assets and Materialize assets.

Create a virtual environment: python -m venv env
Activate the virtual environment: env\Scripts\activate

To install Dagster into an existing Python environment, run:
pip install dagster dagit

Command to create a new project
dagster project scaffold --name my-dagster-project

Additional libraries required: Pandas, psycopg2

To run dagster issue following command:
dagit
dagster-daemon run

Access Dagit UI on port 3000: http://127.0.0.1:3000


๐Ÿ’ฅSubscribe to our channel:
https://www.youtube.com/c/HaqNawaz

๐Ÿ“Œ Links
-----------------------------------------
#๏ธโƒฃ Follow me on social media! #๏ธโƒฃ

๐Ÿ”— GitHub: https://github.com/hnawaz007
๐Ÿ“ธ Instagram: https://www.instagram.com/bi_insights_inc
๐Ÿ“ LinkedIn: https://www.linkedin.com/in/haq-nawaz/
๐Ÿ”— https://medium.com/@hnawaz100

-----------------------------------------

#Python #ETL #Dagster


Topics covered in this video:
==================================
0:00 - Introduction to Dagster
2:11 - Dagster create new project
3:03 - Dagster Project Structure
4:18 - Software Defined Assets
5:35 - Install Required Libraries
5:58 - Source DB Connection
6:27 - Source Asset
10:05 - File IO Manager
14:16 - Second Asset
16:19 - Parquet IO Manager
16:26 - Database IO Manager
19:05 - Materialize Assets