►
From YouTube: Dagster Demo for Genpact - April 2023
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
You
know
a
challenge
that
a
lot
of
data
teams
face
is
that
they're
working
with
orchestrators
that
really
weren't
built
for
data
engineering.
You
know
they
were
airflow
as
an
example
as
a
very
general
purpose,
orchestrator
and
so
dangster.
The
reason
it
was
built
is
to
exist
as
an
orchestrator,
specifically
for
data
teams,
and
so
throughout
today's
demo,
we're
going
to
look
at
you
know
some
of
the
the
benefits
that
come
from
being
a
data
orchestrator,
not
just
a
general
purpose
orchestrator,
but
to
that
I
want
to
highlight
out
of
the
gate.
A
Is
that
we're
able
to
track
a
lot
of
the
lineage
and
metadata
of
the
actual
data
assets
that
the
pipeline
is
producing?
That's,
ultimately,
what
stakeholders
usually
care
about?
You
know
no
one
comes
to
a
data
engineer
and
says:
I
think
you
know
task
X,
failed
and
that's
why
everything
is
bad
today.
You
know
that
they
come
to
data
engineers
and
say
this
data
set
looks
wrong.
You
know
what
what's
going
on
and
so
in
Orchestra
you're.
A
A
lot
of
those
tools
operate
against
data
sets,
and
so
the
orchestrator
being
aware
of
data
sets
allow
us
to
integrate
more
seamlessly
with
a
lot
of
those
tools,
whether
they're
on
the
extract
and
load
side
like
air
byte
transformation
layer
like
DBT
or
even
further
Downstream
things
like
Jupiter
notebooks
that
are
creating
models,
but
those
all
fit
much
more
seamlessly
with
the
tool.
That's
aware
of
data
assets,
then
within
orchestrator.
That's
just
thinking
about
tasks-
and
you
know
this
is
another
view
of
what
I
just
said.
A
Essentially
dagster
gives
us
this
ability,
because
we're
aware
of
data
assets
to
do
many
of
these
things.
The
other
thing
throughout
there
is
that
airflow
was
built
before
a
lot
of
software.
Best
engineering
practices
really
were
hardened,
and
so
things
like
local
testing,
doing
CI
CD.
That
can
be
very
challenging
with
some
of
those
other
orchestration
tools
where
a
stackster
was
built
to
really
adopt
those
best
practices.
A
All
right
so
I
want
to
show
you
kind
of
what
this
looks
like
actually
in
practice
and
so
I'm
going
to
switch
over
to
dagster
I'm,
going
to
work
backwards.
So
I'm
going
to
start
with
what
a
production
data
platform
looks
like
in
dagster
and
then
we'll
work.
Our
way
to
the
actual.
You
know
code
that
you'd
write
to
get
there,
and
so
here
is
a
dixter
cloud.
Dexter
cloud
is
one
way
that
many
teams
use
Stacks
during
production,
and
so
basically,
what
the
extra
Cloud
entails
is
a
control
plane
that
we
run
tracks.
A
Things
like
you
know:
who's
logged
in
what
runs
have
occurred
and
then
the
actual
execution
plane
where
these
jobs
are
fired
off
tends
to
in
most
cases
be
within
a
AWS
or
you
know,
Azure
or
TCP
Cloud
environment
that
is
close
to
the
data.
So
we
have
that
hybrid
architecture,
and
so
what
we're
looking
at
is
a
timeline
of
jobs
that
have
been
running
and
we
can
jump
into
one
of
these
jobs
and
right
away.
A
That's
where
you'll
start
to
see
the
data
first
orientation
of
dagster
as
a
tool,
and
so
here
we
have
a
pipeline.
It's
starting
by
extracting
some
raw
data
from
an
API
loading
it
into
a
warehouse
and
then
orchestrating
some
DBT
transformations
to
clean
up
that
date,
data
and
so
behind
the
scenes
there's
still
kind
of
these
raw
tasks
that
the
orchestrator
is
running.
You
go
and
extract
the
data
from
the
API
and
then
run
some
DBT
Transformations,
but
whereas
with
tools
like
airflow,
you
only
get
this
few
and
dagster
there's.
A
Also
this
awareness
of
the
data
sets
that
are
being
created
as
that
pipeline
runs,
and
so,
if
you
know
I
mentioned
a
stakeholder
comes
to
the
data
engineering
team
and
says
we're
doing
all
this
work
against
a
daily
order
summary
table,
and
today
that
table
doesn't
look
right
in
dagster.
You
can
immediately
go
straight
to
that
table's
definition.
You
can
see
where
it
fits
in
terms
of
everything
else
within
the
pipeline,
and
then
you
get
a
lot
of
Rich
information
about.
You
know
what
the
last
time
that
data
was
updated,
who
owns
it?
A
What
the
schema
looks
like
and
in
the
case
of
utvt,
even
with
the
the
raw
SQL
definition,
was
and
so
you're
right
out
of
the
gate,
you
can
start
to
see
how
dagster's
different
from
other
tools,
because
it
has
that
data
awareness,
a
couple
of
other
things
that
I'll
mention
here.
You
can
see
these
pointers
to
other
parts
of
the
data
platform,
and
so,
in
addition
to
looking
at
just
a
specific
job
in
this
one's
set
up
to
run.
A
You
know
every
three
hours
and
we
could
view
the
the
Run
logs
for
this
specific
job
and
see
you
know
what
events
are
happening
over
time
here.
You
know,
in
addition
to
all
of
those
kind
of
table,
Stakes
capabilities
of
a
great
orchestration
platform.
A
You
can
also
see
all
of
the
assets
everything
related
to
to
your
data
estate
in
one
place,
and
so
here
we
have
kind
of
a
global
view
that
our
original
job
was
kind
of
winking
out
to,
and
so
here
is
that
original
job
we
were
looking
at,
but
now
you
can
see
kind
of
those
other
pointers
out
to
the
rest
of
the
data
platform
and
as
Fraser
mentioned,
the
reason
this
type
of
view
is
so
critical
is
because
the
data
estate
is
spanning
in
most
organizations,
many
different
teams,
and
so,
while
the
data
Engineers
might
be
responsible
for
this
raw,
you
know
extract
and
some
of
the
initial
transformations
of
the
data
in
the
warehouse
you'll
often
see
other
teams,
like
maybe
a
data
science
team,
that's
building
models.
A
On
top
of
that
data
or
an
embedded
analytics
team.
You
may
be
here
inside
a
marketing
organization,
that's
responsible
for
some
kpis
that
are
used
within
your
bi
tools.
So
it's
important,
but
across
these
different
teams,
you're
able
to
see
how
data
is
Flowing
throughout
the
platform,
and
this
in
dagster
is
really
obvious,
because
we
have
that
view
of
these
assets
and
how
they're
connected
together
and
what
that
lineage
looks
like
the
one
other
thing
I'll
mention
that
people
get
really
excited
about
from
this
asset
asset.
A
First,
you
know
view
of
the
world
that
is
around
the
different
ways.
You
can
automate
work,
and
so
our
original
job
we
were
looking
at
was
set
up
on
kind
of
that
standard.
Cron
schedule
run
this
every
three
hours,
but
within
dagster
you
can
also
do
event
driven
orchestration,
so
you
can
have
these
pipelines
be
updated
based
on
external
events
or
you
can
respond
to
slas,
and
so
as
an
example
here
this
average
order
kpi.
Maybe
this
is
something
that
lives
in
an
executive
dashboard.
A
It
needs
to
be
really
frequently
updated,
and
so
in
diagster
you
can
specify
what
we
call
a
freshness
policy
and
then
Daxter
figures
out.
What
else
needs
to
happen?
Inside
of
your
platform
for
this
policy
to
actually
be
met,
and
so
from
a
data
engineering
perspective,
it
becomes
much
easier
to
actually
meet
those
stakeholder
slas
as
opposed
to
trying
to
figure
out.
You
know
across
these
different
teams
what
single,
cron
schedule
or
dag
needs
to
be
written
so
that
you
know
we
get
data
at
the
right
time.
A
Often
when
we're
talking
to
teams
they're
kind
of
guessing
it
may
be.
My
five
trans
sync
take
will
take
an
hour
and
then
my
DBT
Transformations
will
take
30
minutes,
and
so,
if
I
start
one
of
those
things
at
7,
30
I
can
update
a
dashboard
at
10..
That
really
is
is
a
complicated
game
to
be
playing
as
your
data
platform
grows,
and
we
believe
that
the
orchestrator,
which
has
knowledge
of
how
you
know
all
these
things
interconnect,
can
simplify
that
process
for
significantly
for
data
engineers.
A
So
that's
what
those
those
freshness
policies
are
all
about.
In
addition
to
regular,
you
know
Quran
scheduling
and
event
driven
Pipelines,
so
that's
kind
of
the
high
level
view
of
how
all,
if
things
are
orchestrated
before
I
look
at
the
code,
I
did
want
to
jump
back
into
these
run
logs.
Just
for
a
second
to
give
you
a
feel,
for
you
know
what
dijster
is
providing
you
know
as
jobs
are
running,
and
so
when
we
execute
an
asset
or
a
pipeline
that
creates
assets,
dagster
is
going
to
go
through
this
process.
A
I
mentioned
the
hybrid
architecture
of
spinning
up
compute
in
an
environment,
that's
close
to
the
data
and
then
executing
the
those
commands
to
actually
you
know,
create
those
data
assets,
it'll,
keep
track
of
all
the
logs
and
then
it'll
also
read
the
metadata
we're
looking
at
before,
and
so
in
terms
of
you
know
what
what
these
commands
are,
what's
actually
being
executed,
that
we
kind
of
view
people
creating
Bagster
Assets
in
three
types
of
ways.
A
So
one
way
is
that
you
can
just
write
regular
python
code
and
then
incorporate
that
pipeline
code
into
a
dagster
project
using
these
really
simple
function,
decorators,
and
so,
if
you
have
code
that
does
like
extraction
of
data
and
then
transforms
that
data
adopting
dagster
is
super
straightforward.
You
just
add
these
asset
decorators,
and
that
creates
this
lineage
graph.
We
were
looking
at
before
between
an
extract
asset
and
a
transform
asset
notice.
A
That's
you
know
one
way
out
if
you're
ever
scanning,
through
the
dagster
documentation,
you'll,
probably
see
that
approach
presented
front
and
center
as
a
great
way
for
teams
to
get
started.
We
also,
though,
see
a
lot
of
teams
that
that
don't
actually
want
to
process
the
data
within
the
orchestration
tool.
They
just
want
to
Outsource
the
actual
data,
manipulation
and
work
to
other
systems,
and
so
indexed
or
similar
to
what
we
saw
before
you
can
decorate
python
functions,
but
where
those
functions
are
executing
tasks
and
other
systems.
A
The
way
I
was
just
describing
but
keep
you
from
having
to
write
all
the
boilerplate
code
yourself,
and
so,
if
you
are
doing,
for
example,
that
use
case
where
you're
just
orchestrating
five
trans
and
then
you're
doing
transformations
in
DBT,
that's
often
just
a
one-liner
to
pull
in
those
existing
projects
into
dagster,
and
then
you
get
those
benefits
of
the
lineage
graph.
I
was
showing
you
and
all
the
automation
capabilities.
A
So
we've
tried
to
make
it.
You
know
pretty
straightforward
to
get
going
and
in
fact
this
would
be
a
full
functional
code
for
that
process.
I
just
described
where
you're
you're
pulling
something
in
from
Air
byte
and
then
transforming
it
with
DBT
all
and
then
even
fitting
the
machine
learning
model
in
Python.
All
of
that
is
built
in
pretty
straightforward
code
and
that's
something
we're
really
proud
of.
If
you've
ever
had
to
fight
with
constructing
a
dag
in
in
airflow.
A
Even
you
know,
airflow
2,
with
some
of
the
advances
they're
trying
to
make
there.
We
hope
that
you
can
appreciate
the
developer
first,
your
experience
that
the
Daxter
is
creating
for
you,
the
one
other
piece
I'll
mention
you
know
in
terms
of
you
know
what
this
coding
experience
looks
like
is
that
we've
worked
really
hard
to
adopt
those
software
developments,
best
practices
that
I
mentioned,
and
so
in
dagster.
A
A
So
here's
that
same
data
platform,
but
now
just
running
on
on
my
laptop.
You
can
see
that
the
assets
here
are
saying:
they've
never
been
materialized
because
I
haven't
run
anything
yet,
but
I
can
get
a
feel
for
what
the
structure
of
the
pipeline
is
and
really
quickly
identify
if
I
have
any
of
those
typos
or
syntax
errors
or
incorrectly.
A
You
know
structured
the
dependencies
between
my
data
assets
and
then
after
I've
done
that
I
can
start
to
harden
this
by
simply
checking
my
work
and
any
changes
into
Version,
Control
and
so
I'll
show
you
what
that
looks.
Like
for
this
project,
I'm
playing
around
with
I've
got
all
my
code
inside
of
a
GitHub
repo
and
then
I'm
using
CI
CD,
in
this
case
GitHub
actions.
A
A
So
before
we
were
looking
at
prod
now,
I
can
look
at
just
the
this
staging
environment
for
my
new
model
and
what
that
allows
me
to
do
is
actually
run
this
model,
and
so
I
can
execute
this
model.
We've
set
up
our
staging
environment
so
that
it's
going
to
read
from
our
production
warehouse
but
right
to
a
staging
bucket,
and
that
gives
me
a
lot
of
confidence
that
the
results
here
are.
A
Actually,
you
know
similar
to
what
they
look
like
if
I
merge
this
into
production,
as
opposed
to
you
just
having
to
guess,
what's
going
to
happen,
and
then
you
fight
those
fire
drills
when
I
merge
it
into
production
and
everything
breaks,
and
so
these
Branch
deployments
what
we
call
them,
give
you
that
way
to
adopt
those
CI,
CD
best
practices,
and
then
you
know,
of
course,
when
I
actually
merge
this
model
into
production.
I
can
tap
into
you
know
all
of
those
things
we
were
just
talking
about.
A
You
know
setting
things
up
on
a
schedule
running
things
in
a
hybrid
environment
where
we're
able
to
tap
into
large.
You
know
compute
substrates
to
execute
things
at
scale
and
then
dagster
in
production,
as
kind
of
all
of
the
things
that
you'd
expect,
like
you
know,
run
rechase
concurrency
limits,
alerting
and
then,
in
addition
to
extra
cloud,
has
a
lot
of
those
Enterprise
check
boxes
that
big
organizations
are
looking
at.
So
you
know:
granular
role-based,
access
control,
for
example
an
audit
log
of
everything.
That's
changed
in
the
system.