►
From YouTube: Apache Airflow to Dagster Migration Event
Description
Many data engineering teams are frustrated with the limitations of Apache Airflow and are looking to switch to a more contemporary solution. While Dagster offers many advantages vs. Airflow, making a migration can seem like a daunting effort.
Luckily, with the release of recent tooling - namely the dagster-airflow library - making the switch from Airflow to Dagster can take just a few days, and a proof of concept can take just one afternoon.
Join the Elementl team along with data practitioners from Dagsters Open Source and Cloud userbase, including some of Dagster's implementation partners, as we discuss the best ways to make the transition from Airflow to Dagster.
A
A
Fundamentally,
teams
are
excited
to
build.
You
know,
workflows
in
texture
and
they're,
tired
of
doing
it
on
airflow,
so
stability.
B
C
D
Tasks,
it
wasn't
a
system
that
people
loved
going
to
and
working
with
there
were
a
lot
of
like
shim
scripts
to
push
stuff
onto
the
remote
system.
There
was
no
cicd
or
testing
to.
B
C
C
There's
no
version
controlling
for
tags.
This
feature
has
been
talked
about
a
lot
so,
for
instance,
if
you
delete
a
task
from
your
tag,
all
the
meta
data
that's
related
is
lost.
Now.
D
B
A
lot
of
times
our
instances
would
just
completely
be
unresponsive
which
forces
us
to
go
in
and
restart
all
of
the
the
kubernetes
Clusters
as
well.
The
entire
Air
Force
component
cluster.
E
Hi
everyone
thanks
for
coming.
My
name
is
Nick
Schrock,
the
CTO
and
founder
of
Elemental,
the
company
behind
dykster
and
welcome
to
airflow
migration
day.
Airflow
is
a
widely
used
technology.
That
was
a
big
step
forward
for
the
data
ecosystem
when
it
was
released,
however,
times
have
changed
and
technological
choices
must
change
with
them.
The
demands
on
data
teams
in
today's
world
are
intense
and
they
have
overwhelmed
airflow's
ability
to
serve
those
teams.
It's
time
to
move
on
to
better
Solutions.
E
It
makes
putting
this
event
on
this
type
of
event,
really
fun
and
really
effective,
and
it
just
requires
a
simple
strategy:
just
put
our
customers
on
camera
and
let
it
rip
there
are
a
ton
of
great
takeaways
from
those
talks.
But
two
facts
stick
out.
The
first
is
that
in
2023,
data
platforms
are
utterly
essential
to
the
success
of
modern
organizations,
and
the
second
point
is
that
the
teams
charged
with
building
and
deploying
those
data
platforms
face
fundamental
and
essential
challenges
executing
on
this
Mission,
especially
when
they
use
airflow.
E
We'll
talk
about
three
foundational
challenges
that
teams
typically
face.
The
first
developer
experience
the
second
fragmentation,
both
across
tools
and
teams,
and
the
third
is
technical
debt.
Let's
start
with
developer
experience,
you'll
hear
all
of
our
customers
talk
about
the
Quantum
Leap
Improvement
in
developer
experience
when
they
adopt
dagster.
When
we
talk
to
most
data
teams,
the
state
of
their
developer
life
cycles
is
pretty
abysmal.
Most
tools
were
not
built
with
the
local
development
life
cycle
in
mind.
They
were
not
built
with
testabilities
of
first
class
concept.
E
The
results
are
extremely
slow,
fragile
development
life
cycles.
This
developer
experience
problem
must
be
solved
in
order
to
move
the
data
ecosystem
forward
and
it's
truly
hard
to
overestimate
the
number
of
Dimensions
that
improved
developer
experience
delivers
on
I
think
the
most
obvious
one
is
that
Engineers
are
more
productive
and
efficient,
but
this
distracts
scratches
the
surface.
E
This,
in
turn,
frees
the
engineer's
mind
to
think
Bolder
and
bigger
productivity
is
not
just
making
existing
tasks
more
efficient
in
software
engineering,
it
makes
entirely
new
things
possible
productivity.
Put
another
way.
Productivity
transforms
the
mindset
of
an
engineer
from
scarcity
and
fear
to
abundance
and
confidence.
E
This
leads
to
the
next
challenge
facing
data
teams.
Data
platforms
are
typically
extremely
fragmented
across
tools
and
Technologies
platform
teams
integrate
dozens
of
tools
across
teams
in
an
attempt
to
assemble
them
into
a
coherent
data
platform.
In
order
to
synchronize
operations,
they
might
adopt
airflow,
but
they
are
quickly
drowning
under
the
complexity
once
they
adopt
it.
We
are
left
with
a
distorted
Jada
platform
with
disconnected
tools
and
siled
teams.
E
Developer
experience
does
an
important
thing
and
you'll
hear
this
from
our
customers.
It
actually
incentivizes
teams
across
the
org
to
adopt
the
technology
to
improve
productivity
and
happiness,
but
that
unleashes
new
organizational
Dynamics
you'll
hear
our
customers
use
the
word
collaborative.
You
don't
hear
that
word
often
enough
when
it
comes
to
data
platforms
with
dagster.
The
organization
has
shared
context
on
all
assets
under
management
and
a
single
pane
of
glass
to
debug
technical
issues
that
crosses
organizational
boundaries.
E
E
Today,
you're
going
to
hear
from
customers
who
have
overcome
that
challenge,
Stephen
from
whatnot
is
going
to
explain
how
they
transitioned
their
mission,
critical
multi-person,
multi-persona
data
platform
to
dagster,
while
incrementally
building
trust
in
the
tool
and
incrementally
delivering
value
at
every
step
along
the
way,
then
we're
going
to
hear
from
goo
from
group
1001
and
he's
going
to
describe
this
magical
experience
of
using
our
airflow
adapter
layer,
which
is
releasing
to
10.
Today
it
is
a
toolkit
where,
with
minimal
work,
you
can
import
your
existing
dags
into
dagster
and
execute
them
natively.
E
F
Hello,
I'm,
Sandy
and
I'm
the
lead
engineer
on
the
dagster
project
before
dagster
I
spent
years
as
a
data
engineer
and
machine
learning,
engineer
and
I
used
airflow
extensively
in
those
roles.
I
joined
the
Dexter
project
in
large
part
because
of
my
frustrations
with
airflow
I
found
that
I
was
spending
more
of
my
time.
Fighting
with
it
than
writing
the
data.
F
The
point
of
a
data
pipeline
is
typically
to
produce
and
maintain
a
set
of
data
assets
like
tables
files
or
machine
learning
models
accomplishing
that
usually
requires
modeling,
a
graph
of
computations
and
intermediate
data
that
get
you
from
the
source
data
that
you're,
starting
with
to
the
data
products
that
you're
trying
to
create
airflow,
helps
out
with
this,
because
it's
a
workflow
engine.
It
models
a
graph
of
tasks
and
executes
them
on
a
fixed
schedule.
F
It
was
the
first
python-based
workflow
engine
to
have
a
full
web
interface
which
set
it
on
the
road
to
become
one
of
the
most
popular
tools
for
running
data
Pipelines.
The
first
is
not
always
mean
best.
Airflow
is
designed
in
a
way
that
we
believe
actually
makes
it
a
poor
fit
for
this
task
of
building
and
maintaining
data
Pipelines.
F
F
First,
it
schedules
tasks
but
doesn't
understand
that
tasks
are
built
to
produce
and
maintain
data
assets.
And
second,
it's
focused
on
production
environments
that
support
heavyweight
infrastructure
with
long-running
processes.
It
makes
pipelines
hard
to
work
with
in
local
development
unit
tests,
continuous
integration
code
review
or
debugging.
F
F
Second,
it
results
in
poor
reliability
because
if
you
can't
catch
errors
before
your
changes
make
it
to
production,
you'll
catch
them
in
production
and
third,
it
makes
it
hard
to
understand
what's
going
on
when
a
pipeline
is
deployed,
because
it
mainly
gives
you
visibility
into
what
tasks
have
run.
Not
what
data
assets
have
updated.
F
Dexter
takes
a
broader
View.
It
was
designed
to
assist
with
the
holistic
task
of
developing
pipelines
of
data
assets
and
evolving
those
pipelines
over
time.
We
believe
that,
taking
this
broader
view
can
make
data
teams
dramatically
more
productive
and
make
data
pipelines
dramatically
more
reliable
to
make
this
more
concrete.
Let's
start
by
zooming
in
on
these
phases
of
the
development
life
cycle.
What's
the
difference
between
dagster
and
airflow
When,
developing
data
Pipelines.
F
Developing
with
airflow
is
difficult
because
airflow
pipelines
are
heavyweight
and
difficult
to
run
quickly
as
part
of
an
iterative
development.
Loop
all
airflow
runs
go
through
its
scheduler
Loop,
which
means
that
to
run
any
pipeline,
airflow
you'd
have
a
long
running
scheduler
process.
That's
monitoring
a
database
and
after
launching
a
run,
you
need
to
wait
for
the
scheduler
to
see
it
also
to
avoid
dependency
conflicts.
F
Most
guides
recommend
defining
airflow
tasks
with
operators
like
the
kubernetes
Pod
operator,
which
dictate
that
the
task
gets
executed
in
a
particular
environment
like
kubernetes
when
a
dag
is
written
in
this
way,
with
the
pipeline
bound
to
a
particular
execution
environment,
it's
near
impossible
to
run
it
locally
or
as
part
of
continuous
integration.
Unless
you
want
to
set
up
a
kubernetes
cluster
on
your
laptop
dagster,
on
the
other
hand,
was
built
from
the
start
to
support
rapid
development
and
prototyping
of
data.
Pipelines
dicester's
programming
model
encourages
separating
business
logic
from
infrastructure.
F
This
means
that
you
can
have
a
pipeline
that
runs
distributed
across
kubernetes
when
in
production,
but
also
run
it
within
a
single
python
process
during
a
unit
test
without
sacrificing
dependency,
isolation,
Dexter
execution
is
extremely
lightweight.
It
doesn't
require
any
long-running
services
or
schedules.
If
you
don't
want
to
access,
if
you
do
want
to
access
dagster's
UI,
you
can
just
type
dagster
Dev
in
the
command
line
and
be
up
and
running.
F
Dagster
also
has
Rich
testing
apis,
which
make
it
easy
to
write
unit
tests
for
any
component
of
a
data
Pipeline
and
to
stub
out
external
services
that
the
pipeline
interacts
with
another
big
difference
between
dagster
and
airflow
is
the
abstractions
they
offer
for
building
and
operating
data.
Pipelines
Dexter
sees
the
goal
of
a
data
pipeline
as
producing
a
set
of
data
assets
like
tables
files
or
machine
learning
models,
Dexter's
programming
model
and
user
interface
are
heavily
focused
on
that
goal.
So
it
allows
you
to
think
in
assets
when
you're
building
and
operating
your
data.
F
Pipelines
airflow,
on
the
other
hand,
is
primarily
a
task
orchestrator.
An
airflow
dag
is
a
workflow
of
tasks
connected
by
execution.
Dependencies
airflow
recently
introduced
a
data
set
abstraction,
but
it's
bolted
Loosely
on
top,
not
a
core
part
of
the
operating
model
or
programming
model
thinking
in
assets
allows
you
to
express
your
intentions
more
directly,
which
means
less
code
boilerplate.
F
As
an
example,
here's
a
comparison
of
the
same
data
pipeline
written
in
both
airflow
and
dagster.
The
pipeline
has
one
data
asset,
that's
derived
from
another
data
asset
with
airflows
apis.
You
need
to
tell
airflow
that
the
task
building
the
second
asset
should
run
after
the
task
building
the
first
asset
and
then
also
read
from
the
first
asset
and
the
second
task.
It's
a
lot
to
keep
track
of
in
digest
apis.
You
just
expressed
the
dependency
between
the
Assets
in
one
place
after
you've
written
your
data
pipeline.
F
You
typically
use
your
orchestrator's
web
UI
to
monitor
it.
Airflow's
UI
is
primarily
concerned
with
what
tasks
ran,
but
dagster's
web
UI,
which
is
pictured
here,
also
focuses
on
the
data
that
was
produced
by
those
tasks.
It
makes
it
easy
to
include
metadata
about
that
data
and
track
how
it
evolves
over
time.
F
Another
benefit
of
dagster's
asset
focus
is
that
it
enables
much
deeper
Integrations
with
modern
data
stack
tools,
for
example,
consider
DBT,
which
is
a
tool
that
helps
analytics
Engineers
write
SQL
to
build
tables,
airflow
focuses
on
tasks,
so
it
represents
the
entire
DBT
monograph
as
a
single
node
in
its
Dag
in
dagster.
Dbt
models
are
easy
to
represent
as
dagster
assets.
F
F
A
A
Fundamentally,
teams
are
excited
to
build,
you
know,
workflows
in
texture
and
they're,
tired
of
doing
it
on
airflow
and
that's
a
significant
element
to
a
choice
in
technology.
You
know
the
ability
to
attract
talents
to
a
particular
piece
of
work
over
a
long
term
and
also
to
minimize
you
know,
fatigue
and
and
churn,
and
then,
when
the
teams
are
using
Dax
against
airflow,
like
we
see
a
far
higher
rate
of
development,
you
know
they're
they're,
able
to
close
the
loop.
A
Far
quicker
Dexter
has
a
really
amazing
platform
to
be
able
to
develop
locally,
as
well
as
in
your
deployment,
using
like
Branch
deployments.
Things
like
that
and
teams
are
able
to
iterate
very
quickly,
try
ideas
out
very
quickly
and
put
a
protection
and
have
them
work,
and
then
what
I'm,
seeing
with
teams
is
that
you
know
the
testing
of
those
of
the
orchestration,
the
Automation
and
the
workflows
that
you're
able
to
build.
A
It's
not
an
afterthought
right.
It's
kind
of
you
don't
have
to
bring
that
to
the
table
to
exercise
for
you
with
other
tools.
You
often
have
to
create
that
yourself
and
kind
of
finally-
and
this
is
been
my
experience
a
few
times
already
so
when
you
encounter
issues,
you
know
we're
able
to
get
in
touch
with
the
the
team
directly.
They
have
an
open,
Slack
Channel,
where
you're
able
to
communicate
with
them
and
resolve
issues
that
you'd
think
would
take
you
a
week
or
something
like
that
and
actually
get
resolved
very
quickly.
A
The
reason
why
we
see
people
go
for
Dexter
instead
of
airflow
is
honestly
kind
of
a
multi-faceted
question.
The
first
thing
is
that
the
you
know
the
experience
for
end
users,
the
user
interface
and
all
that
is,
from
our
point
of
view,
more
approachable
than
airflow.
You
get
the
information
faster
things
like
error
logs
are
just
a
click
away
right
instead
of
three
clicks
away.
A
So
it's
it's
details
like
those
it's
an
appealing
platform
that
is
Pleasant
to
use
and
that
matters
more
than
you
think
on
a
day-to-day
right,
then
we
find
that
we
have
to
write
less
code
in
order
to
produce
the
end
result
that
we
want
the
dependencies
being
determined
implicitly
rather
than
having
often
to
be
specified
explicitly
as
something
that
you
know
it
avoids
redundant
code
and
makes
it
easier
for
teams
to
collaborate.
A
Availability
of
a
platform
like
Baxter
cloud
serverless,
really
means
that
teams
can
consider
using
it
far
earlier
than
airflow.
In
our
experience,
you
have
this
sort
of
low
barrier
to
entry
in
order
to
start
using
the
product
there
for
even
small
teams
and
it's
able
to
grow
with
you
and
has
all
the
same.
You
know
ability
to
deploy
hybrid
or
on-prem,
as
you
would
with
a
more
complex
tool
like
airflow
but
you're
able
to
approach
it
far
earlier
in
your
journey.
A
It's
really
kind
of
a
game
changer
for
us,
like
typically
with
DBT
you'd,
go
from
something
that's
a
little
bit
too
rudimentary
to
then
having
to
make
the
massive
jump
to
using
airflow,
and
you
would
only
do
that
for
really
sophisticated
teams
and
quite
large
teams
right,
whereas
with
Dexter
you're
able
to
kind
of
from
day
one.
If
you
want,
even
with
a
team
of
one,
even
very
small
team.
A
You
know
if
you're,
using
something
like
the
extra
cloud,
serverless
you're
able
to
have
that
tool
right
away
and
the
level
of
technical
debt
that
you're
experiencing
is
very
low.
So
for
us
it's
it's
just
far
broader
in
terms
of
in
terms
of
the
teams.
A
We
can
introduce
it
to
and
we're
seeing
teams
that
are
very
small
asking
us
excitedly
about
it
are
asking
us
to
implement
it
for
them,
and
I
mean
teams
of
one
teams
of
three
things
like
that,
because
for
them
the
trade-off
between
technical
debt
and
the
value
they
get
is
overwhelmingly
advantageous.
C
We've
implemented,
we
are
implementing
diagster
for
some
of
some
of
our
clients.
You
know
they're
very
happy
with
the
tool.
I
would
say
that
with
tagster
there's,
a
lot
more
collaboration
that
gets
introduced.
This
collaborative
effort
into
you
know
into
developing
data
products
is
something
that
Dexter,
you
know
outshines
airflow
any
day
previously
with
airflow.
You
would
have
these
segmented
teams.
That
would
essentially
manage
a
lot
of
its
components,
but
with
diagster
you
can
get
a
single
unified
view.
You
can
involve
a
lot
of
teams
into
visibility
of
data
Pipelines.
C
In
terms
of
Dax,
you
know,
there's
lot
less
code
to
write,
especially
with
daxterous
Imperial
classes
and
inbuilt
API
methods,
they're
very
efficient,
they're,
reusable
pluggable.
You
know
following
the
DIY
principles
so
that
with
diagster
you
know,
you
kind
of
have
you
kind
of
think
it
basically
enforces
you
to
think
in
a
very
modular
fashion,
but
airflow
I
feel
is
still
you
know
has
is
still
a
long
way
to
reach
this
level
of
modularity
foreign.
C
There
is
a
lot
of
flexibility
around
deployments
as
well.
The
biggest
one
is
the
the
branch
deployments,
which
saves
a
ton
of
effort
and
day
of
time
when
you're
developing
collaboratively
and
when
you
are
doing
integration,
end-to-end
testing
for
your
data
pipelines,
you
know
it,
it
automatically
creates
that
you
know
replicates
the
production
environment
without
having
to
without
having
the
data
Engineers.
You
know
to
do
anything
or
set
up
the
infrastructure.
C
Rather
than
think
of
tasks,
as
you
know,
workflows,
essentially,
you
know
just
a
bunch
of
tasks.
The
extra
treats
the
these
jobs
or
tags
as
data
assets
and
helps
build
on
top
of
these
data
assets,
whether
it's
you
know
whether
it's
a
DBT
model
or
it's
a
data
frame
or
any
transformation
and
then
further
leverages
the
the
these
assets
and
builds.
You
know,
features
like
asset
catalog.
You
know
observability
logs
right
from
its
UI,
which
which
which
which
is
very
helpful.
G
So,
if
you're
looking
to
bring
a
new
technology
into
an
organization,
you
really
need
to
answer
two
questions.
The
first
is:
is
this
the
ideal
end
State?
When
you
adopt
this
technology,
does
it
actually
solve
your
problems
and
the
problems
of
your
stakeholders?
Do
you
think
it
will
be
durable
over
a
long
period
of
time?
G
Second,
is:
is
there
an
a
timely
and
practical
path
to
get
there?
It's
when
you
think
about
the
path
to
adopt
the
technology.
Is
it
even
practical
to
to
get
to
that
ideal,
end
state,
and
so
we
spent
the
last
year
or
so
talking
to
airflow
shops
about
dagster,
and
you
know
the
vast
majority
of
them
would
say:
hey
Daxter
might
be
the
ideal
end
state
for
us,
and
hopefully
the
talk
from
Sandy
earlier
convinced
you
of
that
too.
The
real
challenge
here
was
this
second
question:
the
timely
impractical
path
to
get
to
that
ideal.
G
G
You
can
extend
them
with
software-defined
assets
or
Dexter's,
Ops
and
jobs.
You
don't
need
to
modify
your
existing
airflow
code
at
all
to
take
advantage
of
this,
and
you
don't
have
to
run
your
existing
airflow
instances
anymore.
This
runs
entirely
within
dagster
and
yes,
this
actually
works.
We've
gone
to
production
with
design
partners
with
this
and
as
crazy
as
it
sounds,
it
does
actually
work.
G
G
So
this
includes
our
beautiful
modern
UI.
This
includes
taking
advantage
of
software-defined
assets
fast
local
development,
one
of
the
most
common
airflow
pain
points
testing,
another
extremely
common
airflow
pain,
point
Branch
deployments.
This
is
a
feature
that
we
shipped
last
year
that
lets
you
get
an
ephemeral
staging
environment
for
every
pull,
request,
making
testing
extremely
easy
and
code
review
very
easy.
G
G
What's
important
here
is
that
this
enables
you
to
migrate
your
code
incrementally
over
time.
So
before
this
integration
oftentimes,
you
had
to
run
two
separate
instances.
You
know
your
airflow
instance
and
your
dagster
instance
keep
them
running
in
parallel
for
a
while,
and
then
you
had
to
do
a
big
large
migration
and,
and
it
wasn't-
and
you
basically
had
to
do
the
whole
thing
at
once.
G
Now
this
is
all
really
great
and
we
we
did
launch
it
to
wanna
recently,
but
we
are
still
early.
So
while
this
does
support
many
of
the
features
that
you
would
expect
from
airflow,
like
schedules
and
x-coms,
you
know
there
still
are
some
rough
edges
here.
So
join
our
slack
channel
or
or
check
out
the
docs
at
docs.dagster.io
to
learn
more
thanks
for
listening
and
I'm
going
to
hand
it
off
to
Odette
who's
going
to
demo.
This.
H
H
You
is
how
to
take
all
of
these
dags
and
migrate
them
into
dagster
and
you'll.
See
that
you
don't
need
to
set
up
any
infrastructure
and
you'll
be
able
to
run
and
test
this
locally
in.
Very
simply
so.
This
example
is
actually
available
on
GitHub
and
you'll
be
able
to
follow
along
as
well
as
use
this
as
a
template
to
replace
your
own
dags
in
here.
So
what
I
have
here
is
a
bunch
of
different
folders
I'll,
start
off
by
showing
you
a
bunch
of
the
different
tags,
I'm
going
to
show
you
this
tutorial
bag.
H
So
this
should
look
like
standard
airflow
code
and
what
we'll
be
doing
is
take
this
airflow,
dag
and
turning
it
into
a
dagster
job.
Very
simply
so
you
won't
need
to
change
any
of
this
code.
What
we'll
be
doing
is
actually
wrapping
dagster
around
this
and
you'll,
see
that
there's
no
code
changes
and
no
infrastructure
set
up
to
both
edit
and
run
this
locally,
as
well
as
on
the
dagster
cloud.
Next,
let's
take
a
look
at
the
tool.
H
That's
making
this
happen
so
I
have
this
dagster
migration
folder
over
here,
as
well
as
this
definitions
file,
so
we'll
be
using
a
library
called
dagster
airflow
which
takes
all
of
this
airflow
dag
code
and
is
actually
going
to
not
change
that
code.
A
bit
you
don't
need
to
write
any
dagster
code
as
well,
and
what
this
is
going
to
do
is
create
the
definitions
path
from
those
dags
in
one
or
two
different
commands.
H
Now
that
we
peek
behind
the
scenes
to
see
the
different
airflow
dags,
as
well
as
the
tool
which
is
dags
for
airflow,
which
is
going
to
make
this
happen.
Let's
open
up
the
readme
file
and
let's
see
how
to
run
this
locally.
What
I
have
is
here
is
two
different
commands
that
we're
going
to
run.
The
first
is
going
to
install
both
air
dagster
airflow,
all
the
different
dependencies,
as
well
as
the
stagster
migration
file,
and
the
second
is
dagster
Dev,
which
launches
DUI.
So
let's
get
started
and
execute
these
two
commands.
H
Whatever
workflows
you
had
up
and
running,
and
now
you
could
transition
over
to
things
like
dagster
assets,
which
Tim
will
be
demoing
in
the
next
video,
very
slowly,
with
everything
working
and
your
core
business
operations
untouched.
H
H
Let's
run
it
so
right
now,
we're
running
all
of
these
airflow
dags
locally,
with
a
couple
easy
commands
got
us
here,
so
you'll
be
able
to
really
test
this
out
on
your
airflow
dags
yourself,
and
we
do
recommend
that
you
start
off
locally
before
going
into
Cloud,
just
to
make
sure
everything's
working
and
that's
it.
Everything
has
now
been
migrated
to
dagster
jobs
and
we're
executing
this
locally.
Now
that
we
saw
how
to
run
this
locally
on
dagster,
let
me
show
you
how
to
run
this
on
dagster
cloud.
So
this
is
our
serverless
option.
H
We
also
have
hybrid,
so
you'll
be
able
to
use
your
own
infrastructure,
but
our
dagster
Cloud
serverless
option
requires
zero
infrastructure
set
up
on
your
end
and
what
I'm
going
to
do
is
actually
just
run.
These
couple
commands
and
you'll
be
able
to
see
that
everything
is
pushed
to
dagster
cloud
and
you'll
be
able
to
run
all
of
the
different
airflow
jobs
there.
H
Foreign,
so
here
I
have
my
dagster
cloud
and
you'll
see
that
we
have
different
code
locations
which
will
enable
your
teams
to
really
collaborate
here.
So
no
longer
do
you
need
to
have
one
person
editing
code
on
their
end
and
not
able
to
look
and
see
what
else
is
going
on
in
the
rest
of
the
organization?
H
Here
you
could
have
all
your
different
code
locations,
which
could
correspond
to
different
projects
that
are
work
being
worked
on,
and
you
really
could
have
that
strong
collaboration
between
all
of
your
data
team,
as
well
as
your
analytics
machine
learning
themes
as
well.
So
here
we
also
have
our
dagster
migration.
Folder
you'll
see
all
of
the
different
airflow
jobs,
airflow
Decks
that
have
been
converted
to
jobs
here
and
you'll
be
able
to
run
this
as
well.
H
So
the
last
thing
I'm
going
to
show
you
today
is
how
to
unit
test
your
code.
So
in
airflow
you
might
have
been
pushing
changes
to
production
with
maybe
low
to
no
confidence
of
how
it
will
actually
work,
not
a
lot
of
testing
to
make
sure
that
everything
was
working.
Fine
with
your
data
to
make
sure
that
your
production
instance
is
intact.
H
So
if
you
have
any
side
effects
that
are
being
created
as
part
of
your
existing
airflow
dags
you'll
be
able
to
test
that
and
once
you
get
into
dagster
and
the
different
assets
you'll
be
able
to
even
run
more
robust
tests
on
what
you're
expecting
as
part
of
your
data
pipeline
or
machine
learning
pipeline.
So
you'll
really
have
confidence
that
you
could
unit
test
every
little
piece
of
your
pipeline
before
moving
to
production
and
when
you
go
to
production,
you'll
see
that
you
have
tested
it
and
there
should
be
no
issues
there.
H
So
here,
I
have
a
bunch
of
different
code.
You'll
see
that
we're
testing
to
make
sure
that
everything
was
successful.
We
have
the
different
package
in
here,
which
is
dagster
airflow,
which
loads
in
all
of
our
different
tags
as
definitions,
and
what
I'm
going
to
do
is
actually
run
this
to
see
that
everything
was
executed
successfully.
H
I
Hi,
my
name
is
Tim
and
I'm.
Also
a
developer
advocate
my
colleague,
Odette
just
showed
you
how
easy
it
is
to
lift
and
shift
your
airflow
dogs
and
run
them
within
Daxter.
Now,
that's
great
and
all
and
you're
already
getting
some
benefits
from
using
dagster,
but,
as
others
have
mentioned,
there's
so
much
more.
You
can
do
once
you
move
to
an
asset
focused
view
of
your
data
Pipelines.
I
Let's
take
a
dag,
a
written
for
airflow
and
move
its
logic
into
pure
software-defined
Assets
in
dagster.
While
doing
so,
we
will
cover
our
recommendations
and
best
practices,
while
refactoring
before
diving
in
it
is
important
that
we
make
sure
we
know
to
right,
lingo
between
airflow
and
dagster.
Thankfully,
if
you
head
on
over
to
our
documentation,
click
on
Integrations
and
navigate
to
the
airflow
page,
you
have
some
resources
prepared
for
you.
I
Here
is
a
table
that
covers
similar
terminology
between
airflow
and
dagster,
along
with
our
differences,
the
big
ones
are
worth
highlighting
for
this.
Migration
are
that
there
are
no
operators
in
Daxter.
We
use
normal
python
functions
which
allow
us
to
test
our
logic
better.
I
o
managers
are
dagster's
way
to
set
up
connections
with
external
services
such
as
Amazon
S3.
I
I
You'll
also
see
how
it
leads
to
cleaner
code
and,
once
again,
it's
easier
to
test
assets
are
daxter's.
Bread
and
butter.
Daxter
has
extensive
metadata,
driven
features
built
around
assets
such
as
partitions,
lineage
and
declarative
scheduling.
You
may
have
tried
airflows
data
sets,
but
software-defined
assets
will
help.
You
enjoy
working
with
your
data
pipelines
again
now
that
you
know
the
concepts
we
can
start
talking
about
patterns
for
refactoring,
to
airflow
logic,
into
its
analogous
to
extra
code.
I
We
will
refactor
our
dag
one
task
at
a
time
to
use
a
less
airflow
code
and
more
of
daxter's
Concepts
piece
by
piece
without
breaking
production
because
of
our
Daxter
airflow
utility
package,
we
can
connect
newly
made
assets
with
the
assets
that
are
made
from
the
airflow
Dag
in
this
case,
we'll
start
from
the
beginning
of
the
dag
and
work
sequentially,
one
task
at
a
time
to
the
end
of
zag
replacing
each
piece
of
it.
Using
this
pattern,
we
can
replace
the
airflow
task
with
dagster
assets
at
our
own
pace.
I
Let's
go
see
this
in
action
here
is
the
DAC
whose
logic
will
be
refactoring.
Today
there
are
two
tasks:
one
task
queries,
Hacker
News
for
all
IDs
for
the
current
top
stories
and
saves
this
list
into
an
AWS
S3
bucket.
The
next
task
reads
that
newly
made
file
iterates
through
the
IDS
and
gets
the
titles
for
each
story.
I
I
Let's
cut
and
paste
this
task
from
the
airflow
section
and
move
it
over
to
a
new
function
in
our
Daxter
project.
Now
we'll
clean
up
this
code
to
make
it
into
a
true
software-defined
asset,
we'll
first
add
the
asset
decorator,
then
we'll
return
the
IDS
rather
than
sending
them
to
S3,
because
now
we
can
use.
I
o
managers,
which
we
talked
about
recently
to
deal
with,
reading
and
writing
data
for
us
finally,
we'll
patch
up
the
asset
dependencies,
so
that
our
new
asset
replaces
the
old
asset
made
by
our
utility
function.
I
Once
those
steps
are
finished,
let's
run
it
to
test
that
it
works,
see
that
since
we
added
the
asset
and
removed
the
old
task,
it
works
just
like
it
did
before.
This
is
code
that
can
be
readily
pushed
to
production,
meaning
that
you
can
take
your
time
and
you
don't
have
to
do
this
refactor
All
in
One
Sweep,
once
you've
validate
that
logic
produces
the
right
output.
You
can
move
on
to
refactor
the
next
task,
once
you've
migrated
all
of
your
airflow
dag
to
software-defined
assets.
G
A
The
teams
are
using
Dax
against
airflow,
like
we
see
a
far
higher
rate
of
development.
You
know
they're
they're
able
to
close
the
loop
far
quicker
when.
C
You're
doing
integration,
end-to-end
testing
for
your
data
pipelines,
you
know
it,
it
automatically
creates
that
you
know
replicates
the
production
environment
without
having
to
without
having
the
data.
Engineers,
you
know,
do
anything
or
set
up
the
infrastructure.
A
The
availability
of
a
platform
like
Baxter
cloud
serverless,
really
means
that
teams
can
consider
using
it
far
earlier
than
airflow.
In
our
experience,
create.
B
C
Running
this
collaborative
effort
into
you
know
into
developing
data
products
is
something
that
Baxter,
you
know,
outshines
airflow
any
day,
foreign.
B
If
we
would
explain
it
if
we
do
it
at
airflow,
it
probably
would
take
us,
maybe
two
three
months
to
even
get
the
data
in
place
where
we
can
actually
start
begin
to
figure
out
a
sort
of
insight
to
build
a
visualization
to
build
a
dashboard
and
the
way
our
style
is
instead
of
two
three
months.
What
about
three
days?
B
That's
a
difference
that
it
would
take
to
move
over
to
Daxter
it's
just
instead
of
three
months.
We
can
do
it
in
three
days,
just
simply
decelerated
development,
development
cycle
from
that
perspective,
ease
of
development,
ease
of
troubleshooting
and
monitoring,
and
the
fact
that
we
can
actually
reliably
make
changes
without
breaking
any
of
our
existing
processes
and
that's
the
way
I
would
sell
it.
That's
the
way.
I've
been
selling
to
the
business,
just
really
how
do
what?
B
If
I
could
take
your
idea
from
three
months
request
down
to
three
days,
then
the
the
there's
no
question
to
ask:
let's
do
it
because
then
we
can
deliver
more
insight,
the
develop
a
faster
value,
because
speed
is
actually
what
we
care
about.
The
speed
is
really
well.
We
can
actually
drive
our
competitive
Advantage
within
our
company.
B
Our
entire
architecture
really
centered
around
airflow,
with
with
our
own
python
processes
that
were
running
via
kubernetes
pod
operator.
We
ran
into
a
load
of
different
issues.
The
biggest
pain
Point
I've
generally
seen
in
in
terms
is
really
number
one
I'm
from
a
civil
stability
standpoint.
We've
had
numerous
outages
whereby
airflow
just
simply
wouldn't
run
our
jobs.
B
So
for
stability
standpoint
with
a
big
issue
and
sometimes
a
lot
of
times,
our
instances
would
just
completely
be
unresponsive
which
forces
us
to
go
in
and
restart
all
of
the
the
kubernetes
Clusters
as
well
the
entire
airflow
composer
cluster.
So
stability
was
a
very,
very
big
ink,
one
for
us
by
the
airflow,
the
second
bit
of
the
biggest
problem
from
that
I've
seen
number
two
was
really
regarding
our
end-to-end
cycle
time.
So
from
a
development
experience
perspective
running
any
sort
of
changes
to
our
diagrams
and
airflow.
B
Just
due
to
the
complexity
with
using
the
airflow
version,
one
we
actually
we've
never
actually
did
our
changes
locally.
We
would
just
make
a
code
change.
We
didn't
really
test
it
locally.
We
just
figured
it.
We
would
hope
that
it
worked.
B
That
is
already
eight
minutes
to
get
it
from
the
code
change
to
deploy
and
have
it,
because,
even
when
you
deploy
to
airflow,
even
though
it's
done
deploying
airflow
because
of
the
slowness
of
the
scheduler
to
refresh
the
dag
metadata,
it
takes
about
a
good
two
minutes
to
refresh,
and
even
when
you
start
the
job
it
takes
about
two
minutes
to
have
the
Pod
actually
start
up
and
run
the
job.
So
we're
talking
about
an
entryway
and
cycle
time
of
to
even
do
one
iteration
of
our
of
our
chain,
not
even
an
end-to-end
psychotherapist.
B
B
Actually,
we
were
in
when
we
actually
set
it
up
and
configured
it
in
working
with
whatever
Engineers
would
with
Joe
in
about
what
was
it
four
I
think,
four
hours
later,
once
we
got
it
configured,
we
were
able
to
run
airflow
airflow
1.10.15
locally
in
US
development
sandbox
on
our
Max
that
we've
never
done
before
executing
our
kubernetes
pod
operator
Dag,
using
a
the
docker
image
that
we
had
from
from
our
Google
Cloud
repository
that
we
pulled
down
locally
we're
able
to
execute
that
kubernetes
pod
operator
locally
in
under
four
after
four
hours,
just
testing
it
out.
B
B
What
I
definitely
recommend
for
a
best
practice
perspective
is
take
a
look
at
Baxter.
Take
a
look
at
its
ability
to
run
the
airflow
migration
and
think
of
it
as
a
way
of
a
stopping
of
a
stepping
stone.
It's
a
way
to
really
be
a
stop
Gap.
You
can
migrate,
lift
and
shift
all
your
existing
dags
onto
the
dagster
platform.
You
get
to
that
leverage.
Dagsters
really
well
thought
out.
End-To-End
sdlc
experience.
B
So
from
a
few
we
will
cover
the
migration
standpoint,
but
you
can
actually
use
the
platform
to
reinvent
and
redesign
those
pipelines
in
a
new
manner
that
will
get
you
better,
better
agility
and
better
scale.
So
that's
the
way,
I
kind
of
frame
it
and
that's
what
our
actual
plan
and
vision
for
using
dagster.
B
We
don't
plan
to
keep
our
airflow
nice
for
the
long
term,
but
nevertheless,
in
the
short,
medium
term,
we're
able
to
leverage
Dexter
to
really
get
off
of
Air
Force,
and
then
we
are
able
to
rebuild
these
pipelines
such
that
now
we're
going
to
have
the
agilion
scale
that
we
need
going
forward
and
that's
what
I
would
think
of
when
migration
standpoint
is
how
you
can
do,
okay,
how
you
can
ease
it
in
because
it
de-risk
it
because
you're
just
doing
a
lifting
shift,
there's
very
little
risk
for
my
guard.
B
J
Hey
I'm
Joe
I'm,
an
engineer
here
at
Elemental
I'm,
going
to
go
over
a
few
considerations
for
you
to
keep
in
mind
as
you're,
making
your
migration
from
airflow
to
dagster
as
you've
heard
from
Tim
and
Odette.
The
migration
tooling,
and
the
dagster
airflow
Library
are
really
powerful
and
they
give
you
a
lot
of
Leverage.
But
making
migrations
is
complicated,
and
so
we
really
recommend
that
you
first
focus
on
getting
things
running
locally
and
establishing
that
tight,
iteration
Loop.
J
The
serverless
deployment
option
is
going
to
be
amazing
if
you're
a
team
that
doesn't
want
to
manage
their
own
infrastructure
or
isn't
comfortable
managing
their
own
infrastructure.
The
built-in
environment
variables
feature
unique
to
cloud
is
going
to
make
porting
things
like
airflow
variables
and
secrets
that
are
needed
by
airflow
connections.
A
lot
more
seamless,
the
built-in
CI
CD
is
going
to.
J
D
Hi
I'm
Stephen
Bailey
I'm,
a
data
engineer
at
whatnot
whatnot
is
a
live
stream
shopping
platform
kind
of
like
a
twitch
meets
eBay,
where
we
allow
enthusiasts
to
make
a
living
selling
things
they're
passionate
about
so.
D
What
the
position
I
was
in,
where
you're
evaluating
orchestrators
and
you're,
going
to
you're
going
to
commit
to
one
for
the
foreseeable
future,
I
would
recommend
well,
first
of
all,
I
would
say:
Daxter
is
fully
production
ready.
It's
like
coming
in
coming
into
this.
Coming
into
this
exploration,
you
have
to
ask
that
question
of,
like
is
a
startup.
You
know
going
to
be
stable
enough
for
you
to
build
your
platform
on.
We've
had
zero
issues
with
stability
in
the
platform
from
the
beginning,
the.
D
I'd
say:
is
you
want
to
focus
on
the
value,
add
part
which
is
an
ergonomic
experience
for
your
developers
and
just
the
agility
to
getting
new
pipelines
out
and
plugging
them
into
your
existing
architecture
and,
to
that
extent,
I
think
dagster
has
a
couple
of
really
great
features
that
it
offers.
The
first
is
the
the
building
GitHub
actions.
The
second
is
the
serverless
deployments.
D
So
if
you
cannot
think
about
infrastructure
for
your
first
like
transition
for
your
first
several
pipelines,
like
that's
a
great
win,
and
it
means
you
can
focus
on
building
out
your
graph
of
jobs
and
pipelines.
The
third
thing
I
recommend
is
using
the
new
asset
functionality.
That's
really
where
dagster
starts
to
differentiate
itself
from
other
competitors,
and
also
it
becomes
a
little
more
compatible
with
other
modern
tools
like
DBT.
D
So
the
project
plan
was
to
lift
and
shift
everything.
That
was
the
goal,
and
we
started
with
a
couple
of
key
pipelines,
moving
them
over
to
dagster
and
really
getting
the
dagster
workflow
and
developer
experience,
including
things
like.
How
does
the
python
package
look?
How
do
we
set
up
the
CI
CD?
How
do
we
set
up
notifications?
How
do
we
like
get
just
get
accustomed
to
all
of
the
new
apis?
D
So
we
took
a
couple
of
trial,
trial
pipelines
and
moved
them
over
random
in
production,
then
deprecate
them
in
the
airflow
in
the
airflow
instance,
and
then
then
there
was
this
period
of
just
migrating
pipeline
after
pipeline
until
we
had
just
a
couple
of
residual
ones
left
in
airflow
that
we
really
kept
going
for
quite
a
while,
just
as
a
sort
of
fallback
in
case
anything
happens,
we
migrated
our
first
set
of
pipelines
and
then
we
kept
a
number
of
them
going
at
the
same
time
as
dagster
pipelines
were
running
just
as
a
sort
of
backup
to
in
case
anything
went
wrong.
D
I
think
our
experience
was
a
little
bit
like
like
dating
where
it
takes.
It
takes
a
couple
months
for
us
to
build
trust
in
the
relationship,
trust
with
the
new
infrastructure
that
stood
up.
We
had
some
kubernetes
issues
where
pods
were
getting
killed
when
things
were
getting
scaled,
and
so
it
just
the
the
migration
takes
a
little
bit
of
time
for
everyone
to
build
confidence
in
it.
D
But
then
what
we
found
was
that
there's
definitely
a
Tipping
Point
where
having
everything
in
dagster
became
a
much
faster,
workflow,
much
more
visible,
you'd
go
back
to
our
airflow
workflow
and
you
kind
of
be
like.
Oh,
what
is
this
and
at
that
point
we
were
really
able
to
pull
off
everything
everything
from
the
airflow
instance
and
just
and
just
make
it
in
in
Daxter,
and
during
that
process
we
improved
most
of
our
data,
so
we
were
able
to
pull
over.
D
You
know
it's
mostly
python
modules,
so
we
pull
over
the
air,
the
the
functions,
I
just
kind
of
wrap
them
and
run
them
like
we
would
in
airflow,
but
during
that
process,
a
great
opportunity
to
go
and
like
fix
some
pipelines
like
Harden
configurations
that
were
a
little
squeaky
in
the
airflow
pipelines,
and
it
was
really
a
good
opportunity
for
us
to
really
think
put
that
long-term
vision
of
you
know
very
stable
platform
and
just
refresh
some
of
our
our
core
process.
D
All
of
these
data
assets
that
you
have
out
there
can
be
diagsterized
and
connected
to
your
other
processes
and
that's
a
huge
unlock
for
our
team.
It
allows
us
to
move
to
a
much
more
like
event,
driven
architecture
where
DBT
refreshes
things
and
then
everything
that's
Downstream
of
those
pieces,
refreshes
things
and
that's
the
sort
of
like
next
level
of
orchestration
that
we
really
wanted
to
get
to
where
things
were
much
more
event,
driven,
not
just
a
bunch
of
schedules
that
we
didn't
really
know.
D
What
was
what
was
hitting
what
and
so
I
would
recommend
starting
there.
So
you
know
ignoring
infrastructure
if
you
can
using
all
of
the
the
built-in
CI
CD
and
like
actions
that
make
things
easy
and
nice
and
then
focusing
from
focusing
on
building
out
your
asset
graph
and
leveraging.
All
of
that
be
the
three
things
that
I
would
I
would
recommend.
D
There
are
some
features
in
dagster
like
the
assets
that
make
it
a
little
easier
and
make
it
not
just
a
net
move,
but
a
net
Improvement.
The
migration
challenge
is
going
to
be
proportional
to
your
to
your
dags,
the
amount
your
footprint
in
the
orchestrator
and
that
can
be
challenged.
D
I
think
the
other
piece
there
is
what
your
dag
dependencies
are,
because
if
you
have
triggers
in
in
airflow
where
one
dag
is
triggering
another,
and
then
you
know,
custom
alerting
is
built
on
top
of
that,
like
those
things
take,
do
take
time
to
to
migrate
over.