►
From YouTube: Dagster Introduction
Description
Warning: This video is fairly out of date due to the rapid development of Dagster.
This is supposedly a quick introduction to the Dagster tooling for Python.
Dagster is useful for data engineering, and more generally any parallelizable expensive python script.
https://dagster.io/
https://airflow.apache.org/
A
Hey
guys
today,
I'm
going
to
do
a
short
little
intro
into
dexter.
A
Dexter
is
kind
of
like
airflow,
it's
a
tool.
You
can
use
to
turn
your
python
scripts
into
or
it's
a
tool
that
can
take
your
python
strips
scripts
and
help
you
parallelize
them
as
well,
as
do
things
like
schedule,
them
monitor
them
and
a
little
bit
of
memorization
as
well.
Dexter
is
a
data
centric
dag
orchestrator,
so
it's
a
little
bit
different
than
airflow
and
I'm
gonna
get
into
how
it's
different
later.
A
So
why
might
you
want
to
use
this
with
your
scripts?
Well
yeah.
So
it
helps
you
parallelize
your
scripts,
and
it
gives
you
the
ability
to
easily
run
them
on
clusters
as
well.
A
So
you
can
export
extra
scripts
to
airflow
scripts,
and
then
you
can
run
them
everywhere
where
airflow
runs
so
on
kubernetes
clusters
and
stuff,
like
that,
dexter
also
gives
you
free
monitoring
and
limited
memorization.
There
is
more
complete
memorization
available,
but
it's
not
quite
it's
not
quite
ironed
out
yet
dexter
is
fairly.
New.
A
Dexter
also
gives
you
a
flex
flexible
configuration
system
and
scheduler,
which
is
useful
for
when
your
scripts
take
a
long
time
to
run,
and
you
want
to
set
them
up
to
run.
One
after
another,
or
in
parallel
or
yeah
or
at
a
certain
time
of
day,
you
want
to
trigger
a
script,
and
you
want
to
do
all
this
all
on
the
same
machine.
So
the
scripts
are
aware
of
each
other
and
they
don't
all
run.
At
the
same
time,.
A
So
what
dexter
does
is
it
takes
your
like
all?
You
need
to
do
to
turn
your
python
script
into
a
dagster
script.
Is
you
need
to
take
the
functions
that
your
script
is
made
made
of,
and
you
need
to
turn
those
into
solids,
which
really
just
means
that
you
put
like
you
wrap
them
in
a
dagster
decorator
called
a
solid
or
dexter
functions
are
called
solids
and
you
create
those
by
wrapping
your
functions
in
this
decorator
and
they
in
this
decorator.
A
You
can
define
the
inputs
and
outputs,
including
types
for
your
functions,
and
then
you
combine
these
solids
to
create
pipelines
which
are
the
dags
in
dexter,
and
then
you
get
the
nice
configurations,
yaml
framework
that
you
can
use
to
configure
both
your
pipeline.
So
your
runtime
environment
environment,
like
which
executor
you're,
going
to
use
if
you
want
it
to
run
in
parallel
or
not,
as
well
as
the
actual
solids
themselves
to
your
experiments,
so
that
could
be
which
data
set
you're
going
to
operate
on
or
how
you
want
to
handle
that
data
set.
A
Data
analysis
centric,
but
this
dexter
is
useful
for
any
python
script
that
takes
a
that
takes
a
while
to
run
and
dexter
comes
with
daggett,
which
is
a
web
interface
that
you
can
use
to
monitor
everything
you
can
use
it
to
write
your
configuration
in
the
ammo
files
and
it
has
a
nice
little
like
checker
and
helpful
hints
for
writing
the
yamafly,
the
animal
files
and
I'll,
let
you
know,
live
if
there's
any
errors
in
it
and
what
it's
expecting
that
you're
not
giving
and
all
that
kind
of
stuff
all
right.
A
And
I
set
up
a
little
script
that
I'm
going
to
turn
into
a
dexter
script.
It's
just
a
I
don't
know
a
really
simple
pretend
to
be
expect
pretend
to
be
expensive
and
by
expensive.
I
mean
take
a
lot
of
time.
Script.
A
A
Expensive
setup
takes
10
seconds
and
you
know
just
returns
a
complicated,
some
complicated
data
that
it's
set
up
and
then
the
expensive
analysis
function
takes
another
10
seconds
to
do
something
with
the
complicated
data
and
then
in
the
main
function.
You
just
pass
one
to
the
other
and
run
the
both.
A
A
A
A
A
So
we
have
expensive
setup
and
expensive
analysis,
and
in
here
you
can
see
all
of
the
pipelines
you've
created.
So
we
created
the
expensive,
expensive
pipeline
which
we're
in
right
here
and
there's.
No,
we
haven't
set
anything
else
up,
but
we
can
run
this
function
if
we
want.
Oh,
that's,
that's
a
sneak
peek
ahead,
so
we
haven't
set
up
any
kind
of
configuration,
so
there's
no
configuration
required
and
we
could
just
launch
our.
A
Launch
our
script
and
you
can
see
it's
working,
so
this
is
the
the
running
scripts
page
or
the
runs
page,
and
you
can
see
it's
working
through.
Oh
here,
if
I
load
it
up
here,
it's
working
through
expensive
analysis
right
now
and
it
finished
expensive
setup
a
second
ago
and
you
can
see
how
long
everything
takes
and
what
relies
on
what.
A
So
that's
one
of
the
cool
features
of
dagster
and
the
reason
it
cares
about
your
types
and
all
that
information
is-
and
this
is
another
the
ways
it
differs
from
airflow,
which
I'll
talk
more
about
later.
Is
it
cares
about
the
inputs
and
outputs
of
the
functions
and
how
they,
how
they
compose.
A
Yeah,
so
we
can
see
our
runs.
We
had
a
success.
We
dextra
has
a
nice
logger
as
well
that
we
will
use
later.
You
can
hook
it
up
into
a
database
for
when
you
have
like
postgres
for
when
you
have
a
whole
bunch
of
stuff
running
in
parallel,
so
that
you
know
you
don't
get
a
file
lock
yeah.
So
I'm
going
to
start
expanding
this
script
to
show
you
some
of
the
features
of
the
extras.
A
So
the
first
thing
we're
going
to
do
is
we're
going
to
turn
on
logging,
so
the
deck
each
solid
decorator
provides
your
functions
with
a
context
argument.
A
And
these
arguments
are
used
to
access
all
of
the
dexter
features
from
within
a
solid
from
within
one
of
your
functions
so,
for
example,
log
dot.
A
Object
there
we
go,
so
that's
how
you
log
information
in
dexter
and
then
you
can
also
use
dexter
for
some
cool
like
to
simplify
your
like
how
your
dag
is
organized.
So
you
can
do
something
called
dynamic
output
or
you
can
use
dynamic
output.
A
A
A
A
Okay,
we're
also
going
to
need
our
dynamic
output
definition,
okay
and
we
need
a
name.
So
let's
call
this
common
data
and
then
let's
call
this
one
just
data,
so
what
you
can
do
is,
if
it
expensive
setup.
If
this
were
to
return.
For
example,
I
different
or
10
different,
like
data
frames,
for
example,
each
of
which
you
want
to
do
something
different
with
then
you
can
create
10,
different
outputs
of
the
same
kind
that
will
all
be
operated
on
by
whatever
your
next
set
of
functions
are.
A
A
And
then
we
are
going
to
want
to
have
a
value
in
this
case.
I
don't
know,
let's
just
make
it
I,
and
then
we're
also
going
to
need
a
mapping
key.
The
mapping
key
is
used
to
identify
which
which
of
the
data
outputs.
This
is
you
of
display.
A
I
did
okay
and
then
we
also
need
to.
Since
we
have
multiple
outputs
now
we
need
to
define
what
the
original
one
is.
So
this
one
will
just
be
a
standard
output
whose
name
is.
A
Okay,
so
what
we've
done
here
is
we
have
two
sets
of
output.
The
first
output,
which
we
can
actually
put
above,
is
just
the
original,
like
it's
just
one
standard
output
that
we
had
before
and
it's
just.
This
is
a
complicated
data.
It's
just
that
string
and
we
defined
that
as
the
common
data
output.
Common
data
is
the
name
I
gave
to
that
that
output
value,
and
then
we
have
a
second
output,
which
is
the
dynamic
output
definition,
which
we
call
data
now
data.
A
A
So
we'll
have
we'll
have
our
common
data
common
data
first
and
then
we're
going
to
follow
that
up
with
the
actual
data
value
and
then
we're
going
to
want
to
run
expensive
analysis
once
per
data
output,
but
we're
also
going
to
want
lambda
x.
A
Okay,
so
x
in
this
case,
is
one
of
the
data
here.
Let's
just
call
this
plural
one
of
these,
like
integers
in
this
case,
so
an
expensive
analysis
we
might
want
to
handle
both
of
these
commons
so
we'll
take
in
common
and
and
one
of
these
specific
cases.
So,
let's
give
it
common
and
data
all
right,
undefined
name
data
underscore.
A
Overview
here
we
go
okay.
I
don't
know
why
that
didn't
work
before
you
can
see
that
there's
two
two
values
being
passed
from
expensive
setup
to
expensive
analysis
and
the
kind
of
overlaid,
expensive
analysis
here
indicates
that
there's
one
of
these
is
a
dynamic
output
that
gets
acted
on
multiple
times.
A
Should
lower
that
down
from
10
seconds,
that's
a
bit.
It's
a
bit
long
to
wait,
and
currently
we
haven't
set
up
parallelism
yet
so
it's
all
going
to
run
in
series,
but
you
should
see
that
expensive
analysis
with
the
integer
zero.
This
zero
that's
displayed
here
is
the
mapping
key
is
being
run,
so
this
will
run
for
10
10
more
times
and
you
can
terminate
scripts
and
then
this
one
failed.
A
So
what
you
can
do
is
you
can
re-execute
from
failure
which
doesn't
work
great,
sometimes
with
dynamic
output,
but
it
seems
to
be
working
now
so.
A
A
Yes,
okay,
so
once
it
wants
a
different
I
o
manager
for
the
memorization.
We
will
set
that
up
as
well:
okay,
yeah!
So
let's,
let's
do
that,
let's
also
so
so.
Actually
the
first
thing
I'm
going
to
do
is
I'm
going
to
set
up
some
configuration.
A
So
that
we
can
adjust
how
long
these
functions
take
to
run
so,
let's,
let's
make
a
time
variable
and
give
it
an
integer.
So
the
way
you
take
in
configuration
variables
is
you
define
a
dictionary
of
them
in
your
solids
in
your
solid
decorator
and
then
you
can
access
them
from
the
contacts.
So,
for
example,
instead
of
slipping
10
we're
going
to
sleep
contacts
dot,
I
always
forget
what
this
is
called
solid
config
and
we're
gonna
have
a
lot
of
time.
A
This
is
what
we
called
it
up
here
and
let's
do
this
down
here
as
well.
So
this
is
going
to
take
a
time
variable
as
well.
A
That
now
we
need
a
configuration
in
order
to
run
in
order
to
run
this
this
script,
so
this
could
be
which
data
set
you
want
to
use
or
which
which
features
you
want
to
enable
or
what
kind
of
analysis
you
want
to
do,
and
the
nice
thing
about
daggett
is
it.
It
has
all
these
helpers
for
trying
to
construct
these
yang
the
ammo
files,
so
you
can
make
it
kind
of
fill
in
as
best
as
it
can
what
it
thinks
you
need
as
a
config.
A
So
in
this
case,
let's,
let's
make
things
and
things
take
five
seconds
instead
of
ten
okay.
Now
we're
also
going
to
want
to
parallelize
this
because,
as
you
saw
last
time
doing,
all
the
sincere
or
in
series
is
going
to
take
going
to
take
some
time.
So
in
order
to
do
that,
we
are
going
to
need
to
configure
our
pipeline.
A
So
in
the
pipeline
configuration
you
can
select
what
executor
you're
going
to
use
so
the
standard
one
just
executes
it,
one
at
a
time
in
in
series,
and
then
you
can
set
a
there's
all
sorts
available.
There's
a
desk
executor,
there's
just
a
standard.
Multi-Processing
executor,
which
is
the
one
I'm
going
to
use,
and
for
that
we're
going
to
need
the
multi-process.
A
We're
also
going
to
need
a
mode
definition
and
in
order
so
that
some
of
the
caching
works
better.
We're
going
to
need
to
use
the
fsil
manager,
which
we
also
need
for
the
multi-processing
executor,
because
the
multi-processing
executor
needs
to
cache
between
runs
of
the
solid
like
between
solids.
It
needs
to
cache
the
output
of
a
solid
in
order
to
feed
the
input
to
a
new
process,
and
then
we're
also
going
to.
A
Yeah,
okay,
okay,
very
good,
okay
and
in
this
we're
just
gonna
include
one
two,
four
definition,
and
this
is
going
to
be
the
one
that
uses
free
the
resource
def,
which
is
going
to
be
the
one
that
we
wanted,
which
is
going
to
be.
I
don't
know
I
o
manager.
A
Manager
now
did
these
are
cool
because
so
the
I
o
manager
is
a
little
class
that
gets
called
every
time
a
solid
finishes
to
save
that
salt's
output.
So
this
default
one
here,
just
pickles
the
result
and
then
unpickles
it
later.
But
you
can.
You
can
create
your
own
very
easily
to
use
something
like
feather
if
you're
mainly
working
with
pandas
data
frames,
and
then
it
speeds
up
the
caching
and
makes
your
whole
your
whole
multi-processing
or
your
whole
script,
much
faster
if
you're,
using
like
multi
multi-processing.
A
Oh
so
we're
gonna
want
the
default
ones,
but
we're
also
going
to
want
the
multi
processing
one.
A
Okay,
now,
when
we
reload
this
bad
boy,.
A
A
A
Okay
now,
so
we
have
our
solid
set
up,
so
we
can
also
set
up
our
pipeline
or
sorry
our
like
execution
environment,
and
for
that
we
can
look
in
the
side
here
for
what
it
suggests.
So
we
are
looking
for
execution
execution
and
we
are
going
to
want
to
configure
the
multi
process
and
we
are
going
to
want
to
config
and.
A
A
Start
somewhere,
managers
for
this
hotline
so
we're
gonna
have
to.
A
A
A
Okey-Dokey
here
we
go
so
expensive.
Setup
is
just
one
function
that
gets
run
once
and
then
it's
10
outputs,
the
dynamic
outputs
we
created
called
data
gets
split
into
these
10
that
I'll
get
run
in
parallel
now
because
they're
all
independent
of
one
another-
and
you
can
see
these
start
to
finish,
and
when
you
click
on
one
we
can
see
it's
all
sorts
of
information.
A
Let
me
see
if
I
can
get
more
obvious,
including
our
logging
logged
and
logged
output,
and
you
can
go
back
and
figure
out
how
much
time
it
took
and
all
that
kind
of
stuff
yeah.
That's
that's
an
overview
of
how
how
dexter
works.
So
you
don't
only
need
to
run
the
dexter
scripts
from
inside
of
daggett.
A
You
can
also
run
them
using
there's
a
cli
cli,
and
then
you
can
also
run
them
directly
from
within
python
as
well
and
then,
instead
of
like
you,
can
load
yaml
files
or
you
can
feed
it
python
dictionaries,
if
I'm
not
mistaken.
So
you
can
build
this
into
your
larger,
larger
python
infrastructure.
A
So,
that's!
That's!
That's
a
quick
little
demo
on
how
this
stuff
works.
As
you
create
more
pipelines,
they
will
show
up
here
so
you're
not
limited
to
one
pipeline
per
file
or
anything
like
that
and
you
can
set
up.
You
know
you
can
see
your
run
history.
You
can
set
up
a
scheduler.
Oh,
how
do
we,
sir
okay
yeah,
so
you
can
set
up
a
scheduler,
so
you
can
run
things
at
certain
times
of
the
day.
A
A
So
dexter
versus
airflow,
so
dagster
is
a
lot
easier
to
set
up
airflow.
I
found
had
a
quite
involved
setup,
but
dexter
is
much
more
immature.
It
also.
I
like
the
dexter
python
interface,
much
more,
the
decorator
setup
and
the
the
way
you
combine
functions
into
pipelines
just
using
like
functional
programming.
A
I
find
it
to
be
much
more
intuitive
than
how
airflow
sets
up
its
execution
dependencies.
So
that's
kind
of
the
high
like
the
shallow
overview,
but
then
what
really
separates
the
two
is
that
airflow
is:
has
execution
dependencies
and
extras
data
dependencies.
So
in
airflow
you
airflow
knows
nothing
and
wants
to
know
nothing
about
the
information
passing
between
items
in
its
dag.
It
just
wants
you
to
explicitly
define
these
execution
dependencies.
A
I
haven't
used
airflow
much,
so
this
is
all
from
reading
various
blog
posts,
whereas
in
daxter
it's
just
like
a
functional
program
and
that
allows
you
to
include
the
typing
and
testing
as
well,
and
it
makes
it
ideally
much
easier
to
cache
in
dexter
and
also
to
like
set
up
state
for
testing,
although
that's
still
a
little
bit
immature
because
dexter
dexter
is
so
new.
That's
how
I
would
differentiate
these
two,
though
so
the
pros
of
dagster
is
like.
I
I've
gone
over
a
couple
of
these
things.
Already.
A
The
flexible
pipeline
configuration
system
is
great,
so
I
use
this.
I
use
this
yaml
configuration
extensively
for
when
I
set
up
a
data
analysis
script
to
allow
that
script
to
run
on
a
whole
bunch
of
different
data
sets
with
a
whole
bunch
of
different
data,
set
specific
parameters.
So
you're
not
sitting
there
with
a
directory
full
of
python
scripts
with
one
or
two
lines
changed
for
different
different
data
sets,
and
then
it's
easy
to
parallelize
your
scripts.
A
The
other
thing
is
dexter
is
very
lightweight,
so
I
guess
I've
gone
over
this
before
as
well,
but
it's
literally
just
a
couple
of
pip
dependencies
away,
yeah
and
then
the
cons
is
that
everything
is
experimental,
so
dynamic
pipelines
aren't
horribly
strong
because
you
can't
nest
them
very
well
as
far
as
I
could
find.
So,
for
example,
if
you
had
two
functions
in
a
row,
each
of
which
outputted
10
of
something
that
you
wanted
to
all
iterate
over.
A
So
within
those
10
you
want
to
split
those
out
10
into
10
again,
and
then
let's
say
you
want
to
cut
like
you
have
another
function
that
wants
to
operate
like
between,
like
across
that
gap,
then
dexter
doesn't
handle
that
very
well
yeah,
so
you
always
have
to
like
be
working
directly
within
your
one
dynamic
output
at
a
time
and
it's
very
awkward
for
exploratory
data
analysis,
so
you
can't
drop
into
ipython.
A
At
any
point,
it's
very
much
a
I
would
say
a
data
engineer,
engineering
tool,
as
opposed
to
you
know
an
exploratory
data
science
tool,
so
you
can't
drop
into
ipython
at
any
point
it,
but
it
works
great
if
you
do
test
driven
development
yeah.
So
that's
my
that's
my
hopefully
quick.
I
don't
know
how
long
this
is.
Oh
half
half
hour
overview
of
dexter.
Well,
if
you
stuck
around
for
the
whole
thing,
have
a
good
have
a
good
one
and
enjoy
your
day.