►
Description
Slides:
0.9.0 Slides: https://docs.google.com/presentation/d/17ts4HfRUEqLbBF52DZ66G1QHI1PutK2MnY_cnlTWbug/edit#slide=id.g9380d0b962_0_0
0.10.0 Slides: https://docs.google.com/presentation/d/1qwV2i_4wp-72HsCeQza1ZEP02QPoz8KLIC4czcNzw5s/edit#slide=id.g94347a5a1e_0_14
Prezi Slides: https://prezi.com/view/kveaLi8KasReSs4pyP5l/
A
Alrighty,
I
think
we'll
get
started
here.
I
assume
everyone
can
hear
me
but
yeah.
Thank
you
and
welcome
to
the
first
dexter
community
meeting
we're
planning
on
making
this
a
monthly
affair.
Yeah.
We
want
to
keep
it
pretty
lightweight.
For
today
we
actually
have
quite
a
bit
of
prepared
content,
but
as
we
go
on,
we
want
to
make
this
more
kind
of
community
driven
lightweight.
A
So
we
just
kind
of
give
updates
on
what's
going
on,
we
can
get
feedback
from
you
and
answer
any
questions
for
everyone,
we're
also
interested
in
getting
more
kind
of
community-created
content.
So,
if
you're
interested
in
speaking
or
want
to
share
what
you've
been
working
on
peace,
please
feel
free
to
do
so.
So
everyone
can
see,
I
I'm
doing
an
old
school
slide.
A
A
I
know
what
we
think
are
relevant
features
for
you
also
like
feel
free
to
give
and
we'll
q
and
a
at
the
end,
and
then
sandy
is
also
gonna
talk
about
our
plans
for
oteno,
which
you
know
we're
doing
these
kind
of
major
releases
every
few
months,
so
we're
targeting
that
to
mid-november
and
then
tomas
from
prezi
is
going
to
talk
about
how
they're,
using
dagster
at
prezi
and
using
that
scale
on
kubernetes
and
then
we'll
wrap
it
up
and
have
time
for
qa
so
lucy
agenda.
A
Today,
oh
by
the
way,
I
should
probably
introduce
myself,
I'm
I'm
nick
schrock,
I'm
the
founder
of
elemental
and
and
yeah.
Thank
you
all
for
coming
and
for
being
users
if
you're,
not
users.
Thank
you
for
coming
anyway,
and
with
that
I
will
hand
it
off
to
max
who's.
Gonna
talk
about
relevant
features
from
oh
no,
and
what
we've
been
working
on
the
last
couple
months.
B
Hi
everybody
thank
you
for
coming.
My
name
is
max
I'm
an
engineer
at
elemental,
and
I
want
to
talk
a
little
bit
about,
I
know
and
what
we
did
and
introduced
some
of
the
new
features
so.
B
Our
own,
I
know,
release
codenamed
laundry
service
was
a
shorter
release
than
080
before
it
or
than
oteno
after
it
is
going
to
be.
We
focused
on
cleanup
and
hardening.
I
think
that
we
should
mention
at
the
top
level
that
we've
we've
officially
dropped
support
for
python35
for
two
reasons:
no
one
in
the
community
was
using
it
and
supporting
35
is
actually
more
work
than
supporting
python
2..
So
we've
also
added
full
support
for
python
3.8,
which
is
largely
thanks
to
pi
spark,
finally
being
38
compatible.
B
I
want
a
short
shout
out
to
all
of
our
community
contributors.
Thank
you
so
much
these
contribute.
Contributions
are
ramping
up.
If
you
would
like
to
contribute,
please
let
us
know,
and
we
can
help
you
do
that,
make
it
easier
for
your
contributions
back
to
the
code
base
to
merge.
So
I
want
to
start
with
some
of
the
the
sort
of
internal
work
that
we
did.
No,
no,
no,
the
biggest
thing
is
much
better
user
code
isolation.
So
the
broad
overview
here
is
we'd,
really
like
it.
B
If
user
code,
like
the
code
inside
a
solid
definition
or
inside
a
resource,
init
function,
could
not
crash,
the
framework
could
not
crash
tag
it
even
if
it
said
false,
and
so
we've
done.
We've
we've
isolated
user
code
at
the
process
level
and
now
all
the
code
that
you
write
in
like
a
solid
definition
or
any
other
function.
That's
executed
by
the
framework
is
executed
in
a
separate
process
and
that
process
communicates
with
the
framework
over
grpc
for
inter-process
communication.
B
So
this
is
pretty
exciting.
It
makes
operating
the
framework
much
more
robust.
It
also
opens
the
door
to
a
couple
of
cool
infrastructural
things
that
are
coming
down
the
pipe
or
in
in
some
cases
already
here,
the
first
of
which
is
executing
user
process
in
containers
executing
user
code
in
containers.
So
that's
something
that
is
already
present
on
master
and
you
can
start
playing
with
the
containers
will
communicate
with
the
framework
over
grpc
in
the
future.
B
There's
the
possibility
to
do
sort
of
more
extravagant
things
like
write,
daxter
native
code
in
other
languages,
for
instance,
you
could
have
a
scala
solid,
but
as
long
as
it
communicated
over
grpc
with
the
same
protocol,
it
would
still
be
possible
to
execute
that
user
code
alongside
the
python
code
that
you
expect.
So
that's
a
pretty
neat
internal
change.
That's
going
to
enable
a
bunch
of
features
coming
down
the
pipe.
B
We
also
did
a
lot
of
work
to
harden
the
kubernetes
deploy
strategy,
not
because
kubernetes
is
the
blessed
or
only
way
to
deploy
dexter,
but
because
we
think
it's
important
that
people
be
able
to
deploy
in
a
robust
way
in
kubernetes.
So
that
starts
with
a
bunch
of
small
improvements
and
bug
fixes
the
helm.
Deploy
is
much
more
flexible.
The
scheduling
is
much
more
stable.
B
B
We're
building
towards
seamless,
no
downtime
user
code
deploys
some
of
that
has
already
landed,
and
some
of
that
is
coming
in
oteno,
and
I
think
sandy
will
talk
a
little
bit
more
about
that,
and
we've
also
separated
out
some
of
the
code
into
separate
packages.
So
previously
we
had
a
little
bit
of
a
concerns
were
a
little
bit
mushed
together
between
daxter
celery,
that
is
using
celery
as
a
run,
queue
using
celery
to
launch
user
code
in
separate
containers
and
the
kubernetes
specific
stuff,
and
that
has
all
been
separated
out.
B
B
B
We
also
have
introduced
support
for
kubernetes
native
cron.
So,
if
you're
running
on
kubernetes,
you
no
longer
need
to
use
system
chrome,
I
want
to
talk
about
two
sort
of
user-facing
api
changes
as
well
that
we
think
are
pretty
exciting.
The
first
of
these
is
experimental.
It's
hooks.
We've
had
a
lot
of
demand
for
the
ability
to
run
arbitrary
code
on
solid
success
or
failure,
and
so
hooks
is
a
way
of
doing
that.
B
So
here's
an
example
of
writing
little
hooks
that
on
failure,
success
of
a
solid
send
some
message
to
slack
and
hooks
like
this
can
be
attached
either
to
pipelines
or
to
specific,
solid
indications
inside
of
a
pipeline.
So
this
is
still
an
experimental
api.
It's
very
much
in
flux.
We
really
appreciate
you
playing
around
with
it
and
letting
us
know
how
or
if
the
api
needs
to
change.
I'm
very
excited
to
see
what
people
do
with
this,
it's
going
to
be
very
powerful.
B
The
second
feature
I
want
to
talk
about
is
the
sort
of
configured
wrapper,
so
this
is
a
very
flexible
idea,
but
basically,
if
you
take
some
configurable
object
like
a
solid,
you
may
have
chunks
of
config
that
you
really
don't
want
to
repeat
every
time
that
that
solid
is
invoked.
You
want
to
package
the
config
up
with
the
object
same
for
resources
and
the
other
configurable
objects
in
the
dextre
api.
B
So
what
configured
lets
you
do
is
is
exactly
that
you
can
take
some
solid
and
some
chunk
of
config
package
them
together
and
then
that
basically
constitutes
a
new,
solid
definition
that
you
can
reuse,
but
that
already
has
the
config
fragment
packaged
with
it,
and
this
is
a
very
flexible
idea.
It
also
includes
the
possibility
to
have
custom
config
functions
which
take
in
some
config
specified
on
the
configured
object
and
then
transform
it
to
to
config.
That
is
valid
for
the
wrapped
object.
B
So
we
think
this
is
really
going
to
enable
a
lot
of
code,
reuse
and
reduce
the
possibility
for
errors
in,
like
repeated
config,
fragments,
again
very
excited
to
see
what
everyone
does
with
this
api
new
in
I
know,
or
rather
newly
committed
to,
we
want
to
kind
of
formally
express
a
deprecation
policy.
I
talked
about
hooks
as
being
experimental,
that's
part
of
what
we're
we're
doing
here,
but
we
are
formally
committing
to
you
guys
not
to
break
public
apis
in
minor
releases
by
public
apis.
B
B
We
will
mark
apis
as
deprecated
before
they're
broken.
We
have
new
code
level
support
for
that.
So
when
you
use
a
deprecated
api,
you
should
get
a
warning
print
to
the
console.
Please
enable
warnings
in
your
test
suites
if
they're
not
already
enabled
that'll
help,
you
keep
track
of
our
plans
to
deprecate
apis
and
we've
introduced
this
experimental
wrapper
when
you
use
an
experimental
api
you'll
also
get
a
warning.
B
B
B
I
want
to
briefly
touch
on
a
couple
of
notable
public
api
changes
in
0.90.
These
are
mostly
just
renamings
and
I
know
that
they
induce
a
degree
of
thrash.
Thank
you
for
putting
up
with
it,
but
we
think
that
it's
going
to
make
the
platform
much
easier
for
new
users
to
start
to
start
using
and
easier
for
them
to
understand.
So
some
of
these
are
just
renamings
like
config
to
config
schema,
we've
renamed
materialization
to
asset
materialization
in
part,
to
highlight
that
materializations
are
meant
to
play
with
the
asset
system.
B
Some
of
them
are
sort
of
signposts
of
more
thorough,
going
conceptual
changes
that
are
coming
down
the
pipe,
for
instance,
the
system
storage,
which
is,
is,
I
think,
a
very
difficult
thing
to
understand,
and
a
difficult
thing
to
build.
A
custom
system
storage
is
becoming
an
intermediate
storage
and
that's
paving
the
way
to
changing
slightly
the
way
that
it
it
works
with
the
internals
so
that
it's
it's
easier
to
understand
and
easier
to
build
them.
B
And
eventually,
perhaps
incorporate
some
of
what
is
currently
done
by
the
run,
launcher
and
calling
it
an
executor
is
a
little
bit
more
true
to
that,
and
then
there
are
some
just
terrible
naming
choices
that
we've
changed
like
input,
hydration
config,
which
really
really
is
a
barrier
for
someone
trying
to
adopt
the
system.
We're
now
calling
that
loader,
which
makes
a
lot
more
sense.
But
thank
you
for
putting
up
with
these.
These
are
some
of
the
things
you'll
have
to
grep,
for
if
you
are
migrating
tonight,
so
thank
you.
B
If
there
are
any
questions
about,
I
know
I'm
very
happy
to
talk
about
it
later
at
the
end
of
everyone's
presentations
and
now
I'll
turn
it
over
to
sandy
to
talk
about,
what's
all
the
exciting
stuff
that
we're
planning
to
do
over
the
next
couple
months.
Thank
you
so
much.
D
Awesome
so
my
name
is
sandy,
I'm
an
engineer
at
elemental
and
I'm
going
to
spend
some
time
on
what
we're
thinking
about
for
upcoming
hoteno
release.
D
For
those
of
you
who
aren't
familiar
with
dexter's
release
cycle,
we
do
a
major
release
like
otenno,
roughly
once
every
three
months,
so
I'm
going
to
be
talking
about
here
is
the
set
of
features
we're
thinking
about
roughly
for
november,
we're
not
going
to
get
to
every
one
of
these
items.
So
don't
interpret
this
as
a
concrete
plan.
Instead,
it's
a
set
of
areas
that
we
think
are
important
and
we're
talking
about
them,
because
we
want
to
understand
on
what
you
think
is
important
and
then
also
what
we're
missing.
D
So,
how
do
we
decide
what
to
work
on?
The
first
thing
we
asked
ourselves
was:
what
are
the
parts
of
our
system
that
give
our
users
the
biggest
headaches?
There
are
some
difficulties
that
come
up
over
and
over
again
in
slack
questions,
github
issues
and
direct
conversations
with
users.
D
Some
of
these
panes
just
require
sanding
down
some
rough
edges,
with
the
docks
that
others
require.
Bigger
architectural
changes
to
less
mature
parts
of
the
system.
The
second
thing
we
asked
ourselves
was:
how
can
we
push
the
envelope
of
an
orchestration
system?
So
what
are
the
situations
where
the
dagster
way
of
looking
at
things
lets
us
offer
some
powerful
capability?
That's
maybe
even
beyond
what
our
users
would
have
expected.
So
these
are
opportunities,
really
double
down
on,
what's
unique
about
daxter
and
try
to
offer
something
really
exciting.
D
So
where
are
these
pans
and
opportunities
located
in
the
system?
We
think
of
dexter
in
roughly
three
layers,
the
top
layer
is
the
world
of
data
assets.
So
these
are
the
data,
warehouse
tables
or
machine
learning.
Models
that
we're
actually
building
our
pipelines
to
produce
the
middle
layer
is
the
world
of
execution.
D
So
that
is
solids
pipelines.
These
are
the
set
of
primitives
that
dagstr
actually
uses
for
running
stuff
and
holding
it
all
up
is
the
world
of
deployment
and
instance,
management,
one.
Second,
everything
is
required
to
keep
to
basically
set
up
a
production,
dashboard
installation
and
keep
it
going.
D
I
just
realized
that
my
face
is
hidden
there.
We
go.
Here's
me
so
for
each
of
these
layers,
I'm
going
to
give
an
overview
of
the
improvements
that
we're
thinking
about
and
then
maybe
dive
in
deeper
to
a
few
of
the
interesting
ones.
D
D
D
The
second
pin
we
hear
about
a
lot
is
pipeline
is
the
right
intermediate,
so
some
sort
of
data
warehouse
or
data
lake,
so
many
users
have
expressed
confusion
about
the
relationship
between
the
asset
and
intermediate
concepts
that
we
have
in
dagstar.
But
both
of
these
concepts
are
representing
data,
that's
produced
by
dagster
pipelines,
but
they
function
in
slightly
different
ways.
So
we
think
there's
an
opportunity
to
smooth
out
some
of
those
wrinkles
and
then
our
in
our
exciting
opportunities
bucket.
D
The
big
thing
we're
thinking
about
is
what
we're
calling
version
based
memorization
kind
of
a
mouthful,
but
what
this
means
is
tracking
the
version
of
the
code
at
each
step
in
a
pipeline.
So
then
you
can
avoid
re-running
steps
whose
outputs
you
already
computed
in
a
previous
run,
the
same
capability
could
actually
also
make
backfills
easier
to
manage
so
talking
about
backfills
for
a
minute.
This
is
something
that
we
actually
added
to
dagster
recently
and
we
call
it
the
step
partition
matrix
for
any
partition
run.
D
You
can
find
this
view
and
dag
it,
and
we
have
for
the
rows
the
set
of
steps
for
that
pipeline
and
then
for
the
columns,
the
set
of
partitions
that
that
pipeline
has
been
run
over
and
then
you
can
click
on
any
particular
partition
and
understand
all
of
the
runs
that
affected
that
partition
in
the
past.
D
So
what
this
is
really
useful
for
is
basically
understanding.
Where
are
the
gaps?
Where
did
I
have
failures
that
made
it
so
that
certain
steps
were
unable
to
run
for
certain
prior
partitions?
So
I
can
come
in
later.
Look
at
all
the
runs
that
affected
those
particular
partitions
and
then
under
and
then
rerun
ones
that
are
likely
to
cause
problems.
D
D
Then
you
might
make
some
changes
to
the
compute
function
for
the
third
solid,
when
you
want
to
try
out
these
changes,
it's
kind
of
a
waste
of
time
and
resources
to
rerun
the
solids
that
haven't
changed,
but
both
engineers
actually
just
end
up
re-running
the
entire
pipeline
because
difficult
to
track
what's
changed
and
what
stayed
the
same.
D
Ideally,
we
would
only
auto
run
the
set
of
steps
that
are
actually
stale,
because
those
are
the
ones
that
we're
sort
of
interested
in
the
new
results
from
so
the
version
based
memorization
feature
allows
you
to
tag
any
solid
with
a
version.
The
idea
is
that
the
version
should
stay
the
same
as
long
as
the
solid's
compute
function
stays
the
same.
D
This
is
particularly
useful
if
you're
sort
of
in
like
a
tight
dev
loop,
where
you're
going
back
and
forth
between
making
changes
to
a
pipeline
and
then
rerunning
that
pipeline,
especially
if
you
know,
as
is
the
case
in
most
pipelines,
changes
that
you
make
in
one
step
have
impacts
on
downstream
changes,
and
you
end
up
having
to
make
changes
to
multiple
steps
to
make
sure
everything
works.
Instead
of
re-running
the
entire
pipeline
each
iteration,
you
can
simply
rerun
the
stuff
that
has
changed.
D
Another
place,
as
we
talked
about
the
versions
can
be
helpful,
is
managing
backfills.
So
often
we
run
backfills
when
we
made
a
code
change
and
there
were
a
set
of
prior
partitions
that
we
generated
with
an
old
version
of
our
code.
So
we
can
leverage
that
same
step,
partition,
matrix
view
that
we
showed
earlier
in
the
flesh
to
give
us
an
understanding
of
which
partitions
are
stale
after
a
code
change.
So
you
know
in
in
this
little
visualization.
D
All
of
the
yellow
steps
are
stale,
because
we've
made
a
code
change
to
the
solids
that
generate
those
outputs.
D
D
They
haven't
registration
refers
to
situations
where,
for
example,
someone
wants
to
kick
off
a
task
for
every
file,
that's
discovered
in
a
directory,
so
right
now,
dexter
is
limited
because
it
requires
users
to
specify
a
fixed
set
of
tasks
when
they
define
a
pipeline.
The
idea
behind
dynamic
orchestration
would
be.
They
could
actually
determine
how
many
parallel
tasks
to
run
at
execution.
Time
based
on
information.
That's
only
available
at
execution
time.
D
Crosstag
dependencies
are
when
two
pipelines
don't
make
sense
to
merge
into
a
single
pipeline,
but
the
latter
has
maybe
some
data
dependencies
on
the
former,
and
we
don't
want
to
begin
execution
of
the
ladder
pipeline
until
the
former
pipeline
is
completed.
D
In
the
exciting
execution
opportunities
bucket
one-
and
you
know
this
is
a
little
bit
more
speculative-
is
multi-container
orchestration.
The
idea
here
would
be
to
enable
users
to
assemble
pipelines
out
of
tasks
that
each
live
in
their
own
containers.
This
could
make
it
so
that
the
set
of
dependencies
for
different
tasks
within
a
pipeline
aren't
tied
to
each
other.
So.
D
Little
bit
more
speculative
feature
that
we
have
to
think
about
what
it
would
look
like
to
build,
but
could
also
be
a
very
cool
capability,
and
the
second
more
speculative
feature
is
what
we
call
event
driven
scheduling,
and
this
refers
to
launching
pipeline
runs
instead
of
at
a
fixed
tick.
In
response
to
external
events,
like
maybe,
new
data
has
landed
in
a
storage
bucket,
and
we
want
to
kick
off
a
pipeline
to
process
that
data.
D
So,
lastly,
for
the
deployment
layer,
we
don't
have
as
many
exciting
new
features,
but
it's
maybe
the
most
important
area
to
address
pain,
because
if
you
just
have
trouble
managing
their
attacks
or
instances
in
production,
there's
not
a
ton,
they
can't
accomplish
with
dexter
at
all.
The
main
things
we've
heard
in
terms
of
difficulties
in
deployment
are
difficulties
operating
the
scheduler
difficulties
deploying
on
kubernetes
and
difficulties.
Managing
large
numbers
of
pipeline
runs,
so
I'm
going
to
dive
a
tiny
bit
into
what
we've
observed
about
the
scheduler.
D
We
observed
a
couple
of
main
challenges
with
axiorys
current
scheduler,
so
first
of
all
is
the
fact
that
it
depends
on
cron
for
the
actual
scheduling.
This
creates
kind
of
a
split
brand
scenario
where
you
have
to
manage
both
the
dagster
process,
that's
managing
the
crown
process
and
and
then
cron
itself.
D
The
second
one
is
sort
of
a
availability
concern
where
node
failures
can
actually
make
the
scheduler
miss
ticks.
So
you
know,
while
the
scheduler
node
is
down,
it
won't
be
launching
any
tick.
So
when
it
comes
back
up,
it
doesn't
know
that
it
missed
those
ticks.
D
We
think
that
the
best
way
to
address
both
of
these
is
by
making
the
scheduler
a
first
class
component
of
dagster
itself,
so,
instead
of
sort
of
trying
to
outsource
it
to
kron,
we
want
to
build
the
scheduler
into
dagster,
because
scheduling
is
so
important
to
what
dagster
does
so.
This
will
both
make
operating
the
schedule.
E
D
A
Great
thanks
sandy
and
thanks
max
for
putting
that
together,
yeah
there's
a
ton
of
exciting
stuff
going
on
and
yeah
it's
gonna
be
great.
I
think
out
of
10
is
going
to
be
pretty
epic
actually
in
terms
of
things
to
yeah.
I
highly
encourage
everyone
to
play
with
both
hooks
and
configured,
especially,
I
think,
for
it
solves
a
lot
of
common
issues
that
a
lot
of
people
you
know
mention,
and
I
think
it
can
pay
enormous
dividends
like,
for
example,
with
configured.
A
If
you
have
tons
of
repeated
blocks
of
config
in
your
config
files
that
never
change
between
runs.
I
highly
encourage
you
to
look
at
configured
to
kind
of
capture
that
in
code
and
it
kind
of,
makes
everything
simpler
and
then
I'm
personally
really
this
version
based
memoization.
This
version
based
memorization
stuff,
is
going
to
be
incredible,
so
both
for
dev
loops
and
for
backfills.
So
it's
great
stuff
next
on
top,
is
tamas.
A
So
tomas
tomas
is
a
staff
engineer
at
prezi,
they've
been
working
with
daxter
since
effectively
the
beginning
of
the
year
migrating
their
existing
entire
production
system
to
dagster
and
they've
been
unbelievable
partners
through
this,
and
if
any
of
you
are
using
the
kubernetes
infrastructure,
you
have
tomas
and
crew
to
thank
for
grinding
out
a
lot
of
the
rough
edges
before
you
had
to,
and
so
we
are
all
personally
indebted
to
tomas
and
crew.
A
So
without
further
ado,
I
will
hand
it
off
to
tomas
to
both
show
off
the
features
of
prezi
itself
in
this
presentation,
as
well
as
explain
the
data
infrastructure
behind
it.
E
Yeah
hi,
thanks
for
the
nice
words,
I'm
super
happy
to
be
here
and
showing
you
what
we
did
in
this
dexter
world
and
what's
your
journey
to
getting
to
dexter
and
migrating
to
dexter
at
prezi.
So
let's
start
how
we
started.
E
I
think
the
like
the
data
data
engineering
and
the
data
team
started
around
like
eight
years
ago
at
prezi
we
started
like
I
think,
like
most
of
the
companies,
where
we
have
a
bunch
of
shell
script
and
scheduled
with
chrome,
of
course,
at
some
point,
as
the
uti
jobs
started
to
grow,
we
basically
we
figured
out
that
it
won't
scale,
so
we
had
to
come
up
with
some
kind
of
solution
on
that
it
was
like.
Six
years
ago
we
were
looking
around
on
the
open
source
bird.
E
E
We
call
this
flow
keeper,
that's
what
you
can
see
on
on
the
screen
this.
This
is
our
homegrown
orchestration
and
one
of
the
main
design
decisions.
Why
we
decided
to
create
a
new
one
and
not
going
some
existing
one.
There
is
simplicity,
so
one
of
the
requirements
from
our
users
was
basically
to
not
they.
Basically,
they
don't
did
not
wanted
to
write
a
code
to
have
a
pipeline,
and
that's
why
we
come
up
with
a
with
a
new
orchestrator
where
we
have
the
json-based
conflict.
E
E
Basically,
this
is
the
this
is
a
pretty
simple,
json
descriptor.
What
you
can
see
here,
where
you
can
basically
define
the
scheduling
type
you
have
two
type
is
daily
and
early
schedule.
You
can
define
the
inputs
what
your
job
is
using
and
also
you
can.
As
you
can
see
there,
you
can
give
some
kind
of
friendly
name,
and
there
is
a
path
what
you
can
define
in
this
case.
This
is
an
s3
pass.
E
What
we
pass
and
and
also
you
could
define
what
kind
of
data
sets
your
job
will
generate.
So
in
this
case,
this
job
as
input
is
some
kind
of
s3
location,
and
then
it
will
produce
some
kind
of
redshift
table
and
when-
and
you
should
know
that,
so
these
input
and
outputs
is
really
you.
We
use
this
to
build
up
the
whole
dependency
graph
in
our
orchestration
and
we
did
we.
E
We
did
not
go
with
that
concept,
but
you
can
see
somewhere
else
or
like
a
like
other
orchestrator,
where
you
can
just
basically
define
your
job
names
and
that's
the
way
how
you
define
dependencies
between
the
two
jobs.
Here
we
basically
went
in
the
past
that
that
you
only
have
to
know
what
kind
of
data
sets
you
want
to
work
with
and
based
on
that,
we
will
figure
out
the
dependencies
and
which
job
it
needs
to
be
connected.
E
So,
basically,
if
you
define,
if
you
said
that
your
input
is
this
s3
location,
what
you
can
see
here-
and
we
saw
that
other
jobs
generated
the
same
as
your
location.
We
connected
the
two
jobs.
Basically,
this
this
this
was
or
how
we
set
up
the
dependencies
between
these
jobs.
I
think
it's
pretty
simple
and
it's
it's.
It's
worked
for
us
because
usually
the
user
knows
what
they
are
working
on
a
bit.
E
What
kind
of
data
sets,
but
they
are
not
really
aware
of
which
job
jar
is
that
and
we
defined
a
couple
of
predefined
job
types.
These
are
what
you
could
use,
and
we
here
in
this
example,
is
a
reshift
load.
E
Basically,
what
it
does
you
specify
input
and
we
load
the
input
data
into
redshift
with
the
property,
with
the
parameters,
what
you
can
see
down
there
and
we
have
a
few
job
times
like
redshift
load
registry
transport,
which
was
basically
running
a
sql
script,
and
we
have
spark
jobs
and
and
like
python
jobs
and
a
few
others,
and
we
also
defined
the
the
tiers.
So
every
data
set
put
in
some
kind
of
tier,
which
basically
is
the
priority.
What
does
it
mean?
E
You
can
imagine
that
you
have
a
bunch
of
data
set
and
especially
if
you
have
like
hundreds
of
data
sets,
then
it
can
happen
then,
and
hundreds
of
epa
jobs.
Then
it
can
happen
that
that
you
have
two
jobs
and
the
and
two
jobs
can
run
at
the
same
time
but
like
on
the
resource.
What
you
want
to
run
there.
You
can
load
like
running
two
in
parallel.
E
In
that
case,
you
have
to
make
sure
the
more
important
data
set
will
be
ready
earlier,
and
this
is
what
tiers
means
here:
the
lower
the
tier
that
jobs
get
will
be
scheduled
earlier,
if
possible,
and
another
thing
what
I
failed
to
mention:
it's
basically,
the
job
type
and
in
the
job
type.
This
is
a
redshift
mode
and
jobtite
also
define
the
resource
what
we
are
going
to
use.
E
So
in
this
case
redshift
and
even
in
our
homegrown
scheduler,
we
had
these
cube
resource
skills
where,
basically,
we
made
sure
that
you
can't
overload
the
resources
what
the
job
is
using.
You
can
imagine,
I
guess
if
you
have
like
hundreds
of
jobs
which
can
run
in
parallel,
but
if
you
would
run
these
hundreds
of
jobs,
heavy
jobs
in
on
redshift
that
you
most
probably
would
cure
that.
So
this
was
the
state
this.
This
was
our
own
schedule.
E
So
think
things
are
looking
good
and
it
seems
like
a
user
really
liked
it,
and
we
ended
up
with
a
dependency
graph
like
this,
so
we
had
around
900
jobs
and
if
you
have
900
jobs,
then
you
will
face
with
a
few
issues
and
that's
why
we're
really
thinking
if
we
want
to
fix
those
in
our
current
homegrown
orchestrator
or
we
are
looking
for
some
open
source
alternatives
and
why
we
decided
to
not
improving
our
homegrown
orchestrator.
E
E
E
Another
thing
is
these
orchestras
are
currently
running
on
one
ec2
machine
and
which,
if
ties
then
we
are
in
the
trouble.
Then
we
had
to
start
a
new
machine
and
setting
up
everything
there
and
also
there
are
problems
that,
because
we
are
running
all
of
our
jobs
in
one
machine,
it
can
happen
that
two
jobs
interfere
with
each
other.
E
You
can
imagine
if
one
job,
basically
that
it's
too
high
cpu
load
or
just
eat
up
the
disk
space,
or
even
first,
when
when
basically,
you
have
some
vp
users
and
they
just
start
expecting
that
they
can
write
to
a
temporary
folder
and
one
job
without
defining
the
as
a
dependency
between
each
other
one
just
put
down
some
files
there
and
the
other
one
expects
to
pick
it
up
and,
of
course,
the
infrastructure
that
crazy
is
moving
to
kubernetes
so
or
data
usage
structure
needed
to
move
as
well
to
kubernetes.
E
E
So
if
our
users
wanted
to
test
their
jobs,
mostly,
they
had
to
log
into
one
machine,
copying
their
file
and
trying
it
out
from
that
specific
machine,
and
we
want
to.
We
wanted
to
provide
a
way
better
user
experience
to
them,
and
basically
that
was
the
time
when,
when
we
talked
with
the
dexter
team-
and
they
convinced
us
that
let's
try
out
their
tool
and
try
to
and
let's
see
if
it
how
it
works
for
us
and
that's
when
we
decided
okay,
let's
try
to
migrate
to
this
new
system.
E
But
of
course,
if
you
want
to
migrate
to
a
new
system,
you
don't
want
to
write
all
of
your
epi
jobs
from
scratch.
So
what
what
was
our
first
requirement
when
we
try
to
move
to
dexter
is
basically
to
being
able
to
keep
or
descriptors
and
migrating
and
using
it
for
generating
solids
in
dexter.
So,
basically,
what
we
wanted,
we
had
a
car
and
we
wanted
to
replace
the
engine
a
way
better
engine
and
very
more
reliable
engine,
and
this
is
what
we
did
so
keeping
our
job
descriptors.
E
First
of
all,
we
use
the
job
descriptor
and
started
to
generate
solids
from
it.
How
this
looks
like
first
of
all,
we
generated
a
solid
config,
which
I
now
saw
that
it
should
be
config
schema
which
basically,
if
you
treat
solid
as
a
function
which
has
parameters
then
config
config
are
the
parameters
and
its
types
and,
as
you
can
see
here,
we
had
the
original
json
descriptor,
but
you
can
see
down
there.
It's
a
redshift
transform
and
we
generated
a
nice
schema.
A
config
schema
for
that.
E
What
you
can
see
on
the
right
side,
the
screenshot
from
dexter.
So
as
you
can
see,
the
type
can
be
there
or
and
and
for
every
descriptor.
We
generate
one
specific
solid
for
it.
So
that's
why
it's
rigid.
So
here
you
can
change.
Redshift
transform
any
other
types,
because
their
input
and
even
the
processing
wouldn't
make
sense.
So
there,
as
you
can
see,
you
can
only
specific
transform
and
you
can-
and
there
are
all
the
parameters
which
can
be
used
in
the
register
transform.
E
So
in
this
case,
the
like
the
sql
file,
which
basically
says
resecul
file,
needs
to
be
run
on
redshift
when
you're
running
this
job.
E
Of
course
now
so
now
you
have
a
function
and
you
have
all
the
parameters,
so
you
have
a
solid
and
or
and
and
the
config
schemas,
but
you
need
all
the
parameters
or
the
values
that
you
want
to
pass.
That,
and
this
is
this,
these
are
the
presets
we
also
generating
the
preset
email
and
from
or
json
descriptor
like.
If
you
check
here
on
the
right
side,
this
is
this
one
is
generated,
one.
E
The
left
side
is
basically
one
which
is
in
or
json
and,
as
you
can
see
there,
we
generated
a
nice
preset
where
we
say
that,
where
we
pre-fill
all
the
values,
what's
what
are
what
are
in
the
json
descriptor
and
later
on?
If
you
want,
of
course,
on
the
playground,
you
can
you
can
change
it
if
you
want
to
run
some
test
run,
but
but
basically
you
don't
have
to
do
anything,
we
do
it.
We
pre-fill
it
for
you
like.
E
We
have
the
preset,
then
we
have
the
solid
body.
Basically,
the
solid
body
is
predefined
by
us
and
you
are
and
it's
when
it's
get
all
the
properties
from
or
from
the
solid
presets
and
then
based
on
that
we
decide
okay,
what
kind
of
job
times
we
need
to
run.
So
if
it's
a
retro
transform,
then
we
will
run
a
relative
transform
and
we
do
some
other
steps
as
well.
So
you
know
solid
body,
basically
what
we
do.
It's
checking
the
inputs
doing
the
actual
job
execution.
E
And
now
you
have
a
nice
solid
for
the
the
configs,
the
presets
and
the
body.
But
you
have
to
define
dependency
between
the
solids
and
what
kind
of
dependence
in
the
input
and
outputs
there
are,
and
here
we
as
well
are
using
the
json
descriptor
and,
as
you
can
see
there,
we
are
generating
a
typed
input.
So
in
this
case,
because
it's
a
redshift
table,
that's
why
we
generate
a
ratchet
flow
keeper
pass.
E
It's
called
in
this
example,
and
and
that's
where
we
are
generating
for
the
second
input
and
also
we
generate
the
output
and
when
we
are
generating
the
dependency.
Basically,
what
we
do
we
do.
The
same
depends
and
dependency
setup.
What
we
what
I
mentioned
earlier,
basically
based
on
the
inputs
and
outputs
output
paths
and
table
names,
we
look
up.
We
job
generate
that
and
we
do
the
connection
between
the
solids.
Based
on
that-
and
here
you
go,
there
is
a
nice
small
pipeline
defined.
E
And
last
but
not
least,
we
also
add
some
solid
metadata
which
not
needed
for
the
solid
itself,
but
it's
more
like
like
dexter
as
the
orchestrator,
and
also
because
we
want
to
add
some
nice
tagging
onto
these
solids.
So
just
a
few
examples
here
when
we
set
the
max
returns.
Basically,
this.
This
is
what
what
which
says
that,
how
many
times
we
want
to
retry
a
failing
job
before,
failing,
actually
and
and
stopping
retrying,
and
also
we
set
the
tier
here
and
based
on
the
this
tier.
E
We
also
said
the
dexter
priority
for
the
orchestrator
and
also
we
set
like
the
dexter
salary
queue
based
on
the
job
type.
What
I
mentioned
before
for
resource
based
scheduling.
E
E
That
was
one
one
of
the
requirements
from
the
beginning.
What
we
want
to
achieve,
because
what
we
saw
like
other
orchestrators,
that
they
they
start
to
have
problems
if
you
have
like
hundreds
of
of
jobs
or
solids
in
one
pipeline,
and
that's
why
you
have
to
basically
strip
your
pipeline
into
multiple
pipeline
and
doing
the
connection
between
those
pipelines.
E
What
we
did
we
got
all
of
the
descriptors
and
loaded
into
dexter,
and
let
me
show
you
how
this
looks
like
the
whole
pipeline
put
it
into
texture.
E
E
There
is
this
nice
selector,
where
you
can
just
select
sub
select
of
the
pipeline,
which
can
be
super
useful,
especially
if
you
try
to
understand
your
pipeline
or
if
you
want
to
change
some
job
and
you
are
interested,
what
other
jobs
will
be
can
be
affected
with
that
change
or
or
even
if
you
are
doing
some
kind
of
debugging
where
you
are
interested
in
if
this
job
failed,
what
others
can
be
affected.
E
So
I
think
it's
a
pretty
cool
thing,
so
now
we
have
all
of
the
jobs
and
and
we
we
can
load
into
dexter
all
of
these
and
we
can
generate
from
our
jobs,
solids
and
all
of
the
services
can
be
loaded
into
dexter.
E
But
another
thing
what
we
wanted
to
achieve,
like
the
similar
user
experience,
what
we
have
currently
or
even
better
here
as
well-
and
here
is
the
workflow-
what
we
come
up,
how
you,
how
you
develop
your
ether
on
your
with
a
job?
E
So
basically
the
workflow
is
the
following:
you
as
a
user.
You
start
working
on
your
new
shiny
ethereal
job.
You
start
a
local
development
environment.
Local
development
environment
is
basically
doc
dexter
running
in
docker
and
locally
so
and
there
you
can
start
working
on
your
job
testing
and
even
you
can
go
to
access
services
with
your
own
credential.
When
you
are
happy
with
your
jobs
with
your
job,
you
have
to
create
a
pull
request
in
github,
and
then
somebody
reviews
that
and
in
the
meantime,
as
well
jenkins,
runs
a
check
on
this
job.
E
What
we
do,
what
we
are
actually
checking
it's
another.
I
think
pretty
nice
feature
indexer,
that
you
can
introduce
modes
as
well,
you
you
can
create
multiple
modes
and
we
introduce
this
test
mode
where
which
actually
not
touching
any
of
the
resources,
but
what
it
does.
E
It
just
runs
the
whole
pipeline
and
basically
checks
if
there
are
circular
dependencies,
if
there
is
any
config
issues
and
and
if
we
are
able
to
run
the
whole
pipeline
without
running
on
actual
resources,
which
is
cool,
if
that
passed,
then
you
can
deploy,
we
are
using
the
kubernetes
executor
with
select
as
a
salary
executor.
E
So
what's
happened
in
this
case,
so
you
have
the
your
job.
Basically,
these
in
the
end,
what
you
do
is
basically
just
committing
an
adjacent
file
into
a
repo
based
on
that
we
we
run.
We
do
a
test
run
and
if
everything
is
fine,
then
we
create
a
docker
image
from
all
of
these
descriptors
and
and
with
and
deploy
it
to
dexter
as
a
user
code
separately.
E
In
the
salary
queue
in
various
resource
resource
queues,
like
we
defined
a
separate
queue
for
regis
for
presto
and
hadoop
and
python
in
hadoop
that
one
where
like
spark
and
jobs,
are
running
and
basically
in
this
way,
we
can
make
sure
that
these
cues,
when
when
a
job,
is
executing
from
the
red
shift
keys,
then
we
can
make
sure
that
only
like
five
parallel
jobs
is
running
and
it
can
happen
can't
happen
that
we
were
overrunning
a
ratchet
cluster
and
no
one
else
can
be
basically
querying
it,
which
is
not
a
good
thing
if
it
would
happen
and
another
benefit
running
on
kubernetes.
E
All
of
these
jobs
are
running
in
a
separate
pod,
which
is
nice,
because
jobs
can
interfere
with
each
other
if
it's
using
to
manage
memory
cpu,
whatever
the
pod
got
killed,
but
the
other
jobs
can
run,
which
is
cool
and
also
another
cool
thing
in
the
salary
executor
that
all
this
prioritization
is
there.
So
it's
even
three
dollar
priority
settings
incident
treat
priority
settings
and
it's
super
nice,
but-
and
we
also
got
a
few
nice
additional
values
using
dexter.
One.
Is
this
nice
data,
lineage
visualization?
E
What
I
showed
you
before
the
other
one
is
this
pipeline
performance
monitoring,
which
is
pretty
nice
because
most
of
the
time,
if
it
turns
out
you
want
to
figure
out
if
your
pipeline
is
running
slower
than
expected
and
also
then
you
want.
You
are
interested
in
why
and
which
solid
your
job
runs
longer
than
before,
because
maybe
there
was
some
kind
of
change.
Somebody
committed
a
change
there
which
caused
this
or
or
maybe
there
is
some
issue
with
your
hadoop
cluster
or
whatever.
E
I
think
it's
a
pretty
nice
ui,
where
you
can
basically
see
the
logs
immediately
and
you
have
this
nice
filter
as
well
twittering
down
what
you
are,
and
here
the
the
solid
selectors,
where
you
can
only
see
that
portion
of
the
pipeline.
What
you
are
really
interested
in.
E
And
the
testing
capability,
it's
super
nice.
Actually,
that's
what
I
showed
you
in
the
in
the
github
example
or
the
junkies
example.
So
it's
super
cool
and
and
we
we
can
make
sure
that
we
are
letting
way
less
gear
begin
with
this
running
the
whole
pipeline
in
a
test
run
and,
of
course,
it's
nice
type
and
config
checking
which
comes
automatically.
E
So
this
is
where
we
are
and
actually
but
we
still,
we
are
still
working.
So
this
migration
is
still
in
progress.
So
basically
we
are
now
at
10
percentage.
So
we
migrated
that
10
percent
of
all
of
our
jobs.
We
are
slowly
migrating.
We
are
basically
marketing
a
few
jobs
testing
if
it
works
fine
and
then
going
back
and
and
trying
to
migrate,
more
jobs.
E
This
type
of
to
migrate
their
own
jobs,
improve
backfield
capabilities.
I
was
super
happy
seeing
that
there
will
be
a
bunch
of
improvements
around
that
we,
we
really
would
like
to
see
that
and
and
yeah.
This
is
something
what
we
as
we
are
working
on,
to
improve
that
and
introduce
other
quality
checks.
E
So
currently,
as
I
told
you,
quality
checks
is
basically
if
there
is
a
file
or
not
or
that
if
there
is
a
table
or
not
or
if
there
is
at
least
one
row
in
the
table
or
not,
but
we
can.
We
would
like
to
introduce
more
sophisticated
quality
checks
as
well
later
on
and
last
but
not
least,
thank
you
dexter
team.
I
think
it's
super
nice
and
I
I
we're
really
happy
with
the
cooperation
and
all
of
these
things
what
you
implemented.
A
Yeah
thanks
so
much
tomas.
That
was
excellent.
You
guys
have
done
a
ton
of
interesting
work
and
thank
you
likewise
for
being
great
partners.
We,
and
also
for
setting
a
new
standard
with
presentation,
production
values
is
that
was
an
unreal.
You
know
journey
through
time
and
space,
so
you
know
we
have
to
up
our
game.
I
think,
but
yeah
just
wanted
to
open
it
up.
A
You
know
we
have
about
five
minutes
left
and
wanted
to
open
up
for
any
questions
feel
free
to
either
put
it
in
the
chat
or
to
speak
up.
There's
not
too
many
people
here.
So
I
think
we
can
manage
it,
and
you
know
questions
about.
Oh,
and
I
know
stafford
tenno
plans
are
anything
that
tomas
has
worked.
A
A
Okay
looks
like
everyone's
a
bit
shy.
Wait
a
few
more.
A
Seconds,
okay,
cool,
well,
everyone
feel
free
to
you
know,
obviously
we're
in
slack
all
day
every
day,
so
you
can
feel
free
to
follow
up.
Oh
here
we
go
regarding
the
version
memoization.
Will
this
include
changes
in
run,
config
hugo?
Could
you
do?
You
may
want
to
speak
up
and
that's
a
fairly
generic
question.
I'm
not
sure
exactly
what
type
of
friend
config
changes.
A
C
Yep,
so
the
question
essentially
is
so
you
illustrated
how
like,
using
that
feature,
you'll
be
able
to
rerun
pipelines
and
only
re-run
solids
that
have
had
any
code
changes.
C
My
question
is:
would
that
functionality
also
apply
say
if
you
changed
configurations
for
solids
down
the
pipeline,
would
you
be
able
to
use
that
to
kind
of
only
rerun
those
solids
which
have
had
configuration
changes
so
like?
If
does
that
make
sense.
A
Yeah,
it
makes
total
sense
sandy.
Do
you
want
to
weigh
in
on
that.
D
Hey,
I'm
sorry,
do
you
mind
saying
that
one
more
time.
C
Sure
yeah,
so
the
question
is:
if
so,
if
you're
updating
making
code
changes
like
you'll
be
able
to
rerun
the
pipeline
and
only
rerun
solids
that
have
had
code
changes,
would
that
apply
to
making
changes
in
the
run
config
of
solids
down
the
pipeline.
D
Got
it
yeah,
that's
a
great
question,
so
the
answer
is
yes,
we
would.
The
version
for
a
particular
solid
would
be
based
on
a
particular
step
would
be
based
on
the
version
of
the
solid
definition
itself,
and
it
would
be
also
based
on
the
all
of
the
run
config
that
affects
that
step,
so
configuration
for
that
particular
solid
and
then
configuration
for
any
resources
that
that
solid
depends
on.
A
A
Cool,
thank
you,
hugo
yeah,
and
with
that,
thank
you
for
coming
and
thank
you
for
all
the
presenters
tomas.
You
put
an
incredible
amount
of
effort
into
that.
So
thank
you
very
much
and
then
you
know
sandy
max
we're
gonna
have
to
up
our
slide
deck
game
right,
but
you're
both
of
your
guys
did
a
great
job
as
well.
So
thanks
the
whole
team
and
yeah
we're
looking
forward
to
producing
all
this
stuff
for
you
in
the
next
few
months.
A
I
think
it's
a
lot
of
exciting
stuff,
so
thank
you
all
and
we
will
be
posting
this
online
as
well.