►
Description
This is a kickoff meeting for the Dagster community announcing Dagster 0.8.0. This is the biggest release for the project since inception.
A
A
So
first
order
of
business
is
I
just
want
to
thank
everyone
here
for
being
early
users,
and
you
know
our
purpose
and
mission
is
to
serve
you
all
and
ensure
that
you're
successful,
but
we
also
couldn't
do
our
jobs
without
the
feedback
and
without
your
willingness
to
kind
of
take
a
bet
on
a
system,
that's
still
in
development
and
we're
deeply
appreciative
of
how
positive
your
engagement
has
been,
how
patient
you've
been
with
us
and
how
get
your
feedbacks
it
has
been.
So
thank
you
very
much
for
that.
A
So
what
are
we
going
to
talk
about
today?
Kind
of
the
bulk
of
the
meeting
will
be
updates
on
the
OEO
release
that
we've
codenamed
in
the
zone,
and
it
really
is
the
biggest
and
most
significant
release
since
project
inception.
Both
in
terms
of
the
features
exposed
to
you
as
well
as
kind
of
the
core
architectural
changes
that
we
think
will
improve
stability
as
well
as
open
up
our
design
space
for
the
future.
Also
kind
of
want
to
get
a
sense
of.
A
A
You
know-
hopefully
you
guys
can
start
talking
to
each
other,
because
a
lot
of
our
success
will
be
dependent
on
developing
an
active
community
where
people
can
help
each
other
out
support
each
other
build
tools
for
each
other,
and
then
we
want
to
talk
about
community
growth,
we're
about
to
enter
a
new
phase
of
the
project
where
we're
about
to
be
more
public-
and
you
know
that's
in
the
interest
of
all
of
us,
and
it
also
requires
help
from
the
community.
So
I
want
to
talk
about
that
a
bit.
A
So
let
me
summarize
kind
of
the
big
topic
areas
that
we're
going
to
talk
about.
So
what
are
our
core
architecture?
Changes
as
you'll
see
they're
pretty
significant,
even
though
it
didn't
require
breaking
changes
and
we're
really
excited
for
our
direction.
There
we
completely
revamped
agate
and
you
think
it's
a
really
positive
change.
We
also
added
a
totally
new
feature
which
is
still
in
its
infancy,
but
I
think
indicates
in
a
lot
of
ways
the
direction
we're
gonna
go.
I'm
gonna
describe
kind
of
a
grab
bag
of
things
that
we
think
is
relevant
to
everyone.
A
A
A
Likewise,
you
know
we
want
to
continue
to
improve
our
operational
stability.
We
think
a
lot
of
these
changes
improves.
We
also
really,
you
know
organically
we've
had
a
few
different
archetypes
of
users
and
teams
approached
us,
but
one
that
we're
very
excited
about
our
teams
that
want
dagstar
to
kind
of
enable
other
teams
that
they
serve
to
interact
better
with
each
other.
A
So,
very
typically,
it's
like
a
Data
Platform
team
who
has
data
science,
constituents,
analysts,
constituents
and
data
engineering
consistence,
and
we
think
you
know
these
changes
that
we
have
will
really
enable
that
multi
team
platform.
We
also
learn
innate
some
duplicative
concepts.
You
know
this
kind
of
happens
throughout
time,
and
then
we
also
think
this
like
opens
up
our
design
space
significantly
for
future
developments.
So
what
did
we
actually
do
to
execute
on
those
motivations?
A
One
is
host
user
process
separation,
so
this
specifically
solves
the
problem
daggit
being
in
process
with
user
code,
and
this
is
a
major
architectural
change.
We
enable
multiple
repositories,
so
this
will
be
the
natural
seam
by
which
you
know
a
data
platform
team
can
serve
a
couple
different
teams
who
have
kind
of
their
own
namespace
of
pipelines.
A
We
consolidated
the
notion
of
start
and
launch,
which
was
confusing
both
to
our
users
and
actually
the
elemental
team
itself.
So
that's
very
useful
and
we've
also
started
to
serialize
our
pipelines
and
execution
plans
that
are
metadata,
which
has
some
applications.
Okay.
So
let's
dig
into
the
actual
features
here
so
prior
to
oh,
you
know.
The
architecture
was
like
this:
you
had
a
user
defined
repository,
which
was
loaded
in
process
by
daggit
or
any
sort
of
host
tool
is
kind
of
our
terminology
here,
and
this
had
a
number
of
problems.
A
One
is
that
the
user
dependencies
were
intermixed
with
the
system
dependencies.
So
in
particular
you
know
we
have
a
fairly
heavyweight
graph,
QL
stack
and
Dagon,
and
that
brings
in
a
ton
of
dependencies,
and
sometimes
it
was
a
conflict
where
the
user
dependencies,
those
user
dependencies
also
were
intermixed
with
each
other.
So
even
data
signs
scene,
which
has
a
wholly
different
stack
than
your
data
engineering
team,
that
was
totally
intermixed.
So
the
big
problem
there,
as
I
mentioned
before
year,
that
we
start
Daggett
when
the
user
code
changed
and
as
I
just
mentioned,
I.
A
Me
you
cannot
separate
different
user
environments
from
each
other
and
then
just
in
general
hosting
user
code
in
a
process
that
should
stay
up
forever
is
kind
of
risky
right.
A
user
crash
could
bring
down
Daggett
and
also
this
allows
us
the
opportunity
to
provide
like
fixed
images
for
tools
such
as
daggit
that
can
be
used
off
the
shelf
without
modification,
so
Mabel's
live
exciting
stuff.
So
what
did
we
end
up
doing
here?
A
Well,
you
still
write
your
code
and
you
have
to
load
it
into
a
user
process,
but
instead
of
it
instead
of
Daggett
loading
it
into
process,
we
communicate
over
an
API.
Currently,
this
is
kind
of
just
a
shell,
a
command-line
tool.
We
will
quickly
be
moving
this
to
G
RPC.
Actually,
so
this
API
is
used
to
both
query
metadata,
so
Daggett
will
query
the
user
process
for
the
shape
of
pipelines
and
other
artifacts
in
the
system.
A
This
API
is
also
used
to
instigate
computation,
meaning
both
execute
pipelines
as
well
as
execute
sub
parts
of
the
plans.
So,
oh
this
section
reminds
me
for
those
who
are
just
joining
or
those
who
are
listening
on
YouTube
I
should
have
stated
this
beginning.
This
presentation
is
targeted
towards
people
who
have
quite
a
bit
of
familiarity
with
the
system.
A
The
other
thing
that
enables
is
like
a
multi
repository
world,
so
we
have
an
API,
but
now
we
can
actually,
you
know,
query
to
different
repositories
that
live
in
two
different
environments,
and
this
is
really
exciting
because,
for
example,
these
can
be
totally
separate,
Python
environments
and,
in
fact,
different
Python
versions.
So
imagine
you
have
one
team
that
he
can
my
Python
2:7,
but
you
want
to
migrate
the
rest
of
the
system.
Well,
this
allows
you
to
do
that.
A
So
in
this
example,
you
have
on
the
left
Dexter
using
the
flat
modern
design,
which
indicates
Python
3
and
this
other
environment.
Up
top,
let's
say
it's
a
data
science
team
or
something
that's
using
the
legacy:
Python
2
7,
environment
and
that
works
fine
yeah,
and
then
you
can
docker
eyes
it.
Alright,
we
don't
support
this
yet,
but
that
expect
that
support,
at
least
in
early
form
within
the
next
week
or
two,
where
we'll
effectively
communicate
with
a
docker
container
over
this
G
RPC
interface.
And
then
you
know
just
to
tease
this.
A
This
opens
up
the
design
space
to
actually
support
additional
languages
in
the
future
that
can
implement
the
full
Dexter
spec.
So
this
is
really
kind
of
an
exciting
direction
for
us,
and
then
you
can
also
have
multi
or
repositories
within
the
same,
what
we
call
repository
location,
so
lots
of
folks
do
this,
where
they
actually
want
to
standardize
their
environment,
but
still
have
multiple
teams
for
logical
organization.
So
we
support
both
of
those
operating
modalities.
A
Ok,
so
to
support
this
new
whole
concept.
We
added
this
concept
called
workspaces,
so
it's
both
a
file
format
and
an
abstraction,
and
it
defines
a
collection
of
repositories
and
repository
locations
that
a
host
tool
interacts
with
users
express
what
their
workspace
is
using.
This
new
format
works
based
on
yamo,
it
replaces
repository
done
animal
and
but
we
will
support
backwards.
Compatibility
for
A+
Ianto,
oh
now,
I
know
and
TVD
on
timeline
for
that.
So
what
does
this
new
format?
Look
like?
A
The
simple
form
is
kind
of
it
just
says
it's
very
familiar
to
the
Past
for
posit
Ori
animal
we're
in
the
only
difference
is
kind
of
like
the
words
and
the
fact
that
it's-
and
this
just
says,
load
from
the
Python
file.
From
this
repository,
we
also
as
you'll,
see
added
the
notion
of
auto
discovery
of
repositories
in
a
Python
module,
which
means
you
know
how
I'm
gonna
have
to
specify
the
function
name
where
the
repository
lives,
which
certainly
drove
me
crazy.
A
A
A
And
so
now
you'll
see
here.
This
is
a
preview
go
through
day
and
more
carefully,
but
you'll
see
here,
there's
this
new
switcher,
and
so
you
can
switch
between
your
different
repositories.
It
actually
shows
you
that
they
are
coming
from
different
python
environments.
So
if
we
instigated
execution
from
a
pipeline
in
here,
it
would
actually
execute
within
Python
two
seven-
and
here
it
didn't
say
it
as
execution
within
Python,
three,
three,
six
and
thirty,
seven,
whatever
it
is
preset
and
so
yeah
this
works.
A
A
Actually,
okay,
this
works,
so
this
has
been
in
the
weeds
and
the
technical
details.
Don't
matter
that
much,
but
we
now
preserve
pipeline
structures
historically.
So
what
that
means
is
that
previously,
if
you
had
a
pipeline,
say
named
foo
and
then
you
executed
it
and
then
you
changed
its
shape.
Often
you
couldn't
view
it
historically
and
if
you
kind
of
clicked
on
it
from
the
runs
view,
it
would
view
the
current
pipeline
shape,
which
is
actually
pretty
misleading.
A
So
now
what
we
do
is
we
actually
persist
a
metadata
format
for
the
pipeline
structure,
and
this
is
both
content,
addressable
and
normalized
for
efficient
storage.
That
means
that
don't
worry,
we're
not
going
to
over
persist.
Anything
if
you
run
the
same
pipeline
shapes
a
thousand
times
it's
a
serialized
representation
is
only
stored
once
but
I.
Think,
like
the
takeaway,
the
big
you
know,
takeaway
from
this
is
that
you
know
the
direction.
We're
going.
Is
that
your
instance
it's
DB?
We
want
it
to
be
in
an
immutable
log
of
everything.
A
Thank
you
all
right
right
as
I
was
saying
pipeline
structures,
a
person
historically,
content
us
will
normalize
for
efficient
storage,
but
you
know
the
big
takeaway
is
that
this
is
an
instance
DB,
that's
an
immutable
log
of
everything.
That's
happened
in
your
data
application
and
that's
kind
of
our
philosophy
here
so
kind
of
as
we
mature
the
system.
A
What
we
really
want
to
be
able
to
do
is
that
any
time
there's
a
computation
instigated
you
can
go
back
and
really
understand
what
happened
why
it
happened,
and
this
is
a
layer
that
will
build
more
advanced,
lineage
tools,
compliance
tools,
etc.
So
it's
a
great
fundamental
base
to
build
on
next.
The
Daggett
revamp
so
kind
of
the
big
philosophical
change
with
Daggett.
Is
that
we're
much
more
pipeline
centric
now,
rather
than
functional
functional
area
centric,
and
we
think
this
is
a
much
better
way
to
structure
the
system.
A
A
We're
gonna
go
to
the
launch
terminal
pipeline,
and
here
you
can
click
and
get
to
this
overview
page,
and
this
overview,
page
I,
think
is
incredibly
useful,
so
you
can
go
and
click
on
a
pipeline
and
kind
of
instantaneously
get
context
on
both
its
shape,
but
also
like,
if
you
were
a
good
engineer
and
actually
provided
a
rich
description
of
this
thing
which
I
have
not,
and
but
here
you
can
see
like.
Oh
this
thing
is
running
on
a
schedule.
It
last
ran
on
June
13th.
It
has
all
these
assets
which
we'll
get
into.
A
We
think
this
is
just
like
a
really
exciting
direction,
where
you
can
quickly
navigate
to
a
pipeline
and
get
immediate
context
of
everything
that's
going
on.
We
also
have
it
a
much
improved
run
page.
The
performance
has
been
dramatically
improved,
so
that's
great
and
we've
also,
you
know
a
lot
of
our
users.
A
lot
of
you
have
really
been
using
tags.
The
great
effect
and
they've
become
like
what's
more
important
to
the
system
and
you
know
so.
A
D
C
A
Alright,
let's
talk
about
this
asset
management
piece,
so
you'll
notice
there's
a
new
thing
you
can
do
and
if
you
yield
a
materialisation
from
a
solid
and
there's
this
new
concept
called
an
asset
key
and
what
this
is
is
a
much
more.
We
previously
had
this
thing
called
label
which
we
still
support,
but
that
was
as
pure
metadata
that
was
displayed
in
your
event,
log
asset-
he
has
much
more
precise
semantics
and
we
actually
index
asset
key
and
allow
it
to
build
up
kind
of
a
collection
of
assets.
A
A
You
could
create
asset
keys
if
you
had
a
scheme
for
identifying
your
emails
that
you
send
up.
So
this
can
both
be
more
traditional
data
assets
like
a
table
or
partition
somewhere,
but
you
can
also
use
it
for
a
whole
number
of
purposes,
and
you
know
we
try
to
keep
these
things
not
overly
tailored
to
a
use
case
so
that
the
community
can
innovate
on
top
of
it.
A
I
accepted
I
fully
expect
that
we'll
be
use
cases
that
you
all
come
up
with
that
we
don't
anticipate
because
the
universe
of
data
applications
is
so
heterogeneous,
there's
novel
things
so
the
moment
you
start
indexing
these
assets,
you
get
access
to
this
asset
manager
and
you'll
notice.
That
kind
of
the
asset
keys
are
expressed
in
this
type
of
head.
You
can
navigate
to
one
of
them.
A
You
can
see
that
oh,
this
is
the
last
run
that
touched
it
here
kind
of
the
last
time
it
happened
here,
its
properties,
then
you
can
graph
over
the
properties
that
you
annotate
it
with.
So
you
know
obvious
things
are
to
track
the
number
of
database
rows
over
time.
The
number
of
bytes
stored.
But
again
this
is
really
really
flexible.
We
anticipate
a
lot
of
uses.
A
So
this
really
gives
us
the
opportunity
to
build
this
novel
system
of
record
for
metadata
and
that
the
novelty
we
believe
is
that
linkage
between
computation
and
assets
and
the
programming
model.
You
know
the
moment
that
you
the
moment
that
the
leaf
developer
yields
of
materialization
with
that
asset
key
it
automatically
gets
indexed
in
the
system.
Then
you
know
this
is
just
the
beginning
of
this
direction.
A
A
A
This
one
doesn't
any
grass
we
can
see
like.
Oh
the
last
time,
it's
precise
here,
but
other
interesting
things
is
that
this
actually
is
hierarchical.
So
you
see
poor
scott
cost,
dashboard
dashboards,
dot
traffic
dashboard.
So
now
I'm
get
down
here
and
this
ends
up
being
this
dynamic
folder
structure.
So
you
can
imagine,
as
an
asset
key
scheme,
that
for
your
s3
data,
late
mimics,
how
you
lay
out
your
partitions,
but
it
still
you
can
still
navigate
directly
to
the
asset
key
in
general.
A
So
we
think
is
gonna
be
super
useful
and
you
know
the
idea
is
that
imagine
that
you
have
some
file
in
s3.
You
have
no
idea
where
it
came
from
last
time
I
was
updated
what
computations
that
it's
etc.
You
can
navigate
to
this
asset
manager
type
in
part
of
the
s3
key
and
figure
out
what's
going
on,
and
we
think
this
is
gonna
be
incredibly
powerful
tool
and
it's
fairly
obvious.
You
know
how
you
can
extend
this,
to
data
quality,
to
lineage,
to
more
sophisticated
metadata
properties,
etc.
A
Alright
see
this
again,
so
some
additional
things
that
we've
added
what
I
won't
believe
are
too
much,
but
you
know
kind
of
prior
to
oh
it.
Oh
yeah,
do
you
a
with
a
scheduler,
was
kind
of
part
of
the
rougher
parts
of
the
system
and
for
those
of
you
who
have
worked
with
us
on
that,
thank
you
so
much
for
bearing
with
us.
You
know,
we've
effectively
built
better
tools
for
debugging
and
liability,
so
I
hope.
A
A
So
what
I
mean
by
that
is
we
always
used
to
have
these
reax
accusin
features,
but
they
were
actually
quite
I,
think
confusing
and
buried
and
had
a
lot
of
problems.
So
let's
go
to
the
playground
here,
and
so
this
is
a
very
simple
pipeline
that
has
you
know
for
solids,
bread
and
we've
configured
it
to
actually
persist
the
intermediates.
Here
and
now
we
can
launch
this
computation.
A
A
We
don't
really
express
the
lineage,
but
it's
formally
modeled
in
the
system,
and
but
we
group
these
to
say
like
actually,
we've
been
retrying
this
stuff
and
you
can
actually
say,
for
example,
let's
say
I
want
to
say
like
actually
I
want
to
start
off
with
this
multi
eyelid
and
then
everything
after
it.
So
you
can
see
we've
selected
those
two
things:
we've
actually
filtered
the
event
log
by
that,
and
now
we
can
say
actually
only
I
only
want
do
these
two
things.
A
So
we
think
this
is
going
to
be
super
useful
both
for
local
development,
as
well
as
for
operational
standpoint.
I
don't
have
time
to
do
an
advanced
EMA,
but
you
can
actually
imagine
a
world
where
there's
a
long-running
computation
and
if
Forks
and
then
one
of
the
forks
fails-
and
you
know
you
have
to
debug
something
fix
something.
Yuri
instigate
computation
of
that
fork
thing
that
failed.
Now
you
have
to
running
computations
that
are
all
part
of
what
we
call
the
same
run
group.
A
If
you
go
to
the
runs
page,
you
can
see
the
lineage
information
here.
So
if
you
wanted
to
see
all
the
runs
related
to
some
root
idea,
you
could
actually
click
on
it
by
the
way
guys,
we
should
probably
add
the
root
remedy
to
the
actual,
the
okay.
So
we
think
that's
a
really
fun
feature
again,
that's
kind
of
like
just
getting
started.
So
please
give
us
feedback
on
that.
A
Okay,
so
we
have
improved
PI
spark
support,
I'm
not
going
to
dig
into
it,
because
not
everyone
uses
PI
spark,
but
if
they
do
use
PI
spark
your
world
has
got
a
hell
a
lot
better.
You
can
kind
of
express
your
place
for
computations
more
abstractly,
and
we
actually
you
know
you-
can
just
express
your
business
logic
and,
for
example,
we
currently
support
EMR,
which
we
wrote
and
then
data
breaks,
which
is
a
community
contribution.
Then,
if
you're
on
the
line.
A
Thank
you
very
much
those
extraordinary
work,
but
you
know
you
can
actually
shift
the
computation
between
your
local
machine,
EMR
and
data
bricks
with
no
business
logic
changes
whatsoever,
which
is
pretty
much
a
game
changer.
We
also
have
a
new
kind
of
experimental
programming
model.
Called
lake
house.
I
won't
dig
into
that
in
any
sort
of
detail
right
now,
but
think
DBT
for
spark
code,
not
just
sequel,
but
that
same
sort
of
philosophy,
and
we
think
that's
a
really
exciting
direction.
A
So
if
you
want
to
kind
of,
you
know,
be
on
the
bleeding
edge
of
PI
spark
support,
we're
really
interested
in
working
with
users,
and
we
think
we
can
be
like
a
massive
game
changer
for
those
developers
also
want
to
emphasize
that
we
now
have
kind
of
a
fully
supported
still
in
development,
Daxter
native
orchestration
cluster,
meaning
that
if
you
want
to
distribute
the
orchestration
across
a
cluster
in
kubernetes,
we
do
have
kind
of
a
out-of-the-box
ish
solution.
For
that
this
has
a
ton
of
nice
properties.
A
Every
step
can
be
configured
or
by
default,
I
believe
executors
own
its
own
ephemeral
pod.
So
you
have
total
per
step
isolation.
You
can
set
resource
management,
so
you
can
set
memory
limits
on
a
per
step
basis.
You
can
setup
queues
so
that
you
have
parallelism
units.
So
imagine
you
only
want
to
run
for
outstanding
queries
against
redshift
at
any
one
time.
You
can
limit
that
parallelism
using
this
system
and
you
can
actually
do
inflate
code
update
if
you
configure
it
properly.
A
We
have
some
API
and
naming
improvements,
so
you
know
this
is
continuing
to
evolve
and
we
support
misspelled
environment,
but
it's
actually
spelled
correctly
in
the
codebase
that
we
renamed
environment,
config,
dict
to
run
config.
So
for
those
of
you
been
in
the
system
from
beginning,
we
used
to
have
a
run.
Config
then
get
rid
of
it.
Now
we
reintroduced
it,
but
you
know.
A
Better
final
end
state
for
the
system
for
config.
We
have
renamed
on
all
our
things
like
solid
definition,
resource
definition,
etc.
We
have
renamed
config,
config
schema.
We
think
this
makes
it
much
more
clear
that
you're
passing
a
schema
into
one
thing
and
then
you're
actually
passing
the
body
of
the
config
that
must
conform
to
that
schema
to
other
API.
It
goes
much
more
clear
and
then
we
have
a
new
decorator
for
repositories,
which
is
much
more
kind
of
similar
in
spirit
to
our
other
artifacts
in
the
system.
A
This
is
actually
very
convenient,
so
you
know
you've
before
thing
we
define
this
function,
return
a
thing,
and
now
you
just
you
know,
the
information
is
encoded
in
a
much
more
efficient
manner
here.
This
also
enables
auto
discovery,
which
is
really
nice.
So
now
you
can
just
point
daggett
to
a
Python
module
either
a
file
or
installed
module
and
I'll
discover
the
repositories
and
love
it,
which
is
a
small
thing,
but
is
very,
very
nice.
Actually,
that
was
driving
me
crazy,
as
I
mentioned
earlier
in
the
presentation,
I'm
sure
of
some
of
you
crazy.
A
So
you
know
earlier
in
the
system,
we
really
emphasized
that
we
can
be
used
as
a
software
distraction
over
air
flow
and
that's
still
true,
it's
clear
that
we
are
seen
as
an
alternative,
the
air
flow
and
we're
gonna
reframe
our
communication
and
somewhere
our
systems
in
order
to
accurately
address
that.
So
you
know
there's
ten
of
two
components
here:
one
we
have
these
DAGs
two
native
execution
environments,
so
we
have
folks
running
on
their
Kate's
plus
Ellery
package,
and
we
have
folks
also
using
desk
as
an
orchestration
cluster.
A
So
you
don't
need
to
use
air
flow
as
an
execution
engine,
and
now
we
have
a
feature
where
you
can
actually
automatically
ingest
air
flow
tags
and
dag
bags
into
daxter
and
execute
it
on
that
Dexter
native
infrastructure.
So,
let's
just
talk
about
you
know
how
we
think
about
this
quickly,
so
we
we've
had
this
capability.
You
know,
let
me
talk
about
this,
so
you
know
we're
trying
to
account
for
a
couple
organizational
dynamics
here
about
the
way
things
are
adopted.
A
So
often,
sometimes
it's
the
leaf
developers
as
we
call
them
who
see
our
new
fancy
tool
and
they
see
the
Gantt
chart.
They're
like
I,
want
to
use
this,
but
there's
a
desisting
ops
team
that
have
runs
all
the
infrastructure
and
they're
like
oh
cute.
They
want
to
use
another
tool,
we're
not
gonna
do
that,
and
so
we
have
one
method
that
we
call
inject,
which
allows
you
to
do
that.
This
is
what
is
a
lot
already
prior
to
beta,
and
this
is
leaf.
A
Developers
can
write
to
the
texture
API
and
then
you
can
actually
put
a
function,
call
in
your
dag
file
and
air
flow
and
compile
that
dag
to
airflow
operand
the
air
flow
infrastructure.
You
do
have
to
add.
You
know
a
little
bit
of
infrastructure
to
support
that,
but
you
can
use
air
flow
as
an
execution
engine
and
then
use
kind
of
the
dexstar
native
tools.
A
That's
useful
for
kind
of
wedging
your
way
into
the
system,
but
then
at
some
point
maybe
the
ops
people
are
the
ones
who
instigated
this
or
they
start
to
see
the
value
there
and
they're
like
well.
Maybe
we
should
move
our
whole
system
on
there
and
then
the
dynamic
inverts,
where
it's
like
OQ
the
ops
people
want
us
to
migrate
our
own
code,
we're
not
going
to
do
that,
and
so
this
is
where
we
call
ingest
here.
D
A
A
A
We're
gonna
push
out
a
blog
post
a
week
from
today,
which
would
kind
of
be
a
big
long
one
like
last
year,
if
you
call
that
we
love
help
from
you
all
publicizing
that
day.
So
probably
news
retweets
all
that
stuff
and
then
we're
also
going
to
be
started.
Producing
technical
content
and
as
a
nice
corollary
to
that,
we
believe
our
documentation
will
dramatically
improve
from
here
on
out
and
then
we
are,
you
know
we
did
a
lot
of
work.
You
know
it.
Oh
we
plan
on
doing
a
lot
of
work.
A
You
know
and
I
know,
and
our
team
is
growing,
so
you
know
we're
in
skinny
an
internal
planning
process.
But
if
you
know
right
now,
as
early
adopters,
you
have
the
opportunity
to
really
influence
our
roadmap
and
we
want
you
to
so.
Please
proactively,
reach
out
to
us
and
give
us
kind
of
your
feedback.
What's
your
biggest
pain
point,
what
you
see
the
opportunities
are,
and
you
know
we
will
incorporate
that
feedback
into
our
planning
process
and
with
that
kind
of
wrapping
up
here,
I
know
that
was
probably
a
firehose
of
content.
A
D
A
A
A
David
gamba
not
David,
Katz
muted.
That
is
definitely
a
direction.
We're
looking
in,
and
you
know
also
looking
for
feedback
on
exactly
what
you
use
cases
if
it's
s3
or
other
other
event
sources.
Next
question
Netflix
has
a
project
called
microbots,
so
a
lattice
of
a
pipelines
backfill
immersed
in
time?
A
C
Yeah
I
was
thinking
lattice
more
in
the
Mutulu
materialized
view
world,
but
I
think
in
the
pipeline
speaks
issue
of
a
deck.
So
if
I
know
and
a
you
and
materialized
asset-
and
let's
say
okay,
the
code
is
wrong.
We
have
to
redeploy
the
Python
code
such
that
we
wanted
to
be
simply
just
backfield
that
note
and,
of
course
it
will
human,
a
new
materialized
asset
if
I
have
tantrum
deck.
That
depends
on
them
at
realized
asset.
For
example,
it
was
yesterday
data
on
a
node
that
I
could
back
view.
C
A
It's
a
super,
interesting
thought.
You
know,
I
do
think
this
has
set
abstraction,
does
give
us
the
opportunity
to
model
those
dependencies.
You
know
I
can't
just
off
top
my
head
and
kind
of
think
about
all
the
implications
of
interrelation,
but
you
know
what
are
the
directions
that
we
can
focus
on
certainly
is
really
going
for
fully
incremental
computation.
You
know
we.
C
A
A
So
Ken
for
the
Kate
celery,
executors
launcher.
We
need
you,
so
many
workers
do
separate
jke
jobs.
Why
do
we
need
to
use
selling
workers,
so
maybe
Nate
or
Alex
can
speak
to
this,
but
I
believe
that
you
can
also
just
use
a
pure
kubernetes
thing,
but
effectively.
Why
we
use
celery
is
that
celery
implements
a
number
of
features
like
resource,
pooling
and
prioritization
that
we
felt
we
could
just
use
out
of
the
box
rather
than
implement
our
own
kind
of
reimplementation
of
that,
but
neat
join
us
speak
to
this
yeah.
D
That's
pretty
much
it.
We
wanted
to
kind
of
a
way
to
control
the
parallelism
of
the
number
of
concurrent
kubernetes
jobs
that
were
spawning
during
execution
of
a
pipeline,
and
so
we
use
the
celery
workers
as
that
kind
of
intermediate
layer.
But
there's
no
kind
of
reason.
You
couldn't
have
a
version
of
this
that
just
invokes
the
jobs
directly.
A
A
A
You
know
I
do
think
there
thanks
Phil
I,
do
think
and
Demetrio
get
your
question.
I
didn't
see
that
one
before
I
skipped
assignments
that
you
know
they're
they're,
interesting
services
out
there,
which
kind
of
forked
focus
on
what
you
call
choreography
and
micro-services.
You
know
cadence
comes
to
mind.
I
think
the
feature
retains
way
more
invasive
than
we
are,
but
they
have
this
interesting
kind
of
durable
function.
A
Architecture
which
allows
you
to
kind
of
like
sweep
the
entire
choreography
for
days
at
a
time
somehow
and
daxter
really
isn't
oriented
to
do
that,
but
I
mean
I,
think
you're
right
that
we
are
currently
focused
on
consuming
and
producing
data
assets,
but
the
you
can
actually
model
a
number
of
different
software
systems.
Using
that
framework,
you
know,
like
a
build
pipeline,
for
example,
could
be
easily
modeled
on
Dexter
and
under
certain
strains
of
choreography,
between
microservices
could
also
be
orchestrated,
so
I
think
it.
A
The
let's
go
to
Dimitri's
question
question
about
daggit
separation
from
pipeline
user
code.
What
happens
in
the
pipeline
is
updated
in
posit
ori
while
is
executed
by
Daggett.
So
this
depends
on
the
semantics
of
your
run
launcher.
If
most
is
the
most
important
thing,
I
would
say
so
and
so,
for
example,
kind
of
the
default
run
launcher
that
comes
out
of
the
box,
which
is
kind
of
basic
kind
of
process.
Is
that
once
you
have
launched
that
run,
the
process
is
fixed
and
and
that's
just
the
way
it
works.
A
But
you
know
there
are
also
configurations
where
it
kind
of
automatically
picks
up
the
code.
That's
been
updated
so,
for
example,
in
the
dexter,
celery,
kubernetes
execution,
environment,
the
you
know,
if
you
upload
the
new
docker
container,
that
contains
your
user
code
as
execution
unfolds,
the
updated
docker
container
will
be
automatically
pulled
on
by
the
workers.
So
it's
very
context
dependent
is
the
answer
and-
and
you
know
exactly.
A
A
A
The
other
interesting
question
to
ask
here
is,
you
know,
I
think
we
still
have
some
internal
confusion
actually
and
there's
still
work
to
do
on
exactly
what
the
difference
is
between
a
pipeline
and
the
composite
solid,
because
they're,
very,
very
similar,
abstractions
and
so
I
think
work
along
that
line,
you
know,
could
actually
clarify
things
as
well,
so
I
think
the
CK
as
your
questions.
Yes,
we're.
Definitely
thinking
about
it.
A
I
think
it's
not
I'm
like
one
of
the
promises
of
the
thesis
of
the
system,
that
they
aren't
just
a
bunch
of
independent
pipelines
that
this
is
like
a
data
application
that
has
complex
interrelated
parts.
So
it's
certainly
in
line
with
a
mission
having
all
that
nice
cases
from
Demetri.
Have
you
implemented
the
doctor
and
doctor
option?
A
D
We're
pretty
focused
on
trying
to
mature
the
facility,
kubernetes
integration
and
get
that
to
to
a
good
kind
of
you
know:
production
grade
spa
but
but
I
think
we
will
continue
exploring
other
other
models
for
execution.
Did
we
try
I?
Think
if
I?
Remember
correctly,
you
were
looking
at
EC
s
execution
and
that's
certainly
something
on
our
roadmap
to
to
get
something
in
place
for
for
execution
on
UCS.
A
Okay,
can
we
perform
pipeline?
We
execution
for
the
point
of
failure.
Yes,
that
is
kind
of
in
the
re
execution
menu.
We
have
three
options:
full
pipeline
selected
subset
and
then
the
just
star
me
from
where
I
failed.
So
we
directly
cover
that
use
case
sounds
like
we
need
to
surface
that
feature
more
clearly.
A
A
Okay,
well,
let's
see
we're
about
five
minutes
away
from
the
end.
Oh
here
we
go
great.
What
would
be
your
recommended
execution
environment
at
the
moment
for
production,
salary,
Cates,
yeah
I
think
that's
like
our
most
well
supported
execution
environment,
but
you
know
we
still
really
like,
and
this
is
a
part
of
the
system
we
really
mature.
But
you
know
one
of
our
thesis
is
that
we
really
want
to
try
to
be
execution
environment
independent.
So
we
want
to
be
very
clear
to
communicate
that
yeah.
A
This
is
like
one
vertically
one
vertically
integrated
kind
of
configuration
we
support,
but
you
don't
need
it
because
you
know
for
a
lot
of
people
could
raise
the
overkill,
probably
for
most
people
that
use
it.
And
you
know
we
have
users
who,
for
example,
wrote
a
custom,
run
launcher
to
execute
and
allocate
computational
resources
on
their
own
custom
pass.
So
you
know
if
you
want
to
use
communities
if
you
want
to,
if
you
already
have
a
kinase
cluster
you
can
deploy
compute
to
you
know
that
would
be
a
recommended
execution
environment.
Probably,
but
you
know.
A
A
A
A
Can
we
get
two
minutes
on
when
G
RPC
G,
which
can
we
get
two
minutes?
I
went
to
where
G
RPC
fits
and
wind
graphed,
you
offense,
so
I
think
the
broad
thesis
here
the
broad
way
to
frame
it
is
that
graph.
Ql
is
the
interface
to
the
instance
itself,
whereas
G
RPC
is
the
way
that
inner
process
communication
happens
within
an
instance.
So,
for
example,
if
you're
building
tools
that
are
kind
of
a
peer
tool
to
Daggett,
you
will
almost
certainly
be
on
the
graph
QL
interface
like,
for
example,
I.
A
Imagine
that
we're
gonna
build
a
CLI,
remote
control
that
operates
over
the
graphical
interface,
whereas
something
like
a
run
launcher
that
is
instigating
computation
same
within
a
container,
that's
kind
of
in
the
instance.
So
to
speak,
you
would
use
the
G
RPC
interface,
but
you
know
this
is
still
like
an
evolving
architecture
and
it's
still
settling.
So
you
know
that
framework
may
change
for
this
broadly.
A
A
Great
question:
Joe
I
expect
that
anyone
who
is
working
on
this
for
a
while
and
has
any
sort
of
domain-specific
or
application
specific
logic
for
allocating
compute
resources
will
end
up
running
writing
their
own
run
launcher.
So,
for
example,
you
know
some
folks
want
to
be
able
to
tag
certain
runs
and
then
based
on
that
allocate
specific
compute
resources
say
with
GPUs
very
difficult
for
us
to
do
that
generically.
A
A
Expect
that
anyone
who
doesn't
take
like
off-the-shelf
tell
me
our
mus
were
running
on
kind
of
custom
infrastructure
will
end
up
building
their
own
run.
Launcher
schedulers
is
interesting.
I
do
believe
the
kind
of
instigation
framework
of
thinking
that
I
mentioned
in
order
to
event-based
pipeline
investigation
will
inevitably
be
more
pluggable.
I
think
it
would
be
quite
brave
to
write
a
new
instance
of
a
scheduler
at
this
moment.
Alex.
You
may
want
to
weigh
in
with
that,
to
back
up
that
assessment
or
say.
B
A
A
Great
alrighty
everyone.
Well,
we
can
wrap
this
up
now
and
thank
you
so
much
for
coming
and
again,
thank
you
for
being
early
users
of
the
system,
and
you
know
keep
that
feedback
coming.
I
hope
that
you
know
the
this
presentation
was
useful
and
if
you
find
it
useful,
you
know
we
can
keep
on
doing
these
sorts
of
events.