►
From YouTube: Dagster Community Meeting - Featuring Thinking Machines and 0.12.0 Features | July 13, 2021
Description
In our eighth Dagster Community Meeting, we heard from Carlson Cheng at Thinking Machines on running Dagster for ML pipelines in production. With our new 0.12.0 release, our team presented our new features and the road to 1.0.
👨🏫 Today's Agenda 👩🏫
Introduction: 0:00
Carlson Cheng at Thinking Machines: 1:26
0.12.0 and Road to 1.0: 19:37
Q&A: 43:08
Special Announcement 45:11
🌟 Socials 🌟
Checkout our Github ➡️ https://github.com/dagster-io/dagster
Check out our Documentation ➡️ https://docs.dagster.io/
Join our Slack ➡️ http://dagster-slackin.herokuapp.com
Visit our Website ➡️ https://dagster.io/
Follow us on Twitter ➡️ https://twitter.com/dagsterio
A
Thank
you
for
coming
to
the
and
thank
you
for
coming
to
the
july
2021
daxter
community
meeting.
Today
we
have
a
great
agenda.
We
have
two
speakers.
One
is
carlson
chang
from
thinking
machines
and
you
know
they're,
one
of
our
earliest
and
most
enthusiastic
users
and
they've
deployed
dagster.
A
You
know
across
a
bunch
of
different
companies
in
southeast
asia
and
they
especially
use
it
for
ml
training
pipelines,
which
is
a
really
exciting
use
case,
and
then
sandy
from
the
elemental
team
is
going
to
speak
about
the
o12o
release
and
in
particular
a
bunch
of
new
experimental
but
core
apis.
That
will
replace
a
lot
of
our
core
abstractions
and
fix
a
lot
of
long-running
issues
in
the
system
that
we're
really
excited
about.
A
But
we're
also
going
to
be
asking
a
lot
of
you
in
the
next
six
months
to
eventually
move
code
and
so
want
to
go
over
that
and
talk
about
the
value
of
that
then
some
q
a
and
then
a
special
announcement
after
that.
So
without
further
ado
carlson
do
you
want
to
take
over.
B
All
right
sure,
let
me
just
share
a
screen,
so
I
might
need
some
permissions.
Okay
got
it:
okay,
so
hello,
everyone,
I'm
carlson
from
thinking
machines,
I'm
the
head
of
the
ml
engineering
team
there.
So
thanks
for
inviting
us
over,
so
that
we
can
talk
about
how
we
use
dagstr
for
running
ml
pipelines
in
production.
B
So
just
a
brief
intro
with
thinking
machines.
We
are.
We
are
a
global
technology,
consultancy,
building,
ai
and
ml
solutions
and
data
warehousing
platforms
to
solve,
like
high
impact
problems
for
our
clients.
So
our
clients
range
from
large
corporations
inside
in
southeast
asia
and
like
global
nonprofit
organizations,
and
our
main
goal
is
to
empower
these
business
users
with
valuable
data
and
insights
so
that
they
can
make
better
decision
making.
B
We
we're
internationally
recognized
in
the
field
of
data
science
we
presented
in
top
machine
learning
conferences
most
recently
icml
and
europe's
2020,
wherein
we've
been
awarded
the
best
paper
for
europe's
for
one
of
the
mla
workshops
for
development,
for
our
research
in
gsp
geospatial
for
poverty
estimation
using
satellite
imagery.
B
So
we'll
be
focusing
more
on
the
third
use
case,
which
is
for
mlaps,
so
mentioning
one
of
our
projects
where
we've
used
dagster
so
for
building
a
smart,
unified
search
app.
So
this
app
in
particular
consolidates
a
number
of
data
sources
and
allows
users
to
basically
search
through
these
sources
and
get
relevant
information.
B
So
one
example
of
this
is
a
user.
Would
then
ask
our
application
like
how
do
I
apply
for
vacation
leave
and
then
our
application
using
ml
algorithms
would
then
give
the
most
relevant
section
in
the
employee
handbook,
highlighting
the
steps
that
you
would
need
to
actually
file
for
a
leave
and
our
use
cases
extend
from
that
also
allowing
our
users
to
query
relevant
entities
like
people
and
companies.
B
So
a
user
could
then
search
for
the
company
gamestop
and
then
they'll
get
relevant
information
regarding
that
company,
and
so
our
search
app
composes
of
like
three
main
search
features.
One
is
semantic
search
wherein,
based
on
your
search,
query,
to
give
you
the
most
relevant
faq
document.
B
Second,
is
for
q,
a
search
which
is
our
initial
example,
wherein
you
will
get
the
most
relevant
section
of
the
employee
handbook
for
your
question
and
third
is
entity
search,
giving
you
the
most
relevant
person
or
company
information,
and
all
of
this,
like
search
results,
are
piped
into
a
search
ranker.
So
the
search
ranker
is
like
an
additional
model
that
prioritizes
from
the
your
search
results
which
one
to
actually
list
down
as
the
top
priority
and
most
relevant
search
result.
B
So
just
a
simple
up
architecture
for
our
project.
You
can
see
here.
Our
training
and
test
sets
are
piped
into
the
automated
training
and
evaluation
pipelines
that
we've
built
using
daxter
and
based
on
this.
We
can
do
continuous
training
for
newer
data
and
then
build
like
new
models
that
we
then
stage
inside
the
s3
bucket
and
from
there
we
can
redeploy
our
web
application
servers,
storing
these
new
models
and,
as
our
users
are
using
our
application.
B
Their
interactions
will
then
give
us
more
relevant
information
and
give
us
user
feedback
so
that
we
can
build
our
models
and
create
more
fine-tuned
models
for
their
application
needs,
and
we
have
like
an
additional
set
here
where
and
we
created
a
few
dagster
pipeline
for
meta
pipeline
monitoring.
B
So
this
is
our
ml
automation
workflow
in
general,
it
goes
from
poc
to
dev
into
prod
and
focusing
on
the
first
stage.
First,
the
proof
on
proof
of
concept
phase
is
primarily
done
inside
jupyter
notebooks,
wherein
our
data
scientists
can
fully
test
out
their
different
ml
approaches
and
run
experiments
wherein
then,
after
that,
they
can
finally
finalize
their
ml
methodology
and
then
that's
when
we
start
migrating
over
their
jupyter
notebooks
inside
dijkstra
pipelines,
where
we
can
do
further
fine
tuning
and
polishing.
B
So
how
does
this
work
in
practice?
In
the
left
side,
you
can
see
here
the
jupiter
notebook
that
our
typical
data
scientists
would
create.
So
in
this
case,
after
they've
done
like
some
initial
data
prep,
they
would
then
start
doing
hyper
parameter.
Optimizations
they're,
using
a
module
here
called
optuna
which
is
used
for
hyperparameter
search
and
given
like
a
number
set,
a
set
number
of
trials
like
100
trials.
B
They
would
then
get
like
the
best
scoring
model
with
its
hyper
parameters
and
accuracy
score
and
that's
what
we
can
then
save
over
and
export
as
our
best
model
from
that
number
of
trials.
B
So
this
is
like
a
typical
thing
that
you
would
get
from
a
data,
scientist's
notebook
and
then
once
we
like
finalize
this
and
polish
it,
we
can
actually
start
moving
it
over
to
our
dagser
pipelines
and
the
convenient
thing
here
is:
you
can
pretty
much
just
copy
paste
over
your
notebook
code
inside
dax
or
solid
since
daxter
is
very
pythonic.
It
doesn't
really
require
you
to
like
write
anything
extra
since
most
of
dax
or
solids
are
just
python
functions.
B
You
can
pretty
much
just
port
it
over
to
your
dax
or
solid,
and
the
additional
steps
here
that
you
just
do
in
coordination
with
your
data
scientists
is
to
add
the
data
descriptions
for
your
solid
definition
and
the
input
and
output
definitions,
and
this
is
important
because
we'll
need
these
definitions
later
on
when
we're
validating
and
debugging
our
pipeline
inside
the
ui,
and
some
additional
steps
is
just
adding
logs
and
dagger
assets
for
ml
metadata
tracking.
So
more
on
this
later.
B
Just
some
learnings
that
we've
had
when
running
ml
pipelines
alongside
existing
dags
or
pipelines.
So
usually
when
we're
adding
and
porting
over
our
jupyter
notebooks
into
like
more
standard
ml
pipelines,
we
would
already
have
like
an
existing
dagster
infrastructure
set
in
place
for
more
traditional
etl
and
elt
pipelines.
So
we
don't
really
need
to
create
like
a
new,
a
new
set
of
platform
for
our
ml
workflows.
We
can
just
make
use
of
our
existing
taxi
infrastructure
and
then
add
in
our
ml
pipelines
there.
B
So
one
thing
to
take
note
is
we
should
just
organize
our
pipelines
into
logical
groups.
So,
for
example,
you
would
have
a
group
for
your
different
etl
pipelines
for
source
a
and
source
b,
and
then
you
would
have
other
groups
for
your
ml
pipelines
for
let's
say
a
specific
model
x
and
then
another
model
y.
So
we
make
use
of
like
the
dag
dax
feature
for
repositories
which
helps
us
like
isolate
the
individual
groups,
and
this
also
further
helps
us
isolate
the
dependencies
for
each
of
these
pipelines.
So
you
can
avoid
conflict.
B
B
And
just
some
additional
learnings
that
we've
had
like
daxter
makes
it
very
easy
to
move
over
to
prod
since
pipeline
implementation
is
pretty
much
the
same
whenever
you're
running
inside.
Let's
say
your
local
machine
or
your
kubernetes
production
environment,
so
there's
very
minimal
changes
when
moving
over
your
pipelines.
So
most
of
the
changes
is
done
inside
the
high
level
dagger
configurations,
but
then
on
the
pipeline
level,
you
don't
really
need
to
change
much
to
port
it
over.
B
So
speaking
of
production,
moving
on
to
that,
this
is
where
we
can
fully
make
use
of
our
automated
training
and
evaluation
model
pipelines,
producing
new
models
and
deploying
them
into
our
servers
where
we
can
do
further
monitoring
and
get
new
data
so
for
pipeline
monitoring
for
ml
ops,
we
make
use
of
daxer's
asset
materialization
feature,
so
we
can
keep
track
of
ml
metadata.
So,
coming
back
to
our
initial
example,
we
have
a
code
snippet
here
wherein
we
create,
like
an
asset
called
generatedmodel.
B
So
this
is
important
because
later
on,
when
you're
actually
checking
your
dagster
ui,
you
can
view
the
metadata
for
each
of
your
training
runs.
So
here
your
latest
run
can
pretty
much
show
you
your
file
name,
the
hyper
parameters
that
was
the
best
scoring
model
and
your
accuracy
score
there.
So
your
data
scientists
can
pretty
much
keep
track
of
like
the
scores
of
each
of
your
training
runs
and
we
get
to
easily
view
if,
like
a
certain
run,
is
performing
well.
B
B
And
based
on
the
ui,
we
also
like
appreciate
like
a
lot
of
like
the
pipeline
definitions.
Since
dagster
pipelines
are
very
data
aware
we
can
see
like
the
input
and
output
coming
through
each
of
the
solids,
so
we
can
keep
track
of
how
our
data
is
processed
and
changes
throughout
the
pipeline,
as
opposed
to
airflow
ui,
wherein
the
data
is
abstracted
away
from
you.
We
don't
really
get
to
see
how
the
data
is
processed
in
our
pipelines
and
overall,
it
makes
it
a
more
intimidating
ui
to
work
with.
B
They
re-run
the
pipelines
or
just
subsets
of
the
pipelines
for
further
debugging
for
a
pipeline
for
additional
pipeline
monitoring.
We
also
make
use
of
the
slab
notifications
in
general,
like
our
company,
makes
use
of
slap
for
our
day-to-day
like
communication,
so
it
allows
this
allows
us
to
spend
less
time
manually,
checking
the
ui
for
pipeline
passing
or
failure
messages,
and
we
get
to
find
out
when
something
happens
as
soon
as
it
happens.
B
So
just
an
example
here
is
we
get
to
see
that
a
dax
rle
ingestion
pipeline
is
actually
performing
and
succeeded,
although
it
didn't
really
get
anything
new
data,
so
this
still
counts
as
a
success
for
us
and
it
we
can
see
here
that
the
elapsed
time
takes
this
much
and
we
can
see
if
there's
anything
any
issues
with
the
cpu
resource.
B
If
ever
this
amount
of
time
passes
is
incredibly
high
and
we
can
see
the
s3
or
sg
link
to
the
ml
model
path
that
we
create
and
used
for
the
pipeline,
and
similarly,
we
can
check
like
pipeline
errors.
Whenever
it
happens,
we
can
see
the
specific
pipeline
that
failed
and
which
solid
actually
failed
there,
and
even
the
error
message
that
shows
up
in
the
solid
and
further
on
we
created
like
a
handy
link
here
that
allows
us
to
just
enter
and
then
go
and
which
sends
us
over
to
the
dagster
run
itself.
B
Where
can
we
do
where
we
can
do
further
debugging?
So
just
a
extra
step
on
top
of
like
pipeline
monitoring,
we
do
like
meta
pipeline
monitoring,
we're
in
we've
created
like
a
dax
or
pipeline
that
checks.
Other
pipelines,
so
how
we
do
this
is
we
build
a
separate
digester
pipeline
to
regularly
do
a
health
check
on
our
production
pipelines
and
summarize
their
status
in
slack?
B
As
you
can
see
here,
we
have
like
a
summary
that
we
get
on
a
day-to-day
basis,
giving
us
all
of
the
different
production
pipelines
that
we
have
and
its
success
scores
in
the
past.
A
few
runs
that
it
was
executed
on.
So
we
can
easily
see
here
that
some
of
the
pipelines
are
working
as
expected
and
then
some
are
not
doing
as
well,
and
some
things
might
need
to
be
flagged
for
further
debugging
and
we
even
have
like
the
last
success
date
which
helps
us
like
further
check.
B
If
there's
any
issues
based
on
that
date
and
how
we
do
this
is
we
have
like
a
pipeline
that
accesses
the
dax
database,
primarily
the
runs
table,
so
we
produce
like
a
simple
sql
query
that
just
checks
the
number
of
pipeline
runs
for
each
pipeline,
so
based
on,
like
let's
say
the
past
10
runs
of
a
pipeline.
B
So
yeah
in
conclusion,
why
we
think
dagser
works
for
ml
pipelines
in
production
data.
Scientists
overall
gets
a
very
user-friendly
ui
which
enables
them
to
run
data
pipelines
without
fear.
They
can
easily
monitor
their
pipelines
and
debug
their
pipelines
from
the
ui
and,
secondly,
dagster
is
very
versatile,
wherein
we
would
have.
We
would
usually
have
a
dagster
infrastructure
that
already
supports
etl
and
elt
pipelines,
and
then
we
could
easily
just
extend
that
to
support
ammo
pipelines.
So
this
removes
the
overhead
of
setting
up
something
completely
new
for
our
ml
workflows.
B
And
thirdly,
daxter
is
uniquely
works
for
ml
ops,
because,
unlike
other
orchestrators
daxxer,
has
features
that
supports
emel
ops
on
top
of
like
automating
our
training
and
evaluation.
B
So,
just
some
extra
things
that
we
we're
planning
on
working
on
next
in
the
future
we
plan
on
migrating
over
to
using
grpc
servers
inside
kubernetes,
so
that
we
could
separate
out
the
pipeline
code
from
core
daxer
infrastructure.
So
this
helps
us
update
our
pipeline
separately
from
the
dax
or
daemons
like
the
scheduler
and
the
sensors,
so
that
we
can
avoid
redeploying
them
all
together.
So
this
is
just
a
step
into
more
process
isolation,
but
inside
kubernetes.
B
B
We
want
to
try
out
dynamic
orchestration
for
etl,
allowing
us
to
generate
solids
more
dynamically
during
runtime
instead
of
having
to
manually,
define
it,
and
this
has
a
bonus
of
like
making
it
very
easy
to
check
inside
the
ui,
since
it
allows
you
to
view
those
dynamic
solids,
a
lot
easier
than
manually
defined
solids.
B
So
yeah,
that's
pretty
much
it.
Thank
you
for
listening
and
I'm
free
to
for
any
questions
later
on.
Thanks.
A
That
was
fantastic
carlson.
Anyone
in
the
audience
feel
free
to.
We
can
take
brief
questions
now,
if
you
want
to
put
anything
in
the
chat
or
you
can
just
if
you're
feeling
brave
you
can
just
unmute
and
pop
in
this
is
a
pretty
unregulated
zoom
call.
C
I
actually
have
a
quick
question.
This
is
rebecca
here.
Thanks
for
the
presentation,
it's
really
cool,
to
see
how
you
guys
use
it.
I
just
had
a
question
about
your
slack
pipelines.
The
I
mean
the
notifications
about
the
pipelines
that
run
the
summary
one
looks
really
great
the
the
ones
that
report
on
individual
pipelines.
I
may
have
misunderstood
that,
but
if
that's
what
it's
doing,
does
it
does
it
get
spammy?
Is
it
something
that's
helpful
for
you
to
monitor
systems
and
that
kind
of
stuff,
or
how
do
you
use
that.
B
Yeah
for
the
staff
notifications
in
general,
like
overall,
like
the
most
important
ones,
are
the
ones
that
actually
fail.
So
in
terms
of
like
success
notifications
that
might
not
be
as
important
for
us.
So
that's
just
like
an
additional
check
that
we
do,
but
then
overall
the
ones
that
actually
do
fail.
Those
are
the
ones
that
we
actually
tag
users
on,
so
we
automate
like
also
the
tagging
functionality
we're
in
whenever
we
have
a
pipeline
error,
if
we
would
tag
the
relevant
person
for
that
pipeline.
B
A
A
Cool
there's
no
further
questions.
If
you
have
any
more
that
come
up
to
mind,
you
can
just
always
pop
into
the
chat
and
we'll
be
able
to
get
to
them
at
the
end
of
the
meeting,
I'm
going
to
hand
this
off
to
sandy
now
who's
going
to
present
on
the
new
corporate
the
new
core
apis
that
are
now
released
in
o
12o.
D
All
right,
hello,
everybody,
my
name
is
sandy,
I'm
an
engineer
at
elemental
and
I
lead
the
team
that
builds
and
maintains
the
core
dexter
apis.
So
I'm
going
to
talk
to
you
about
a
set
of
changes
and
improvements
that
recently
arrived
inside
the
project
we
just
released
extra
o12o
last
week.
D
The
release
includes
a
bunch
of
stuff
that
we're
really
excited
about
so
on.
The
left
are
the
new
features.
These
are
additions
to
dexter
that
make
it
easier
to
build
reliable
and
observable
data
pipelines
pipeline
failure.
Sensors
help
address
our
most
uploaded
github
issue
of
all
time.
It's
all
level.
D
Retries
are
a
core
orchestration
feature
that
we
have
been
missing
and
are,
you
know,
excited
to
include
a
new
set
of
testing
apis,
offer
sort
of
really
nice
and
elegant
ways
to
verify
any
of
the
functions
you
provide
to
build
dags
or
definitions
and
then
dbt
and
ml
flow
are
two
of
the
systems
that
are
most
commonly
used
with
daxer
on
the
right.
We
have
a
set
of
more
fundamental
changes.
I'm
going
to
spend
the
bulk
of
the
time
in
this
presentation
on
those.
D
So
one
of
the
things
that
we've
heard
is
that
people
grasp
the
basics
of
constructing
a
pipeline
very
quickly,
but
it
takes
them
quite
a
while
to
understand
modes.
Presets
partition
sets
deposit
solids
and
the
link
so
part
of
what's
difficult
here,
is
sort
of
inherent
complexity
that
we're
helping
to
model
with
the
problem
domain
itself,
but
part
of
what's
difficult
is
also
that
many
of
the
concepts
are
similar.
So,
for
example,
modes
and
presets
are
both
ways
of
specializing
pipelines
to
particular
execution
environments,
pipelines
and
completed.
D
Solids
are
both
ways
of
defining
dependency
graphs
of
solids.
So
the
relationship
between
these
three
concepts
pipelines
composite
solids
and
solids,
can
inspire
a
decent
bit
of
confusion
here,
for
example,
users,
ask
us
why
they
can't
miss
pipelines
inside
of
their
pipelines
and
then
solids
and
composite
solids
while
names
similarly
work
very
differently.
D
So
the
main
difference
is
that
the
code
inside
a
solid
runs
when
the
pipeline
actually
runs,
but
the
code
inside
a
composite
solid
runs
when
the
pipeline
is
being
defined,
and
you
know
that
can
be
especially
tricky
to
grasp,
given
their
similar
names
coming
from
a
different
direction,
but
one
that's
ultimately
related.
It's
difficult
that
we've
heard
about
using
resources
and
tests,
so
one
of
the
core
goals
of
dag
source
resource
system
is
to
make
it
easy
to
test
pipelines.
D
The
idea
is
that
you
can
supply
different
implementations
inject.
You
know,
pieces
of
your
environment
that
might
not
actually
exist
inside
of
a
unit
test,
but
it
can
be
very
difficult
to
actually
take
advantage
of
the
resource
system
in
unit
tests
and
that's
because
all
resources
need
to
be
supplied
to
the
pipeline
at
the
place
where
the
pipeline
is
defined.
So,
for
example,
here's
a
test
where
we'd
like
to
construct
a
mock
resource
supply,
some
particular
values
that
are
relevant
to
that
test
and
execute
a
pipeline
with
it.
D
That's
a
problem,
because
we
can't
necessarily
anticipate
all
the
ways
that
we're
going
to
want
to
test
a
pipeline
at
the
time
that
we're
defining
that
pipeline
and
then
another
separate
but
related
point
of
awkwardness
is
that
instances
typically
include
modes
and
presets
that
cannot
or
should
not
be
launched
on
them.
So
this
is
a
screenshot
from
a
production
dagget
instance,
but
it's
displaying
a
local
partition
set.
D
If
a
pipeline
includes
a
prod
mode
and
a
local
mode,
the
dagger
running
in
production
will
display
both
of
those
pipeline
modes,
even
though
in
many
situations
or
in
many
setups,
the
local
mode
should
never
actually
be
used
in
that
environment
and
then,
last
but
not
least,
one
of
the
most
persistent
pieces
of
critical
feedback
we've
gotten
about
dashboard
apis
has
just
been
the
name.
D
Solid
people
who've
spent
a
lot
of
time
with
dexter,
mostly
get
used
to
it,
but
new
users
often
find
it
difficult
to
understand
what
the
name
solid
has
to
do
with
executing
graphs
of
data
computations.
This
is
a
quote
from
one
of
our
users,
so
we
asked
ourselves
would
be
comfortable
shipping.
A
1.0
release
with
these
issues
outstanding
for
us
1.0
means
a
stable
and
a
set
of
apis.
That
users
can
expect
to
remain
the
same
for
a
very
long
time.
D
Before
making
that
commitment
to
stability,
though
we
want
to
make
sure
we
can
confidently
say
that
our
apis
are
as
intuitive
and
simple
as
they
can
be,
so
so
I'm
going
to
jump
in
and
talk
about
these
core
changes
that
we're
planning
on
making
that
essentially
are
into
bringing
our
apis
to
the
point
where
we
feel
comfortable,
releasing
1.0
and
committing
to
them
for
a
very
long
time.
D
D
Also,
none
of
these
changes
are
set
in
stone.
I'm
going
to
make
an
appeal
at
the
end
of
this
talk
for
you
to
try
these
out,
while
they're
still
experimental
and
give
us
your
feedback,
so
we
can
change
them
and
fix
issues
that
you
encounter
so
jumping
in
graph
and
job
are
a
pair
of
new
abstractions
that
we're
planning
to
introduce
they're
going
to
replace
pipelines,
modes,
presets
and
partition
sets
before
we
talk
about
them.
Let's
look
a
little
bit
and
try
to
understand
how
pipelines
are
structured
in
dead
extra's
current
apis.
D
Every
pipeline
includes
a
set
of
solids
and
the
dependencies
between
those
solids.
So
that's
what
that's
what's
included
in
the
body
of
the
function?
That's
used
to
define
the
pipeline.
This
is
the
part
of
the
pipeline
that
stays
constant,
no
matter
where
or
how
the
pipeline
is
running
because
it's
not
bound
to
any
particular
environment.
We
sometimes
call
it
the
logical
part
of
the
pipeline,
as
opposed
to
kind
of
the
physical
specialization
of
a
pipeline
that
is
tied
to
a
particular
set
of
resources
or
config.
D
So,
in
your
production
environment,
you
might
include
a
resource
that
represents
your
production
database,
whereas
in
a
development
environment
you
might
include
a
resource
that
represents
your
your
development
database
and
then
pipelines
often
also
include
presets,
each
of
which
corresponds
to
one
of
the
pipeline's
modes
and
supplies
configuration
for
the
pipeline,
so
kind
of
another
way
of
specializing
pipelines
in
particular
environments.
But
this
one
focusing
on
configuration
instead
of
long
resources.
D
So
with
the
new
apis,
instead
of
defining
a
pipeline
with
modes
and
presets,
you
define
a
graph
and
then
you
build
jobs
from
that
graph,
each
of
those
jobs
of
which
is
specialized
for
a
particular
environment.
So
a
graph
is
this
logical
piece
of
the
pipeline,
it's
a
dag
of
logical
computations
and
then
a
job
is
a
specialization
of
that
pipeline
to
a
particular
environment.
It's
an
operational
unit.
It's
something
that
you
might
want
to
monitor,
something
that
you
might
want
to
execute
to.
You
know
tie
to
development
or
production.
D
D
So
here's
how
graphs
and
jobs
fit
together
up
on
top
we've
got
a
representation
of
the
data
model
of
a
single
job
and
down
here
we've
got
some
code
that
defines
a
graph
and
creates
three
jobs
that
references
it
to
connect
the
diagram
with
the
code.
The
graph
is
the
set
of
solids
and
their
dependencies.
It
corresponds
to
the
logical
input
components
that
we
talked
about
when
we
were
talking
about
pipelines.
It's
a
logical
object
to
be
referenced
by
multiple
jobs
as
well
as
embedded
inside
other
graphs.
D
The
job
is
a
single
operational
unit
usually
bound
to
a
particular
environment,
so
your
production
job
contains
production
resources
and
production,
config
your
devja
obtains
div
resources
and
dev,
config,
etc.
You
create
a
job
by
taking
a
graph,
invoking
two
job
on
it
and
supplying
the
set
of
resources
and
config
that
that
correspond
to
that
job.
D
You
end
up
with
a
job
that
references,
the
graph
that
you
invoked
your
job
on
and
has
these
these
additional
environmental
pieces
each
schedule
or
sensor
points
to
a
particular
job.
D
We
also
require
that
no
more
than
one
schedule
or
sensor
points
to
any
particular
job.
So
this
results
in
a
simpler
dagget
experience
in
the
new
left
navigation
pane.
We
simply
show
a
list
of
jobs
and
those
jobs
can
have
icons
next
to
them
to
represent
their
schedulers,
their
their
their
schedules
or
sensors.
D
This
corresponds
to
the
fact
that,
when
you're
working
with
a
when
you're
working
with
dagstart
in
any
production
or
even
developed
environment,
you're
typically
zoomed
in
on
a
particular
job
right,
you
want
to
understand
all
the
runs
that
happened
of
your
production
job
or
you
want
to
relaunch
your
development
job
as
part
of
your
development
workflow.
So
the
the
ui
becomes
a
lot
more
focused
on
jobs,
although
it
does
still
allow
you
to
connect
a
set
of
jobs
that
all
correspond
to
the
same
graph.
D
So
this
change
has
a
few
positive
consequences.
It
makes
life
easier
in
a
few
different
ways
and
I'm
going
to
go
through
some
of
them.
This
will
be
a
bit
of
a
whirlwind
of
code,
so
don't
feel
bad
if
you
miss
one
of
two
of
these
things,
the
first
implication
of
this
chain
is
that
repositories
will
be
able
to
selectively
include
jobs
from
the
graph.
So
this
means
that
your
production
instance
no
longer
needs
to
be
cluttered
with
the
dev
modes
of
all
your
pipelines.
D
What's
going
on
in
this
code
example,
is
that
we're
defining
two
different
repositories?
Our
development
instance
can
reference
the
development
repository
and
only
show
all
the
development
jobs,
and
then
our
production
instance,
through
our
productionworkspace.yaml,
can
reference
the
prod
repository
and
only
show
all
of
the
production
jobs.
D
A
second
benefit
is
solving
the
testability
problem
that
we
talked
about
earlier
on
the
left.
We
have
what
it
looks
like
to
mock
resources,
inside
of
to
build
mock
resources
for
tests
with
the
old
apis
and
with
and
on
the
right
we
have
with
the
new
apis.
D
So
you
can
now
execute
a
graph
with
resources
that
you
constructed
inside
a
unit
test.
It's
no
longer
required
to
define
all
the
possible
resource
parameterizations
at
the
pipeline
definition
site,
so
you
can
construct
resource
inside
your
tests
that
have
you
know
particular
attributes
that
are
relevant
to
that
particular
test
and
execute
the
pipeline
with
those.
So
part
of
the
advantage
here
is
requiring
less
boilerplate
so,
like.
C
D
It
looks
like
there's
less
code
on
the
right
on
the
left.
The
other
part
is
actually
enabling
usages
that
were
really
awkward
or
nearly
impossible
with
all
the
apis.
So
now
you
can
have
10
different
tests,
each
construct,
their
own
resources,
and
you
don't
need
to
anticipate
all
10
of
those
tests
at
the
site
where
the
pipeline
is
defined
and
then
see
10
different
modes
corresponding
to
those
tests.
When
you
view
the
pipeline
and
tag.
D
Another
advantage
is
using
pointers
instead
of
strings
target
jobs
from
schedules
and
sensors,
so
currently
in
dagstar
the
way
it
works
that
if
you
want
to
point
a
sensor
or
a
schedule
at
a
job
when
you
construct
the
sensor,
you
supply
the
name
of
the
pipeline
as
well
as
the
mode
of
that
pipeline.
D
This
is
a
little
bit
errorplan
error-prone,
because
if
you
mess
up
the
name,
you
know
if
you
type
one
of
the
characters
wrong,
it's
difficult
for
your
ide
to
tell
you
about
it
and
then
the
object
itself.
The
sensor
object
doesn't
actually
have
a
reference
to
the
pipeline.
So
if
you
wanna,
you
know,
go
and
verify
something
with
that
sensor,
you
have
to
go
and
grab
that
reference
from
somewhere
else
and
make
sure
those
are
synced
up
so
with
the
new
apis.
D
Instead
of
providing
the
pipeline
name
and
mode
as
strings,
you
now
point
directly
to
python
objects
when
defining
a
schedule
or
sensor.
So
this
means
you
can
discover
error
errors
earlier,
because
linters
can
tell
you
if
your
schedule
points
to
a
pipeline
that
doesn't
exist.
It
also
makes
the
code
briefer
and
then
arguably,
most
importantly,
the
sensor
object,
has
a
reference
to
the
pipeline
that
that
it's
targeting.
So
so,
if
you
want
to
test
that
sensor,
all
you
need
is
the
reference
that
sensor
object.
D
So
yet
another
advantage
is
the
graph
can
now
be
nested
inside
of
other
graphs,
so
graphs
replace
both
pipelines
and
composite
solids.
This
used
to
not
make
sense,
because
nesting
a
pipeline
with
multiple
modes
inside
another
pipeline
has
all
sorts
of
key
implications.
You
end
up
with
a
sort
of
combinatorial
explosion
of
modes,
which
is
each
mode
in
the
subpipeline
correspond
to
a
mode
in
the
in
the
parent
pipeline,
but
by
exposing
graphs
as
a
logical
concept
that
does
not
involve
modes,
we
can
now
provide
a
single
abstraction
for
composition.
D
A
graph
can
include
a
graph
that
graph
can
include
any
any
number
of
graphs
and
then,
ultimately,
you
take
the
top
level
graph
and
build
a
job
out
of
that
graph
and
supply
resources.
At
that
point
that
apply
to
the
to
the
entire
hierarchy
of
graphs.
D
Last
of
all,
it's
now
possible
to
uniformly
apply
a
mode
across
all
the
pipelines
in
an
environment
without
needing
to
provide
it
to
each
graph
individually.
So
I
suppose
you
want
to
have
a
set
of
resources.
Maybe
these
are
sort
of
your
standard
production
resources.
They
include
your
production
database
production
credentials
to
some
set
of
systems
in
the
past.
If
you
wanted
all
of
your
production
pipelines
to
reference,
those
you
have
to
individually
on
each
of
those
pipelines
include
a
mode
that
reference
those
resources.
D
D
D
D
No,
more
string,
pointers
allowing
you
to
embed
graphs
inside
other
graphs
and
writing
a
single
abstraction
for
execution
and
composition,
a
better
dagget
experience
that
allows
you
to
focus
on
production,
jobs
and
less
boilerplate.
Overall.
D
So,
with
these
changes
we
asked
ourselves,
would
we
feel
comfortable?
Releasing
one
to
o
going
back
to
our
original
criteria
is
of.
D
Do
we
feel
like
we're
supplying
supplying
you
know,
kind
of
the
most
intuitive
and
simple
set
of
apis
that
we
can
that
we
can,
and
I
think
the
changes
I
just
talked
about
are
kind
of
a
massive
leap
forward
for
simplicity.
What
about
intuitiveness
and
one
thing
kept
kept
nagging
us.
D
That
thing
was
the
name
solid,
releasing
dijkstra
1.0
with
solid,
as
the
core
abstraction
was
mean,
as
the
core
abstraction
we've
been
committing
to
a
name
that
most
of
our
users
have
met
with
kind
of
confusion
and
aversion.
It
would
mean
that
many
more
years
of
having
to
explain
and
turn
to
people
and
having
them
split
their
eyes,
they
sort
of
tried
to
understand
how
it
relates
to
this
process
of
executing
graphs
of
data
computations.
D
D
As
with
the
changes
I
discussed
above,
this
is
currently
experimental
and
we're
planning
to
maintain
backward
stability,
chemical
compatibility
for
a
long
long
time.
We
don't
make
this
change
lightly,
because
we
know
that
we'll
mean
changing
a
lot
of
code
in
the
long
term,
but
also
for
the
long
run.
We
think
it's
important
for
making
the
project
as
accessible
and
successful
as
possible
and
making
the
core
abstractions
as
intuitive
to
understand
as
they
can
be.
D
D
The
object
creator
is
also
going
to
support
a
briefer
way
of
defining
inputs
and
outputs.
So
in
the
current
apis
we
have
these
fairly
revo
verbose,
input,
definitions
and
output
definitions
in
the
new
apis.
We
just
have
these
simple
ins
and
outs,
and
that
allows
the
emphasis
to
be
on
the
actual
values
that
are
supplied
to
these
definitions.
Instead
of
these
kind
of
enormous
strings.
D
So
here's
what
the
timeline
looks
like
we
released
o2
of
o
last
week
and
it's
including
graph
job
and
off
as
experimental
changes,
pipeline,
solid
mode
and
preset
are
still
sort
of
the
main
stable
apis.
D
For
the
o12o
timeline.
Oshawa,
though,
includes
an
opt-in
ui
that
I'm
gonna
talk
about
and
a
couple
slides
down
that
allows
you
to
sort
of
focus
on
the
job.
Daggett
experience
that
I
talked
showed
a
little
window
of
earlier
in
o-130.
D
Our
plan
is
to
essentially
make
graph
job
after
we
receive
feedback
from
you
and
I'll
talk
about
that
in
a
minute.
D
Our
plan
is
to
make
graph
job
and
op
the
new,
stable
apis
pipeline,
solid
mode
and
preset
will
be
you're,
learning
preferred
but
again
they're
going
to
stick
around
for
a
long
time,
so
that
we're
not
forcing
people
to
immediately
change
their
code
in
other
tno,
the
new
ui
will
be
opt
out,
so
you'll
be
defaulted
to
the
job,
focused
ui,
but
you
will
be
able
to
revert
to
the
old
api
for
it
and
then
our
docs
and
tutorial
are
going
to
focus
on
grass
and
offs.
D
D
So
we
think
these
new
apis
are
a
big
improvement
before
we
switch
over
to
them.
It's
really
important
for
us
to
hear
how
they
work
out
for
you.
We
would
love
for
you
to
try
them
out
and
give
us
your
honest
feedback.
None
of
this
is
yet
set
in
stone
and
you
can
have
a
lot
of
influence
over
what
the
final
product
looks
like
you.
D
So
you
don't
have
to
switch
over
all
at
once.
If
you
do
want
to
try
these
new
apis,
it
means
two
things
so
the
first
one
is
converting
code
to
the
new
api
and
the
second
one
is
switching
the
appearance
of
dagger
I'll
post,
a
link
in
slack
where
we
wrote
a
migration
guide
that
explains
how
to
take
code
written
using
the
pipeline
apis
and
translate
that
code
into
the
graph
apis
like
it
goes.
Example
by
example,
situation
by
situation,
and
I
think
you'll
see
that
many
of
them
the
apis.
D
It
ends
up
looking
quite
a
bit
simpler
and
then
daggett
now
has
a
toggle
that
allows
you
to
switch
to
a
view
that's
based
on
the
new
apis,
so
you
can
find
it
by
clicking
on
the
gear
in
the
top
right.
It's
going
to
take
you
to
this
page
that,
where
you
can
flip
the
switch,
as
I
mentioned
before,
the
big
difference
here
is
that
when
working
in
daget
you'll
now
be
usually
working
within
a
single
job,
that
means
you're.
D
Looking
at
a
list
of
runs
you'll
be
seeing
runs
for
a
particular
job.
You
won't
be
distracted
with
modes
permission
sets
from
other
jobs
and
daggett,
as
I
mentioned,
we'll
still
be
able
to
load
pipelines
defined
using
the
old
apis.
This
works
essentially
by
flattening
them
into
multiple
jobs
if
they
have
multiple
nodes.
D
So
again,
we'd
love
for
you
to
try
it
out
and
give
us
your
feedback
and
we're.
You
know
we're
pretty
excited
about
the
simplicity
and
intuitiveness
that
these
changes
have
the
opportunity
to
bring
that's
all.
I
have
for
you
any.
E
Hi,
thank
you
very
much.
It
was
really
interesting.
I
would
have
a
question
about
the
part
when
you
showed
us
that
in
the
repository
you
are
iterating
over
a
list
of
graphs
and
changing
the
mode
and
could
we
use
or
change
the
annotation
yeah
exactly
this
one?
Could
we
use?
Could
you
change
the
annotations
here
and
set
other
parameters
or
it's
just
only
four
modes.
D
E
E
I
have
another
question:
can
we
partially
use
these
new
obstructions
side
by
side
with
the
other
one.
D
Yes,
you
can
so
yeah,
you
can
have
a
repository
that
includes
both
pipelines
and.
D
Jobs
and
then
op
is
just
to
rename,
so
you
can
include
you
can
build
a
graph
out
of
ops.
You
can
build
a
graph
out
of.
Are
you
sorry?
You
can
build
a
pipeline
out
of
ops.
You
could
build
a
graph
out
of
solids.
D
A
A
Chat
yeah.
I
just
want
to
also
reiterate
the
point
that
sandy
made
that
you
know
we
highly
encourage
you
to
start
using
these
today.
We
think
there's
immediate
improvements
in
among
a
number
of
things.
You
know
once
you
kind
of
unclutter
your
instance
with
irrelevant
modes.
You
kind
of
can't
go
back,
so
that
feels
really
really
good
and
then
the
especially
the
testing
apis
are,
I
would
say
dramatically
better
and
for
those
who
care
about
that
you'll
find
immediate
ergonomic
improvements
and
then
yeah.
We
will
also
you
know.
A
This
is
your
opportunity.
You
live
in
this
tool,
so
it's
your
opportunity
to
give
feedback
and
shape
the
future
of
it,
and
we
take
the
feedback
super
seriously.
So
yeah
yeah,
please
reach
out
to
us
with
feedback.
If
you
didn't
feel
comfortable
asking
questions
here
and
you
know
start
using
it
as
soon
as
possible
and
we're
really
excited
to
work
with
you
on
that.
We
also
really
appreciate
the
patience
of
everyone
here.
A
A
So
we
will
be
asking
you
to
do
work
and
we
sincerely
appreciate
it,
but
we
think
it's
good
for
the
long-term
health
of
the
system
and
the
community,
and
then
you
never
have
to
explain
to
any
of
your
colleagues
what
the
hell
solid
means,
which
is
a
nice
bonus,
so
yeah
thanks
again
and
then
and
then
the
yeah
sandy
and
the
practitioner
team
have
done
amazing
work
on
this.
I
think
it's
like
a
dramatic
improvement
in
the
system.
So
thank
you
to
everyone
on
the
team
there.
A
So
we
have
one
additional
announcement
here,
and
that
is
so.
We
are
a
company
and
we
have
to
eventually
make
money
and
have
a
commercial
product,
and
so
we
have
been
actively
working
on
that
for
months,
and
so
we
are
working
on
dexter
cloud
and
we're.
You
know
we
have
we're
announcing
a
closed
beta
today.
So
that
means
there's
a
wait
list
and
you
can
sign
up
for
it
and
then
we
are,
you
know,
looking
for
and
working
with,
early
design
partners
to
improve
the
system.
A
So
this
is
a
hosted
version
of
dagster
and
the
goal
here
is
to
effortless
enable
our
users
to
effortlessly,
deploy
and
operate
daxter.
You
know
we
hear
feedback
all
the
time
that
listen.
You
can
like
get
local
dagget
running
in
six
lines
in
python,
and
it's
kind
of
immediately
empowering
you're
learning
the
concepts
executing
on
your
laptop
super
fun.
A
Then
you
have
to
deploy
this
thing
and
kind
of
hit
this
wall
and
it's
really
challenging
to
do
and
we
want
to
have
a
centralized,
hosted
service
that
makes
that
as
smooth
as
the
daggett
experience
on
your
laptop,
so
you
know
this
system
will
we
will
host
the
scheduler,
the
web
server
daggett
and
the
metadata
database
on
your
behalf.
A
You
will
never
have
to
run
daxter
instance
migrate
ever
again,
everyone's
favorite
thing,
so
we
will
handle
version
upgrades
on
the
database
side
while
maintaining
backwards
compatibility.
A
So
it
would
not
compel
you
to
upgrade
your
code
dynamic
workspace
management
so,
instead
of
it
being
driven
from
a
yaml
file,
you'll
just
be
or
less
so
you'll
be
able
to
just
dynamically
add
it
using
command
line
utilities
and
then
authorization
on
rbac
will
also
be
included
in
dexter
cloud,
and
then
your
data
and
your
code
is
owned
by
you
still,
so
you
will
be
able
to
run
it.
You
know
you
can
run
dexter
cloud
and
still
run
the
actual
compute
on
your
laptop
or
in
your
vpc
in
a
kubernetes
cluster.
A
So
we're
super
excited
about
this.
The
team's
done
amazing
work.
It's
really
smooth
to
spin
up
and
it's
really
fun
to
use.
Actually
so
yeah
you
can
become
a
design
partner.
So
right
now
there's
a
live
link
on
daxter,
io,
dexterio
cloud.
A
There's
also
a
link
at
the
top
of
the
page,
and
you
can
just
go
and
plop
your
name
in
a
form
and
we'll
reach
out
to
you
or
you
could
just
dm
me
if
you
want
and-
and
we
can
start
chatting
so
yeah-
it's
a
really
exciting
day
for
us
to
start
to
unveil.
A
What's
going
to
be
our
commercial
platform
to
the
world
and
with
that,
if
there's
any
follow-up
questions
we
can
talk
about
that
or
we
can
end
the
meeting,
wait
one
minute
for
any
additional
questions
on
for
any
of
the
speakers.
Carlson
sandy
or
myself.
A
Well,
I
got
a
private
message
from
someone
that
they're
pumped
up
about
dexter
cloud,
but
the
thanks
peter
okay,
so
this
is
a
ton
to
absorb,
especially
on
the
new
o120
core
api
changes
that
will
serve
as
the
core
of
1.0.
So
again,
please
play
with
it
reach
out
to
us.
We're
excited
to
engage
with
you
on
the
slack
to
really
like
suss
this
out
and
to
iron
out
the
kinks,
but
we
think
it's
a
massive
improvement
and
I
think
we
can
close
out
the
meeting.