►
From YouTube: Dagster with Nick Schrock
Description
Eric Anderson interviews Nick Schrock about Dagster, the open-source data orchestrator for machine learning, analytics, and ETL. Nick is the founder and CEO of Elementl, and is well-known for creating the Project Infrastructure group at Facebook, which spawned GraphQL and React. On today’s episode of Contributor, Nick explains how he set out to fix an inefficiency he identified amongst the complexity of the data infrastructure domain.
Find show notes and previous episodes at https://www.contributor.fyi/
A
Well,
welcome
to
the
show
we've
got
nick
schrock
here
with
us
today,
who
is
the
creator
of
daxter,
I'm
sure
many
of
you
know
nick
thanks
for
joining
thanks
for
having
me
nick.
I
if
I
recall
we
met
years
ago
and
and
I
was
I
was
privy
to
the
early
days
of
daxter
to
a
degree-
maybe
you
can
to
start
us
off
and
and
tell
everyone
what
dagster
is.
B
Yeah,
so
daxter
is
a
data
orchestration
platform,
so
within
every
company
of
any
non-trivial
size
or
enterprise,
there's
typically
a
data
platform,
meaning
that
there's
a
bunch
of
different
people
using
tons
of
different
tools,
putting
data
products
into
production
and
in
order
to
do
so,
they
have
to
work
together
because,
typically
one
team
creates
data
products
and
another
team
consumes
those
and
so
on
and
so
forth.
B
So
these
data,
orchestrators
or
workflow
engines
become
critical
pieces
of
infrastructure
in
these
enterprises,
and
we
just
thought
there's
a
huge
opportunity
here,
because
just
the
overall
theme
we're
seeing
is
that
people-
and
this
has
taken
years
to
realize
that
people
in
the
data
domain
are
beginning
to
realize
and
are
in
the
process
of
realizing
that
this
is
a
software
engineering
discipline
and
you
really
want
to
have
a
full,
proper
software
engineering
life
cycle
with
a
delightful
developer,
experience,
testing,
fast
deployment
and
then
in
data.
B
We
think
you
need
to
also
track
the
produced
assets.
So
daxter
is
an
orchestration
platform
that
really
thinks
about
this
problem
and-
and
you
know,
a
lot
of
people-
call
the
software
engineering
unification
different
terms.
Some
people
call
it
data
ops,
some
people
call
it
the
modern
data
stack,
but
you
know
whether
or
not
how
to
use
the
term.
We
believe
we're
kind
of
like
the
orchestrator
that
fits
into
that.
That
theme.
A
Yeah-
and
I
I
like
how
you
describe
this
as
as
as
a
newly
there's
new
awareness
around
the
fact
that
of
how
critical
this
is,
I
think
there
was
a
time
when
you
just
people
wrote
scripts
and
threw
them
over
the
fence
and
called
it
good,
and
there
was
there
was
some
acknowledged
and
and
accepted
messiness
to
this
world
totally.
B
Like,
for
example,
you
know
I'm
not
like
a
you
know,
I
I
don't
have
like
a
billion
twitter
followers,
I'm
not
like
a
chamath
or
something,
but
one
tweet.
I
tweeted
that
went
super
viral
recently
was
about.
You
know
it's
time
for
the
term
data
cleaning
to
be
officially
retired,
because
it
it
isn't
that,
like
data
cleaning,
is
the
work
and
what
it
is
is
a
process
of
putting
data
products
into
production
and
it's
hard
and
it
takes
real
engineering
and
just
referring
to
his
cleaning.
B
Not.
This
is
anything
wrong
with
cleaners,
but
it
undersells
the
sophistication
and
the
importance
and
the
criticality
of
what
this
are,
because,
like
a
model,
you
might
throw
away,
but
all
the
data
products
that
you
produce
leading
up
to
that
live
forever
and
are
incredibly
important
assets
to
an
organization.
A
Yeah,
no
and
and
now
let's,
let's
figure
out
how
how
you
got
to
daxter
and
I'm
curious,
I
mean
part
of.
Let
me
I'd,
be
curious
to
go
back
to
graphql.
If
you
don't
mind
because
I
feel
like
that's
that's
where
some
of
us
know
you
from
and
and
how
that
you
know
led
you
to
figure
out
the
need
for
this
daxter
thing
and
then
and
then
the
early
days
of
dexter.
B
Totally
so
yeah
I,
as
you
mentioned,
you
know,
I
worked
at
facebook
in
2009
to
2017
and
I
actually,
I
believe
I
was
the
one
who
started
this
group
called
product
infrastructure.
It
was
started
three
engineers,
but
it
really
expanded
and
grew
and
that
team's
job
was
to
serve
our
application
developers
and
make
them
more
efficient
and
productive.
And
out
of
that
group,
a
very
special
group
of
people.
I
didn't
have
anything
to
do
with
the
project.
B
Both
legacy
or
traditional
enterprises,
as
well
as
companies
that
are
in
soma
or
in
the
valley,
and
I
mean
just
it-
was
a
hundred
percent.
Their
biggest
technical
liabilities
and
challenges
were
around
some
called
the
data
infrastructure.
Some
called
the
envelope
infrastructure,
but
just
this
entire
domain
was
just
a
mess
and
it
is
just
critically
important
to
the
functions
of
these
businesses
and
to
society
at
large,
and
I
looked
into
the
tooling
and
how
people
were
working
and
effectively.
B
What
I
saw
was
the
biggest
mismatch
between
the
complexity
and
criticality
of
a
problem
domain
and
the
tools
to
support
that
domain
tools
and
processes
to
support
that
domain
that
I've
ever
seen
in
my
engineering
career.
The
only
thing
that
came
close
was
web
development
in
the
early
2010s
when
everyone
was
just
whining
about
ie6
and
everything
was
awful
and
engineers
didn't
even
want
to
engage
with
it
and
javascript
community,
like
was
didn't,
take
itself
seriously.
Now
it
might
be
the
opposite
problem
that
might
take
itself
too
seriously
yeah.
B
You
know
thing
needed
to
happen
in
the
data
domain
and
a
lot
of
the
you
know.
A
lot
of
the
maladies,
both
cultural
and
technologically,
are
similar.
So,
for
example,
in
2010
people
would
be
like.
Oh,
I'm
just
a
javascript
script
kitty
and
it
wasn't
considered
real
engineering
and
so
on
and
so
forth,
and
I
think
there's
a
lot
of
analogy
to
calling
themselves
data
cleaners
or
data
janitors
and
kind
of
being
in
love
with
the
tools
they
hate.
B
There's
almost
like
this
stockholm
syndrome
that
goes
on
there,
so
I
thought
there
was
very
similar
properties,
but
so
and
then,
when
I
see
a
developer
experience
dumpster
fire
like
that,
I'm
kind
of
like
drawn
to
it
like
a
moth
to
a
flame.
I
really
like
serving
I
love
serving
other
other
engineers
and
making
their
lives
better,
and
you
know
it
just
when
I
see
inefficiencies
like
that,
I
just
get
kind
of
mad
and
frustrated
on
other
people's
behalfs
and
I
really
want
to
work
on
it
and.
A
Not
to
over
draw
the
analogy,
but
that's
a
little
bit
of
what
you
did
at
facebook,
no
like
there
was.
There
was
a
group
of
engineers
and
you
were
said
how
you
make
your
lives
easier,
and
you
came
up
with
this
spec,
which
codified
a
relationship
and
and
and
kind
of
enforced
expectations
and
and
in
some
ways,
that's
a
little
bit
of
what
you
maybe
saw
in
this
situation,
like
here's,
a
bunch
of
engineers
and
there
and
there
there's
no
there's
no
expectations.
A
There's
no
system,
there's
no
discipline
around
the
work,
they're
doing
yeah.
B
So
you
know
definitely
trying
to
do
the
same
thing
here
and
you
know
an
open
source
community
is
a
way
to
engage
with
that.
A
So
I'm
curious,
we
haven't
quite
talked
about
daxter,
that
you
know
how
you
built
daxter
yet,
but
but
we
now
have
your
understanding
of
the
need
and
how
you're
you're
engaging
people
with
folks.
As
you
start
forming
this
idea
of
daxter,
was
it
obvious
from
day
one?
This
was
going
to
be
an
open
source
project.
We
got
to
build
a
community
or
or
how
do
you
go
through
that
process
of
thinking.
B
B
B
We
wanted
to
install
with
pip
it's
normative
that
those
are
open
source.
The
code
is
in
the
developers.
You
know
stack
like
physical
computation
stack
like
so
there's
a
very
practical
reasons,
and
then
just
I
just
like
open
source
communities.
I,
like
the
vibe
that
comes
out
of
them.
I
like
the
kind
of
deep
connection
that
people
have
with
open
source
communities,
but
I'm
no
like
open
source
zealot,
like
I
think
it's
perfectly
acceptable
to
build
proprietary
technology
that
supports
open
source
and
yeah.
B
We
will
you
know
that
is
kind
of
on
our
roadmap
as
well,
and
but
you
know
what
you're
really
trying
to
do
in
open
source
in
a
way
is
develop
an
open
standard.
B
So
you
want
your
api
to
be
standardized,
and
that
was
also
a
big
takeaway
from
the
graphql
experience,
because
you
know
what
was
interesting
about
graphql
right
is
that
we
open
source
a
document
that
was
the
primary
article
exactly
that
we
open
sourced.
We
open
source
a
document
and
some
javascript
code
that
was
only
written
to
execute
that
document.
It
was
not
a
production
system
at
facebook
at
all,
so
for
in
the
graphio
experience
like
the
most
important
thing
was
not
software.
The
most
important
thing
was
this
document
we
open
source.
B
So-
and
I
don't
think
any
I
I
don't
want
to
speak
for
lee
and
dan,
the
co-creators,
but
you
know
if
someone
implemented
a
proprietary
service
behind
that
behind
that
document
like
go
for
it,
you
know
if
it
delivers
value
to
developers
like
good
good
for
them.
So
yeah,
I'm
not
an
open
source
zealot
in
any
sort
of
way,
but
I
just
truly
enjoy
working
in
those
communities
and
it
lends
itself
to
the
type
of
problems
that
I
want
to
work
on.
A
Yeah
so
you're
you're,
you're
knee-deep
with
these
these
folks
and
their
and
their
data
problems,
and
and
how
soon
does
the
the
beginnings
of
dagster
emerge.
B
So
I
originally
was
calling
the
system
and
I
started
to
prototype
some
ideas
that
were
totally
disconnected
from
dagster,
maybe
summer
2017.,
and
then
I
honed
in
our
orchestration
kind
of
early
2018
and
was
actually
prototyping
stuff.
Alongside
you
know,
I
was
actually
talking
to
abe
gong
a
lot
about
this
who
went
on
to
he
was.
He
was
the
founder
of
super
conductive
and
the
creator
of
great
expectations.
So
you
know
just
getting
lots
of
context
in
the
data
space
through
him
and
then
you
know
kind
of
launched
dagster.
B
You
know
I
raised
some
money
in
may
2018
just
to
do
exploratory
work
and
hired
a
few
full-time
folks,
then-
and
you
know
the
let's
see-
and
well
actually
it
didn't
fire
any
full-time
folks
into
higher
pardon
me
on
fire
hire
any
full-time
folks
until
early
2019.
So
I
was
kind
of
like
around
and
building
some
prototypes
working
with
the
contractor,
exploring
ideas,
and
then
we
really
the
the
project
really
started
to
take
off
when
we
did
our
more
public
rollout
in
mid
2019.
B
and
we
brought
a
medium
article
and
starting
an
attraction
and
we
started
working
on
getting
early
design
partners
then,
and
then
it's
been
kind
of
off
to
the
races
since
then.
So.
A
Yeah
and-
and
it
was,
it
was
probably
fairly
clear
in
the
beginning
that
there
would
be
some
workflow
element
to
this,
because
that's
kind
of
the
that's
what
that's
the
work
that
needs
to
happen,
get
done
and
then
other
elements
kind
of
emerge
to
kind
of
support.
The
the
workflow
piece
totally.
B
So
the
initial
insight
which
still
holds
true
today
is
that
these
dags
that
people
build
that
are
hosted
and
currently
kind
of
the
incumbent
system.
That
is,
you
know,
adjacent
to
us.
I
like
to
call
it
because
we
kind
of
have
different
views
of
what
these
systems
are
is
airflow,
and
you
know
when
I've
seen
airflow
installation
with
thousands
of
dags
and
tens
of
thousands
of
tasks.
I
don't
see
a
bunch
of
graphs.
B
I
see
a
full
application
yeah,
just
in
the
same
way
that,
like
in
front
end
10
years
ago,
people
would
say
like.
Oh,
these
are
just
some
scripts
and
then
it
took
kind
of
the
folks
of
react
and
those
tools
to
be
like.
Actually,
no,
these
are
full
applications
and
they
need
a
full
software
engineering
process.
B
B
But
it
became
clear
that,
in
order
to
accomplish
everything
we
wanted
to
accomplish,
we
needed
to
do
more
vertical
integration
and
take
over
more
of
the
infrastructure.
So
that's
been
a
really
interesting
journey
to
kind
of
make
it
into
a
more
full-fledged
full
stack
system.
A
Yeah
and
and
be
requires
a
little
more
resources
requires,
but
by
the
same
time
you
can
deliver.
Maybe
a
better
user
experience.
I
imagine
that's
what
you're
getting
at
is
that
you
try
the
programming
model
on
top
of
existing
stuff
and
it
and
it
never
quite
solves
the
problem
in
the
way
you
wanted.
That's.
B
You
need
vertical
integration
for
the
scheduler
to
get
some
reliability
that
you
want,
and,
and
also
we
just
wanted
to
enable
fundamental
new
run,
scheduling
capabilities.
You
know
like
we're
trying
to
we're
enabling
not
just
time-based
events.
Time-Based
schedules,
like
mostly
systems,
do
but
think
that
we're
really
investing
heavily
in
event-based
schedules
so
runs
that
kick
off
because
something
that
did
happen
in
the
world
other
than
time
proceeding
and
in
order
to
implement
those
features
in
the
way
we
wanted.
We
needed
to
take
more
control.
A
You
know
I
I,
when,
when
people
want
to
go
to
their
tables
and
queer
and
run
queries,
the
thing
that
keep
those
tables
up
to
date
are
these
these
tasks
and
workflows
and
processes,
but
but
increasingly,
as
you
point
out,
with
the
event
triggering
they're
they're
kind
of
intertwined
into
much
more
than
just
the
bi
system
or
or
the
kind
of
the
the
data
warehouse
like
like
they're
they're
plumbing
everything
to
everywhere.
That's.
B
In
our
view,
yeah
there's
been
some
consolidation
around
cloud
data
warehouses
for
sure,
but
you
know
any
any
data,
processing
or
data
of
movement
has
to
be
managed
by
one
of
these
systems,
so
that
encompasses
data
warehousing
yes,
but
it
also
encompasses
moving
data
from
any
source
to
any
other
source
and
there's
like
a
very
energy
set
of
databases,
you
might
want
to
move
data
to
a
time
time
series
database,
you
want
to
move
data
back
into
sas,
apps
you're,
moving
it
out
of
sas
apps,
then
there's
the
entire
ml
space,
where
you're
producing
models.
B
Those
models
in
turn
affect
the
behavior
of
the
pipelines
which
produce
further
models.
You
know
the
the
once
you
put
a
tool
like
this
in
the
hands
of
developers.
The
use
cases
expand
as
well.
You
know
because,
like
so
many
computations
can
be
modeled
as
directed
graphs.
B
So
you
know
we
just
see
the
the
purview
and
scope
of
these
systems
expanding
and,
as
you
have
to
you,
know
it's
like
the
you
know.
The
the
the
workflow
engines
or
orchestrators
are
effectively
the
circulatory
system
of
data
in
an
enterprise
and
in
the
modern
world.
That
means
it's
a
circulatory
system
of
the
business
itself,
so
the
it's
just
a
these.
These
data
platforms
in
these
companies
are
utterly
critical
and
incredibly
sophisticated
and
undertooled.
In
our
view,.
A
A
You
know
most
of
compute
before
now
I
guess
was-
was
apps
serving
database
like
transactional
databases
behind
them.
There
was
a
time
when
hadoop
emerged
and
people
got
excited
about
big
data,
maybe
for
the
first
time
like,
let's
store
all
the
things
and
then
we
can
query
them,
but
a
lot
of
those
promises
went
on
unfulfilled
and
now
with
machine
learning.
Suddenly
the
I
guess
what
I'm
pointing
to
is
that
the
the
compute
of
old
was
largely
serving
apps
and
running
transactional
databases
and
compute
of
the
future
feels
like
it's.
A
Increasingly.
All
data
like
the
primary
workload
of
cloud
is,
is
now
data,
whether
that's
all
the
machine
learning
stuff
we're
doing
or
just
all
the
the
kind
of
plumbing
to
to
move
data
from
where
it's
captured
to
to
where
it
needs
to
be
consumed.
And
so
I
feel
like
you
all
you.
You
saw
these
problems
and
fixed
them,
but
at
the
same
time,
you're
kind
of
riding
this
wave
as
as
the
world
explodes
in
data
and
data,
becomes
a
primary
workload.
B
I
mean
I,
you
know,
apps
are
gonna,
be
important,
and
but
I
think
that-
and
it's
still
getting
a
lot
of
workloads
but
yeah
I
mean
I
see
the
same
thing
in
terms
of
just
the
importance
and
investment
in
data
infrastructure
and
data
processing,
more
broadly
being
just
a
massive
massive
trend.
B
I
mean
you
see
that
in
terms
of
investment
activity,
which
is
out
of
this
out
of
control,
as
well
as
the
amount
of
investment
that
companies
are
making
into
these
tools-
and
you
also
see
in
the
performance
of
these
companies
in
public
markets
right,
like
snowflake
famously
has
a
blockbuster
ipo,
but
the
metric
that
people
are
really
wowed
by
them.
And
I
think
rightfully
so,
is
what
called
net
dollar
retention,
which
means
that
effectively
like.
B
If
you
kept
the
same
exact
customers
and
held
them
a
year,
how
much
more
revenue
would
you
get
out
of
them,
including
churn
and
theirs,
I
believe,
is
a
160
or
so,
which
means
that,
even
if
they
stopped
acquiring
new
customers,
the
business
would
still
be
growing
60
year
over
year,
which
is
completely
wild
and
that's
an
indication
and
the
reason
why
it's
indication
is
that
snowflake
is
usage
based,
which
means
the
more
you
use
the
tool,
the
more
you
pay,
and
so
I
think
that's
a
you
know
the
fact
that
these
data
companies
have
such
high
nbrs
is
another
metric.
B
That
indicates
that
usage
of
these
systems
is
just
exploding,
so
you
can
kind
of
you
feel
it
anecdotally.
When
you
talk
to
engineers
and
then
you
can
see
it
verifiably
in
metrics
in
private
and
public
markets,.
A
Got
it
well,
I
I
digress
a
little
bit
so,
as
we
tell
your
story,
you
you
you've
now
got
a
workflow
plus
plus
kind
of
system
coming
together
and
and
you're
now
kind
of
engaged
you've
been
engaging
with
kind
of
these.
These
partners
that
you've
been
working
with
for
some
time.
You
know
what
are
some
of
the
learnings.
What
are
some
of
the
things
you
see
in
the
field
as
you
as
you,
people
engage
with
dax
for
the
for,
for
the
first
time
like
what?
A
B
Well,
I
think
the
first
thing
that
people
feel
in
their
you
know
the
gut
punch
so
to
speak
is
that
the
spin-up
process
is
effectively
instantaneous.
You
don't
have
to
you,
write
six
lines
of
code
and
you
type
one
thing
in
your
command
line
and
you
have
our
full
graphical
tool
load
it
up
and
you
can
start
executing
your
pipelines
immediately
and
that's
just
a
dramatically
faster
and
easier
spin-up
experience
than
anything,
and
that
you
know
that's
just
not
like
it
doesn't
just
save
time.
B
It's
also
kind
of
an
indication
of
the
value
proposition
of
the
system
that
we've
really
thought
about
local
development
and
ensuring
that
individual
developers
and
practitioners
who
are
using
the
orchestrator
and
developing
these
graphs
feel
empowered.
I
mean
like
they
can
just
locally
develop,
they
can
do
everything
themselves
and
then
we
have
a
fast
developer
loop.
So
you
know
in
their
slack.
It's
like
people
are
like.
Oh
my
god,
I
can
actually
execute
one
of
my
tasks
in
isolation
and
test
all
this
business
logic
without
pushing
this
to
prod.
B
So
I
think
you
know,
I
think
it
was
right
for
us
to
focus
on
this
spin-up
and
local
development
experience
and
testing,
because
people
really
kind
of
feel
that
immediately
and
then
the
other
thing
that
you
know
you
know
developers
deserve
well-designed
consumer-grade
tooling.
They
live
in
these
tools,
they're
very
important
constituency,
so
we've
really
put
a
lot
of
time
and
effort
into
making
the
ergonomics
of
our
ui
very
nice
and
people
definitely
appreciate
that.
So
that's
another
thing,
another
kind
of
learning.
B
I
guess
the
lesson
like
the
word
lesson
better
than
learning
the
is
is
yeah.
I
never
underestimate
your
developers,
especially
these
open
source
communities.
You
know
it's
just
really
exciting,
to
engage
with
people
and
have
them
surprise
you.
You
know
like
one
of
our
partners.
B
My
team
is
probably,
can
you
stop
talking
about
them,
but
I
I
love
the
way
they
use
product
as
a
good
eggs
and
they
have
they're
doing
all
sorts
of
crazy
stuff.
My
favorite
is
that
they've
actually
trained
ops
people
who
work
on
their
warehouse
floor
to
use
our
tool
to
manually
kick
off,
computations
and
retry
them,
because
they
have
people
manually
entering
google
spreadsheets
about
stuff,
and
then
they
want
to
ingest
that
and
the
data
platform
team
did
some
work
using
dexter
to
make
it
so
that
they
could.
B
You
know,
correct
errors
in
the
google
sheets
and
kick
off
pipelines
and
then
report
back
to
the
people.
Hey.
You
have
to
fix
this
and
then
re-kick
it
off,
and
they
can
do
that
with
no
intervention
from
the
data
platform
team,
so
we're
able
to
enable
self-service
operations
across
much
broader
use
cases
than
I
would
have
anticipated,
and
you
know-
and
I
think
that's
part
of
you
know
it
is
kind
of
related
to
that.
B
The
wide
spectrum
of
people
who
use
these
data
platforms
in
the
enterprise
is
crazy.
So
this
is
nothing
but
daxter,
but
a
story
kicks
out
or
stands
out
from
netflix.
They
have
an
internal
notebooking
platform
based
on
technology
called
paper
mill
and
they
actually
have.
I
always
I
always.
I
always
need
to
double
check
this.
I
probably
will
email
him
matt
seal
he's
now
the
cto
of
notable,
but
I
always
like-
because
I
I
quote
this
stat-
I
almost
don't
believe
it,
but
they
have
something
like
15
of
all.
A
B
A
Something
to
some
dude
everyone
I
mean
in
part
everyone's
job
is
to
like
make
sense
of
data
and
then
show
that
their
their
work.
They
validate
their
work
through
through
data.
B
Yeah,
so
it's
interesting,
you
know,
there's
just
like
in
terms
of
software
systems.
I've
worked
on
there's
this.
It's
this
very
interesting
mix
of
having
some
of
the
users
require
deep
technical
complexity
and
a
deep
engagement
with
the
product,
but
they
also
are
serving
entire
wide
swaths
of
other
constituents
of
the
companies
who
might
be
less
technical,
but
still
need
to
engage
with
these
systems
in
in
some
sort
of
coherent
manner.
That's
not
just
emailing
around
excel
spreadsheets
or
whatever.
B
So
you
know,
the
heterogeneity
of
the
populations
we
serve
has
also
been
kind
of
a
an
interesting
takeaway.
A
Yeah
maybe
say
a
few
words
on
that,
like
I
think
some
of
us
are
used
to
organizations
where
there's
like
a
central
team
that
that
runs
all
the
production
data
workflows
and-
and
you
know,
data
scientists
might
generate
things
and
kind
of
hand
it
over
to
them
or
or
maybe
someone
on
the
bi
team
might
write
new
queries
and
and
but
if
they,
if
they
want
to
be
scheduled
or
kind
of,
have
some
maintenance,
we
hand
them
over
to
the
kind
of
central
data
team.
A
Is
this
how
things
still
operate
with
dagster
or
what
does
those
roles
change.
B
I
mean
yeah
people
use
the
tools
in
different
ways,
but
you
know
the
the
the
early
kind
of
cohorts
of
the
user
base,
who
it
really
clicked
with
were
people
that
described
themselves
as
data
platform
engineers
or
who,
in
reality,
act
as
data
platform
engineers
and
their
entire
mission
of
life
is
to
make
it
so
that
practitioners
can
self-serve
their
own.
You
know
data
products
effectively,
so
there's
this
platform
relationship
between,
because
you
know
what
you
don't
want
is
this
old
world.
B
We
had
data
scientists
and
then
they
would
like
prototypes
and
stuff
in
notebooks
and
then
throw
it
over
the
wall
to
data
engineers
would
then
productionize
it.
This
is
very
similar
to
the
artificial
distinction.
20
years
ago,
between
dev
and
test,
where,
like
you'd
have
developers,
you
had
no
responsibility
for
testing
their
code.
Other
humans
would
do
that.
There'd,
be
this
like
now
in
retrospect,
totally
insane
system
and
in
20
years
of
the
this
whole
like
data
scientists,
do
something
and
throw
it
over
the
wall.
B
B
So
you
know
either
you
know
the
the
people
who
really,
I
think,
embrace
stagster
and
are
really
getting
a
lot
of
value
from
it
or
are
either
people
or
teams
where
those
two
jobs
are
considered
kind
of
two
separate
things,
but
the
but
yeah.
You
know
the
the
world
should
be
moving
to
empowering
stakeholders
to
own
their
paid
products
and
m
and
just
an
example
other
tool
which
I
think
fully
embraces
this,
which
is
kind
of
an
open
source.
Hidden
data
is
dbt,
which
you
know
instead
of
saying
like.
B
B
You
take
analysts
and
you
put
them
in
a
software
engineering
process
with
a
tool
that
intuitively
makes
sense
to
them
and,
as
a
result,
you
know
one
it's
great
for
those
people's
careers
because
they
become
engineers
and
they
can
kind
of
control
their
own
destiny
in
a
way,
but
it
just
makes
the
systems
work
better
and
typically,
you
have
a
layer
of
people
in
a
dbt
workflow
who
are
owning
their
data
products
in
the
data
warehouse,
end-to-end
and
then
they're
operationalized
within
the
context
of
a
broad
data
platform.
B
So
we're
you
know
in
a
lot
of
ways.
You
know
one
of
our
one
of
our
users
kind
of
told
us
yeah
what
dbc
did
for
our
sequel.
Daxter
did
for
our
python.
You
know
the
users
who
use
our
python-
and
you
know
we're
kind
of
like
we
consider
ourselves
very
aligned
values
with
dbt.
So
we
took
that
as
a
very
a
very
strong
compliment.
A
No,
it
makes
a
lot
and
I
think,
for
those
who
know
dbt.
That
makes
a
lot
more
sense
and
it
helps
helps
understand
the
situation
you.
You
said
something
earlier
that
I've
been
thinking
about
since
that
the
these
data
products,
these
data
workflows
become
an
application.
A
Like
does
a
daxter
app
span
being
that
these
all
these
processes
sometimes
are
intertwined
with
other
processes
throughout
an
organization
you
know
what
what's
the
bounds
of
of
a
daxter
application
is,
that
is
that
the
whole
organization,
or
or
do
you
find
they're
kind
of
discreet
apps
within
organizations?
That's
a
great.
B
Question
and
we're
still
figuring
out
the
right
way
to
articulate
this,
so
I
guess
I'll
spin
test
this,
but
you
know
effectively,
we
see
yeah,
we
define
a
data
application
as
graphs
of
computation
that
consume
and
produce
data
assets.
That's
what
they
do
so
you
know
we
kind
of
think
of
like
you
know,
we
call
it
a
pipeline
because
it's
more
familiar
to
people,
but
that's
what
we
think
of
as
the
application
is
kind
of
this,
like
limited
set
of
pipelines
within
a
single
team
that
consume
and
produce
data.
B
They
want
loose
coupling
between
those
things,
so
we
kind
of
think
of
a
data
platform
as
a
graph
of
data
applications
and
those
themselves
are
then
graphs
of
kind
of
more
granular
compute
that
people
structure,
so
the
you
know,
that's
how
we
kind
of
think
of
it.
But
you
know
it's
not.
We
don't
want
to
like
try
to
redefine
every
single
noun
that
people
know
as
a
data
application.
B
A
But
but
yes,
but
yet
in
many
ways
this
is
this
is
how
the
world
has
gone
with
applications.
I
mean
like
defining.
We
ran
into
this
when
I
was
at
google
cloud
defining
what
an
application
was
gets
a
little
tricky
today,
because
microservices
rely
on
other
services
throughout
an
organization
such
that
yeah.
A
You
just
have
kind
of
a
a
whole
slew
of
of
interconnected
micro
services,
and
I
think
we
we
ended
up
just
calling
them
services
in
the
end,
because
applications
was,
it
was
a
trippy
tricky
word
to
use,
and-
and
so
I,
the
the
analogy-
continues
and
reinforces
the
fact
that
you're
onto
something
in
that
you're
following
a
similar
path
as
the
application
world.
B
Yeah,
well,
I
hope
it
makes
sense
to
people.
You
know
it's
a
you
know.
Defining
terminology
is
always
a
when
you're
in
the
process
of
a
domain.
That's
rapidly
reinventing
itself,
there's
often
this
kind
of
cambrian
explosion
of
terms
that
can
be
difficult
for
both
customers
and
users
and
the
companies
themselves
to
navigate
so
it's
you
know
it
sounds
like
you
experienced
some
more
challenge
at
google.
A
Yeah
yeah
yeah
so
nick
as
as
we
kind
of
near
the
end
of
our
conversation,
take
us
to
where
the
project
is
today
and
any
kind
of
things
that
we
should
be
looking
forward
to
and
and
wrap
with
how
people
can
get
involved.
If
they,
you
know
just
totally.
B
So
if
you
want
to
get
involved,
you
know
we're
an
open
source
project,
so
we
have
a
github
we
sign
into
our
slack
and
then
we
have
a
really
high
quality
doc
site.
So
those
are
kind
of
the
three
primary
touch
points.
B
B
So,
although
we
have
nothing
to
announce
there,
it's
certainly
top
of
mind
shall
we
say-
and
you
know
that's
actually
been
interesting
to
see
as
well,
because
there's
an
increasingly
you
know
what
we
see
increasingly
true
is
that
having
a
hosted
product
and
a
commercial
solution
is
increasingly
a
prerequisite
for
adoption,
rather
than
people
self-hosting
something
and
like
pulling
teeth
to
get
them
to
pay
a
company
money
to
do
stuff,
which
is
an
interesting
shift
in
the
industry
three
or
four
years
ago.
B
That
was
not
true
at
all,
and
so
that's
interesting.
You
know
we
are
really
doubling
down.
One
thing
we're
really
excited
about
is,
you
know,
we're
the
only
orchestrator
that
has
integrated
the
data
observability
capabilities,
and
we
think
this
is
a
really
powerful.
B
You
know
element
of
the
system,
so
you
know
the
the
entire
reason
why
the
system
exists
is
to
produce
data
products
and
to
us
it
didn't
make
sense
to
not
have
an
orchestrator
not
be
aware
of
the
data
products
they're
producing
and
it
it
enables.
You
know
incredibly
simple,
intuitive
but
use
cases
that
you
can't
do
anywhere
else.
So,
for
example,
because
we
have
pipelines
in
the
system
that
are
aware
that
they're
producing
a
specific
database
table
or
data
lake
table
somewhere,
the
people
who
consume
that
table
generally
have
no
idea
what
pipeline
produces
it.
B
They
don't
know
they
don't
care,
no
one
cares,
but
they
still
need
to
know
who
to
talk
to
or
what's
happening
in
this
thing.
If
they
see
it
go
wrong.
So
we
have
this
simple
tool:
it's
called
an
asset
catalog.
You
can
go
into
dagster,
you
type
in
the
name
of
your
table.
It
pops
up,
you
can
click
on
it.
You
see
graphs
about
it.
B
You
can
see
what
pipeline
produced
it
when
it
produced
it
last
you
can
see
the
lineage
of
it,
meaning
like
what
assets
it
came
from
and
that's
very
straightforward
to
us
to
encode,
because
we
already
have
the
dependency
graph
of
computation,
so
it's
actually
very
straightforward
to
derive
the
asset
lineage
graph,
and
this
is
one
of
the
capabilities
that
has
really
enabled
kind
of
less
traditional
stakeholders
to
engage
in
the
system,
because
you
can
even
kind
of
an
op.
B
B
They
care
about
their
assets,
so
very
excited
about
these
metadata
capabilities
built
on
this
so
and
we've
organized
the
company
actually
around
kind
of
the
dynamic
I
told
you,
which
is
there's
people
who
are
responsible
for
data
assets
and
there's
people
responsible
for
support,
building
the
platforms
that
support
those
people.
So
we
have
kind
of
a
platform
team
and
a
practitioner
team,
and
that
is
their
goal
and
their
mission.
A
Fantastic
yeah,
I
you
you
mentioned
something
there
that
that
sparked
a
thought.
Will
you.
A
I
noticed
from
the
beginning
of
daxter
that
you
played
nice
with
other
tools,
which
was
important
because
the
data
landscape
is
kind
of
so
you
know
there's
so
many
like
random
things
talking
to
other
things,
but
a
lot
of
these
the
value
propositions,
you're,
offering
I
mean
like
like
the
more
things
I
do
in
daxter,
the
more
metadata
I
get,
the
more
my
assets
make
sense,
the
more
lineage
and
these
these
benefits
kind
of
compound.
A
I
I
imagine,
that's
that's
kind
of
the
the
growth
path
as
an
organization
like
a
team
adds
you.
It
complements
the
existing
things
that
teams
pretty
excited
because
it
works
nicely
and
then
and
then
folks
discover
how
they
can.
You
know
the
more
they
do,
indexer
the
more
they
get.
These
compounding
benefits
of
of
metadata
lineage.
B
Yeah
totally
yeah
we're
still
early
on
the
journey
right,
there's
a
lot
of
stuff
in
dexter
and
we're
always
working
on
kind
of
what
we
call
the
progressive
disclosure
problem
and
like
introducing
these
things
sequentially
in
a
place
that
makes
sense.
I
also
want
to
make
clear
that,
even
though
we
have
like
capabilities
in
the
asset
domain,
it's
not
like
we're
trying
to
take
over
that
world
yeah.
We
call
our
asset
catalog.
An
operational
asset.
Catalog
means
like
very
specifically
the
way
the
asset
the
assets
relate
to
the
orchestrator.
B
We,
you
know
we
fully
expect
to
integrate
with
tools
like
a
mudsin
and
data
hub
which
have
our
entire
different
products,
which
have
way
more
complex
ontologies
than
we
want
to
support.
You
know
likewise
we're
aware
of
data
quality,
but
we
have
no
interest
in
competing
with
the
likes
of
free
expectations
of
monte,
carlo
et
cetera.
You
know
we
want
to
be
able
to
integrate
with
these
tools.
B
B
Has
to
play,
for
example,
data
quality
is
interesting
right.
Data
quality
has
to
play
with
an
orchestrator,
because
you
have
to
schedule
the
data
quality
tester
and
the
computation
and
there's
actually
a
lot
of
value
in
the
orchestra
kind
of
knowing
about
that.
And
but
you
know
we
also.
You
know
like,
for
example,
great
expectations
which
tool
I
know
well
has
an
entire
world
of
they
have
their
own
dsl
for
specifying
all
these
declarative
data
quality
tests
and
the
sas
tool
that
they're
working
on
built.
B
On
top
of
that-
and
you
know,
that's
like
way
outside
the
scope
of
what
we
want
to
work
on,
so
we
want
to
you,
know
plug
into
that
and
have
a
great
place
for
them
to
notch
into
and
give
users
a
ton
of
control
how
they
use
a
tool
like
great
expectations,
but
but
yeah
yeah.
We
want
to
like
expose
the
capabilities
and
make
what
we
think
it
makes
sense
to
vertically
integrate
with
an
orchestration
system,
but
not
supplant
and
take
over
the
entire
world.
A
No
makes
a
lot
of
sense,
fantastic
nick.
Thank
you
so
much
for
the
discussion
today,
a
whirlwind
of
experience
in
the
last
few
years,
since
we
met
appreciate
you
coming
on
the
show
totally.