►
From YouTube: Fireside Chat: Nick Schrock (Founder & CEO, Elementl) with Matt Turck (Partner, FirstMark)
Description
Nick Schrock, Founder and CEO of Elementl, spoke with FirstMark's Matt Turck in a virtual fireside chat at Data Driven NYC in June 2021. They spoke about Dagster, open source, and much more.
Data Driven NYC is a monthly event covering Big Data and data-driven products and startups, hosted by Matt Turck, partner at FirstMark Capital.
A
Nick
schrock
is
the
founder
of
elementor
prior
to
elemental,
which
is
a
company
behind
the
popular
open
source
project.
Daxter
nick
was
a
principal
engineer
and
director
at
facebook,
from
2009
to
2017,
where
he
founded
the
product
infrastructure
team
and
co-created
graphql.
Another
very
popular,
open
source
project.
A
Welcome
nick
thanks
for
having
me
very
good,
so
I'd
love
to
start
with
a
discussion
of
the
current
state
of
the
data
ecosystem,
basically
the
premise
for
the
creation
of
of
daxter
and
then
and
then
elemental
what
what
was
the
problems
that
you
saw
that
you
wanted
to
address.
B
Yeah,
that's
a
great
question.
It
really
is
a
fundamental
thing.
So
if
you
go
to,
for
example,
a
a
conference
about
machine
learning
or
data
science,
you
often
hear
people
say
that
I
spend
90
of
my
time.
Data,
cleaning
and
10
of
my
time
doing
my
job,
and
that
represents
a
fundamental
problem
in
the
entire
ecosystem.
When
people
feel
like
they're
not
doing
what
they're
supposed
to
be
doing,
and
they
might
express
that
in
terms
of
that
pain.
B
But
it's
actually
a
more
often
a
more
complex
issue
than
that,
and
the
analogy
that
I
had
in
my
head
was
the
to
front-end
engineering
and
then
early
2010s,
and
where
what
you
would
hear
people
say
there
is
that
I
spend
90
of
my
time
fighting
the
browser
and
10
of
my
time
doing
my
job
and
for
those
of
you
who
knew
any
web
developers
in
2010,
you
mentioned
ie6.
B
You
might
get
an
intensely
emotional
reaction
in
terms
of
browsers,
for
example,
and
that
kind
of
stuff
doesn't
happen
anymore,
not
just
because
ie6
doesn't
exist
and
it
was
really
the
browsers
had
to
get
better.
But
it
was
really
a
holistic
ecosystem
problem
and
it
required
tools
like
on
many
different
dimensions.
A
It
and
and
just
do
double
click
on
on
that.
I
think
you
said
if
you
look
at
the
evolution
of
the
data
space,
that
a
lot
of
the
sort
of
early
wins
and
data
in
the
data
space
over
the
last
few
years
have
already
been
around
managing
scale
and
that
we're
now
switching
to
a
phase
where
the
primary
challenges
are
higher
in
the
stack
around
productivity
testing
integration
is.
That
is
that
is
that
fair.
B
I
think
that's
fair,
you
can
think
of
it,
sort
of
like
maslow's
hierarchy
of
needs
and
when
the
amount
of
data
started
to
explode,
there
was
a
very
critical
need
that
you
couldn't
even
process
it
on
a
technical
level
like
you
could
not
scale
out
compute
and
that's
why
you
know
in
the
big
data
revolution,
you
called
it
big
data.
You
started
with,
you,
know,
hadoop
and
then
spark
and
now
the
cloud
data
warehouse.
B
So
now,
if
you
have
petabytes
of
data,
you
are
able
to
process
that
efficiently
at
scale,
but
now
the
problems
are
kind
of
higher
up
on
maslow's
hierarchy.
So
to
speak.
So
now
it's
okay.
We
can
actually
process
the
data.
Okay.
What
is
it
does
it
have?
High
quality?
Are
the
people
that
actually
build
the
computations
that
process
that
data
productive?
Can
you
track
that
data
throughout
the
ecosystem
and
so
on
and
so
forth?
So
you
know
you
know,
there's
an
amazing
engineering.
B
Achievement
happened
in
the
early
to
you
know
throughout
the
2010s,
which
was
solving
these
massive
pure
technical
scale
problems,
but
now
we're
talking
about
organizational
scale,
dealing
with
complexity
and
dealing
with
developer
productivity
and
any
number
of
other
dimensions.
B
Yes,
the
the
favorite
question
any
engineer
gets
in
terms
of
explaining
what
they
do
without
confusing
the
hell
out
of
her
voice.
The
the
way
I
would
explain
it
is
that
these
data
systems
involve
multiple
types
of
people.
So
often
you
would
have
say
a
data
engineer,
an
analytics
engineer
and
a
data
scientist.
They
use
different
tools,
they
have
different
skill
sets,
but
in
these
systems
all
data
has
to
come
from
somewhere
and
go
somewhere.
B
B
What
would
literally
happen
is
that
station
one
in
the
factory
would
take
usually
two
hours
to
complete
its
task
and
then
the
next
person
would
start
after
that,
but
there's
no
way
to
actually
manage
those
dependencies.
So
literally
it
would
say.
Okay,
usually
it
passes
in
two
hours.
The
first
thing
started
at
1am,
I'm
going
to
start
every
day
at
4
am
so
in
the
factory
analogy.
A
B
Right
so
I
said
that
the
the
one
example
where
there's
one
assembly
line
and
three
stations:
well,
that's
not
the
way
that
ends
up
working.
You
end
up
having
thousands
of
assembly
lines,
sometimes
with
thousands
of
stations,
and
they
have
all
sorts
of
crazy
interconnections
and
then
what's
even
more
kind
of
interesting
about
it,
is
that
I
describe
the
interactions
between
one
person
to
the
next.
B
It
happens
both
on
a
macro
scale,
meaning
like
they
also
interrelate
teams,
but
also
the
individual
practitioner
will
build
their
own
little
assembly
line
because
it
makes
sense
for
them
to
do
it.
You
know,
as
they're
figuring
out
what
their
data
is,
they
might
say
like.
Oh,
this
is
an
intermediate
data
product
that
will
be
generally
useful
and
I
kind
of
want
to
have
like
a
checkpoint
right
in
the
process.
B
So
what
ends
up
happening
is
that
the
these
systems
explode
in
complexity,
and
you
need
to
be
able
to
take
the
assembly
line
offline
test
it
on
test
data.
Put
it
back.
If
something
goes
wrong,
you
need
to
halt
the
assembly
line.
Maybe
start
it
from
a
certain
point,
move
it
around,
and
then
it
also
makes
sense
for
this
assembly
line
to
be
aware
of
and
track
the
things
that
are
actually
coming
out
of
it.
So,
like
okay,
I
have
this
widget
over
here
it
came
from
this
previous
intermediate
which
came
from
this
etc.
B
So
it's
we
think
this
like
orchestrator,
is
really
the
central
leverage
point
that
makes
sense
to
be
the
data
platform
and
also
it's
the
place
where
people
actually
build
their
machines
sort
of
software
programs,
but
continue
with
the
factory
analysis.
This
is
where
they
build
their
machines
actually,
and
so
it's
critical,
not
just
to
think
of
it
as
a
operational
tool,
but
as
a
place
where
people
can
productively
work.
A
Great
and
and
so
when
did
you
start
daxter
and
like
tell
us
about
the
the
project.
B
Yeah,
so
you
know,
I
formed
the
company
kind
of
an
exploratory
sense
in
2018,
but
we
didn't
really
start
working
on
it
in
earnest
until
late
2018
early
2019,
and
then
we
launched
the
project
the
summer
of
2019
publicly.
B
You
know
so
the
I
guess
the
question
was
kind
of.
How
did
it
start?
You
know
I
was
looking
around
and
exploring
you
know
taking
an
interest
in
the
space
and
talking
a
lot
to
people.
B
You
know
I
talked
a
lot
with
abe
and
one
of
the
next
panelists
a
lot
in
this
time
and
that's
where
we
really
honed
in
as
orchestration
as
a
critical
part,
and
so
I
just
started
building
you
know
experimenting
with
with
stuff
building
out
the
open,
getting
feedback
working
with
people,
and
that
was
really
the
genesis
of
the
project.
B
So
you
know,
and
the
genesis
of
the
project
was
really
the
the
the
insight
that
you
know
you
should
people
think
of
data
sets
as
physical
things
right
like
a
like
a
table
in
a
database
right,
but
in
the
modern
world,
where
you're
really
applying
software
engineering
processes
to
data
all
that
data
ends
up
being
computed,
and
so
what
we
thought
really
is
that
you
should
move
the
primary
focus
of
energy
from
that
produced
data
set
to
the
process
that
produces
it,
because,
maybe
like
you
throw
away
that
data
set,
you
need
to
recompute
it
right
reproduce
it.
B
It's
really
the
the
computation
which
matters.
That's
the
primary
focus,
and
you
know
we
thought
that
you
could
really
make
you
know
almost
like
the
focus
of
the
orchestrator
sort
of
this,
like
virtual
data
set
concept
and
that
was
kind
of
the
genesis
project.
A
Which
I'd
love
to
continue
down
that
path
and
and
maybe
double
click
on
how
dexter
works
like
some
high-level
easy
concept,
they
would
say,
for
example,
the
concept
of
a
solid,
which
is
your,
I
think,
your
atomic
units.
Do
you
wanna
talk
about
that
and
how
the
extra
works
in
general.
B
Sure
so
dexter
works
is
an
open
source
python
project.
So
you
know,
if
you're
you
know,
if
you're
a
programmer,
you
can
just
type
a
simple
command:
pip
install
dagster
and
you're
off
to
the
races.
You
can
write
a
little
bit
of
code
and
then
you
know
what
you
effectively
do.
Is
you
build
these
these
functions?
B
We
call
them
solids
which
define
like
a
computation,
meaning
like
a
step
in
the
factory
to
keep
on
that
analogy,
and
then
you
can
construct
graphs
out
of
that
and
the
moment
that
you
use
our
apis
and
structure
your
code.
In
this
way
you
immediately
have
access
to
all
sorts
of
tooling.
So
without
any
infrastructure
on
your
laptop,
you
don't
need
to
deploy
anything.
B
And
then
you
can
also
use
that
tool
as
almost
like
a
local
ide
for
these
graphs
and
then
you
effectively,
you
know
one
of
the
things
that
we
really
focus
on
at
dexter
is
being
able
to
execute
these
computations
these
programs
in
different
environments.
You
can
like
develop
on
your
laptop
and
then
also
deploy
it
to
a
deployed.
B
You
know
piece
of
infrastructure
and
not
have
to
change
the
core
business
logic,
and
this
is
really
something
we
focused
on
and
that's
a
critical
piece
of
this.
So
you
know
you
develop
locally.
You
have
this
fast
local
development
loop.
You
can
test
things,
then
you
can
deploy
it
to
any
infrastructure
and
then,
from
that
point,
you're
scheduling
your
computations,
so
you're
saying
like
I
want
to
run
this
every
day
or
in
a
first-class
way.
B
I
don't
want
to
run
this
whenever
something
upstream
changes,
and
then
you
have
all
sorts
of
you
know
what
we
consider
consumer-grade
tools
to
monitor
and
observe
those
computations,
so
you
can
track
both
like
you
know
the
process
as
it
unfolds
using
a
live
like
gantt
chart
viewer
for
example,
and
then
we
also
track
the
assets
that
are
produced
by
those
and
we
can
back
link
and
say,
like.
B
Oh,
this
thing
came
from
this
computation,
which
is
extremely
useful
when
you're
actually
figuring
out
what's
going
on
these
systems,
so
that's
kind
of
we
we
take
this
through
the
entire
process
and
thinking
about
like
okay.
What's
the
fast
developer
and
test
lifecycle,
there's
a
huge
opportunity
there
for
massive
improvements.
B
You
know
how
do
you
deploy
it
reliably?
Allow
multiple
teams
to
use
common
infrastructure,
that's
a
critical
thing.
We
think
that
orchestrator's
standard
data
platform
and
then
lastly,
monitoring
and
observing
those
things
both
the
computations
and
the
produce
essence
assets
being
like
a
table
ml
model,
any
physical
materialization
of
something.
A
Great
who's,
a
good
customer
or
type
of
user
for
daxter
like
do
you
need
to
have
a
certain
type
of
infrastructure
in
place?
Do
you
need
to
be
a
certain
size?
Do
you
need
to
have
like
a
specific
type
of
talent
on
the
team
who's,
a
good
sort
of
profile.
B
Yeah,
so
you
know,
we've
really
found
two
classes
of
user
who
really
gravitate
toward
the
system.
B
One
is
what
we
consider
an
emerging
title
of
data
platform
engineer
a
lot
of
people
self-title
that
way
a
lot
of
data
engineers
act
as
data
platform
engineers,
but
what
they
see
is
that
hey
inside
every
company
there's
a
data
platform,
whether
they
acknowledge
it
or
not,
and
this
data
platform
is
where
all
these
people
come
together
and
all
the
different
like
the
data
engineers
can
work
with
the
data
scientists.
All
this
stuff
can
execute
on
time.
B
You
can
have
like
one
single
management
of
all
the
important
data
and
all
the
heterogeneous
tools,
and
so
that
user,
maybe
they'll
start
out
and
they'll
say.
Okay,
all
we
need
is
like
an
ingest
tool
like
vitram.
A
B
What's
called
the
modern
data
stack
and
need
to
do
anything
outside
of
that,
they
need
an
orchestrator
and
they
want
an
orchestrator,
that's
kind
of
in
line
with
the
values
of
those
tools
and
a
lot
of
those
users
have
like
really
gravitated
towards
decks
or,
like
one
of
our
users,
said
like
what
what
dbt
did
for
our
sequel,
dag
suited
for
our
python
in
very
concrete
ways-
and
you
know-
and
I
would
say
the
second
type
of
user-
is
that
we
really
find
starting
to
gravitate
towards
this.
B
Are
people
building
end-to-end
model
training
pipelines,
so
they
want
to
work
in
python.
They
want
to
use
tools
like
pandas
scikit-learn.
They
want
a
fast
development
workflow,
they
want
an
orchestrator
because
they
need
it,
and
this
tool
really
speaks
to
them
and
often
they
have
to
roll
their
own
infrastructure.
B
And
you
know
one
of
the
things
that
dagster
does
is
really.
You
know.
We
attempt
to
thoughtfully
really
think
about
the
the
interface
between
what
we
call
practitioners,
those
who
are
responsible
for
the
production
of
data
assets
and
infrastructure.
Folks
and
those
are
kind
of
like
two
jobs,
and
sometimes
one
human
is
doing
those
two
jobs
and
the
way
to
make
that
manageable
is
to
have
the
nice
software
abstractions
that
deal
with
that.
A
Great
thanks
and
help
us
understand
how
you
positioned
daxter
in
the
orchestrator
segment.
I
mean
it's,
it's
it's.
You
know
like
all
categories
in
the
in
the
data
world,
there
are
other
folks
we
had
jeremiah
from
prefect,
for
example,
at
a
prior
event,
there's
air
flow
and
then
there's
like
all
the
historical
ones
like
luigi
we
had
kedro
as
well
at
the
event
like
so
for
folks.
What's
what
are
the
bright
lines
in
terms
of
like
thinking
of
daxter,
in
comparison
to
some
of
those
other
folks.
B
It's
a
great
question
and,
as
you
know,
positioning
is
always
an
evolving
art,
but
I
think
the
primary
difference
and
I'll
focus
on
airflow
and
prefix
and
she
started
with
those-
and
you
know,
airflow,
is
definitely
the
dominant
incumbent
in
the
space
as
it's
traditionally
defined
is
that
they,
you
know
in
terms
of
the
they
don't
consider
the
full
life
cycle
of
developing
data
products.
B
They
view
their
mandate
as
very
narrow
and
purely
these
operational
use
cases
of
ordering.
This
comes
after
that
comes
after
that,
and
then
there's
operational
complexity
in
that.
So
you
need
to
know
how
to
retry
things
and
so
on
and
so
forth.
But
you
know
we
think
the
graphs
of
the
orchestrator
encodes
are
one
complex
enough,
that
they
need
a
full
local
development
life
cycle.
B
That's
fully
thought
out
and
two
are
in
fact
in
some
ways
the
structure
of
the
applications
themselves,
especially
the
ones
that
are
written
in
python,
and
so
we
really
think
about
the
dev
and
test
life
cycle.
We
also
want
to
be
data
aware,
and
so
that
means
that
one
we
have
data
dependencies
that
are
encoded,
meaning
that
you
know
not
only
do
we
order
the
machines,
but
we
know
that
upstream
that
you
know
this
input
comes
in
this
output
and
without
getting
into
details.
B
This
is
a
much
not
what
we
believe
is
much
more
natural
program
model
for
practitioners.
You
know
the
dexter
really
also
embraces
the
fact
that
these
technologies
are
inherently
multi-tenant,
meaning
that
multiple
teams
want
to
want
to
be
able
to
deploy
to
common
shared
infrastructure
and
airflow.
Just
wasn't
designed
like
that
kind
of
all
the
teams
share
a
python
environment.
B
These
are
all
very
technical
things,
but
effectively
one
team
can
push
up
one
mistake
and
bring
down
the
entire
system,
which
seems
like
a
bad
thing,
and
then
you
know
airflow
is
not
data
aware.
I
guess
I
consider,
like
you
know,
prefix
lineage
is
very
direct.
You
know
jeremiah
was
a
a
primary
contributor
to
airflow
wanted
to
make
some
changes.
One
wasn't
able
to
do
so
and
went
and
started,
went
and
started
prefect,
and
you
know
just
like
airflow
prefect
views
itself
is
a
very
strictly
operational
tool.
B
So
you
know
the
framework
is
about
positive
and
negative
engineering
right
which,
like
we
take
care
of
negative
engineering,
which
is
effectively
like
ordering
those
computations
retrying
things
taking
care
of
errors,
which
is
effectively
new
words
to
describe
exactly
what
airflow's
goal
is,
and
then
they
also
describe
their
system
as
a
insurance
company.
B
What's
not
about
that
about
that
is
that
insurance
companies
don't
make
you
more
productive?
I
mean
I've
never
met
an
insurance
company.
That
makes
you
happier
for
certain
and
it
doesn't
really
talk
about
the
full
end
on
process.
It
plugs
into
existing
system,
and
then
you
know
kind
of
deals
with
this
operational
complexity
and
that's
its
only
remit.
We
believe
that
orchestration
is
so
central
that
you
need
to
think
of
it
in
terms
of
a
wealthy
life
cycle
thing
and
especially
to
think
of
it
as
a
productivity
tool.
A
Great,
thank
you.
So
what's
next,
you
guys
are
a
thriving
startup
like
I
know
you
did
some
integrations
with
dbt,
with
great
expectations
that
we're
going
to
stick
with
in
a
few
minutes.
What's
next
on
the
roadmap
for
the
next,
you
know
year
or
so.
B
Yeah,
so
you
know
a
year's
a
long
time
as
a
startup,
and
so
you
can
never
predict
the
future.
You
know
we
are
a.
You
know.
I
think
we'll
get
to
this
in.
We
will
get
to
this,
but
you
know
it
is
a
venture-backed
commercial
company.
So
at
some
point
we
will
need
a
revenue
bottle
of
business
models,
so
I
think
it
would
be
without
a
spoiler
alert.
We
are.
We
are
working
on
that
aspect
of
the
business,
but
I
can't
talk
about
that
too
much
detail.
B
You
know
on
the
open
source
roadmap.
I
think
there's
two
things.
B
One
is
that
you
know:
we've
been
without
what
I'll
call
in
sort
of
a
open,
applied
r
d
phase
of
the
company,
where
we've
been
really
working
with
a
set
of
targeted
design
partners
and
not
hyper
focused
on
growing
adoption
and
too
quickly,
because
we
didn't
want
to
have
too
many
partners,
because
we
knew
the
technology
was
still
changing
a
bit
we're
about
to
land
yeah
we're
we're
about
to
land
some
changes
that
will
really
set
us
up
for
kind
of
a
pre
1.0
release
and
we've
learned
a
ton
from
the
last.
B
You
know
year
and
a
half
of
work,
and
we
really
are
feel
like
we're.
Settling
on
a
set
of
abstractions
and
concepts
will
serve
the
foundation
of
technology
for
years
and
years
to
come.
So
in
the
short
term.
That
is
really
our
focus,
and
I
think
beyond
that
you
know.
Once
we
have
that
stable
core,
that
we
feel
can
be
the
foundation
for
years
to
come.
B
You
know
we
also
want
to
focus
on
expanding
the
set
of
stakeholders
who
can
interact
with
the
system,
because
we've
seen
a
lot
of
early
users
do
this
really
effectively
where
non-technical
ops
folks
are
able
to
self-serve
workflows
without
any
intervention
from
the
data
platform
teams,
which
is
just
incredibly
valuable,
and
we
expect
to
double
down
on
things
like
that
yeah.
So
that's
kind
of
what
we're
thinking
in
very
broad
terms.
A
Very,
very
good,
okay.
Well
as
we
are
getting
close
to
the
end
of
the
allocated
time,
why
don't
we
finish
with
a
couple
of
sort
of
like
rapid
fire
questions
which
have
nothing
to
do
with
daxter,
so
first
question
outside
of
daxter?
What
is
a
something
in
the
data
ecosystem,
whether
that's
a
tool
or
a
project
or
a
company?
You,
you
just
love,
and
you
just
think
it's
the
coolest
thing.
B
B
I
think
I'll
give
a
probably
a
common
answer,
which
is
you
know.
Dbt
is
really
taking
over
a
big
part
of
the
stack
and
I've
known
that
team
since
2018,
when
I
they
gave
a
conference
talk
and-
and
I
ran
up
to
them
because
I
was
super
excited
what
they
were
saying,
because
I
felt
like
we
shared
a
lot
of
alignment
on
values
in
terms
of
the
way
you
can
structure
these
things.
B
You
have
data
dependencies
all
this
stuff,
but
they
were
very
specialized
for
the
very
specialized
for
sql
and,
what's
really
cool
is
I
think,
they're
really
on
the
forefront
of
saying
hey,
you
know
the
way
to
do
this
is
not
to
try
to
remove
analysts
from
the
equation.
The
way
to
do
this
is
to
bring
engineering
processes
into
their
life,
and
you
know
so-called
upskilled
them,
and
that
only
that
only
makes
it
more
productive.
It
actually
allows
them
to
re-title
themselves
be
sufficiently
different,
where
they
actually
like
make
more
money.
B
They
associate
their
career
with
technology,
and
I
really
admire
technologies
like
that.
I
feel,
like
kind
of
previous
work
in
front
end
is
like
that,
where
you
know
graphql
and,
like
you
know,
the
kind
of
a
sibling
team
react
like
that
shows
up
on
people's
resumes
and
they
really
invest
in
that
in
their
careers.
So
I
really
like
technologies
like
that.
That
kind
of
aren't
just
more
efficient
but
help
with
careers,
and
then
actually
you.
A
You're
going
to
say
one:
oh,
okay
and
last
question
one
minute
or
less:
how
do
you?
How
does
somebody
like
you
learn?
What
are
some
of
the
data?
You
know
learning
resources.
You
recommend
whether
that's
a
newsletter
or
podcast
or
conference,
just
like
whatever
comes
to
mind
like
one
or
two
like
just
quick
one,
podcast
totally.
B
B
I
really
like
software
engineering
daily
and
data
engineering,
podcast
and
then
the
other
thing
is
that
you
shouldn't
feel
you
shouldn't
fear
to
reach
out
to
someone
who's
on
one
of
these
podcasts
like
just
email
them
and
be
hey.
I
thought
what
you
said
was
super
interesting
as
someone
who's
on
podcast.
If
I
ever
get
that
email,
I'm
like
super
excited
to
talk
to
the
person,
so
both.
A
Okay,
wonderful,
look!
This
was
great
super
interesting.
You
know
this.
The
the
the
the
popularity
of
dexter
and
the
number
of
times
it
comes
into
conversation
is
is
really
impressive.
Considering
that,
ultimately,
you
guys
haven't
been
doing
this
for
a
very
long
time.
So
congratulations
on
everything
you've
built
and
you
know,
excited
to
see
how
you
progress
over
the
next.
You
know
a
few
months
and
years
and
hope
you
come
back
and
someone
soon
and
tell
us
about
all
the
great
things
you've
achieved.