►
From YouTube: Data Quality Meetup: Building Reliable Data Apps
Description
Max Gasner, Co-author of Dagster – data orchestration framework for ETL –talks about principles of building reliable data applications with Dagster. Sign up for the next Data Quality Meetup: https://bit.ly/3yiUH2H
Join our Meetup Group: https://www.meetup.com/data-quality-meetup/
A
Hi
everybody
good
morning,
good
afternoon
and
good
evening.
My
name
is
max.
I
I
work
at
elemental
and
I'm
a
core
developer
on
dagster,
which
is
an
open
source
data
orchestrator,
and
today,
I'd
like
to
share
kind
of
the
orchestrator's
perspective
on
the
broad
goal
that
we're
the
broad
goal
that
we're
all
kind
of
working
towards
of
building
more
reliable
data
apps.
So
by
a
data
app,
I
mean
a
graph
of
computations
that
consumes
and
produces
data
assets
where
the
nodes
in
the
graph
and
the
assets
they
produce
can
be
wildly
heterogeneous.
A
A
These
processes
span
persona
and
team
boundaries,
and
they
often
involve
multiple
compute
environments
and,
as
a
consequence
of
this,
everything
is
hard.
All
the
pieces
of
the
ordinary
software
development
life
cycle
are
hard.
Developing
and
testing
data.
Apps
is
hard.
Deploying
and
executing
them
is
hard.
Observing
their
operations
is
hard.
So
what
we're
trying
to
do
with
dagster
is
build
a
platform
that
makes
it
easier
to
work
with
the
graph
of
computations
and
assets
that
makes
up
a
modern
data
app
through
all
these
stages
of
the
application
life
cycle.
A
So
how
do
you
actually
build
a
system
that
makes
all
this
easier?
I
want
to
talk
about
a
couple
of
design
principles
and
how
they
cash
out
in
practice.
So,
first
of
all,
in
order
to
make
developing
and
testing
easy,
we
need
to
bring
some
of
the
lessons
from
software
engineering
into
the
data
domain.
So
for
lack
of
a
better
phrase.
This
is
what
we
call
functional
data
engineering.
What
it
means
in
practice
is
that
dags
should
be
typed,
so
that
issues
with
the
data
flowing
between
processes
can
be
caught
early.
A
A
The
application
needs
to
be
able
to
run
on
a
really
wide
range
of
systems
in
order
to
be
robust
in
the
presence
of
many
different
user
personas
tools
and
needs.
You
need
a
system
that
isolates
user
code
execution
by
design
so
that,
if
an
analyst
makes
an
error
in
one
pipeline
that
doesn't
take
down
the
production
cluster,
it's
isolated
to
that
particular
pipeline
and
you
need
the
right
basic
principles.
For
instance,
you
need
a
scheduling
system
that
allows
for
conditional
triggers
and
one-off
execution
without
hacks,
and
you
need
a
notion
of
execution.
A
A
It's
the
boundaries
between
the
testable
units
that
are
each
written
in
different
tools
that
we
think
are
some
of
the
biggest
problems
for
sort
of
the
quality
of
data
applications
today.
So
in
this
example,
we
have
dbt
models
and
dbt
tests
running
alongside
an
ingest
process
written
in
pure
python,
some
analysis
running
in
notebooks
and
a
process
that
uploads
uploads
some
results
to
slack
with
all
the
dependencies
between
these
processes
made
explicit.
A
My
chart
looks
weird
what
pipeline
want
run
produced
it
or
what
do?
The
last
five
charts?
Look
like,
and
finally,
you
need
to
build
in
a
rich
metadata
system
so
that
users
can
ask
detailed
questions
about
app
operations,
so
you
can
have,
for
instance,
longitudinal
views
of
data,
quality,
metrics
or
sql
execution
times.
A
So
hopefully,
if
we
kind
of
can
make
these
three
design
principles
real,
give
people
a
programming
model
that
makes
it
easier
to
do
the
right
thing
when
they're
writing
their
business
logic,
makes
it
easier
to
develop
and
test
that
logic.
Make
sure
that
the
orchestrator
really
is
the
kind
of
platform
people
need
to
be
able
to
integrate
with
and
expose
the
graph
as
an
interface
that
all
of
the
personas
using
the
data
application
can
use
to
self-serve
debug
and
observe
their
operations.
A
Then
the
world
of
the
the
difficulty
of
building
a
modern
data
application
is
going
to
get
a
little
bit
easier
and
that's
what
we're
working
towards
happy
to
take
questions
in
the
chat
and
thank
you
so
much.
I
hope
this
has
piqued
your
interest
about
dexter
and,
more
importantly,
giving
you
some
food
for
thought
about.
A
What's
coming
next
in
data
orchestration
and
how
the
orchestrator
itself
can
help
you
achieve
kind
of
higher
quality
in
your
data
systems,
thanks
so
much
and
please
be
in
touch
if
you'd
like
to
learn
more
about
what
we're
up
to.