►
From YouTube: VMware: Monitoring Driven Development with Dagster
Description
David Laing—Staff Data Engineer at VMware—present on monitoring-driven asset development
See full Feb 8, 2022 Community Meeting here: https://www.youtube.com/watch?v=fYJBN6MAtbE
A
Yeah
so,
as
sandy
mentioned
talking
about
monitoring
driven
development,
which
is
sort
of
a
play
on
the
ideas
of
test
driven
development
from
the
agile
software
development
movement,
so
so
I
think
that
in
the
data
engineering
space,
we're
kind
of
watching
a
set
of
paradigm
shifts
unfold
if
you're.
Like
me,
you
probably
started
with
a
kind
of
project
mindset,
so
you
were
thinking
about
projects
where
you
were
delivering
some
kind
of
data
artifact,
maybe
a
report
to
a
stakeholder.
A
There
was
always
they
wanted
some
some
level
of
some
number
of
features
by
some
specific
date,
and
we
all
know
how
that
goes
and
success
was
that
your
stakeholder
accepted
the
data.
You
know
artifact
that
that
you
gave
them,
and
then
you
moved
on
to
the
next
project,
then
I
think
as
the
ultimate
as
the
automation
tools
started
to
mature.
A
We
we
moved
to
what
I
call
the
the
pipelines
mindset,
so
we
were
basically
using
automation
to
do
the
manual
steps
that
we
had
been
doing
before,
but
we
still
had
something
of
a
a
project
mindset.
A
We
were
just
doing
more
automation
when
delivering
that
project,
so
we're
still
trying
to
deliver
something
automatically,
but
by
a
certain
date,
success
moved
from
acceptance
by
the
stakeholder
to
your
pipeline,
succeeding
or
not
succeeding,
and
when
you
were
finished,
you
moved
on
to
making
the
next
pipeline-
and
you
know
replay
that
cycle
a
couple
of
times
and
I'm
sure
we've
all
had
the
experience
where
you
have
a
whole
bunch
of
pipelines.
Some
of
them
are
failing
and
no
one's
quite
sure
who
cares
whether
the
failure
is
a
problem
or
not.
A
We
make
that
a
first
class
citizen,
which
we
talk
about
the
data
asset
that
our
team
maintains
and
that
our
stakeholders
use
for
things
like
drawing
those
reports
and
we
shift
into
a
sort
of
incremental
delivery
pattern
where
we
get
feedback
and
engagement
from
our
stakeholders,
and
we
use
that
to
guide
how
we
improve
that
data
asset.
A
We
also
shift
instead
of
having
automation,
that's
just
used
to
create
the
asset.
We
also
have
automation
that
validates
that
that
the
asset
is
available
and
has
the
quality
criteria
that
our
stakeholders
expect,
and
only
once
we
have
all
of
that
in
place.
Do
we
do
the
the
familiar
work
of
creating
pipelines
to
create
the
asset,
and
then
we
have
a
much
more
iterative
approach
where
we
look
at
well.
Are
our
stakeholders
asking
for
more
data
in
this
asset?
A
Let's
prioritize
that
work,
our
quality
monitors
showing
us
that
the
existing
asset
isn't
being
updated
frequently
enough
or
that
there's
errors.
Well,
let's
focus
our
attention
on
that.
A
A
The
pipeline
is
going
to
do
some
work,
so
do
the
daily
work
job
and
it
had
a
set
of
operations
that
we
strung
together
very
nicely
and
those
operations,
some
of
them's
kind
of
some
assets,
sort
of
kind
of
like
fell
out
of
the
bottom
of
of
this
pipeline,
but
our
focus
was
very
much
on
building
the
pipeline
is
the
pipeline
succeeding,
and
I
think
that
if
we
make
that
paradigm
shift
to
thinking
about
assets
which
the
new
features
in
dexter,
13
and
now
14
help
us
to
do,
we
can
we
can
shift
into
much
more
of
a
sort
of
data
product
focused
mindset,
and
we
can
start
by
publishing
an
mvp
asset
to
the
dexter
asset
catalog
and
putting
that
front
and
center
in
front
of
our
stakeholders
to
get
feedback
early
on
before
we've,
even
written
any
pipelines
to
generate
that
asset,
because,
as
we
all
know,
changes
at
this
stage
are
significantly
cheaper
than
changes
later
on
in
the
in
the
in
the
process.
A
We
also
can
start
by
writing.
This
is
the
kind
of
monitoring
driven
start
by
by
writing
the
monitors
that
validate
that
our
asset
is
some
level
of
quality
and
we
expect
it
to
fail
in
the
beginning,
but
now
we
have
something
that
we
know
once
we've
implemented
some
logic.
We
know
when
we're
finished,
because
now
these
monitors
are
succeeding.
A
It's
also
a
great
way
to
sort
of
demonstrate
to
our
stakeholders
why
our
data
assets
are
trustworthy,
we
can
sort
of
you
know,
show
them.
This
is
the
automation
that
we
have
that
validates
that
the
monitor
that
the
data
assets
are
good,
and
this
is
how
frequently
they're
good
and
how
frequently
they're
bad,
and
only
at
that
stage
do
we
get
on
to
actually
implementing
the
asset
creation
logic.
So
this
is
the
traditional.
A
You
know,
crate
crate,
pipeline
sort
of
steps,
but
it's
done
in
a
way
that
we
have
a
much
clearer
target
and
we
know
when
we've
reached
that
target
due
to
the
the
previous
two
steps,
and
then
we
don't
finish
and
move
on
to
the
next
sort
of
project
or
pipeline.
A
Now
we
think
of
now
we
iterate
on
that
on
that
asset
we
get
feedback
from
the
stakeholders
from
the
automation
and
we
use
that
to
prioritize
adding
more
data,
perhaps
improving
the
quality
and
so
forth,
and
so
success
in
this
paradigm
looks
like
a
team
owning
some
data
products
and
long-lived
data
assets
with
good
quality
and
monitoring.
And
you
always
know
you
know
what's
important
to
the
team,
where
you
should
focus
your
your
your
attention.
A
All
right,
that's
lots
of
theory.
Let
me
show
you
some
code,
so
this,
I
hope,
will
be
merged
into
the
standard
set
of
examples.
Once
this
pull
request
gets
merged
under
the
monitoring
driven
software
defined
assets
example.
A
So,
first
step
here
publish
the
the
mvp
asset
to
get
some
early
stakeholder
feedback.
So
we
want
the
simplest
possible
thing
that
we
can.
We
can
write
in
daxter.
A
We
want
to
write
some
logic
to
validate
the
properties
of
that
asset,
and
then
we
can
even
use
the
same
validation
graph
in
our
ex
to
sort
of
drive
our
unit
our
acceptance
tests
while
we're
developing.
A
A
Come
on
there
we
go
mouse
control
there.
We
go
all
right.
So
look
at
this,
so
we
want
to
kind
of
publish
this.
This
very
basic
asset
schema
with
no
actual
data
in
it
yet
into
our
asset,
catalog
and
get
our
our
stakeholders
engaged
about.
You
know
is
this
the
thing
that
they
were
expecting.
Is
this
the
definition
when
you
know
when
they
talk
about
a
customer
being
active,
for
example,
you
know:
is
this
data
that's
presenting
helpful
for
them?
A
A
They
can
come
back
to
this
asset
catalog
page
to
to
see
this.
You
know
the
state
of
the
data
asset,
so
here
we
go
so
we
filled
in
in
one
partition,
and
here
we
go.
We
can
see
that
you
know.
One
of
the
six
partitions
is
now
filled
with
some
test
data,
but
more
importantly
than
that
we
can
also
start
showing
them
how
we're
going
to
validate
the
quality
of
this
data.
A
So
in
this
case,
the
implementation
that
I
did
was
the
simplest
possible
thing.
I
just
I
made
a
view
in
my
in
my
data
mart
with
some
static
data
in
it.
So
it
just
happens
to
work
for
that
date,
but
obviously
it
won't
work
for
any
future
dates.
A
So
I
can
write
a
dagsta
pipeline
that
is
focused
not
so
much
on
the
creation
of
the
data,
but
on
the
validation
of
the
data.
That's
in
the
mart,
and
so
I
can
have
a
set
of
graphs
around
validating
specific
data
assets
and
within
those
they
can
have
a
particular
set
of
you
know
conditions,
so
it
should
conform
to
some
schema.
It
should
contain
perhaps
some
current
date
and
then
the
actual
implementation
of
these.
A
Is
so
here's
an
implementation
of
of
one
of
those
validation
operations,
the
it
should
contain
current
current
data,
and
this
is
just
you
know
you
can
put
whatever
logic
you
wish
in
here
to
to
validate
whatever
the
thing
that
you're
validating
is.
A
A
And
so
this
just
looks
like
a
regular
daxter
pipeline,
but
instead
of
it
having.
A
But
if,
if
something
went
wrong
in
the
in
the
asset
validation,
then
it
would
show
up
here
as
a
problem
against
this
particular
partition,
and
you
could
you
could
see
you
know
what
the
what
the
cause
of
that
of
that
problem
was.
A
Oh
and
then-
and
then
finally,
I
mentioned
that
we
can-
we
can
use
that.
You
know
this
same
the
same
job
that
we
ran
manually
over
here
and
that
we
can,
you
know,
use
daxter's
scheduling,
capabilities
to
run
for
us
automatically.
We
can
run
that
same
job,
but
kind
of
in
a
pi
test,
pi
test
context
to
help
drive
our
our
actual
implementation
development.
A
So
I'm
using
pi
test
parameter
the
pi
test
parameterize
feature
to
just
check
to
to
execute
this,
this
monitoring
job
and
then
validate
that
all
of
the
steps
succeed.
So
now
I
get
a
set
of
unit
tests
that
I
can
use
to
to
guide
my
development
implementation.
A
A
All
right,
so
that's
phase
one
just
to
recap.
So
we
talked
about
publishing
the
asset
as
one
of
the
first
things
that
you
do
in
the
asset
catalog,
and
what
we're
trying
to
do
is
get
early
feedback
from
our
stakeholders
and
trigger
the
kind
of
questions
that
that
you
know
sort
of
come
up.
Can
you
actually
access
this
location
in
the
data?
Mart
it's
a
big
one.
I
bumped
into
a
lot
the
perennial.
A
Oh,
what
do
you
mean
when
you
say
a
customer,
especially
if
you're
in
a
corporate
environment
you
know,
there's
a
customer,
the
subsidiary
or
the
parent
company
or
the
one
that
pays
the
bills,
and
you
can
have
conversations
about
about
the
required
grain
we've
assumed
daily
here.
Maybe
actually,
this
is
something
that
needs
to
be
done
hourly
or
maybe
it's
much
slower.
Maybe
we
just
need
something
quarterly,
but
having
those
conversations
early
on
can
really
impact
how
you
go
about
implementing
things.
A
Then
we
talked
about
writing
a
validation
job
where
each
operation
is
a
check
for
the
data
quality
as
a
person
that
makes
data
and
using
the
asset
observation
type
to
kind
of
link
that
metadata
from
this
validation
run
into
your
into
the
asset,
catalog
and
then
finally,
using
the
the
same
validation
pipeline,
but
run
inside
a
unit
test
context
to
service
your
acceptance
test,
while
you're
implementing
the
features
all
right
phase.
Two.
A
Now
that
you
have
you
know,
set
the
stage
now,
it's
time
to
get
on
to
the
bit
that
used
to
be
step,
one
which
is
actually
developing
the
logic
to
put
the
to
put
the
asset
in
in
the
right
place.
So
what
I
think
we
should
be
doing
under
this
paradigm
is
to
first
deploy
our
monitoring
and
see
that
it's
failing
and
failing
in
a
way
that
that
we
expect
you
know,
are
the
error
messages
we
see
in
the
monitoring
helpful.
A
A
Maybe
we
don't
have
a
fully
automated
process
in
the
beginning,
because
we're
still
learning
about
the
problem
that
we
want
to
automate,
you
can't
automate
something
you
you
don't
understand
and
you
definitely
don't
want
to
be
doing
things
like
worrying
about
performance,
optimizations
or
any
of
those.
Those
are
you
know
things
once
this
asset
is
used
and
important.
That's
the
time
to
invest
in
that,
not
at
this
early
stage
and
then
finally,
we
can
use
the
monitoring-
and
you
know
the
associated
acceptance
test
to
tell
us
when
on
implementation
is
good
enough.
A
Looking
at
the
time
I'm
going
to
skip
over
the
the
phase
2
demo,
but
let
me
just
show
you
sort
of
by
by
the
recap,
so
you
would
have
deployed
your
monitoring
and
it
would
have
failed
for
some
amount
of
time
and
then,
finally,
once
you
deployed
the
your
implementation
that
worked
now,
you
would
have
started
to
have
your
monitoring
started,
starting
to
succeed.
A
Your
mod,
your
your
first
implementation,
can
be
really
simple,
perhaps
rather
than
doing
a
whole
lot
of
logic
to
compute
the
daily
information
you
just
do
that
manually
and
stick
it
in
a
csv
or
a
spreadsheet
and
have
that
be
imported.
A
And
then
phase
three
is
to
is
to
iterate
on
your
solution
and
to
prioritize
where
you,
where
you
focus
based
on
the
feedback
that
you're
getting
from
your
stakeholders,
are
they
asking
for
more
data?
Are
they
asking
for
a
different
schema?
Is
the
update
schedule?
Maybe
they
want
to
change
that
or
your
your
automated
asset
monitoring?
A
A
A
Some
further
ideas,
you
could
add
notifications
based
on
the
monitoring
job
to
let
you
know
about
problems
before
your
stakeholders
notice
them.
There's.
No
reason
that
your
monitoring
job
should
only
be
limited
to
monitoring
your
assets.
You
know,
as
we
saw
the
monitoring
operations
are
just
a
python
function.
You
can
do
whatever
you
like
in
them.
Maybe
there's
a
set
of
upstream
assets
that
you
depend
on
that
would
be
helpful
to
to
monitor,
or
maybe
you
want
to
monitor
something
downstream.
A
Is
it
getting
there
without
being
corrupted,
might
be
something
that
you
could
monitor
as
well
and
then,
finally,
you
can
start
to
use
the
monitoring
as
the
beginnings
of
some
kind
of
data
service
level
agreement
with
your
stakeholders,
you
might
say
something
like
a
particular
data
asset
will
be
stale
no
more
than
10
days
per
quarter,
for
example,
and
you
have
a
nice
measuring
system
because
you
know
the
days
when
your
monitors
failed
are
the
days
when
that
asset
was
stale,
so
to
sort
of
visualize
that
we've
got
here's
our
here's.
A
Our
monitoring
system
they're
running
we're
running
it
on
a
schedule,
and
I
think
this
is
quite
subtle,
but
I
think
important.
You
want
to
pick
a
schedule,
that's
relevant
to
your
stakeholders,
which
doesn't
necessarily
have
to
be
the
same
schedule
as
the
automation
that
creates
the
asset
runs
on.
A
A
All
right
so
recapping,
so
we
talked
about
this
whole
idea
of
monitoring
driven
development,
is,
is
part
of
this
paradigm
shift
that
we're
seeing
and
that
daxter
is
helping
to
drive
this
move
away
from
pipelines
and
towards
assets,
and
the
approach
that
you
follow.
Is
you
start
by
publishing
information
about
the
asset
upfront?
So
you
can
get
stakeholder
feedback,
nice
and
early?
You
can
try
drive
that
in
engagement
and
you
can
you
can
make
changes
when
it's
cheapest
to
make
changes.
A
A
You
know
we're
asking
our
stakeholders
to
place
trust
in
our
data
assets,
so
we
should
have
a
mechanism
of
verifying
why
they
should
give
us
that
trust,
and
it's
only
once
you've
done
steps,
one
and
two
that
you
get
to
what
used
to
be
step,
one
which
is
to
actually
implement
the
asset
creation
logic.
A
But
you've
got
a
nice
set
of
of
guides
to
tell
you
what
you're
aiming
for
and
when
you're
there
when
those
those
when
the
monitoring
job
starts,
starts
to
pass
and
then,
instead
of
finishing
and
moving
on
to
the
next
project.
Now
we
iterate
to
try
and
make
this
asset
better.
A
Where
you
know
based
on
feedback,
we
add
more
data,
or
maybe
we
improve
the
quality
or
the
reliability
of
something
and
success
in
this
new
paradigm
looks
like
a
a
team
that
has
a
a
bunch
of
long-lived
data
assets
with
a
well-known
quality,
some
kind
of
sli
driven
by
our
monitoring.
And
then
we
look
we're
thinking
about
selling
the
same
assets
to
additional
stakeholders
rather
than
being
on
the
perpetual
treadmill
of
just
doing
the
next
project.
A
And
that's
that's
it
I'm
not
sure
about
timing.
If
there
are
questions,
please
just
add
them
to
the
the
zoom
chat
and
I'll
answer
them
and
I'll
stick
around
at
the
end.
To
answer
questions
as
well,
thank
you,
david.
That
was
fantastic.