►
From YouTube: Troubleshooting your Data Workflows: a live debugging session using Noteable and Dagster
Description
Data engineers waste a lot of time troubleshooting long-running pipelines and know only too well the frustration of minor errors consuming hours of work. In this practical tutorial, we will demonstrate an innovative solution for dramatically shortening testing cycles and reducing the number of reruns required, boosting developer/practitioner productivity, and reducing frustration on the team.
Join Noteable's CTO & Co-Founder Matthew Seal and Elementl's Jamie DeMaria for this virtual event.
A
Weird,
so
one
of
the
things
we
wanted
to
intro
here
was
the
troubleshooting
data
workflows,
the
notable
and
dagster,
and
we
really
want
to
walk
through
what
you
know
kind
of
a
live
example
and
be
able
to
guide
some
folks
in
how
this
can
be
achieved
and
follow
along
on
some
of
the
blog
posts
and
the
content
here
that
we're
walking
through
so
from
the
beginning.
I
want
to
talk
a
little
bit
about
the
pain
points
you
can.
A
You
can
find
in
doing
for
the
data
pipeline
work
and
one
of
the
really
big
pain
points
we're
doing,
ETL
and
especially
scheduled
ETL
is
when
you
have
errors
to
troubleshoot
this
feedback
loop
between
the
air
and
the
data
engineer
can
get
very
slow
and
long
like
you
could
have
things
like
that
data
pipeline
step
in
the
middle
might
take
hours
or
even
days,
to
fully
execute.
A
So
when
you
get
errors,
you
oftentimes
I
have
to
get
creative
about
how
you
reproduce
the
minimum
copy
of
what
happened
in
order
to
fix
the
problem.
To
kind
of
talking
to
that,
you
know,
this
creates
really
fragile
change
of
ownership.
That's
owned
by
multiple
teams.
So
let's
take
this
as
a
real
example
of
something
you
know:
I
debugged
in
the
past,
for
people
and
helped
them
walk
through.
It
involve
like
three
different
teams,
which
was
you
have
a
fragile
change
tools
that
are
owned
by
many
teams.
A
You
have
something
goes
wrong
in
your
ETL
to
track
it
down.
Well,
the
problem
is,
you
know
your
Tableau
report.
Isn't
refreshing?
Okay,
you
go
back
and
look
at
my
SQL
extract
query.
You
look
at
that
and
that
might
be.
A
You
know
something
where
okay
where's,
that
coming
from
and
you
look
and
well
it's
actually
just
pulling
from
a
copy
of
data.
It
came
from
Druid,
he
goes
to
Druid
and
Druid
will
just
pulled.
You
know.
Ultimately,
it
was
sourced
from
data,
that's
populated
by
spark
and
you
keep
going
back
and
back
and
back
all
the
way.
Back
to
the
event.
A
A
So
part
of
this
is
to
kind
of
walk
through
when
you
use
notebooks,
you
can
actually
capture
the
intent
of
what
was
executed.
You
can
have
you
know
the
collection
of
queries,
and
on
top
of
that,
when
we
go
through
the
demo
here,
we'll
show
you
once
you
combine
notable
and
you
can
actually
land
in
a
live
session.
That
has
the
real
context
in
memory
for
you
to
play
with
and
manipulate,
and
this
gives
a
lot
better
visibility
into
that
tool
chain.
A
You're
using
it
allows
you
to
share
with
your
your
constituents
that
are
using
that
content
to
manipulate
and
edit
it
to
their
own
needs,
and
it
reduces
the
friction
for
for
having
to
have
lots
of
data
engineers
in
the
middle
for
many
types
of
data
recovery
and
problems
and
we'll
kind
of
walk
through
a
few
examples
here
as
we
go
and
what
we're
using
today,
just
to
kind
of
outline
we're
using
git
pod.
Otherwise,
you
can
use
a
local
virtual
environment
if
you're
comfortable
with
that
pod's
a
great
place
to
start
for.
A
If
you
want
to
get
a
clean
slate,
that's
going
to
be
consistent,
we're
using
dagster,
which
is
the
the
primary
product
here
built
by
Elemental
and
then
we're
using
some
other
open
source,
libraries,
Paper
Mill
and
then
some
extensions
that
talk.
Allow
paper
mill
to
talk
to
to
notable
paper
mill
is
the
Headless
notebook
executor
that
runs
in
Python,
so
sway
the
Run
notebooks
programmatically,
and
we
have
a
nice
little
plugin
that
just
slots
right
into
that
technology
called
origami.
A
So
when
we
shorten
that
feedback
loop
of
errors
that
that
you
can
kind
of
run
into,
one
of
the
things
that
you
oftentimes
need
to
do
is
go
inspect
the
most
recent
like
what
do
you
compiled
and
what
are
you
executing
at
the
end
of
a
particular
pipeline
and
in
that
the
the?
What
oftentimes?
What
happens
is
you'll
get
an
error
and
air
most
time
is
a
trivial
layer.
A
It's
in
something
like
a
column,
got
renamed
or
there's
a
new
row
in
the
data,
that's
causing
a
problem
or
something
along
those
lines
and
to
be
able
to
live
troubleshoot
this
on
the
actual
data
that
you
had
pulled
in
and
collected
locally,
in
order
to
evaluate,
what's
different
with,
if
the
air
Message
doesn't
tell
you
immediately,
it
can
be
really
valuable
and
really
shortening
this
feedback.
Loop
saves
you
a
ton
of
time,
saves
you
a
ton
of
resources
over
time.
A
We
could
say
this
like
with
confidence
that,
in
places
where
we
built
internal
tools
for
doing
these
types
of
things,
not
even
to
this
extent
of
even
just
having
visibility
of
the
rerun,
let
alone
have
a
live
session.
It
saved
a
ton
of
time,
so
let's
talk
a
little
bit
about
how
we're
going
to
achieve
this
and
what
the
relationship
is
between
these
tools.
We
talked
about
paper
mill,
I
talked
about
dagster
and
we've
talked
a
little
bit
about,
haven't
talked
a
whole
bunch
about
notable
yet
so
for
dagster.
A
Do
you
might
be
familiar
with
that?
As
the
is
the
scheduling
dag?
It
has
a
concept
of
asset
resolution
where
it's
going
to
go,
find
and
build
you
assets
based
on
definitions,
you've
provided
in
order
to
accumulate
workflow
notable,
is
a
notebook
platform
that
really
provides
a
great
enhanced
experience
over
what
you
typically
get
in
open
source
Jupiter,
it's
based
on
Jupiter
under
the
hood
and
it
it
runs
through
and
does
a
bunch
of
quality
of
life
improvements,
as
well
as
a
lot
of
features
that
are
well
and
Beyond.
A
The
scope
of
a
notebook
into
the
scope
of
data
engineering
needs
and
data
analyst
analysts
sorry
analytics
needs
on
the
the
UI
side.
It's
a
really
great
platform
for
turning
your
exploratory
work
into
production,
work
and
really
kind
of
working
through
a
lot
of
the
the
issues
that
you
would
have
in
clearly
running.
A
The
open
source
offerings
in
Notebook
space
and
here
paper
mill
in
the
middle
is
going
to
be
doing
the
translation
of
the
dagster
parameters
that
it's
providing
it's
going
to
be
putting
those
against
the
notebook
version
and
giving
you
an
immutable
copy
of
a
notebook
run.
That
does
the
materialization
results
of
the
asset
asset
and
then
and
failure
will
have
you
know
a
window
of
time
to
go
log
in
and
play
with
that
before
it
automatically
shut
down
the
Live
contacts
that
has
the
air.
A
So
the
workshop
here
we're
going
to
productionize
the
Jupiter
notebook.
The
analyze
is
Irish
day,
so
we're
going
to
start
with
just
the
basic
example,
use
Jupiter
notebook
and
see
how
that
would
work
in
Dexter
we're
gonna
do
some
interest
of
the
data
pipelines
that
are
used,
particular
was
task
posts,
data
access
asset
focused
and
then
we're
going
to
do:
data
pipelines
and
dagster
and
the
apis
and
Concepts
around
there.
B
Great
so
yeah,
like
Matt,
said
we're
just
gonna
start
off
talking
kind
of
a
bit
of
an
overview
of
what
data
pipelines
are
and
we'll
go
through
kind
of
like
the
process
of
Designing.
The
data
pipeline
that
we're
eventually
going
to
be
like
implementing
in
the
morally
Hands-On
Workshop
portion
of
this.
So
just
like
starting
off
in,
like
just
kind
of
like
a
broad
sense.
B
B
So
a
really
classic
example
of
a
data
pipeline
is
the
ETL
Pipeline
and
the
like
first
step
of
this
pipeline
is
that
we're
going
to
make
fetch
data
from
an
external
source,
and
then
we
might
need
to
do
some
transformation
on
the
data
to
clean
it
up
or
join
data
sets
together
or
like
whatever.
We
may
need
to
do
to
get
our
data
into
kind
of
like
a
usable
form,
and
then
we
need
to
store
that
data
in
a
data
warehouse.
B
So
today,
like
the
workshop.
Basically,
what
we're
going
to
be
doing
is
we're
going
to
be
analyzing
sort
of
the
canonical
Iris
data
set
within
a
jupyter
notebook,
and
so
we
want
to
make
the
process
of
doing
that
analysis
and
running
the
Jupiter
notebook
part
of
our
data
pipeline.
So
we're
going
to
start
by
sort
of
like
in
this
design
process
like
replacing
the
steps
of
this
template.
Etl
Pipeline,
with
the
steps
we'll
need
to
complete
to
kind
of
do
our
Iris
analysis
and
make
that
a
self-contained
like
data
pipeline.
B
So
the
first
thing
we'll
need
to
do
is
actually
like
fetch
the
iris
data
set,
and
we
may
do
that
with
a
tool
like
air
byte
that
allows
you
to
kind
of
ingest
data
easily.
Without
writing
a
bunch
of
custom,
API
API
calls
and
we're
going
to
kind
of
take
a
little
bit
of
Liberties
here,
because
the
canonical
Iris
data
set,
like
doesn't
really
change
but
to
make
our
sort
of
example
pipeline
feel
more
like
a
real
world
example.
B
We're
just
going
to
kind
of
pretend
for
a
little
bit
that
there's
like
a
group
of
scientists
that
are
consistently
publishing
new
data
about
different
species
of
flowers
to
a
public
database,
and
we
kind
of
always
want
to
be
doing
our
Iris
analysis
on
the
latest
data.
So,
at
the
start
of
our
pipeline,
we're
going
to
be
like
refetching
the
data
from
this
database
and
then
once
we
have
the
data
we
want
to
like
kind
of
transform
it
or
clean
it
up.
B
All
of
this,
like
data
that
we've
received
from
the
flowers
database
into
the
data
we
want
to
analyze
and
for
this
step
we
might
use
some
kind
of
specific
data
transformation
tool
like
DBT
and
the
last
step
stays
the
same.
We
load
our
data
into
a
data
warehouse
and,
let's
say
we're
using
snowflake.
B
So
the
next
thing
we
need
to
do
for
this
is
actually
add
our
Jupiter
notebook
into
this
pipeline.
So,
let's
add
a
step
at
the
very
end
that
fetches
the
data
from
Snowflake
and
executes
the
Jupiter
notebook,
and
this
Jupiter
notebook
is
going
to
actually
do
our
analysis
of
the
iris
data
so
through
this
kind
of
like
initial
design
of
our
data
pipeline
I've
kind
of
mentioned,
a
couple
of
different
tools
like
air
by
NBT,
and
if
you
aren't
like
familiar
with
some
or
all
of
them-
that's
completely
fine.
B
B
B
B
So
let's
take
a
step
back
and
think
about
what
might
happen
once
this
pipeline
is
up
and
running
so
pretty
soon.
We
might
get
a
request
to
analyze.
Daffodils
too
so
we'll
add
some
more
code
to
the
step
that
fetches
data
from
the
flower
database
to
also
get
the
daffodil
data
set
and
we'll
update
the
remaining
tasks
to
handle
this
data
as
well
and
we'll
end
up
with
a
new
table
in
our
snowflake
database.
B
B
So
our
task
to
fetch
the
flower
data
from
the
flower
database
produces
a
table
of
data
for
each
species
of
flower
and
the
data.
We're
storing
in
Snowflake
is
exactly
the
transform
data
from
our
DDT
tasks
and,
finally,
the
step
to
execute
the
Jupiter.
Notebook
produces
an
executed
notebook
file
and
that
executed
notebook
file
is
actually
what
we're
going
to
be
like
sharing
around
with
our
co-workers.
B
So
these
data
sets
and
the
jupyter
notebook
in
these
green
boxes
are
kind
of
what
we
actually
care
about
in
this
data
Pipeline,
and
these
are
the
data
assets.
So,
if
you'll
remember
from
earlier,
a
data
asset
can
be
any
deliverable
from
a
data
pipeline.
For
example,
in
this
case,
we've
got
tables
and
we've
got
a
Jupiter
notebook.
B
It
doesn't
necessarily
have
to
be
what's
happening
at
the
very
end
of
your
data
pipeline
here
we're
producing
data
assets
at
every
step
in
our
data
pipeline,
so
now
that
we've
kind
of
uncovered
these
assets
that
are
in
this
pipeline,
let's
try
kind
of
constructing
the
same
pipeline,
but
with
our
focus
on
the
data
assets
rather
than
on
the
tasks.
B
So
the
first
thing
we're
going
to
need
to
do
is
figure
out
like
what
objects
we'll
be
modeling
in
the
pipeline.
So,
let's
start
by
just
like
focusing
on
our
Legends,
so
we'll
need
a
way
to
represent
our
external
Source
data,
which
in
this
case
is
the
flowers
database
and
we'll
need
a
way
to
represent
the
data
assets
we'll
be
creating
in
our
pipeline
and
the
connections
or
edges
in
our
pipeline.
B
That
will
connect
assets
and
it'll
connect
assets
that
have
data
dependencies,
and
this
means
that
the
asset
at
the
end
of
The
Edge
requires
data
from
the
asset
at
the
beginning
of
the
edge
and
finally,
it'll
still
be
useful
to
understand
how
each
asset
is
created.
So
we'll
also
document
the
operation
required
to
create
the
asset
along
the
edge
connecting
the
assets.
B
B
So,
overall,
this
pipeline
looks
larger,
but
it's
actually
doing
the
exact
same
thing
as
the
test.
Focus
data
pipeline
this
one's
just
like
a
lot
more
descriptive
and
aware
of
the
data
that
it's
processing.
We
can
look
at
this
and
at
a
glance
we
know
exactly
what
data
is
available
to
work
with
and
how
those
data
sets
relate
to
each
other.
B
So
before
we
move
on,
let's
just
do
like
a
quick
visual
comparison
of
these
two
data
pipelines.
We've
designed
so
there's
like
a
lot
going
on
on
this
slide,
but
the
main
point
is
to
demonstrate
that
when
we
model
our
data
pipeline
focused
on
the
tasks
we
want
to
execute,
we
get
a
data
pipeline
that
looks
dramatically
different
from
one
where
we
focus
on
the
assets
the
pipeline
should
produce,
and
there
are
some
situations
where
a
task
focused
data
pipeline
is
the
correct
approach.
B
But
at
Dexter
we
tend
to
believe
that
the
majority
of
data
pipelines
should
be
asset
focused
again.
Kind
of
the
point
of
a
data
pipeline
is
to
produce
data
assets
that
you
use
at
your
organization,
so
it
kind
of
makes
sense
that
it
would
be
helpful
to
have
those
data
assets
be
the
primary
thing
you
model
when
you
think
about
your
data
pipeline
and
by
modeling
your
data
Pipeline
with
tasks
you
actually
kind
of
end
up
increasing
your
cognitive
burden,
because
you
have
to
sort
of
understand.
A
B
You
may
have
to
relearn
what
the
data
pipeline
does,
and
that
increases
the
amount
of
time
you
have
to
spend
kind
of
getting
your
job
done.
Conversely,
when
you
have
your
data
pipeline
modeled
with
assets
as
the
primary
object,
you
get
kind
of
a
media
asset
insight
into
all
the
data
you
have
available,
and
it
can
be
much
easier
to
figure
out
like
where
your
new
task
fits
into
this
pipeline.
B
So
how
does
extra
fit
into
all
of
this
so
Dexter
provides
your
framework
to
build
and
execute
asset
focused
data
Pipelines
and
Dexter
also
has
support
for
task
Focus
data
pipelines
since,
like
I
mentioned
there
are
cases
where
that
kind
of
makes
the
most
sense.
But
again
we
think
most
data
pipelines
should
focus
on
assets.
So
that's
what
we're
going
to
be
doing
today,
so
using
Dexter's
apis,
you
can
create
these
graphs
of
assets
that
span
across
the
different
Technologies
in
your
data
platform.
B
A
B
So
in
the
workshop
today,
I
will
be
writing
a
data
pipeline
that
will
result
in
a
Jupiter
notebook.
So
let's
go
over
some
of
the
main
diagster
Concepts
we'll
be
working
with
to
build
our
data
pipeline
and
the
most
important
one
is
the
software
defined
asset.
So
indexter
a
software
defined
asset
is
a
software
Declaration
of
a
data
asset.
You
expect
to
exist.
So
it's
a
way
to
write
in
software
that
you
expect
a
data
asset
like
an
ml
model
or
table
in
a
database
or
a
Jupiter
notebook
to
exist.
B
B
B
So
then
diester
is
going
to
execute
the
iris
data
function
and
there's
a
little
magic
going
on
here.
So
Dexter
knows
that
the
raw
Iris
data
parameter
passed
into
Iris
data
corresponds
to
the
output
of
the
raw
Iris
data
asset.
So
Dexter
is
going
to
load
the
raw
Iris
data
asset
from
storage
and
provide
it
as
input
to
the
iris
data
function.
And
then
we
execute
this
function
and
add
these
column
names
and
return.
The
result
so
again,
dagster
will
store
that
to
a
persistent
location
and
the
persistent
location
is
actually
important
here.
B
So
the
next
time
we
want
to
materialize
these
two
assets,
the
new
return
values
will
be
written
to
the
same
location
as
the
previous
values,
and
this
means
that
if
we
want
to
manually
look
at
the
most
recent
data,
we
know
the
exact
location
to
look
at.
There
is
no
more
looking
at
a
bucket
of
data
files
and
like
wondering
which
one
has
the
most
recent
data
and
the
persistent
storage
location
also
allows
us
to
materialize
an
asset
without
necessarily
materializing
the
Upstream
assets.
B
B
So
Dexter
also
has
built-in
support
for
working
with
notebooks
with
the
dagstromo
library
and
Dexter.
Mill
is
just
a
thin
wrapper
around
paper
mill
that
allows
Jupiter
notebooks
to
be
directly
executed
from
text
error
pipelines.
So,
instead
of
kind
of
copying
the
development
work
you
do
in
a
jupyter
notebook
into
a
python
function.
Diagster
can
just
run
the
jupyter
notebook
as
part
of
your
data
pipeline,
and
you
don't
need
to.
You
know,
translate
that
notebook
into
some
other
format
and
potentially
like
lose
readability
and.
B
So,
let's
move
on
to
the
like
Workshop
portion
of
this
so,
like
I
mentioned
we're
going
to
be
productionizing,
a
notebook
that
analyzes
the
iris
data
set
and,
more
specifically,
we're
going
to
be
writing
like
a
slimmed
down
version
of
the
data
pipeline.
We
just
designed
so
we'll
create
two
assets,
one
for
the
iris
data
set
and
another
for
the
Jupiter
notebook
and
the
jupyter
notebook
will
use
the
iris
data
set
asset
as
its
input
data
and
we'll
be
working
directly
with
the
canonical
Iris
data
set.
B
So
we
don't
need
to
do
any
data
transformation
stuff
in
our
Pipeline
and
additionally,
we
won't
be
using
any
external
tools
to
fetch
the
data
set,
we'll
just
be
using
Dexter
and
the
dextermal
library
to
execute
our
jupyter
notebook.
Then,
at
the
end
of
the
workshop,
we're
going
to
also
start
executing
a
notable
notebook
and
we'll
explore
some
of
the
features
there,
as
well,
so
for
each
step
in
the
workshop.
Basically
I'll
talk
through
what
we're
going
to
do.
I
will
demo
it
on
my
computer
and
then
I'll
go
back
to
a
slide.
B
That
kind
of
lists.
The
task
to
complete,
we'll
wait
a
couple
of
minutes
for
each
step
so
that
everyone
has
enough
time
to
complete
it.
And
if
you
run
into
any
issues,
you
can
just
put
it
in
the
zoom
chat
and
a
self
or
Matt
we'll
try
and
help
you
out,
I.
Think
since
we're
in
Zoom.
What
might
be
kind
of
helpful
is
like
once
you're
done
with
the
task.
B
If
you
could
use
like
a
little
like
race,
hand,
feature
or
one
of
the
reactions
just
so
I
can
kind
of
get
a
good
idea
of
like
who's
done,
so
we
can
move
on
cool
and
then,
during
the
first
step
of
the
workshop,
we'll
download
the
code
we'll
be
using,
and
that
will
also
have
a
readme
in
it
that
contains
all
of
the
steps
for
the
workshop,
and
you
also
get
a
fully
completed
version
of
the
project
as
well.
B
B
So
the
first
thing
we'll
need
to
do
is
actually
just
get
our
environment
set
up,
so
I'm
going
to
be
using
a
tool
called
gitpod
that
creates
a
fresh
python
environment
that
I
can
use
directly
in
my
browser.
So
if
you're
up
to
try
that
out,
I
would
definitely
recommend
giving
it
a
try.
I'll
kind
of
help
us
all
be
using
the
same
setup
and
that'll
make
any
issues
we
may
run
into
easier
to
fix
since
we'll
be
kind
of
working
from
a
consistent
place.
B
You'll
be
sort
of
dropped
into
something
that
looks
a
lot
like
a
vs
code.
Editor
like
this.
So
once
you're
here
give
me
like
a
little
react
and
then
we
can
move
on
cool,
okay.
It
looks
like
we're
looking
good,
so
the
next
thing
we'll
need
to
do
is
like
install
the
example
code
we'll
be
working
with
and
all
the
required
dependencies.
So
I
have
all
of
these
commands
here
in
this
text
file.
B
If
you
want
to
start
executing
them
along
with
me,
but
I'll
also
go
back
to
that
slide
once
I'm
done
so
we'll
just
start
by
just
like
upgrading
pip,
just
to
make
sure
everything,
I
don't
know,
goes
a
little
bit
more
smoothly
and
we'll
need
to
First
install
Dexter
and
once
we've
installed
Dexter
we'll
actually
have
access
to
a
CLI
tool.
We
have
that
will
allow
you
to
like
download
custom
and
like
fully
supported
dagster
example
projects.
B
So
that's
what
we're
going
to
be
doing
here.
Let
me
actually
make
this
a
little
bigger,
so
you
can
see
the
full
command
we're
going
to
be
downloading
an
example
called
tutorial
notebook
assets,
and
then
we
can
do
that.
Might
be
helpful
if
I
throw
these
in
the
chat
yeah.
Let
me
do
that.
B
Great
okay,
so
once
we
have
the
example
code
downloaded,
we
can
just
move
into
this
new
folder,
that's
been
created
and
then
there's
a
setup.pi
file
in
there
that
we
can
use
to
install
all
the
required
dependencies.
B
I'm
doing
this
pip
install
and
it
takes
like
30
seconds
or
so
so
over
here
you
guys
can
get
started
and
then
we
can
move
on
all
right
great.
So
let's
just
take
a
minute
I'm
going
to
kind
of
walk
you
through
sort
of.
What's
in
this
project
you
downloaded
and
we
can
pull
up
kind
of
the
files
that
we'll
be
working
with
so
in.
Let
me
make
this
a
little
bigger,
so
you
can
see
the
file
name's
a
little
easier.
So
in
the
stacks
here,
not
a
little
demo
folder.
B
We
have
kind
of
two
subfolders,
there's
tutorial
finished
that
contains
a
fully
completed
version
of
the
workshop
today.
So
you
can
use
that
to
like
get
a
sneak
peek
into
what
we'll
be
doing
or
just
kind
of
see.
The
final
version,
but
where
we'll
be
working
is
in
this
tutorial,
template
folder.
B
So
in
here
we've
got
a
couple.
Other
subfolders,
the
ones
that
are
important,
are
in
the
notebook
subfolder,
which
is
where
our
jupyter
notebook
that
does
the
analysis
of
our
Iris
data
set.
Is
we'll
go
through
this
in
just
a
minute,
but
you
can
open
that
up
in
your
text
editor.
You
should
see
this
comment
at
the
top
saying
that
we're
filling
it
out
as
part
of
the
pi
data
workshop
and
that,
if
you
see
that
comment
there
there
you
know
you're
in
the
right
folder.
B
B
Most
of
what
we'll
actually
be
doing
is
like
uncommenting
code
blocks,
to
help
kind
of
keep
this
Workshop
mostly
bug
free,
but
I'll
be
going
through
exactly
what
we're
doing
at
every
step,
so
that
you
can
understand
like
what
all
the
code
is
doing
so
I
recommend
just
having
both
of
these
files
like
open
in
your
text,
editor
so
that
they're
easy
to
get
to.
B
B
This
cell
here
we'll
get
to
it
later
in
the
workshop,
but
then
what
we're
going
to
do
is
actually
let
me
start
running.
This
is
get
into
some
sort
of
descriptive
analysis
of
our
data,
so
we'll
just
start
kind
of
exploring
our
data
set
understanding.
What's
there
getting
an
idea
of
kind
of
like
what
the
data
looks
like
so
I
didn't
start
at
the
actual
top
cell.
Okay.
Now
it
should
work
all
right
here
we
go
so
we've
got
we're
kind
of
looking
at
our
data,
we're
going
to
make
this
plot
here.
B
That
gives
us
an
idea
of
like
what
our
data
looks
like
and
how
the
different
axes
of
the
data
kind
of
compare
to
each
other.
B
And
then
we'll
get
into
our
actual
k-means
analysis,
so
we'll
run
our
clustering
algorithm
and
then
we're
going
to
do
some
more
plotting
so
that
we
can
understand
like
how
our
clustering
did.
If
we
scroll
down
to
the
very
bottom,
we'll
see
a
plot
with
our
results,
and
we
can
see
that,
like
one
of
our
species
of
Iris,
data
is
very
easily
distinguishable
from
the
other
two,
but
the
other
two
are
still
a
little
mixed
up,
which
means
we
might
need
to
do
some
more
like
complicated
analysis
to
separate
them.
B
B
So
the
first
thing
we're
going
to
do
is
actually
scroll
down
a
little
bit
and
we're
going
to
uncomment
this
code
block
under
to
do
one,
and
when
you
do
that
this
one
line
is
still
going
to
stay
commented,
and
that
is
fine.
We're
going
to
get
to
it
in
a
minute,
so
let's
walk
through
kind
of
what
this
code
is
doing.
So
we
want
to
start
by
making
an
asset
for
the
jupyter
notebook
we
just
looked
at
and
in
the
kind
of
dagster
API.
We
went
over
in
the
presentation.
B
Doing
that
might
look
something
like
this:
we
would
have
our
asset
decorator,
the
name
of
our
asset,
the
code
to
execute
our
notebook
and
then
maybe
we
would
return
our
executed
notebook,
but
you'll
notice.
I
just
have
this
comment
here
code
to
execute
the
notebook
and
that's
because
it's
actually
quite
complex
to
to
execute
a
notebook,
and
so
the
diagram,
Library
kind
of
helps
abstract
away
that
complexity,
and
it
just
gives
you
a
helper
function
that
will
just
return
this
whole
asset
for
you.
B
So,
instead
of
having
to
write
this
out
yourself,
you
just
get
to
call
this
helper
function
and
it
does
all
this
work
for
you.
We
don't
need
that.
So
let's
look
at
kind
of
what
we're
providing
to
the
helper
function.
So
we
have
our
defined
diagonal
asset
function.
We
give
it
the
name.
We
want
our
asset
to
have
and
then
we
give
it
the
path
to
our
notebook
file
and
then
the
last
thing
we're
doing
in
this
case
is
giving
it
a
group
name.
This
is
sort
of
an
optional
parameter.
B
B
So
I
will
just
like
again
wait
here
for
like
a
minute
or
so,
and
then
we
will
move
on
so
now
we
can
move
on
to
actually
materializing
our
asset
in
dag
it,
so
that
can
get
pod.
We
need
to
actually
start
a
running,
dag
it
which
we'll
do
in
the
terminal.
B
B
A
B
Really
quick
tour
before
we
get
to
materializing
our
asset,
so
this
first
page
we're
on
is
sort
of
like
your
home
page
timeline
view
and
it'll
give
you
like
an
overview
of
all
of
the
like
recently
run,
data
pipelines
or
assets
that
Dexter
is
executing.
We
haven't
run
anything
yet
so
this
is
blank,
so
we'll
go
up
here
to
this
left
hamburger
menu
in
the
top
and
we'll
see
sort
of
our
two
different
like
repositories
or
Dexter
projects
that
we're
running.
B
So
we
have
one
for
the
finished
version
of
the
project,
and
then
we
have
our
our
template
project,
which
is
where
we're
working
right
now.
So,
let's
open
up
that
one
we'll
have
you'll
see
this
like
Ping
notable
job.
This
is
here
to
help
us
like
test
our
connection
to
notable
later
in
the
workshop.
If
you
need
it
and
then
here
in
this
asset
groups,
you'll
see
a
template
tutorial
asset
group,
and
that
is
from
the
the
group
name
we
added
to
our
asset
earlier.
B
So
let's
click
on
this
asset
group
and
we
can
see
our
notebook
asset
right
here.
So
we
can
click
on
this
asset
and
this
right
side
panel
will
pop
open
with
some
more
information
about
the
asset.
We
got
our
description.
We
can
click
this
view,
Source
notebook
button
and
it'll
open
up
a
preview
of
our
notebook
and
we
can
see,
see
the
contents
and
then,
if
we
close
this,
we
can
click
the
materialize
button
and
this
will
actually
execute
our
notebook.
B
So
let's
go
ahead
and
click
that
and
then
this
view
windows
won't
pop
up
and
you
can
click
this
view
button
to
watch
the
asset
materialize.
If
you
kind
of
miss
this
button
at
first,
you
can
come
back
down
here
and
click
this
little
hash.
That
appears
on
the
asset
itself,
so
we'll
go
here.
B
I
took
a
little
bit
of
time,
so
I
missed
watching
it
actually
execute,
but
we
can
see
that
our
notebook
has
executed
successfully
and
then,
if
we
go
back
to
our
sort
of
main
asset,
page
and
click
on
this,
we'll
have
some
additional
metadata
about
the
asset
and
we
can
actually
click
this
to
see.
The
executed
version
of
The
Notebook.
B
So
I
will
give
you
guys
a
minute
to
kind
of
go
through
that
process
yourself,
and
then
we
can
move
on
to
kind
of
the
next
steps
in
making
our
data
pipeline
all
right.
So
we've
executed
our
notebook,
but
if
we
go
back
to
our
notebook
file,
we'll
see
that
you
know
the
logic
to
kind
of
fetch.
Our
data
is
still
in
this
notebook,
and
that
means
that
every
time
we
execute
this
notebook
we're
refetching
our
data
set,
and
that
may
be
like
a
really
costly
operation,
and
we
may
not
want
to
do
that.
B
Every
time
we
execute
the
notebook
and
so
we're
going
to
like
the
next
thing,
we're
going
to
do
is
actually
like
factor
out
this
data
fetching
into
its
own
asset,
and
we
want
to
do
that
for
a
couple
reasons
like
the
first
one
like
I
mentioned,
not
necessarily
having
to
refetch
the
data.
Every
time
we
execute
this
notebook
it'll
also
help
us
if
we
ever
want
to
add,
like
a
second
notebook,
also
analyzing
the
IRS
data
set.
B
So,
instead
of
having
to
copy
the
data
loading
logic
into
that
notebook
and
potentially
worry
about
them
getting
out
of
sync-
and
maybe
one
notebook
is
analyzing
a
slightly
different
data
set
than
the
other.
We
can
instead
just
have
one
asset
that
has
the
iris
data
set
and
both
notebooks
will
use
that
asset.
B
So
this
asset
looks
a
lot
more
like
ones
we
went
over
kind
of
in
the
presentation
portion.
You
know
we
have
our
asset
decorator.
We
gave
our
asset
the
name,
and
then
we
just
tell
it
what
we
want
it
to
do
again.
We
have
this
group
name
here
just
out
with
organization,
so
we
have
this
Iris
data
set
asset
now,
but
we
still
haven't
told
our
notebook
to
use
it
so
to
do
that,
we'll
actually
just
scroll
down
a
little
bit
and
we'll
uncomment
this
line.
That
has
the
to
do
three
comment
in
it.
B
So
I'll
uncomment
this.
So
this
will
help
tell
dagster
that
we
should
be
using
the
iris
data
set
asset
in
our
notebook,
and
we
do
this
by
specifying
kind
of
this
dictionary
or
a
parameter
called
ins.
So
if
you
will
recall
from
our
notebook,
we
are
storing
the
fetched
Iris
data
set
in
a
variable
called
Iris.
B
So
we
can
go
ahead
and
do
those
two
things,
and
then
we
will
do
like
one
more
change
in
our
notebook
file
and
then
we'll
be
ready
to
materialize.
These
assets
again
so
again,
wait
here
for
like
a
minute
or
two
cool
that
was
easy.
B
We've
got
our
Irish
data
set
being
fed
into
our
notebook
asset,
but
we
need
to
do
one
final,
like
small
change
in
our
notebook,
so
that
we
know
not
to
actually
execute
this
code.
So
we
could
just
kind
of
like
delete
this,
and
that
would
be
fine.
But
then,
if
we
like
maybe
wanted
to
like
Scandal
and
execute
this
notebook,
we
might
run
into
issues
and
we'd
have
to
like
copy
this
back
in.
B
So
if
we
just
cut
and
paste
this
block
of
code
into
the
cell,
that
says
it's
been
tagged
with
parameters
instead
of
executing
this
code,
it'll
actually
get
overwritten
and
we'll
be
pulling
in
the
value
from
the
Irish
data
asset.
B
So
the
reason
we're
like
cutting
and
pasting
code
here
is
that
in
gitpod
there's
not
really
like
a
good
way
to
add
parameters
or
add
tags
to
to
Jupiter
notebook
cells.
So
in
like
the
real
world,
if
you
were
doing
this,
you
would
be
either
like
in
your
Jupiter
kernel
or
in
like
your
vs
code
editor,
and
you
would
just
add
the
parameters.
Tag
to
the
cell,
where,
where
this
Iris
data
set
fetching
is,
is
happening,
but
instead
we've
provided
a
cell.
B
So
right,
so
you
can't
see
the
parameters
tag
in
the
cell,
but
I
promise
you
it
is
there
and
and
when
we
execute
the
notebook
like
it's
in
the
metadata
and
it'll
be
found,
but
you
just
can't
see
it
and
get
pod
so
we'll
be
here
again.
You
just
need
to
cut
the
this
like
block
of
code
and
move
it
into
the
cell.
B
That's
been
tagged
with
parameters,
okay,
so
now
we
can
go
back
to
daggit
and
we
can
click
this
reload
definitions
button,
and
this
is
going
to
just
basically
like
look
at
the
new
assets
we've
created
and
like
pull
them
in,
so
we
can
see
them
in
dag
it.
So
we
should
see
our
Iris
data
set
asset
now
and
then
that
is
Upstream
of
our
jupyter
notebook.
So
we're
going
to
materialize
these
assets,
but
like
real
quickly,
just
to
like,
hopefully
go
over.
B
What's
going
on
again,
what's
going
to
happen,
is
Dexter's
going
to
execute
this
Iris
data
set
assets
going
to
fetch
that
data
and
then
store
it
in
local
storage
and
then
it's
going
to
execute
this
notebook
asset
and
because
we've
moved,
we
have
this
new
parameter
cell.
It's
going
and
we've
set
up
the
the
the
kind
of
input
mapping
of
our
Upstream
asset
to
this
variable.
B
It's
going
to
inject
the
the
content
of
our
Iris
data
asset
as
the
IRS
parameter,
and
then
we'll
be
doing
our
analysis
on
that
on
that
asset.
So
we'll
click
the
materialize
all
button
and
then
again
we
can
click.
This
view
button
to
kind
of
watch.
This
all
execute
it's
a
little
crowded
on
my
screen,
because
it's
kind
of
tiny,
but
you
can
kind
of
watch
these
assets
execute
up
here.
B
Okay,
great,
so
that's
done
so
now
we
can
go
back
to
our
main
doctor.
Page
and
again
we
can
click
on
this
view,
notebook
button
to
see
our
executed
notebook
and
you
can
see
that,
like
we've
injected
the
cell,
that's
got
some
kind
of
funky
code
going
on
that's
like
pulling
in
the
asset
that
we
materialized
before
this
one.
B
So
give
everyone
just
like
a
minute
to
do
that,
and
then
we
can
move
on.
So
we
sort
of
completed
the
sort
of
like
indexer
and
Jupiter
a
portion
of
this
and
we're
actually
going
to
kind
of
move
on
to
start
looking
at
notable.
B
So
we're
going
to
go
through
kind
of
like
a
very
similar
process
as
what
we
just
did,
but
with
a
notable
notebook.
So
the
first
thing
we
actually
need
to
do
is
make
a
notable
account.
So
if
you'll
go,
you
can
go
to
notable.io
and
make
a
free
account
and
then
we
can
go
ahead
and,
like
start
creating
a
notebook
there
and
then
we'll
execute
that
indexer.
B
But
first
thing
to
do
is
we
can
just
make
an
account
and
then
we'll
go
through
the
rest
of
the
setup
after
that,
once
you're
in
notable
you'll,
see
kind
of
a
page
that
looks
like
this
you'll
be
in
like
your
space
and
then
once
you're,
there
you're
good,
and
then
we
will
move
on.
B
Okay,
so
the
next
thing
we'll
need
to
do
is
actually
upload
the
notebook
we
were
just
working
with
to
notable.
So
if
you're
in
gitpod,
you
can
download
The
Notebook
we've
been
working
with
over
in
the
left
sidebar
you
can
right
click,
the
name
of
the
notebook
and
then
there's
this
download
button
here
and
then
that
will
download
The
Notebook
and
if
you're
working
locally
in
your
own
virtual
environment,
you
should
just
have
it
on
your
computer.
B
B
And
then
we
can
and
drop
our
downloaded
notebook
into
this
little
upload
window,
and
then
it
should
be
uploaded.
Then
we
can
just
open
that
up
and
see
our
the
same
notebook
just
in
the
notable
UI.
B
B
Yeah
and
get
pod
yeah,
so
it's
in
the
like
file,
navigation,
the
you
know,
file
in
there,
that's
the
the
Jupiter
notebook
we've
been
working
with,
and
then
you
just
right
click
and
then
there's
the
download
here.
B
So
the
last
thing
we
need
to
do
is
actually
get
an
API
token
from
notable
so
again
to
do
that
from
within
notable
you
can
click
on
your
profile
and
go
to
settings
and
then
on
the
left
side
there
will
be
this
API
token
section
and
you
can
generate
a
new
API
token
and
copy
the
value
and
then
once
you
have
that
value
copied,
we
want
to
go
back
to
your
terminal.
B
B
And
then
once
you've
done,
that
you
can
just
like
restart
dag
it
with
the
same
command
as
before.
A
And
these
tokens
are
just
basically
like
any
other,
you
know
GitHub
token
or
other
service
token
you'd
have
so
you
can
delete
them
and
make
new
ones,
they're
sort
of
a
a
way
for
machine
to
act
as
you
on
behalf
in
the
notable
ecosystem.
While
that
token
is
valid
and
the
default
there.
If
you
see
they
set
up
for
you
know
one
year
tokens
you
have
to
worry
about
expiring
in
the
short
term,
you
can
just
kind
of
play
with
it.
B
Well,
okay,
yeah
I
had
to
go
through
and
set
up
my
token
as
well.
So
thanks
for
keeping
an
eye
on
that,
okay,
great
so
now
we
can
and
actually
make
an
asset
for
our
new
notable
notebook
so
back
and
get
pod
back
into
submit
file
where
we've
been
making
all
of
our
other
assets.
You
can
scroll
down
to
the
very
bottom
and
there's
this
two
five.
B
So
we
can
uncomment
this
code
block
and
this
is
doing
pretty
much
exactly
the
same
thing
as
our
defined
diagstormal
asset,
but
instead
we're
defining
a
notable
diagster
asset
again
we're
giving
a
name
to
the
asset
specifying
kind
of
our
input
mappings
and
then
giving
it
the
group
name.
The
main
difference
is
that,
instead
of
a
notebook
file
path,
we
need
to
give
it
a
notebook,
ID
and
we'll
go
over
getting
that
notebook
ID
in
the
next
steps.
You
can
just
leave
it
like
this
for
right
now,.
B
So
the
last
thing
we
need
to
do
for
running
this
asset
is
actually
get
that
notebook
ID
and
we
can
do
that
by
going
back
to
the
notebook
we
uploaded
and
just
getting
it
from
the
URL.
So
in
notable
just
open
up
the
notebook
we
uploaded
and
then
in
the
very
top.
B
You
should
have
sort
of
this
ID
kind
of
in
the
middle
of
the
URL,
and
you
can
just
copy
that
and
then
go
back
over
to
where
your
asset
is
defined,
and
we
will
just
replace
the
value
of
this
notebook
ID
variable
with
the
value
we
copied.
B
So
now
we
can
be
back
in
keypad.
I
did
not
restart,
so
you
also
did
not
restart
tag
it.
You
can
just
do
that
if
you're
gonna
get
pot
I
think
it'll
like
reopen
that
little
like
window.
If
you
want
to
open
it,
not
100
sure
how
gitpod
works,
but
it
might
be
starting
like
a
new,
like
port
forwarding
thing
each
time
so
we're
back
in
daggit,
you
can
go
back
to
our
asset
group
and
we'll
see
our
new
notable
asset
If.
You
restarted
daggit.
B
Before
we
uncommented
that
notable
asset,
you
may
need
to
click
this
reload
definitions
button
and
then
you
should
see
it
up
here.
So
we
have
our
notable
notebook.
It's
going
to
be
using
the
iris
data
set
as
input.
So
we
can
execute
this.
We
can
hold
down
on
the
shift
key
and
click
the
iris
data
set
and
the
notable
notebook-
and
this
will
materialize
both
of
those
assets
together,
so
we'll
go
ahead
and
click
materialize.
B
A
And
the
notebook
that's
running
in
this
type
of
parameters
run
it's
being
executed
by
the
way.
I
think
this
is
the
parameterized
in
the
input
rather
than
the
yeah.
Oh
okay
got
it
so
the
this
notebook
here
is,
you
know
it's
live
running
and
it's
you
can
have
multiple
people
sitting
and
watching.
At
the
same
time,
you
can
multiple
editors.
On
the
same
time,
if
you
want
to
pair
debugging
something
that
comes
up,
it's
an
ephemeral
copy
of
that
original
notebook.
You
had
that's
for
this
particular
Bagster
materialization
right.
B
Yeah
so
I
mean,
if
you
wanted
to
like
I,
don't
know
I
guess
like
test
that
out,
you
could
go
back
to
notable
and
open
up
the
the
notebook
you
uploaded
and
you'll
see
like
that's
just
sitting
here,
not
being
executed
where
it's
like
this
copy
of
the
notebook
is
being
executed.
B
Okay,
there
we
go
so
it's
run
successfully.
We
can
go
back
to
daggit,
we
can
click
on
it
and
in
the
right.
Sidebar
we'll
have
this
link
to
notable
that
shows
kind
of
the
that'll
like
reopen
that
ephemeral,
notebook
and
show
you
like
the
last
executed
version.
B
So
we
can
wait
there
for
a
minute
just
to
finish
like
executing.
B
Your
different
notebooks
and
then
we'll
jump
into
like
actually
doing
like
some
live,
debunking,
debugging
and
sort
of
like
what
it
looks
like
if
your
notebook
kind
of
fails
halfway
through
and
how
you
would
deal
with
that,
all
right,
so
yeah
into
live
debugging.
So
in
order
to
actually
have
something
to
debug,
we
need
to
like
introduce
an
error
into
our
notebook.
B
So
we
can
go
back
to
the
notebook
we
uploaded
you
shouldn't
like
I,
don't
know
this
is
the
one
that
was
executed.
We
can
execute
exit
out
of
that
and
go
back
to
kind
of
the
like
Source
version
of
our
notebook
that
we
originally
uploaded
to
notable
and
I'm
just
going
to
scroll
down
a
couple
of
cells,
maybe
after
this
Iris
dot,
head
cell
I'll,
add
a
new
cell
and
I'm
just
going
to
raise
an
exception.
A
B
This
process
would
be
like
with
like
a
more
like
challenging
bug
that
you
may
have
to
actually
like
dig
into
your
data,
to
figure
out.
Let's
just
throw
an
exception.
A
B
Here
and
then
I
will,
then
we
can
like
wait
for
a
second
get
everyone
with
a
notebook
that
will
fail
and
then
we'll
go
through
and
materialize
this
notebook
and
then
debug
it
great.
So
we're
going
to
go
back
to
Dag
it.
We
can
just
click
on
our
notebook
and
materialize
it.
B
We
won't
need
to
re-materialize
this
Iris
data
set
because
we've
already
we
already
have
that
data,
so
we'll
click
on
our
notebook
click
materialize
selected,
and
then
we
can
click
on
The
View
button
and
what
we
should
see
is
this
notebook
fail
and
then
we
can
go
we'll
have
that.
So
we
have
this
notebook
link
here
again
we're
going
to
kind
of
wait
for
the
notebook
to
fail,
and
then
we
can
click
on
this
link
and
start
trying
to
debug.
It.
B
You
can
sort
of
like
imagine
a
situation
where
you
know.
Maybe
you
have
this
notebook
running
on
a
schedule,
so
you're
not
like
watching
it
execute
every
time,
and
you
see
that,
like
the
last
like
execute
scheduled
execution
of
the
notebook
field,
you
can
go
back
into
the
logs,
find
this
link
and
then
get
into
the
live,
notebook
and
start
debugging.
A
And
by
default
that
live
notebook
will
stay
around
for
90
minutes.
So
after
90
minutes,
it'll
shut
down
the
context
and
clean
up,
but
for
now
from
this
point
on,
we
have
90
minutes
to
go
jump
into
the
live
session.
Otherwise,
you'll
get
the
not
live
copy
of
what
happened.
B
Cool,
so
we
also
get
some
logs
in
dag
it
very
simple,
just
based
off
of
these
logs,
but
again,
if
our
bug
were
a
lot
more
nefarious
than
that,
maybe
you
can
click
on
our
live
notebook
and
we
will
be
like
dropped
into
this
live
notebook
here
we
can
maybe
delete
this
and
we
can
see
that
we
can
still
do
like.
B
We
can
still
investigate
our
data.
We
got
everything
is
like
already
loaded
into
memory.
So
if
there
were
issues
with
our
data,
we
could
start
like
popping
in
here
and
trying
to
figure
out
what's
going
on,
so
we
found
our
bug.
You
know
we
had
that
exception
in
there
and
said:
let's
you
can
just
like
change
that
to
something
else.
We
can
delete
it
whatever
and
then,
if
you
want,
you
can
just
start
executing
the
remainder
of
the
cells
in
your
notebook,
maybe
just
to
like
make
sure
that
things
continue.
A
B
Run,
oh
there.
It
is
great
okay,.
B
And
yeah
my
fix
of
changing
the
exception
to
the
current,
like
that
it
was
saved
and
we're
all
set
from
there.
B
So
going
through
kind
of
that
process
of
of
doing
that,
little
like
live
debugging
exercise
is
kind
of
the
last
like
step
in
the
workshop.
So
we
can
stop
here,
do
any
be
like
a
little
like
q,
a
if
you
guys
have
any
questions
and
then
we'll
be
done.