►
From YouTube: Dagster Hot Takes: Notebooks
Description
IStop guesstimating when your Notebooks should run
Prefer to read: https://gist.github.com/slopp/7c64973c70b12ffbd506d4e64d80d65e
Resources posted below, or star us on GitHub to get started.
- Star Us: https://gitub.com/dagster-io/dagster
- Full Example Project: https://github.com/dagster-io/hooli-data-eng-pipelines
A
A
If
you
have
data,
for
example,
that's
being
updated
at
8am
on
a
Monday
morning,
you
might
set
up
a
notebook
to
run
just
every
Monday
morning
at
9am
and
while
that
might
work
most
of
the
time,
what
will
invariably
happen
is
one
day
the
8
AM
data
load
will
be
late
and
then
the
notebook
will
either
fail
or
it'll
run
and
show
the
results
of
stale
data.
Both
of
these
outcomes
can
cause
a
lot
of
confusion
and
problems
for
the
stakeholders
who
are
relying
on
those
notebooks.
A
So
today,
I
want
to
look
at
how
we
can
schedule
notebooks
in
dagster
and
eliminate
a
lot
of
the
guesswork
and
when
notebooks
should
run
so
here,
I
have
a
notebook
I'm
using
vs
code
to
edit
and
interact
with
this
notebook
instead
of
a
local
Jupiter
server.
Basically,
because
it's
better
in
every
way,
so
I
guess
you
get
two
hot
takes
for
the
price
of
one
today,
but
what
I
can
do?
This
is
just
a
regular
Jupiter
notebook,
so
I
can
run
the
cells
and
in
this
case,
we're
looking
at
some
daily
order.
A
A
Forgive
me
it's
not
a
realistic
data
set
or
a
great
model,
but
a
standard
use
case
that
you
might,
as
a
data
scientist,
be
using
a
notebook
for
the
problem
with
this
notebook
is
that
it
relies
on
external
assets,
so
we're
pulling
in
our
daily
order
data,
as
well
as
the
coefficients
from
a
trained
forecasting
model,
and
so
this
notebook
clearly
doesn't
live
in
isolation
and
it
shouldn't
be
scheduled
in
isolation
either.
So,
let's
take
a
look
at
what
it
would
entail
to
add
this
to
tagster.
A
So
the
first
thing
I'm
going
to
do
is
in
my
diagster
project:
I'll
Define,
a
new
asset
that
represents
this
notebook.
All
I'm
doing
is
pointing
at
the
existing
notebook
file
that
I
was
just
running
locally
and
then
I'm
specifying
what
inputs
this
notebook
is
going
to
use
daily
order.
Summary
net
order,
forecast
model,
and
then
the
only
thing
I
need
to
change
in
my
notebook
is
that
I'll
go
into
this
cell,
where
I'm
referencing
those
inputs
and
I'll
mark
it
as
a
parameter.
A
Now
that
I've
done
that
I
can
launch
dag
it
and
view
my
data
pipeline
locally.
So
here's
my
local
pipeline,
we
can
see
I
have
all
of
the
Upstream
data,
that's
responsible
for
being
transformed
and
creating
that
daily
order.
Summary
and
I
also
have
the
model
that's
being
trained
and
both
of
those
become
inputs.
It's
my
model
notebook
now
locally
I
can
go
ahead
and
materialize
or
run
this
entire
data.
A
A
My
daily
order
summary
and
my
forecasting
model
has
automatically
been
replaced
with
the
necessary
code
to
actually
use
the
order
summary
in
the
model
from
my
pipeline,
but
then
I
still
have
the
ability
to
do
everything
else
in
the
notebook,
such
as
view.
The
predictions
versus
The
observed
data,
as
well
as
the
trend
for
my
model,
if
I
needed
to
make
some
changes,
I
could
change
the
model
and
then
just
re-execute
this
notebook,
rendering
step
and
daxter's
smart
enough
to
use
the
Upstream
assets
without
having
to
rerun
the
entire
pipeline
now
once
I'm
in
production.
A
This
notebook
interface
is
going
to
look
the
same,
so
I
still
have
my
notebook
as
a
first
class
asset,
but
now,
instead
of
running
things
in
an
ad
hoc
fashion,
I
have
a
couple
of
options.
I
can
run
things
on
a
schedule.
I
could
run
things
using
a
dagster
sensor
if
the
Upstream
data
has
changed
or
you'll
notice.
That
dagster
is
smart
enough
to
indicate
that
our
daily
order,
summary
has
been
updated,
and
so
the
model
notebook
should
be
updated.
A
As
well
and
this
approach
we
call
reconciliation
scheduling,
so
you
never
have
to
worry
about
when
your
notebook
should
run.
All
you
have
to
specify
are
those
inputs
that
the
notebook
depends
on
index
will
keep
everything
up
to
date.
So
hopefully
this
gets.
You
excited
about
bringing
notebooks
into
your
orchestration
tool.
If
that's
the
case,
check
out
dags
here
on
GitHub,
give
us
a
star
and
we'd
love
to
help
you
get
going
thanks.