►
From YouTube: How to Orchestrate your Airbyte ELT Jobs with Dagster
Description
In this edition of the Community Call, Owen will show off the new Airbyte Dagster integration, allowing you to orchestrate Airbyte and the rest of your modern data stack easily.
Check out our announcement here: https://airbyte.com/blog/orchestrate-your-airbyte-elt-jobs-with-dagster
Subscribe to our newsletter: https://airbyte.com/newsletter?utm_source=youtube
Learn more about Airbyte: https://airbyte.com
A
All
right,
hey
everybody,
community
how's.
It
going
really
appreciate
everyone
being
here.
I
know
that
everyone
is
kind
of
trickling
in
right
now,
but
I'm
gonna
do
a
quick
overview
of
what
we
have
planned
for
today
and
a
quick
introduction
to
our
guest.
So,
basically,
airbite
has
been
working
with
dagster,
which
is
an
orchestration
platform
that
allows
you
to
orchestrate
a
lot
of
things
on
the
modern
data.
A
Basically,
take
too
much
time
away
from
him,
so
owen,
if
you
want
to
take
it
away
and
start
presenting.
Thank
you
so
much
for
being
here
awesome.
B
No,
I
appreciate
the
invite
it's
always
exciting
to
present
this
stuff
to
a
new
community,
so
I'm
going
to
go
ahead
and
share
my
screen.
Yeah
thanks
for
the
introduction,
I'm
a
software
engineer
at
elemental
we're
the
company
behind
dagster,
which
is
an
open
source
data,
orchestration
tool.
A
B
A
B
High
level
I'm
going
to
explain
what
orchestration
is
and
then
from
there
I'll
just
jump
into
the
demo.
Also,
I
hear
comments,
but
I
can't
see
them
on
my
screen.
So
if
there's
something
important
for
me
to
adjust
in
a
moment,
let
me
know
yeah.
A
I'll
I'll.
A
We'll
do
it
we'll
do
a
q
a
session,
so
everyone,
if
you,
if
you
have
any
questions,
feel
free
to
leave
them
in
the
questions
tab.
So
if
you
go
ahead,
if
you
look
at
the
bottom,
there's
a
questions
tab,
you
can
go
ahead
and
create
a
question
and
we'll
directly
answer
it
at
the
end
and
we'll
make
sure
owen
hears
it
and
do
shim.
This
will
indeed
be
recorded.
This
will
be
put
on
our
youtube
channel
for
to
live
in
perpetuity.
Now.
B
Without
further
ado,
I
appreciate
that
so
at
its
most
basic
orchestration
is
when
you
have
a
set
of
tasks,
and
you
want
to
run
them
in
a
particular
order.
So
if
you
have
two
tasks,
one
might
depend
on
the
other
one,
which
means
that
you
don't
want
to
run
task
2
until
task
1
completes
task.
1
fails,
don't
run
task
2..
These
tasks
can
be
anything
so,
for
example,
task
1
might
be
load
new
data
into
a
database
that
would
be
using
a
tool.
Something
like
airbyte
task.
B
2
might
be
something
like
transform
those
tables
once
they're
there
again.
This
could
be
using
a
modern
data
stack
tool
like
dbt,
or
it
could
be
really
anything.
These
tasks
are
completely
arbitrary.
Another
example
would
be
maybe
first
you
want
to
store
some
log
file.
You
have
your
local
machine
to
s3
or
something,
and
then
only
once,
that's
done.
Do
you
want
to
delete
your
local
copy?
It's
it's
really
important
to
get
the
order
right
there
and
an
orchestrator
is
something
that
can
help
with
that.
B
So
other
features
of
orchestrators
that
aren't
strictly
necessary,
but
you
generally
find
them
would
be
scheduling.
So
being
able
to
kick
off,
these
runs
at
a
particular
time
alerting.
So
if
something
fails,
you
want
to
know
about
it
and
history
so
being
able
to
look
back
over
time
and
say
you
know:
how
long
did
this
thing
take
at
x,
point
of
time,
etc?
So,
seeing
all
that
the
question
might
come
up
like,
why
would
you
want
to
use
an
orchestrator
or
even
is
airbite
an
orchestrator
right,
so
airby
actually
offers
an
integration
with
dbt.
B
It
allows
you
to
say
okay
once
this
airbyte
sync
completes,
then
I
want
to
run
this
particular
dvd
project
and
in
a
sense
that
is
orchestration,
and
this
is
actually
really
useful
for
some
set
of
use
cases.
However,
often
the
reality
is
a
bit
more
complex
than
can
be
expressed
by
those
that
symbol
of
a
dependency
structure.
So
really
your
entire
data
platform
is
a
collection
of
these
interdependent
tools.
It's
not
just
airbite
and
dbt.
B
Usually
you
have
other
stuff
mixed
in
and
where
this
gets
really
tricky
is,
when
you
have
a
custom
code
using
python
right,
so,
instead
of
just
running
error,
byte
and
then
dbt
afterwards,
maybe
you
use
airbite
to
take
data
from
a
database,
put
it
on
s3.
Once
it's
there,
you
use
spark
to
transform
that
partition
in
a
particular
way.
Then
you
do
further
transformation.
B
You
might
have
machine
learning,
use
cases
that
also
need
to
run
on
the
data
that's
produced
by
these
things
and
as
the
data
platform
grows
and
the
business
logic
becomes
more
complex,
the
dependencies
between
these
different
tools
also
become
a
lot
more
complex
in
turn,
and
it
sort
of
quickly
becomes
unmaintainable
to
just
have
these
running
on
their
own
separate
cron
schedules
or
in
their
own
separate
tools.
The
second
you
need
to
do
something
like
a
backfill.
B
It
can
be
extremely
arcane
to
know
the
correct
incantation
of
scripts
that
you
need
to
run
in
a
particular
order
in
order
to
get
everything
working
again.
Similarly,
debugging
workflows
can
be
really
challenging
because
you
need
to
page
through
all
these
different
tools,
see
where
the
error
occurred.
When
the
only
symptom
you
have
is
hey
some
some
number
looks
weird.
This
is
really
where
an
orchestrator
shines,
and
so
as
these
complexities
grow.
So
that's
the
sort
of
general
orchestration
view
what
about
dagstr?
B
What's
what's
different
or
interesting
about
dagster,
as
opposed
to
other
orchestration
tools,
and
I
think
like
I
could
go
on
about
feature
differences
etc
for
for
a
while.
But
I
think
the
most
salient
point
here
is
that
dagster,
fundamentally
the
philosophy
behind
it
is
that
the
orchestrator
isn't
just
responsible
for
running
tasks.
It
should
be
responsible
for
understanding
them.
It's
a
centralized
place
where
you're
defining
all
the
dependencies
between
your
tasks.
B
So
it
should
have
some
sense
of
why
you're
actually
running
them,
and
I
think
the
easiest
way
to
explain
what
that
means
is
with
the
demo
so
you're
already
free
from
me,
going
through
the
slides
and
I'll
just
hop
right
into
the
demo.
All
of
the
code
for
this
is
available
on
my
personal
github.
So,
if
you're
interested
in
looking
at
that
feel
free
and
we're
also
going
to
have
a
recipe
going
out
this
week
or
next
week,
that
will
show
how
to
get
all
this
up
and
running
on
your
local
machine.
B
If
you're
interested
okay
with
that,
I'm
going
to
do
this
demo
in
maybe
like
a
slightly
backwards
order,
I'm
going
to
actually
show
off
the
ui
first
and
then
explain
the
code
that
is
used
to
generate
what
you're
seeing
so
dexter
is
completely
free
and
open
source,
and
that
actually
includes
the
ui
tool
as
well.
So
you
can
run
the
ui
tool,
which
is
called
tag
it
on
your
local
machine,
no
sign
up
all
that
stuff
required
just
hit
install
daggett.
So
this
is
one
of
the
files.
B
That's
in
that
github
repo
that
I
was
sharing,
and
so
I'm
just
gonna
point
tag
it
at
that
and
spin
it
up.
So
that'll
take
a
second,
but
now
it's
up
so
I
can
tab
over
here,
and
this
is
what
daggett
looks
like
so
at
a
high
level.
We've
defined
a
single
job
here,
and
the
purpose
of
this
job
is
to
take
data
from
two
different
sources:
github
and
slack.
B
It's
going
to
run
that
through
dbt
to
transform
it
join
that
into
a
single
unified
metric
and
then
we're
going
to
do
some
custom
python
stuff
at
the
end.
Apologies
to
any
data
scientists
that
are
watching
this
right
now,
because
I'm
not
one
but
it'll,
give
you
at
least
a
sense
of
a
data,
sciencey
workflow
that
you
might
experience.
If
we
zoom
in
a
little
bit,
you
can
see.
Dijkstra
already
gives
you
a
ton
of
metadata
about
everything,
that's
going
on
within
this
job.
B
So,
for
example,
if
we
want
a
more
complete
description
of
this
sync
github
thing,
if
runs,
have
succeeded
or
failed,
etc.
So
all
this
sort
of
operational
information.
This
is
really
helpful,
especially
if
you're,
not
the
person
that
has
ridden
this
job,
because
you
might
have
no
idea
like
what
all
these
things
are.
A
name
only
goes
so
far.
So
having
all
this
metadata,
that's
available
to
you
right.
B
B
A
step
back,
there's
sort
of
a
naive
question
that
I
think
people
should
ask
about
most
of
their
data
pipelines,
which
is
what
is
the
point
like?
Why
have
I
defined
this
thing
and,
of
course
this
is
a
demo
so,
like
the
actual
data
flowing
through
here,
isn't
that
important?
But
if
you
imagine
someone
who's
written
something
similar
in
reality,
why
would
they
have
done
this
and
it's
not
just
so
that
they
can
run
tasks
right?
That's
not
their
overarching
goal.
It's
to
achieve
a
particular
result
right
and
there's
a
few
different
ways.
B
We
can
understand
this
result.
One
is
at
the
end
of
all
this
data
science
stuff.
We
have
this
generate
chart
step
and
all
that's
doing
is
we're
fitting
some
model
to
the
data
that
we're
observing
in
dbt
and
generating
a
chart
that
represents
how
well
the
fit
the
the
model
fits
the
actual
data.
So
that's
one
thing
that
a
person
writing
something
like
this
would
care
about.
They
care
about
that
chart
that
they
generate
in
the
end.
Another
thing
that
they
care
about
is
the
models
or
the
tables
in
the
database.
B
That
dbt
is
creating.
So
when
you
run
dbt,
it
creates
or
updates
a
bunch
of
tables
in
a
database,
and
we
probably
care
about
those
as
well
right.
Analysts
might
want
to
make
sure
those
are
up
to
date
or
have
accurate
information.
And
finally,
we
also
care
about
the
data
that
is
being
moved
by
airbyte
from
the
source
to
the
our
data
warehouse.
B
So
we
probably
care
that
the
raw
data
for
github
and
slack
information
is
up
to
date
and
if
you
ask
a
traditional
orchestrator
about
you
know
what
is
the
point
of
this
data
pipeline?
They
often
have
no
insight
into
the
fact
that
those
are
the
things
that
are
actually
being
created
by
these
steps.
So
we
have
a
term
for
those
things
that
we
care
about,
which
is
data
assets.
Traditionally,
that's
something
like
a
table
in
a.
B
Machine
learning
model
some
report
etc,
and
we
actually
consume
metadata
about
those
assets
and
allow
you
to
visualize
that
and
track
them
over
time.
So
we
can
see
here
that
this
job
actually
creates
six
assets,
so
we
have
an
asset
for
the
slack
channel
messages.
That's
one
of
the
things
that
airbyte
is
syncing
from
the
sync
sync
slack
step.
We
have
this
airbyte
github
commits
thing
again:
that's
the
raw
data
that
error
byte
is
moving.
Then
the
dbt
project
has
three
models
in
it.
B
It's
just
some
daily
rollups
on
top
of
those
tables
and
then
they're
joined
together
into
a
single
metric
and
then
finally,
we
have
this
chart
that
we
create
at
the
end,
which
is
also
data
asset.
So
if
we
click
on
one
of
these
things,
not
only
do
we
see
the
most
recent
time
that
this
chart
was
created
as
well,
as
you
know,
the
the
fit
function
that
was
used
to
create
it,
as
well
as
the
chart
itself.
So
you
can
take
a
quick
peek
at
that.
It's
not
the
prettiest
thing
in
the
world.
B
We
can
also
see
this
historically
over
time,
so
dexter
allows
you
to
see
this
sort
of
longitudinal
view
of
the
actual
assets
you're
creating.
So,
for
example,
if
I
go
all
the
way
back
in
time
to
9
32
pm
last
night,
I
can
grab
a
different
version
of
this
asset
and
see
what
the
chart
looked
like
all
the
way
back
then-
and
I
promise
these
are
different,
even
though
they
look
very
similar.
B
So
if
we
again,
instead
of
finding
the
asset
this
way,
we
just
know
that
there's
some
error
by
github
commits
thing.
We
can
just
search
for
that
and
we
see
the
longitudinal
information
for
this
airbyte
created
asset
over
time
and
airbag
is
a
great
tool
because
it
gives
us
lots
of
meditation
to
work
with
right.
So
this
isn't,
like
the
user,
is
manually
inputting
all
this
information.
B
Every
time
you
run
airbyte
just
gives
you
tons
of
really
important
metadata
that
you
can
track
over
time,
so
we
can
see
how
many
bytes
or
how
many
records
were
created
track
that
over
time
we
also
get
schema
information
that
again,
we
can
track
over
time,
and
you
know
if
something
changes.
This
is
a
really
powerful
debugging
tool.
You
can
see
the
exact
point
in
time
at
which
the
schema
changed
or
at
which
a
data
spike
occurred.
So
an
orchestrator
isn't
just
good
for
looking
at
things.
B
B
So,
for
example,
if
I
go
to
the
sync
github
thing,
we
actually
ingest
those
logs
directly
into
dagstr,
so
you
can
view
these,
as
the
thing
is
running,
to
get
an
idea
and
that's
not
useful
only
when
you're
running
something,
but
you
also
get
a
historical
record
so
again,
debugging
workflows.
This
is
invaluable
to
be
able
to
look
and
see
if
there
are
any
warnings
etc.
At
a
certain
point
in
time
you
get
all
the
you
know:
dbt
transformation
information,
all
of
that,
so
this
is
going
to
take
a
little
bit
there.
B
We
go
actually.
A
I
have
a
quick
question
for
you,
so
how
is
dexter
able
to
kind
of
generalize,
like
I
guess,
having
log
output
for
like
because
you
have,
because
you
can
like
orchestrate
a
bunch
of
different
things,
you
can
probably
get
like
dbt
logs
too,
like
how
do
you?
How
did
you
like
generalize
just
basically
like
having
this
window
there?
That,
like
always
shows
you
log
output,
yeah.
B
So
the
answer
is
the
tool
needs
to
provide
it
to
us.
So
luckily,
error,
bytes
api
provides
the
the
log
information,
but
as
long
as
we
can
get
that
from
a
request,
we
can
insert
that
into
our
own
system,
actually
under
the
hood,
we're
just
re-outputting
that
into
the
standard
out
stream-
and
you
can
see
this
is
actually
a
generic
window
showing
all
the
standard
out
generated
for
this
step.
It
just
so
happens
that
the
step
only
runs
airbite
and
we're
the
ones
outputting
the
standard
out.
A
And,
and
do
you
have
like
a
similar
thing
for
like
capturing
like
metadata?
You
mentioned
that,
like
you
can
capture
like
that,
you
can
like
basically
have
that
longitudinal
data
where
you
like,
where,
like
we,
send
you
a
bunch
of
metadata
and
so
you're,
just
capturing
that
through
the
api.
So
you
would
do
that
for
like
any
tool.
You
would
basically
look
for
you'd
kind
of
like
individually,
like
look
for
the
metadata
and
then
capture
it
so
that
you
can
display
graphs.
B
Exactly
yeah
yeah,
so
the
more
metadata
the
tool
provides,
the
better,
which
is
one
of
the
reasons
we're
really
happy
with
airbite
and
how
this
integration
turned
out
and
yeah.
As
you
mentioned,
we
also
captured
dbt
output.
B
Actually
because
dbt
output
is
a
little
smaller,
we
show
it
just
in
line
airbike
can
produce
like
hundreds
or
thousands
of
logs
of
log
lines,
so
we
don't
want
to
show
it
in
the
same
view
that
we're
seeing
all
this
operational
information
but
yeah
again,
you
can
look
back
at
this
over
time,
so
we've
run
our
step,
we'll
get
some
new
asset
materializations.
So,
for
example,
for
this
sync
slack,
we
can
see
that
we've
found
three
new
records
since
the
last
time
I
ran
this
is
incremental
mode,
etc.
B
So
that's
a
quick
view
of
basic
dagster
functionality,
so
I'll
just
hop
into
the
code
now.
So
this
is
what
the
code
looks
like.
It
is
all
fitting
on
one
screen.
The
reason
that
it's
able
to
be
this
condensed
is
just
that
we're
using
pre-built
integrations
for
a
lot
of
this
work,
so
we're
using
three
libraries
here.
The
first
one
is
dijkstra
airbite.
This
was
contributed
by
air
by
team
member,
which
we're
very
grateful
for
we're
very
happy
with
how
that
turned
out.
B
Then
we're
also
using
this
dexter
dbt
integration
to
run
our
dbt
step
and
then
because
this
is
running
all
on
my
local
machine,
where
we
have
like
a
postgres
database
where
all
the
transformational
stuff
is
happening
so
we're
using
dijkstra
postgres
to
eventually
read
from
that.
So
the
first
thing
that
of
import
here
is
this
is
where
we're
defining
our
erabite
sync
operations.
B
So
we
import
this
airbyte
sync
op
thing
and
then
because
it's
just
a
generic
up,
we
need
to
configure
it
with
a
particular
airbyte
connection
id
so
that
it
knows
which
connection
to
kick
off,
and
then
we
just
give
it
a
name
so
that
it
shows
up
nicely
and
daggett
and
people
can
understand
what
it
is.
And
then
we
do
the
same
exact
thing
for
slack.
B
So
we
just
take
the
generic
up
and
and
pointed
towards
the
slack
connection,
and
then
we
do
a
very
similar
thing
for
dbt
this
time
it
doesn't
actually
need
any
configuration.
We
just
give
it
a
name
here,
and
even
this
is
actually
optional.
If
I
didn't
do
this,
it
would
just
show
up,
as
you
know,
dbt
run
in
daggett.
So
this
is
maybe
a
little
bit
more
understandable
I'll,
show
the
python
ops
and
how
they're
defined
in
a
second
most
of
the
code.
B
There
is
just
you
know,
normal
python
code,
so
I
didn't
include
it
in
this
file,
but
here's
how
we
define
the
dependencies
between
these
things.
So
we
define
the
operations
that
we
want
to
compute.
How
do
we
define
the
order
and
how
they
connect
to
each
other?
We
do
that
using
what's
called
a
job,
so
indexed
or
the
collection
of
ops
is
the
job
and
we
have
the
option
here
to
define
particular
resources,
and
these
resources
are
how
dagster
communicates
with
the
relevant
apis,
and
these
can
be
swapped
out
sort
of
at
will.
B
So
you
can
imagine
having
the
same
exact
dependency
structure
between
things
but
hey
instead
of
looking
at
my
localhost
airbyte
resource.
Maybe
I
want
to
point
that
to
the
airbite
server
I
have
running
in
production
when
I'm
actually
running
this
for
real.
So
this
allows
you
to
sort
of
segment
the
concerns
of
what
the
dependency
structure
is
from.
How
is
this
going
to
run
like
what
things
is
it
actually
going
to
hit?
We
configure
a
dbt
resource
to
point
out
a
particular
dvt
project
and
then
we're
looking
at
a
local
postgres
database.
B
So
we
give
a
particular
database
connection.
This
is
going
to
be
used
in
one
of
the
python
ops,
so
I'll
show
that
in
a
second
and
yes,
it
is
just
running
on
my
local
machine,
but
dijkstra
does
make
it
easy
to
read
from
environment
variables.
So
don't
do
this
and
don't
do
this
in
the
real
world.
Please.
A
Yeah,
I
sorry
to
kind
of
jump
in
again
here,
but
I
I
just
to
kind
of
touch
on
another
question
that
someone
that
someone
asked
this
is
kind
of
a
interesting
place
to
do
it.
So
you
could
have
like
kind
of
like
a
mix
of
like
local
and
cloud
products
right
so
like
you
have
like.
A
Let's
just
say,
you
were
running
like
airbike
cloud
or
you're
running,
maybe
like
daxter
cloud
or
something
like
you
can
use
this
like
air
by
resource
configured
to
like
point
it
at
anything
right
and
still
run
dexter
locally
like
and
and
then
like.
I
guess
reverse
like
you'd
have
this
in
like
the
extra
cloud
and
point
and
then
could
you
like,
but
then
I
guess
you
wouldn't
be
able
to
point
it
at
local
right
so,
like
I
guess,
the
question
would
be
like.
B
Think,
generally,
the
patterns
that
we'll
see
is
that
all
of
the
code
to
run
either
in
production
or
locally
would
just
be
in
the
same
git
repo.
Although
dijkstra
has
a
concept
called
repositories
as
well,
which
allows
you
to
sort
of
segment
which
versions
of
those
jobs
you
see
in
those
different
environments.
I
didn't
want
to
get
too
bogged
down
in
the
weeds
here,
but.
B
So
one
version
of
that
graph
would
be
pointed
out
all
my
local
stuff,
so
I
would
give
it
those
resources
and
then
a
different
version
of
that
graph
would
be
configured
with
all
of
my
prod
stuff
and
when
I'm
running
in
prod,
I
point
towards
the
prod
version
of
that
graph
and
when
I'm
running
locally,
I
would
point
at
the
the
local
version
of
my
graph
awesome.
A
B
No
no
happy
to
answer
more
as
they
come
up,
but
yeah,
so
we've
defined
all
the
resources
that
are
important
or
relevant
to
running
this
thing.
So
now
we
just
need
to
define
the
dependency
structure,
so
the
first
bit
is
defining
how
we're
defining
the
fact
that
this
dvt
step
depends
on
these
airbyte
steps.
So
the
way
we
do
this
is
we
take
our
transform
slack
github
thing
that
was
the
dbt
run
up
that
we
configured
and
we
say
that
it
starts
after
stink,
github
and
sync
slack.
B
This
is
slightly
different
syntax
than
what
you're
going
to
see
for
the
python
stuff,
simply
because
dbt
and
airbite
don't
actually
need
to
pass
any
data
between
each
other.
So
dbt
knows
how
to
read
tables
and
it
doesn't
need
to
be
like
past
a
copy
of
the
the
data
in
a
table
or
anything
in
order
to
function.
So
there's
no
actual
data
being
passed
between
these
steps,
but
when
you
do
python,
transformations
data
actually
does
need
to
get
passed
between
the
steps.
B
So
we
might
pass
a
pandas
data
frame
from
one
step
to
another
and
therefore
data
needs
to
flow
so
yeah.
Then
we
have
these
custom
python
ops,
so
I
have
a
file
called
ops.pi
and
yeah.
99
of
this
file
is
just
me
messing
around
with
various
data
sciencey
python
libraries,
but
our
first
app
here
that
I've
defined
is
called,
read,
dvt
output
and
you
see
it.
It
has
a
required
resource
key
of
dbe
con.
B
So
this
is
just
a
connection
string
that
can
be
used
by
pandas
to
read
sql
from
a
particular
place,
so
this
db
connection
string
could
also
be
a
snowflake
database
connection
string.
So
this
is
how
you
can
define
an
op
that
does
the
same
thing,
regardless
of
what
it's
pointed
at.
I
don't
need
to
separately,
create
a
read,
dbt,
output,
postgres
and
read
dbt
output,
snowflake
thing
and
then
create
different
jobs
for
either
of
those
scenarios.
I
can
just
say:
okay.
B
This
is
a
generic
way
to
read
some
data
from
a
dbt
project
that
was
run.
Then
we
have
this
getfitprams
thing
again.
This
is
just
like
purely
a
normal
python
function
right.
So
this
is
just
a
function
that
takes
in
a
data
frame.
In
this
case,
it's
a
data
frame
containing
like
that
summarized
daily
metric
data,
and
then
it
uses
some
library
to
create
a
curve
and
generate
some
parameters
that
say
hey.
B
This
is
the
best
set
of
parameters
that
most
closely
matches
this
data,
and
this
is
really
powerful
right,
like
you're,
not
thinking
about
how
am
I
storing
this
data?
How
am
I
passing
it
from
task
to
task
right,
like
these
things
are
actually
getting
serialized
and
passed
between
different
processes,
but
that's
actually
completely
invisible
to
you
as
your
programming,
you
can
think
purely
in
terms
of
what
python
code
do
I
want
to
write.
How
do
I
want
to
transform
my
input
into
my
output
and
then.
B
We
have
this
generate
charts
thing,
and
this
is
where
that
asset
thing
comes
into
play,
so
we
take
in
both
a
data
frame
containing
the
observed
data
and
then
the
fit
params.
So
that's
what
we
generated
in
the
previous
step
to
match
things
together,
and
then
we
plot
that
all
this
is
just
you
know,
plotting
stuff,
and
then
we
save
that
plot
into
a
particular
storage
path.
And
then
this
is
how
you
tell
dagster,
hey,
there's
some
persistent
asset
that
I
want
you
to
keep
track
of
in
the
previous
ops.
B
B
But
if
you
want
to
track
something
over
time,
this
is
how
you
would
do
it,
so
you
give
an
asset
materialization,
a
name
so
I've
just
called
it
analysis,
chart
num
action
fit,
and
then
you
give
it
a
function
that
this
is
the
fit
function,
and
then
you
give
it
a
location,
and
this
all
just
gets
rendered
in
tag
it
I'm
running
low.
On
time,
but
just
very
quickly,
I
think
one
of
the
reasons
that
it's
so
important
that
daggett
can
be
run
locally.
Is
the
local
dev
experience.
B
If
I'm
developing
on
this-
and
I
want
to
change
one
of
these
functions,
it
can
be
pretty
time
consuming
to
be
like.
Okay,
I'm
going
to
rerun
the
entire
thing
just
to
see
if
this
one
thing
works
like
these
might
take
up
to
minutes,
you
know
an
hour
and
what
you
can
do
is
actually
just
run
subsets
of
the
graph
sort
of
at
will.
B
So
if
I
come
back
here
and
I
run
from
selected,
this
will
now
run
with
the
new
version
of
the
code
and
we'll
see
that
my
new
version
of
the
code
is
not
very
good
in
a
second
and
it
has
now
failed.
So
this
is
actually
a
really
nice
local
dev
experience.
B
You
can
just
iterate
really
quickly,
it's
kind
of
like
a
like
a
jupiter
notebook-esque
thing
where
you
get
all
this
stuff
working
above
it,
and
then
you
just
sort
of
iterate
and
keep
going
to
sort
of
fix
the
the
issues
as
they
come
up.
So
I'm
gonna
stop
it
there.
I'm
happy
to
answer
questions
or
show
more
stuff
that
people
are
interested
in.
A
Yeah,
that's
awesome!
That's
that
that's
really
cool!
I.
I
really
appreciate
your
presentation.
It
was
very
detailed
and
thorough,
and
I
I
think
that
that
was
really
great.
Let's
try
to
go
through
these
questions
quickly.
Ari
asks.
How
do
you
serialize
data
assets
between
ops?
Can
you
configure
that
and
overwrite
it
to
collect
metadata
about
assets.
B
That
is
a
great
question
yeah,
so
we
have
an
abstraction
called
an
I
o
manager
by
default,
when
you
run
it
locally,
that
io
manager
just
pickles,
the
python
object,
writes
it
to
a
file
and
then
on
the
other
end.
It
reads
that
and
unpickles
it,
but
that's
completely
customizable.
We
have
abstract
actions
where
we
have
built-in
integrations
for
things
like
s3,
so
it
does
that
same
protocol,
but
instead
of
local
file
system,
it's
s3
and
yes,
you
can
emit
metadata
during
that.
A
Then
bogdan
asks
daxter
has
to
be
installed
locally
question
mark,
which
I
think
they're
implying
do
like
like
are.
Where
are
all
the
places
that
dexter
can
be
installed
and.
A
B
Yeah,
so
you
don't
need
to
install
it
locally.
It
is
useful
if
you
want
to
develop
like
that,
but
some
people
just
prefer
to
develop
in
pure
python.
You
can
test
ops
as
regular
python
functions.
There's
no
need
to
run
bag.
It
like
that.
You
do
need
to
install
the
python
diagnostic
library
in
order
to
develop
code,
but
that's
just
the
same
as
any
any
normal
library
with
dagster
cloud.
You
can
get
access
to
the
same
like
daggett
interface,
without
running
anything
on
your
local
machine.
B
It's
purely
a
convenience,
though
most
people
will
deploy
daga
to
some
server
and
push
code
to
there
and
interact
with
it
like
that.
A
All
right
awesome,
honestly,
I
think
you
did
a
fantastic
job
of
summing
up,
all
the
all
the
information,
so
some
last,
I
guess
housekeeping
to
do
this.
Video
will
be
uploaded
to
youtube
within
the
next
day
or
two
and
you
can
go
and
check
out
any
of
the
information.
The
awesome
information
that
that
owen
has
bestowed
upon
us.
A
We
will
also,
as
owen
mentioned,
have
the
recipe
for
doing
this
and
implementing
this
and
deploying
it
locally
in
the
same
way,
that
owen
has
showed
you
out
next
week
and
additionally,
we
will
have
some
documentation
on
this
on
the
airbag
website
and
owen.
Is
there
any
last
like
kind
of
shout
outs
or
last
things
call
actions
or
anything
that
you
want
to
kind
of
say
to
the
community
before
you
go.
B
No,
I
I
just
really
want
to
thank
the
air
by
team
honestly
for
first
of
all,
getting
the
integration
going
and
then
being
super
organized
on
all
of
this
community
outreach
stuff.
We
really
welcome
people
to
join
the
slack.
That's
the
best
way
to
reach
us.
If
you
have
support
questions
or
just
curious
about
this
but
yeah,
that's
it
yeah.
A
Awesome,
it
was
really
awesome
having
you
here,
honestly,
really
fantastic
presentations.
So
thank
you
so
much
again,
owen
thank.