►
From YouTube: How To Orchestrate Airbyte Syncs with dagster | Community Call 16 w/ Ben Pankow & Shawn Wang
Description
In this video, Ben Pankow, Software Engineer at Elementl, will show you how to streamline your process using dagster to manage your Airbyte connections and orchestrate syncs with downstream computation using DBT. He'll take you step-by-step through setting up Airbyte with Dagster from scratch, so even if you're new to these tools, you'll be able to follow along. And for those who are more experienced, we'll also show you how dagster and Airbyte can elevate your project to new heights. Don't miss out on this game-changing demo.
Subscribe to our newsletter: https://airbyte.com/newsletter?utm_source=youtube
Learn more about Airbyte: https://airbyte.com
#dataorchestration #data #communitycall
A
A
Super
nice
to
meet
you
and
I
noticed.
This
call
has
been
a
long
time
coming
so
I'm
excited
for
people
to
get
introduced
to
Dexter
and
see
what
you've
been
working
on
with
the
awesome,
Integrations
I
work
with
Krishan
who,
on
the
developer,
experience
team
here
there
right
where
we
help
to
run
events
and
make
content
and
basically
help
anyone
discovering
data,
integration,
data
pipelines
and
data
engineering
along
the
way
and
and
just
provide
them
whatever
content
that
they
need
to
get
familiar
with
their
bite.
A
We
also
do
stuff
with
data
warehouses
and
just
a
fair
amount
of
the
data
ecosystem,
but
with
big
personal
fans
of
dagster.
In
fact,
we
have
a
blog
post
talking
about
how
we're
moving
our
internal
data
team
stats
to
something
that
is
orchestrated
by
Dexter
and
I
know
that
the
actor
team
has
been
cooking,
something
cool
with
with
your
integration.
So
maybe
you
want
to
give
a
little
intro
of
yourself
and
and
Dexter,
and
then
we
can
take
it
from
there.
B
Thanks
again,
Sean
for
for
the
introduction,
as
I
mentioned,
my
name
is
Ben
I'm,
a
software
engineer
at
Elemental,
which
is
the
company
behind
diagram,
which
is
an
open
source
data
orchestrator.
So
today,
I'll
be
walking
through
how
you
can
use
dagster
to
orchestrate
air
byte,
along
with
some
of
your
other
MDS
tools
using
our
recently
updated
integration.
B
So
the
goal
of
a
data
platform
or
a
data
pipeline
is
to
produce
some
sort
of
result,
some
sort
of
persistent
output,
an
artifact
that
we
call
a
data
asset
depending
on
your
use
case.
This
could
be
you
know,
one
of
any
wide
variety
of
things.
You
know.
Maybe
it's
a
bi
dashboard
that
helps
guide
some
decision
making
in
your
organization.
B
Maybe
it's
an
ml
model,
maybe
it's
a
notebook
that
one
of
your
analysts
is
using,
or
maybe
it's
something
like
a
file
in
S3
or
even
a
database
table
in
your
Cloud
Warehouse.
B
All
of
these
things
are
kind
of
data
assets
that
are
produced
by
your
pipeline
and
it's
the
reason
that
a
pipeline
exists
is
to
you
know,
create
one
of
these
assets
that
is
going
to
be
used
down
the
line
by
someone
or
some
other
process,
and
these
data
assets
often
have
dependencies.
Let's
say
you
know
we
have
some
sort
of
bi
dashboard,
showing
the
performance
of
our
product.
That
dashboard
is
likely
going
to
rely
on
some
sort
of
metrics.
You
know.
B
Maybe
this
is
like
a
daily
active
user
and
those
metrics
are
probably
transformed
from
some
raw
data.
You
know.
Maybe
this
is
actual
session
data
that
we're
ingesting,
transforming
into
daily
active
user
metrics
and
then
displaying
that
on
a
dashboard
and
in
the
world
of
the
modern
data
stack.
We
have
a
lot
of
great
tools
that
are
purpose
built,
for
you
know
specific
building,
specific
kinds
of
assets
or
doing
specific
kinds
of
Transformations.
B
So
here
already,
you
know
we
have
three
different
tools,
but
in
a
larger
data
platform
this
can
grow
to
be
an
even
larger
number
of
tools.
You
know,
maybe
you
want
to
do
some
computation
in
python
or
use
Pi
spark.
This
is
more.
You
know
different
purpose-built
tools
that
you
kind
of
have
to
Jungle,
and
this
is
where
an
orchestrator
comes
in.
That
is
core
and
orchestrator's
job
is
kind
of
to
schedule.
B
B
About
it-
and
you
don't
want
to
rebuild
your
dashboard
when
the
data
is
broken
and
that
kind
of
falls
into
the
second
category
of
observability,
most
orchestrators
provide
a
layer
of
observability
on
top
of
all
of
these
tools,
making
it
easier
to
kind
of
debunk
failures
to
get
alerts
when
one
part
of
your
pipeline
fails
and
to
get
things
like
history
being
able
to
view
all
the
times
that
something
is
run
to
help
figure
out.
You
know
when
something
might
have
gone
wrong.
So
then
we
move
into
you
know.
B
Where
does
dagster
lie?
What's
kind
of
the
sales
pitch
for
dagster
in
the
realm
of
orchestrators?
The
first
is
that
dagster
is,
is
data
aware
and
is
asset
focused?
A
lot
of
more
traditional
orchestrators
are
what
we
call
task
focused.
So
you
know
if
we
look
at
the
previous
set
of
kind
of
data
assets,
we
talked
about
a
dashboard,
that's
powered
by
symmetrics
from
some
ingested
data.
There's
an
underlying
set
of
steps
to
actually
produce
these
assets,
so
we'd
run
everybody
ingest.
B
Our
data
we'd
run
DBT
to
generate
our
metrics,
and
then
we
trigger
a
dashboard
update
to
you
know,
get
our
finalized
Dash,
but
most
kind
of
traditional
task-based
orchestrators
can
sign
themselves
to
only
working
on
this
kind
of
lower
level
task
based
view
with
dagster.
We
like
to
take
kind
of
a
holistic
view
of
both
the
data
assets
and
the
tasks
that
lie
under
that,
because
we
believe
you
know
if
you're
a
stakeholder
that
cares
about
a
dashboard
or
even,
if
you're,
a
member
of
the
team,
who's
who's.
B
B
The
other
kind
of
facet
that
set
stagster
apart,
which
hopefully
you'll
see
in
the
demo,
is
that
it's
built
for
the
engineering
workflow
so
for
local
development
using
the
same
kind
of
tooling
and
same
interface
that
you
will
use
when
you
deploy
to
production
and
also
the
ability
to
test
so
to
write
unit
tests
to
easily
input
sample
data.
B
You
know
deploy
to
a
staging
environment
that
sort
of
thing
so,
with
that
all
being
said,
I'm
gonna
move
into
a
demo,
hopefully
that'll
help
to
illuminate
some
of
the
things
I
quickly
talk
through
the
code
for
this
demo
is
available
on
my
GitHub
here
here,
so
github.com
earbud
Community
demo.
So
what
I'll
do
is
kind
of
first
show
what
it
looks
like
to
use
everybody
with
dagster
and
then
I'll
kind
of
move
into
the
code
base
and
show
you
know
how
we
would
actually
build
out.
B
So
we'll
start
by
going
to
a
terminal
and
we'll
go
ahead
and
run
Gadget.
So
this
is
going
to
go
ahead
and
kick
off
the
kind
of
Local
web
interface
for
dagster.
You
know
you
can
get
this
and
run
this
locally.
Just
pip
install
dagster,
pip,
install
dag
it
if
we
go
ahead
ahead
and
open
up
the
UI
in
our
browser.
We're
greeted
here
by
this
update
activity,
Dash
job.
So
here
we
get
kind
of
a
dag
showing
the
relationship
between
the
different
Assets
in
our
job.
B
It's
a
pretty
straightforward
data,
ingest
job,
we're
loading,
some
data
from
GitHub,
using
earbud
and
also
from
Slack,
then
we're
using
DBT
to
generate
some
kind
of
aggregate
tables
here
so
rolling
up
some
daily
data
for
slack
messages
and
for
GitHub
commits
and
then
building
kind
of
a
central
table
here
that
we'll
use
to
power
our
hex
dashboard
here.
So
each
node
on
this
graph
as
I
mentioned
kind
of,
represents
an
asset.
So
you
can
go
ahead
and
click
on
one
of
these
nodes
to
view
information
about
that
asset.
B
So
we
can
see
you
know
the
last
time
the
asset
was
materialized,
so
the
last
time
that
this
commits
table
was
recreated
by
air
byte
was
you
know
about
an
hour
ago.
We
get
a
glimpse
into
the
schema
of
this
table
in
Snowflake.
You
know
if
we
click
on
one
of
these
DBT
assets,
we
get
something
similar.
We
also
get
some
time
series
metadata,
showing
you
know
every
time
this
asset
was
created.
How
long
did
it
take
to
produce?
B
This
is
all
kind
of
data
that,
in
this
case,
is
automatically
added
by
dagster,
but
can
also
be
user
specified
metadata.
So
if
you
wanted
to
attach
any
you
know
metadata,
you
want
to
keep
track
of
to
one
of
your
assets.
You
can
totally
do
that
and
have
it
show
up
in
the
UI.
And
finally,
we
have
this
dashboard
asset,
which
actually
represents
kind
of
a
version
of
our
hex
dashboard
that
was
recreated
here
at
12
30..
B
So
we
can
go
ahead
and
click
on
the
URL
there
to
view
our
dashboard
at
the
time
that
it
was
generated.
So
here
we
have,
you
know,
commits
over
time
graph,
pretty
Bare
Bones
here,
but
you
can
imagine
this
is
you
know
some
sort
of
dense
dashboard
here,
providing
some
some
useful
information
for
some
outside
stakeholder.
So
one
thing
I'd
like
to
kind
of
showcase
here,
is
the
ability
for
you
to
kind
of
get
insight
on
your
assets
without
leaving
the
UI
without
having
to
delve
into
the
code.
B
So
if
I'm
you
know
an
outside
stakeholder
who
maybe
cares
about
this
text,
dashboard
I
don't
have
to
go
dig
through
the
code
to
see
how
it's
produced.
I
can
just
go
to
the
kind
of
dedicated
page
for
that
asset
and
I
can
see
every
time
it's
been
produced.
Historically
I
could
you
know
maybe
jump
to
one
of
the
runs
if
I
wanted
to
view
some
logs
I
could
go
ahead
and
and
open
up
a
version
of
that
dashboard
from
that
particular
point
in
time.
B
So
here's
kind
of
a
historical
version
of
the
dashboard
you,
you
can
see,
there's
a
slight
difference
here,
since
this
is
from
a
couple
hours
earlier
and
you
can
even
go
in
and
view
lineage
for
a
specific
asset.
So,
if
I
didn't
know
anything
about
how
this
dashboard
was
sourced,
I
could
go
to
this
lineage
Pane
and
see
you
know
it's
coming
from
this
DBT
table
which
in
turn
is
coming
from
these
Airbus
assets.
B
So
if
I
go
back
to
this
job
page,
let's
go
ahead
and
see
what
it
looks
like
to
actually
trigger
an
airbite
sync-
and
you
know
a
DBT
Transformations
from
dagster
I
can
either
go
ahead
and
click
this
materialize.
All
button
which
is
going
to
you
know
take
all
of
these
assets
figure
out
what
underlying
tasks
are
needed
to
produce
them?
B
B
C
Can
see
that
artsyncs
will
be
kicking
off
in
just
a
moment
three.
So.
B
We
can
see
you
know
both
of
our
air
byte
sinks.
Here
have
started.
You
know,
they're
starting
to
produce
logs
and
dagster
integrates
pretty
tightly
with
air
byte,
we
stream
the
log
information
directly
from
Air
byte.
So
if
something
were
to
go
wrong,
you
know
you
can
access
the
logs
directly
through
Dexter
and
you
can
also
get
kind
of
high
level
structured
information
from
our
structured
event
log
here.
So
once
the
air
writes
and
completes
we'll
get
some
kind
of
metadata
telling
us.
B
You
know
how
many
new
rows
we're
seeing
that
sort
of
thing
in
this
more
structured,
so
this
is
going
to
take
a
little
bit
to
to
actually
get
going,
but
I
wanted
to
delve
just
briefly
into
the
code
to
show
you
know
how
easy
this
is
to
get
going.
B
This
is
all
the
code
that's
needed
to
get
this
example
working.
So
you
know
we
have
a
single
python
file
here
and
we
have
a
couple
kind
of
segments
that
are
generating
each
of
our
assets
so
to
set
up
our
earbud
assets.
Here,
all
we
need
to
do
is
tell
dagster
how
to
access
our
air
byte
instance.
So
here
we're
pointing
you
know
at
our
local
earbud
instance.
B
This
is
a
plugable
system,
so
you
know
in
our
production
environment
we
were,
you
know
pointing
towards
our
production
earbud
instance
and
then
we're
just
telling
diagster
to
load
all
of
the
earbud
assets
automatically
from
the
irbite
instance.
You
can
see.
There's
you
know
a
bunch
of
optional
inputs
here.
You
can
use
to,
of
course,
those
assets
to
look
the
way
you
want
them
to
here
we're
you
know
changing
what
the
asset
names
look
like,
but
for
the
most
part,
this
loading
just
kind
of
happens
automatically.
B
You
can
see
we're
doing
something
very
similar
here
with
DBT.
We
have
a
you
know,
a
DBT
project
defined
here
with
some
model
SQL
files
and
all
we
do
is
you
know,
point
diagster
at
the
files
and
it
will,
you
know,
do
the
work
of
automatically
generating
the
assets
for
you.
Even
the
hex
asset,
you
know.
Really
all
we
need
to
do
here
is
using
our
hex.
Integration
is
to
point
at
the
project.
B
Id
tell
dagster,
you
know
what
data
that
notebook
depends
on
and
that's
all
we
need
to
get
that
asset
up
and
running.
Then
we
can
define
a
job
to
kind
of
recreate
everything
attach
a
schedule
to
it
and
you
know
tell
diagster
all
of
the
assets
that
we
want
to
load
and
that's
really
all
we
need
to
get.
B
You
know
kind
of
a
smaller
example
like
this
up
and
running
so
I
wanted
to
make
sure
there
was
enough
time
for
questions
and
I
know
we
got
a
little
bit
of
a
late
start,
so
I'll
take
a
pause
there,
but
I'm
happy
to
kind
of
delve
into
you
know
any
other
part
of
this
demo
or
answer
questions.
If
anyone
has
it.
A
Yeah,
so
if
anyone
is
watching
along
feel
free
to
drop
in
some
questions,
but
I
can
drop
in
a
few
like
so
I
think
this
is
a
really
good
overall,
like
high
level
demo
like
what
was
involved
in
building
this
dagster
air
byte
integration.
I
would
love
to
learn
a
little
bit
more
about
like
the
process
and
like
perhaps
what
could
be
better.
B
Yeah,
so
a
little
bit
behind
the
curtain,
so
you
know
most
of
our
Integrations.
You
know,
airbite
included,
are
you
know,
broker
dagster
talking
to
whatever
Service
you
know,
through
whatever
API
surface
that
that
service
has.
So
everybody
in
this
case
has
a
has
a
really
good
API,
which
made
it
pretty
easy
for
us
to
you
know.
Not
only
do
things
like
you
know,
kick
off
earbud
sinks
through
the
API,
but
also
to
stream
a
lot
of
that
metadata
back.
B
So,
whether
that's
you
know
incorporating
the
earbud
logs
into
diagster's
run
logs,
but
also
to
provide
some
of
this
metadata
so
like
being
able
to
show
us,
you
know
what
is
the
scheme
of
our
tables
look
like
directly
in
dagster
or
even
to
show
us.
You
know
how
many
new
rows
were
added
to
our
table
every
time
we
ran
this.
So
you
know
under
the
hood
a
lot
of
this
integration
work
was
really
just
interfacing,
with
their
bytes
Avi.
A
B
You
know
we
have
the
ability
to
basically
to
validate
the
outputs
of
various
each
of
your
assets,
and
one
of
the
things
that
we're
actually
working
on
internally
right
now
is
like
a
notion
of
slas
for
assets,
so
kind
of
rules
that
you
can
set
for
assets
that
will
alert
if
they're
violated
to
their.
B
Yeah,
so
that's
kind
of
a
yeah,
that's
another
kind
of
way
that
the
slas
can
be
used
so
right
now,
you
know
this
example
is
run
on
a
you
know,
strict
time
schedule,
so
we've
decided
you
know,
hey
we're
going
to
rerun
all
these
assets
every
10
minutes,
but
you
can
also
have
dagster
kind
of
dynamically
figure
out
when
to
run
certain
assets
by
telling
it
I
want
this
asset
to
be
ready
by
9
A.M
every
morning
that
way,
you're
not
like
defining
explicitly
when
every
asset
runs
you're,
just
giving
dagster
a
list
of
criteria
and
it'll
kind
of
figure
out
behind
the
scenes,
one
to
run
everything
yeah.
A
The
ultimate
in
declarative
assets,
yeah.
B
B
If
we
go
back
to
the
code
here,
you
know
this
is
one
of
the
ways
that
you
can
Define
urban
assets
is
to
kind
of
automatically
generate
them
from
your
air
byte
instance.
So
what
daxer
does
here
is
you
know,
talks
to
the
earbud
API
directly
figures
out
what
syncs
are
present
and
then
we'll
generate
assets
from
those
things
based
on
the
tables
that
they're
producing
you
can
do
things
like.
B
You
know,
filter
down
the
list
of
connections
that
you
care
about,
there's
also
ways
that
you
can
build
airbite
assets
manually
if
you'd
rather
kind
of
have
each
of
the
assets
defined
explicitly
so
there's
you
know
you
can
also
use
something
like
build
airbite
assets
here,
which
takes
an
explicit
connection
ID.
So
it's
a
little
bit
less
magical,
but
it
might
be
nicer
for
cases
where
you
want
everything
like
explicitly
defining
code.
You.
A
Know
one
thing
one
thing
I
feel
like
is
under
appreciated
when
people
look
at
Solutions
like
this
is
the
fact
that
you
don't
have
to
statically
define
the
asset
once
I
feel
like
because
you
have
control
of
the
API
and
I've
seen
some
of
our
most
advanced
users
do
this,
which
is
essentially
based
on
some
parameters
that
come
into.
Let's
say
your
dagster
run
spin
up
new
air
byte
instances
and
assets,
dynamically
I
I
feel
like
you
have
that
control
I
I,
don't
know.
If
there's
any
foot
guns
that
you
would
flag
for.
B
Yeah,
that's
actually
something,
you
know
not
that
specific
use
case,
but
we've
had
a
lot
of
people
playing
with
recently
kind
of
like
dynamically
spinning
up
and
tearing
down
infrastructure
like
as
part
of
a
run.
So,
yes,
you
know
one
thing
we've
had
people
do
is
kind
of
have
ephemeral
development
environments
that'll,
like
Fork
a
snowflake
schema.
You
can
kind
of
test
against
production
without
actually
interrupting
your
production
database.
A
So
it's
funny,
because
dagster
Cloud
also
kind
of
does
that
for
dagster
itself
right,
like
that's
your
sort
of
deploy,
preview
thing,
I,
don't
know
if
you
want
to
talk
about
that
a
little
bit
and
show
it
off
really
I
mean
I.
Think
it's
the
most
impressive
thing
that
that
you've
shipped
last
year,
yeah.
B
So
you
know,
maybe
a
little
bit
of
a
plug,
so
everything
I've
showed
here
is-
is
all
the
open
source
version
of
dagster,
which
you
can
run
locally.
You
can
host
yourself,
but
we
also
kind
of
have
a
hosted
version
of
dagster,
which
will
ease
some
of
the
kind
of
deployment
burden
called
dagster
cloud,
and
it
also
has
Enterprise
features.
So
things
like
our
back,
but
one
of
the
other
kind
of
features
is
we
have
something
called
branch
deployments,
so
we
have
the
ability
to
kind
of
dynamically
Fork
your
entire
diagster
environment,
with
pull
requests.
A
Production,
yeah,
cool
I
think
it's
like
part
of
the
value
proposition
that
essentially
you're
innovating
on
all
the
things
that
maybe
you
know,
airflow
doesn't
really
consider
part
of
their
core
feature
set
and
I.
Think
that's
that's
pretty
cool.
What
is
I
guess
part
of
the
roadmap
or
what
should
people
look
forward
to
in
2023
when
thinking
about
the
dagster
story?
In
order
to
you
know
whether
or
not
they
should
bet
on.
B
This
horse,
yeah,
that's
a
great
question,
so
you
know
a
lot
of
the
roadmap.
Is
you
know,
stuff
we're
still
kind
of
internally
debating,
but
some
of
the
things
that
you
can
look
forward
to
you
know
one
of
the
things
we've
been
a
lot
of
time
on
recently
is
our
integration.
So
you
know
here
you've
seen
our
Integrations
with
air
by
dut
hex,
but
we're
hoping
to
build
out.
B
You
know
even
deeper
Integrations,
with
a
lot
of
the
core
tools
that
people
are
using
as
part
of
the
MDS,
and
it's
part
of
a
lot
of
these
data
use
cases.
I.
Think
one
of
the
things
that
set
Stacks
are
apart
from
some
more
traditional
orchestrators
is
the
level
of
depth
in
our
integration.
So
we
surface
a
lot
of
metadata.
B
B
Energy
building
out
our
set
of
Integrations
there
we're
doing
a
lot
of
work
to
improve
the
core
kind
of
python
ergonomics
of
the
diagster
apis.
You
maybe
saw
a
little
bit
of
that
in
the
code.
Preview
I
showed,
but
that's
another
area
where
we're
focusing
a
lot
of
energy
and
then
the
kind
of
slas
that
that
we
talked
about
a
little
bit
earlier,
so
making
it
even
easier
to
kind
of
just
declare:
hey
I.
B
A
It
really
is
kind
of
everything
that
you
would
want
like
taking.
The
software
engineering
mindset
right,
like
I,
feel
like
Nick
when
he
spoke
at
our
conference,
was
talking
a
little
bit
about
that
and
bringing
that
to
data
Edge,
which
is
kind
of
starved
for
for
all
that
Innovation
and
progress.
A
You
know
on
our
ends,
we're
working
on
basically
the
bringing
the
API
that
you
that
you
use
here
in
sort
of
local
environments,
to
our
everybody
Cloud
instance,
so
just
kind
of
similarly
like
what
we
want
to
encourage
using
Daxter
using
orchestrators
in
general
as
part
of
your
production
quality
deployment
of
a
data
pipeline
right
over
the
sort
of
data
architecture-
and
you
know
all
the
all
the
Integrations
that
you
guys
have
is
just
really
incredible
to
see
so.
I
posted
it
up
in
the
chat
for
people
to
explore.
A
Awesome
I
think
that
about
covers
it
any
parting
words
for
the
community,
any
call
section
what
what
they
should
check
out
next.
B
Yeah
well,
first
of
all,
you
know
thanks
so
much
for
giving
me
the
opportunity
to
share
what
we've
been
working
on
here.
I
don't
have
too
much
to
share
other
than
if
you're
interested
in
trying
this
out,
you
know,
dagster.io
is
where
you
can
find
out
more.
Our
slack
Community
is
really
the
the
biggest
place
to
kind
of
engage
with
us
and
the
broader
dagster
community.
So
if
you
want
to
play
around
with
dagster,
you
have
any
questions
that
come
up.
The
slack
Community
is
probably
the
best
place
to
ask.