►
From YouTube: Supercharge dbt: You might not need dbt Cloud!
Description
Join the Dagster team for a special showcase of new dbt functionality included in Dagster 1.4. which makes Dagster a strong alternative to dbt Cloud. This session will benefit data professionals who work with dbt models and are looking to manage these as part of a larger pipeline. If you are looking to migrate off dbt Cloud or simply exploring dbt alternatives, this is a session you will not want to miss.
A
Hi
I'm
Pete
CEO
at
Elemental,
we
make
dagster
in
open
source
data,
orchestrator
and
I
want
to
welcome
you
to
our
event
today
focused
on
DBT,
but
before
we
get
into
all
the
content,
I
wanted
to
first
touch
on
something
that
we
care
very
deeply
about
here,
which
is
data
engineering,
and
we
view
data
engineering
as
a
discipline,
not
a
job
title.
So
there
are
all
sorts
of
different
people
in
the
organization
they
have
titles.
Like
data
engineer
or
machine
learning,
engineer
data
platform,
engineer,
analytics
engineer,
even
data
scientists
and
data
analyst.
A
They
are
all
often
participating
in
the
process
of
building,
maintaining
and
leveraging
data
Pipelines,
and
so
we
view
this
as
a
discipline,
not
a
job
title
and
the
state
of
this
discipline
of
data
engineering
today
is
is
has
a
lot
of
challenges.
So
first
there's
a
lot
of
different
tools,
so
everyone
participating
in
this
process
has
to
learn
multiple
tools.
They
have
to
jump
through
many
of
them
in
order
to
do
their
job.
A
The
third
problem
is
that
there
are
multiple
different
data
teams
within
the
organization.
Often
they
have
the
the
title:
ml
Engineers
analytics
Engineers
data
engineers
and
they
often
have
entirely
different
Stacks,
which
makes
them
difficult
to
collaborate,
even
though
they
are
doing
very
similar
activities
and
they're,
often
operating
on
the
same
data
sets
and
serving
similar,
if
not
the
same
stakeholders.
A
So
this
siloing
has
a
bunch
of
different
problems
and
creates
a
lot
of
problems
and
finally,
many
of
the
tools
that
data
Engineers
or
people
that
are
practicing
the
discipline
of
data
engineering
have
available
to
them,
have
a
really
low
quality
developer
experience
so
often
they're
lacking
local
development.
We
see
teams
testing
in
production
all
the
time.
Version,
Control
and
CI
CD
are
often
not
ubiquitously
adopted,
and
so
that's
why
we
started
the
dagster
project
to
accelerate
the
adoption
of
software
engineering.
A
Best
practices
and
solve
a
lot
of
these
other
problems
that
we
were
seeing
amongst
folks
that
were
practicing
the
discipline
of
data
engineering.
So
just
a
quick
recap
for
those
that
are
not
familiar
with
dagster
to
set
the
table
with
dagster,
it's
a
data
orchestrator,
so
you
write
and
you
test
your
data
pipelines
in
Python.
You
might
be
either
doing
you
know
your
raw
Computing
Transformations
within
python,
or
you
might
be
orchestrating
systems
outside
of
python.
A
You
know
based
on
SQL
or
Scala
or
another
language
like
that,
but
fundamentally
you're,
defining
your
your
data
pipelines
in
Python
and
you're
testing
them
locally
and-
and
you
know,
in
the
CSE
process,
then
dagster
will
run
and
monitor
your
computation
for
you.
So
you
can
launch
runs
dagster
will
monitor
them
and
restart
them
when
they
fail.
You
can
put
your
data
pipelines
on
a
schedule.
You
can
kick
off
runs
of
data
pipelines
based
on
external
signals
using
a
feature
called
sensors.
You
can
partition
them.
A
Etc
Dexter
will
run
them
for
you,
and
it
will
give
you
a
beautiful
UI
to
monitor,
monitor
that
computation
and
also
hook
up
to
alerting
systems
to.
Let
you
know
when
something's
going
wrong
and
then
finally,
one
of
our
distinguishing
features
is
that
we
track
data
lineage
and
metadata
within
the
tool
itself,
so
you
can
inspect
every
asset
status.
It's
schema.
It's
metadata,
it's
dependencies
all
from
one
place.
It's
really
really
convenient
for
engineers
working
on
the
data
platform,
as
well
as
their
stakeholders
to
kind
of
self
self-identify
issues.
You
know
when
they
come
up.
A
A
lot
of
people
are
using
dagster
all
sorts
of
different
companies
all
over
the
world,
all
sorts
of
different
Industries
and
sizes
and
stages,
and
we
integrate
with
every
data
tool
out
there.
Basically,
and
one
thing
that
we
noticed
was
that
one
integration
really
Rose
Above
the
Rest
in
terms
of
adoption
within
the
dagster
community,
and
that
was
DBT.
A
So
over
half
of
our
Cloud
users
use
DBT
and
at
least
one
of
their
pipelines,
and
so
DBT
has
become
one
of
our
most
important
Integrations,
and
that's
why
we
have
spent
the
last
couple
of
months
really
focusing
on
making
a
lot
of
improvements
to
our
integration
and
and
really
trying
to
set
the
standard
for
orchestration
with
DBT,
and
so
I
wanted
to
share
our
agenda
with
you
today.
So
pedram
is
going
to
take
over
from
me
and
he's
going
to
talk
about
the
challenges
that
organizations
are
facing.
A
You
know,
with
DBT
at
scale
Sandy's
going
to
make
the
case
that
dagster
is
the
best
way
to
orchestrate
DBT
and
I
agree
with
him.
Then
Rex
is
going
to
do
a
demo
of
our
new
DBT
integration
and
then
finally,
Ben
is
going
to
show
us
what
this
enables
in
terms
of
kind
of
the
current
set
of
features
that
dagster
has
and
the
future
of
dagster
and
how
DBT
can
plug
into
that
and
leverage.
All
the
great
features
that
everybody
using
using
dagster
has
been
able
to
take
advantage
of
so
far.
B
Thanks
Pete
hi
I'm,
Pedro,
Navid,
head
of
data
engineering
Endeavor
at
dagster,
I've
been
a
long
time.
User
of
DBT
and
I've
spent
a
lot
of
time
talking
to
people
in
the
community
about
the
joys
and
pains
of
building
data
pipelines
and
while
DBT
is
a
great
tool
that
has
really
changed
how
we
build
SQL
based
data
pipelines.
I've
also
seen
some
common
pain
points,
as
teams
grow
and
their
data
needs
become
more
complex.
B
First,
let's
talk
about
vendor
login,
perhaps
not
an
obvious
topic,
but
over
the
past
year
we've
seen
DBT
adopt
a
less
permissive
licensing
model,
making
certain
features
exclusively
available
to
DBT
Cloud
users.
Now,
while
DPT
core
remains
open
source,
many
of
the
features
required
to
make
it
usable,
such
as
orchestration
and
deployment,
as
well
as
upcoming
features
like
DBT
server
and
metrics
they're,
all
part
of
the
cloud
only
product.
B
This
increases
a
risk
of
vendor
lock-in
and,
along
with
things
like
last
year's
price
hikes
and
new
limitations
on
their
non-enterprise
plans,
such
as
a
one
project
limit
or
two
concurrent
jobs.
This
has
become
a
real
concern
within
the
community
issue.
We've
heard
a
lot
about
is,
as
teams
build,
more
complex
pipelines
and
Integrations,
especially
with
the
rise
of
things
like
Ai
and
ml.
B
They
soon
begin
to
realize
that
there's
more
to
life
than
chinja
and
SQL
theater
theater
teams
are
increasingly
working
cross-functionally
with
ML
and
Ops
teams,
but
DPT
cloud
is
essentially
an
operational
silo,
there's
little
to
no
understanding
of
the
source
data
that
feeds
your
models
and
instead
we
rely
on
Patchwork
freshness
tests,
for
example,
to
hope
that
Upstream
beta,
just
has
been
refreshed
now.
Dbt
does
support
using
snowpark
or
Pi
spark
for
python
models,
but
in
doing
so,
there's
Now
a
type
coupling
between
your
DBT
models
and
your
python
code.
B
Probably,
most
importantly,
DBT
just
isn't
a
full
life
cycle
tool
for
data.
Engineers
Jinja
only
gets
you
so
far,
and
things
that
appear
simple
on
the
surface
become
very
cumbersome
at
scale.
This
is
the
most
common
depressing
concern
when
I
talk
to
teams
and
I
really
believe
that
data
teams
deserve
better
tools
that
support
engineering,
best
practices
and
DBT
has
certainly
given
us
in.
Given
us
a
head
start
on
that
from
where
we
were
before.
B
I
think
it's
still
not
enough
for
all
the
great
aspects
of
SQL,
and
you
know,
I
love,
SQL,
I,
defend
it
all
the
time.
Sometimes
things
are
just
better
than
python
unit
testing,
for
example,
or
if
you
have
complex
logic
that
you
want
to
wrap
around
in
a
python
function.
That's
a
very
natural
thing
to
do
and
it's
very
easy
to
test,
but
you
can't
really
test
logic
in
SQL.
All
you
can
test
is
the
output
of
these
data
transformations
in
a
system
when
it
comes
to
things
like
observability
or
notifications.
B
B
A
DBT
is
limited
to
crossed
out
schedules
which
results
in
teams
creating
these
large
buffers
between
successive
tasks,
just
to
create
the
illusion
of
dependency,
if
you've
ever
scheduled
an
integration
at
1am
and
then
ran
your
DBT
models
at
4
AM
and
then
your
operational
tasks
like
notebooks
and
reverse
ETL
at
6am.
You
know
exactly
what
I
mean
you
just
hope
and
pray
that
nothing
takes
longer
than
those
periods
right
and
while
yaml
itself
is
an
excellent
language
for
configuration,
we've
really
started
to
overload
it
and
I.
B
C
Sandy
and
I'm
the
lead
engineer
on
the
dagster
project,
if
you're
not
familiar
with
it,
dagster
is
an
orchestrator.
Its
goal
is
to
help
data
practitioners
orchestrate
the
computations
and
data
that
make
up
their
data
Pipelines
to
schedule
them
to
run
them
in
the
right
order
to
give
visibility
into
failures
and
to
help
pick
up
where
they
left
off.
C
C
C
Between
the
way,
the
dagster
thinks
about
data
pipelines
and
the
way
that
DBT
thinks
about
data
pipelines
means
that
daxra
can
orchestrate
DBT
much
more
Faithfully
than
other
general
purpose.
Orchestrators
like
airflow
and
at
the
same
time,
Dexter
is
able
to
compensate
for
dbt's
biggest
limitations.
Dbt
is
rarely
used
in
a
vacuum.
The
data
transformed
using
DBT
needs
to
come
from
somewhere
and
go
somewhere
when
a
data
platform
needs
more
than
just
DBT.
C
C
Data
lineage
comes
automatically
because
the
references
between
your
tables
are
part
of
how
you
define
your
data
pipeline.
Like
DBT
dagster
puts
data
assets
at
the
center.
Diagster
pipelines
are
graphs
of
connected
data
assets.
This
means
that
dagster
can
understand
a
DBT
project
at
a
really
deep
level.
A
C
Other
orchestrators
daxer
doesn't
need
to
run
each
DBT
model
in
a
separate
task
which
incurs
a
lot
of
overhead,
but
DBT
models
aren't
the
only
kind
of
data
asset
that
dagster
works
with
a
diagster
asset
could
be
a
table
ingested
using
a
tool
like
five
trans,
Stitch
or
air
bite.
It
could
be
a
machine
learning
model,
a
data
set
of
images,
a
file
and
you
can
compute
decks
or
assets
using
any
python
code
running
on
any
platform.
C
Orchestration
has
been
Dexter's
primary
job
since
its
Inception
and
over
that
time
it's
grown
to
handle
a
very
long
tail
of
orchestration
needs,
for
example,
to
determine
when
you
run
your
DBT
models,
you
often
need
to
rely
on
logic,
that's
specific
to
your
use
case.
For
example,
you
might
have
a
particular
way
to
check
whether
new
source
data
has
arrived
or
need
to
incorporate
a
specific
business
calendar
into
your
scheduling
in
dagster.
You
can
write
arbitrary
python
code
that
triggers
runs
of
your
DBT
models.
C
C
C
Last
of
all,
Dexter
is
open
source.
We
offer
a
cloud
product
where
we
deploy
Dexter
for
you
and
offer
extra
features
for
teams,
but
fundamentally
you're
not
locked
in,
for
example,
if
we
change
our
pricing
in
a
way
that
you
believe
is
unfair,
you
always
have
the
option
to
take
your
decks
or
pipelines
and
deploy
them
on
your
own,
using
the
open
source
project
with
that
I'm
going
to
pass
it
off
to
Rex
who
will
demo
how
this
all
works
in
much
more
depth.
D
We're
going
to
start
off
with
a
javel
shop,
DVD
project
and
supercharge
it
piece
by
piece
with
dagster,
we'll
first
scaffold
the
project
to
allow
dagster
to
load
your
entire
DP
project
as
software-defined
assets.
We
provide
a
built-in
utility
to
do
this
in
our
integration
Library
here
we're
creating
a
new
project,
called
Japanese
metal
container
scaffolded
code.
D
D
D
We
can
see
the
individual
Assets
in
our
current
project
and
how
they
relate
to
one
another,
and
currently
this
just
contains
all
the
assets
modeled
in
our
dapple
shop
project
and
we
can
search
for
models
and
filter
accordingly.
So,
for
example,
a
we
can
filter
for
any
model
that
matches
the
word
customers
and
we
can
also
do
a
case-sensitive
match
for
the
actual
customers
model
and
show
all
models
Upstream
with
them.
D
D
D
D
D
D
D
D
D
D
D
D
D
Now
to
ensure
that
our
data
assets
are
up
to
date,
we
want
to
run
them
on
a
schedule
in
our
existing
scaffold,
we're
given
a
schedule
that
only
materializes
our
GPT
assets,
but
we
just
added
two
new
upstream
and
downstream
assets.
So,
let's
incorporate
all
of
our
assets
onto
the
same
schedule
here
we'll
create
a
new
schedule
that
selects
all
of
our
data
assets
and
materializes
them
on
a
daily
Cadence.
D
D
Instead
of
just
Jinja
and
SQL,
two
integrate
your
data
platform,
you
can
understand
how
all
of
your
tools
and
data
assets
relate
to
one
another
in
one
central
control
plane
and
you
can
configure,
alerting
and
monitoring
that
works
for
you
and
your
team
and
three
you
can
materialize
your
selected
data
assets
together
at
desired
times
with
dagster.
Knowing
how
and
when
your
data
will
update
will
never
be
a
guessing
game.
E
Thanks
Rex,
my
name
is
Ben
and
I'm.
An
engineer
working
on
Dexter
Dexter
provides
powerful,
orchestration
capabilities
that
let
you
schedule
your
DBT
models
alongside
other
data
assets,
but
that's
just
scratching
the
surface
of
what
it
can
do.
Extra
is
built
to
provide
fine-grained
control
and
deep
insight
into
your
data
platform.
So
today,
I'll
showcase
a
few
upcoming
features
that
will
help
you
supercharge
your
use
of
DBT.
E
We've
already
seen
how
you
can
set
up
Dexter
to
regularly
update
your
DBT
models.
But
what
happens
when
you
make
changes
to
your
models?
Ideally,
the
table
corresponding
to
the
model
would
be
re-materialized
so
that
it
matches
our
SQL
change.
Our
Downstream
data,
the
tables
models
and
reports
that
are
built
off
of
our
model
might.
E
Out
of
date,
ordinarily,
it
can
be
a
nightmare
to
track
down
and
update
everywhere.
Our
models
are
used.
Let's
take
a
look
at
how
dagster's
automaterialization
functionality
can
help.
Let's
say
we
notice
something
is
wrong.
In
our
payments
model,
it
looks
like
we're
incorrectly
converting
prices
in
tenths
to
dollars.
E
Let's
go
ahead
and
fix
this
bug
and
save
our
change,
because
we've
been
Computing
this
Fielding
correctly.
Our
Downstream
models
have
also
been
flawed.
Thankfully,
since
tagster
is
aware
of
all
of
our
data
dependencies,
it
can
automatically
Mark
our
assets
as
out
of
date.
Let's
go
ahead
and
reload
our
code.
Indexer
Dexter
will
automatically
recognize
that
our
SQL
has
changed.
For
those
of
you
familiar
with
the
state
modified
DBT
selection.
This
works
in
a
similar
way.
E
One
difference
is
that
diagster
will
Mark
out
of
date
all
affected
Assets
in
your
data
pipeline,
not
just
those
written
in
DBT.
Here
we
can
see
an
asset
which
uses
python
to
generate
a
report
choses
out
of
date.
Two,
if
we
wanted,
we
could
now
manually
re-materialize
our
assets
to
bring
them
back
up
to
date.
E
Coming
soon
to
dagster
is
the
ability
to.
If
you
so
choose
instruct
dagster
to
automatically
recreate
models
and
their
Downstream
assets
when
their
SQL
has
changed
instead
of
manually
re-materializing
our
assets,
we
can
see
the
diagster's
auto
materialization
functionality
has
automatically
kicked
off,
run
to
rebuild
just
the
change
model
and
our
Downstream
assets.
We
can
see
our
DBT
models
which
depend
on
the
change
model
as
well
as
the
report.
Python
model
are
being
updated.
E
E
E
Let's
say
I
wanted
to
add
a
new
DBT
model
to
collect
all
of
our
pending
orders
in
one
place.
I've
already
tested
against
mock
data
locally,
but
I
want
to
make
sure
that
it's
going
to
work
properly
on
real
production
data,
I've
gone
ahead
and
created
a
pull
request.
Since
I've
set
up
branch
deployments
on
my
GitHub
repository
dagster
cloud
is
created
in
ephemeral,
sandboxed
environment,
where
I
can
test
my
code
opening
up
my
Branch
deployment,
I'm
greeted
with
an
asset
graph,
which
now
shows
my
new
DBT
model.
E
This
is
in
a
favorable
environment
which
coexists
alongside
my
production
deployment.
It's
running
against
a
copy
on
write
clone
of
my
snowflake
schema,
so
I
can
kick
off
the
materialization
of
my
model,
which
reads
from
real
production
data,
but
without
having
to
worry
about
anything
being
written
to
production.
E
B
E
Going
on
in
your
data
platform
for
many
data
teams,
dealing
with
scale
tracking
down
resource
usage
and
cost
is
a
priority.
When
you
have
a
lot
of
stakeholders
who
are
building
and
scheduling
computations,
it's
not
always
easy
to
get
a
handle
onto
where
to
focus
your
effort
to
save
money
for
these
users,
we're
building
a
feature
for
diagster
cloud
that
helps
track
down
sources
of
wasteful
compute
and
spend
out
of
the
box.
E
Let's
say
that
I'm
a
data
platform
engineer
supporting
a
team
of
other
stakeholders
who
have
built
jobs
using
DVT,
Snowflake
and
other
tools.
A
recent
budget
review
meeting
showed
that
our
snowflake
bill
has
been
spiking.
Recently,
let's
go
to
the
new
reporting
tab
on
diagster
to
get
a
handle
on
what's
going
on.
E
There
are
a
couple
different
ways
that
we
can
investigate.
Diagster
provides
a
way
to
filter
jobs,
assets
and
asset
groups
by
costs
like
snowflake
credits
or
by
performance
metrics
like
compute
time
or
number
of
free
tries.
Since
we're
looking
at
a
snowflake
building
change.
Let's
go
to
the
snowflake
credits
tab,
it
looks
like
one
of
our
asset
groups
has
had
a
major
increase
in
consumption.
Over
the
past
couple
days,
let's
drill
down
into
that
acid
crew.
E
First,
maybe
let's
see
if
any
of
our
assets
are
being
re-materialized
more
often,
though
it
seems
like
the
rematerialization
account
has
been
the
same.
Maybe
let's
look
at
duration.
Instead,
it
looks
like
one
of
our
assets.
Orders
has
increased
in
run
time,
pretty
significantly
over
the
past
couple
days.
E
We
can
click
over
to
the
asset
index,
there's
asset
catalog
and
jump
to
the
assets.
Definition
in
our
GitHub
repo,
let's
tab
over
to
the
get
blame
and
see
if
anything
has
changed,
it
looks
like
someone
has
recently
changed
the
DVT
model
to
add
a
pretty
expensive
join.
It
seems
like
it
might
be
our
culprit.