►
Description
Chris and Radovan discussing about next steps of how to decouple dbt DAG in Airflow
B
Yeah
great
rather
increase
regarding
data
engineering,
internship,
we
just
start
about
DBT
airflow
infrastructure
things
here.
So
at
the
beginning,
as
I
said,
DBT
is
one
graph
or
one
doc
from
this
list
and
it's
executed
exactly
in
the
same
way
as
any
other
thing
here.
Our
airflow
is
based
on
or
it's
deployed
on
cluster,
and
it
has
three
components:
it's
database,
it's
web
server
and
it's
casual.
B
Yeah
I
will
do
that.
Thank
you
thanks
for
reminder.
I
stopped
shared
and
start
recording
and
yeah.
Yeah
great
stuff
starts
one
more
time,
DBT
running
on
airflow.
For
now
it's
just
one
monolith
structure
with
a
lot
of
tasks
inside
and
actually
just
one
step
back
about
the
airflow.
It
has
three
for
components
as
I
said,
web
server,
schedule
and
database
for
us
what
is
most
important.
Actually,
this
is
component
managing
the
web
server
behind
that
the
schedule
and
how
things
are
orchestrated
and
all
data
are
persistent
in
database
block
or
database
layer.
B
Long
story
short
any
of
these
dug
is
set
up
to
be
spin
up
in
the
same
way,
then
DBT
is
no
different.
In
that
case,
as
I
said
when
we
call
the
task,
it's
pin
up
the
kubernetes
spot
in
our
cluster
and
everything
is
done
inside
that
Port
I
mean
everything
is
executed.
That
is
a
little
bit
more
difficult
way
to
maintain
and
manage,
but
great
advantage
of
it
is
you're
not
limited
to
this
Machinery
deploy
airflow.
So
you
detached
orchestration
from
execution.
In
this
case
you
have
airflow,
you
say:
okay,
this
will
be
done.
B
I
will
want
to
execute
DBT
and
then
plot
both
spin
up
and
everything
is
done
on
that
pod.
It
can
be
scaled
directly
speaking
infinitely,
so
we
have
no
problem
with
any
any
issue
when
it
comes
to
memory
or
things
like
that,
and
also
then
take
a
look
into
the
code
here,
you
will
see
command,
we
run
when
we
run
the
DPT,
let's
say
for
non-product
models
and
small
later
Warehouse.
B
B
This
will
run
one
the
port
on
kubernetes.
This
will
run
the
second
one
and
also,
let's
say
DBT
test-
will
run
the
third
one,
etc,
etc,
but
long
story
short.
We
actually
run
DBT
command
as
a
script
and
it
spin
up
on
the
cluster
and
on
that
cluster
everything
is
executed
and
data
are
transformed.
Also
we
Define
which
environment,
which
profile
what
we
exclude.
What
we
want
to
include
some
additional
commands
as
per
documentation
from
DBT
and
that's
it.
B
So
actually
you
can
do
the
same,
and
actually
you
can
create
a
mimic
of
production
environment
on
that
what
you
said,
all
local
clusters,
what
we
talked
last
last
time
and
also
you're
able
to
run
DBT
locally
and,
of
course
it
will
Target,
also
automation,
last
time,
not
production,
but
your
private
schema
or
like
Chris
sharp
prepared
products.
Okay,.
B
B
I,
don't
know,
did
you
find
any
obstacle
or
any
show
stopper
blocker
for
us
or
we're
we're
good
to
go
with
that
with
that,
let's
say
set
of
date
or
set
of
tables
right,
because
I
think
it's
increment
or
it's
imported
or
extracted
using
file,
training
I'm,
not
wrong,
and
it's
in
raw
layers.
So
from
that
row
layer
we
need
to
decompose
DBT
jobs.
B
So
what
from
my
point
of
view,
if
you
can
go
with
Salesforce,
what
we
need
to
do
is
to
do
a
couple
of
things
to
analyze
the
models
to
tack,
all
of
them
to
see
how
dependency
are
going
and
then
to
create
a
command.
We
can
run
it
first
locally
and
then
to
wrap
it
up
inside
the
new
dock
or
in
airflow
right
yeah.
That's
how
I
see
this
so.
A
Let's
see
so
where
the
Salesforce
is
the
Stitch
integration
with,
because.
A
B
Everything
is
nice
to
described
here
for
each
data
source
we
have,
but
we
will
stick
with
Salesforce
and
also
it's
Stitch.
We
will
also
have
one
lesson
to
cover:
how
to
determinate
it
or
or
Define,
something
or
predefine.
Something
is
teaching
five
Trend.
It
also
can
be
a
good
candidate
to
take
a
look
in
brief
in
Stitch
what
we
have
regarding
the
sales
force,
but
actually
data
is
landing
in
Europe's
schema
and
also
you
can
see
what
is
the
ETA
for
this
data.
B
No
actually
Stitch
is
a
start
platform
where
you
define
all
of
things,
source
and
Target,
and
everything
is
on
there.
Resources.
So
typical
sauce
story,
like
okay
I,
want
to
get
the
data
from
Salesforce
here.
This
is
my
credentials,
and
this
is
set
of
tables.
You
can
actually
Import
in
your
data
warehouse
and
then
you
said,
Okay
I
want
this.
This
table
don't
want
this.
This
D
table
I
have
a
Target
like
snowflake
in
our
case
and
I
want
to
put
everything
in
database.
A
So,
actually,
a
way
of
triggering
the
DBT
tag
like
on
a
stitch.
So
once
the
Stitch
process
has
run
trigger
the
use
as
an
input
for
the
DBT
stuff,
yeah.
B
Actually
I
think
DBT
is
blind.
In
that
case
you
can
tell
DBT.
Usually
you
know
if
you
go
to
stitch
van,
just
trigger
your
data
importing
and
also
you
approximately
know
what
time
is
needed
for
finished
extraction.
So,
as
per
that,
you
can,
let's
say,
run
your
DBT
model
yeah
or
you
can
be
more
smart
and
use
some
sensors
or
something
like
that
to
see.
Okay,
is
this
done
or
not,
and
then
based
on
that
parameter,
run
DBT?
B
This
will
be
a
fight
tuning
for
us
because
for
as
of
now,
what
I
know
usually
extrapolate
at
the
time
and
when
we
want
to
start
something
and
DBT
is
just
one
more
structures
of
now.
So
in
that
case
it's
running
after,
like
okay,
P
postgres
pipeline
is
done.
This
job
is
done.
These
five
five
integration
is
done
and
then
DBT
will
run.
So
you
can
consider
Stitch
and
five
Trend
as
the
same
type
of
tool
like
sus
pipelines
for
extraction
part
of
ETL
processing.
B
B
B
B
A
B
To
ensure
everything
is
working,
fine,
locally,
DBT
run
that
commands.
So
when
you
hear
there
we
have
the
command
actually
what
we
want
to
run,
then
we
incorporate
the
command
in
in
Duck
in
new
acrylic
duct,
but
we
need
to
be
sure
everything
is
cool
here
and
then
move
on
to
that
creation,
which
will
give
more
context
about
data
engineering
stuff
here,
yeah
so
yep.
B
B
Exclude
it
physically,
but
on
this
level
I
would
say
for
data
DBT
sharding,
it's
enough
to
Define,
okay,
actually,
what
we
want
to
load
see
do
we?
Can
we
somehow
improve
something?
Is
it
okay
or
not,
but
main
point
and
main
focus
is
to
extract
it
from
that
big
bunch
in
DBT
and
have
like
something
like
DBT,
Salesforce
right
and
then
on
that
level
we
will
schedule
everything
when
we
want
to
run
how
we
want
to
run.
What
is
the
command?
Actually
we'll
have
the
command
as
outcome
of
this
issue,
and
and
that's
it,
foreign.
B
B
Then
we
can
exclude
run
it
locally
on
your
own,
reminds
him
or
whatever
and
ensure
we
include
everything
needed
for
that
run.
Yeah
and
after
that
months,
when
you
do
a
test,
we
can
move
to
the
creation,
which
will
be
a
separate,
let's
say,
working
session
for
us,
a
real
real
data
engineering
stuff
in
main
python
coding
and
plus
airflow.
So
just
my
assumption
and
anything
you
think
you
should
add
here,
please
add,
feel
free
to
rephrase
to
add
more
details.
Whatever
you
needed,
but
we
can
start
from
here
if
you're,
okay,
yeah.
A
I
will
do
I.
Think
Dennis
mentioned
that
there
was
some
requirement.
Some
some
people
were
looking
for
the
Salesforce
data
to
be
delivered
into
a
certain
table
more
frequently
than
once
a
day.
So
great
I
think
that
would
be
the
so
find
out
where
which
table
they
were
looking
for
it
and
whether
it's
yeah
and
then
maybe
take
it
that
far,
because
we
wouldn't
need
to
do
the
whole
everything
that
relied
on
Salesforce
data
more
frequently
than
once.
A
Was
only
a
sort
of
subset
of
it,
something
to
do
with
opportunities,
maybe
yeah
definitely
I
agree.
Salesforce
data
load,
it
sorry
transform
it.
B
B
So
we
need
to
align
with
that
part
and
also
once
when
you
drop
off
everything
regarding
the
team,
epic
think,
okay,
we
can
put
Salesforce
loading
more
more
frequent
than
it's,
not
because,
as
I
said,
you
can
schedule
it
in
in
Stitch,
very,
very
simple,
so
we'll
cover
and
touch
that
part
also
we'll
start
from
that.
But
still
I
would
say
what
we
can
do
is
is
done
right.