►
From YouTube: dbt Integration Improvements
Description
Shirshanka Das and Gabe Lyons (Acryl Data) share recent improvements to the dbt + DataHub integration
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
So
srishanka-
and
I
are
very
excited
to
talk
about
some
improvements
that
we've
made
to
the
dbg
integration,
with
data
hub
and
just
to
set
the
context.
Dbt,
as
many
of
you
folks
know,
is
the
tool
that
allows
you
to
write
transformations
on
your
data
inside
of
a
data
warehouse.
However,
dbt
is
not
the
one.
Storing
the
data
itself
you'll
be
running
dbt
on
something
like
snowflake
bigquery,
postgres,
et
cetera,
where
the
data
is
stored.
A
Now
the
downside
of
viewing
both
of
these
ingestions
historically
has
been
that
we've
created
independent
data
set
entities
for
the
dbt
nodes
and
the
warehouse
nodes.
So
this
comes
up
in
a
couple
cases
in
search.
You
would
see
these
duplicate
nodes
from
data
warehouse,
the
data
warehouse
table
and
the
dbt
table
coming
up
in
search.
A
You
also
see
a
more
expanded,
lineage
graph,
where
you
see
lineage
between
the
data
warehouse
nodes
and
the
dbt
nodes
themselves,
and
finally,
you'd
also
see
independent
entity
pages.
So
these
dbt
nodes
and
warehouse
nodes
would
have
their
own
entity
pages
where
you'd
see
the
metadata
from
some
of
them
or
some
of
the
metadata
on
the
gbc
page
and
some
of
the
metadata
on
the
data
warehouse
page.
A
A
We
listened
to
your
feedback
and
we
decided
to
make
an
improvement
that
really
addressed
these
concerns,
and
this
is
our
solution.
Essentially
we're
taking.
You
can
see
here.
This
is
the
dbt
node
on
the
left
and
the
warehouse
node
on
the
right,
and
we
thought
these
are
just
you
know.
If
you
view
these
as
the
same
thing,
let's
bring
them
together
and
you
can
see
that
once
we
once
we
brought
them
together,
they're
just
as
happy
as
these
two
privates.
A
So
after
we
have
merged
entity
pages,
merge,
search,
results,
merge
lineage,
and
we
really
get
that
joy
that
you
saw
of
the
those
two
primates
again.
Of
course
it
would
not
be
a
town
hall
without
a
live
demo.
So
let's
cut
to
that-
and
I
can
show
you
what
this
looks
like
in
action
so
going
to
my
local
data
hub
I've
loaded
in
the
japashop
dbt
project
with
being
run
on
bigquery,
and
I
can
show
you
how
exploring
this
metadata
will
look
in
this
merged
world.
A
So
if
I
type
in
customers
to
search
for
that,
instead
of
seeing
distinct
results,
I'll
see
those
combined
results.
So
I
can
see
here
the
dbt
and
bigquery
node,
customer
customer
source,
etc,
and
instead
of
seeing
duplicate,
search
results,
I'm
just
seeing
one
for
each.
In
addition,
I
want
to
call
out
that
we
still
preserve
the
filters
that
you
had
before.
So,
if
you
just
want
to
search
for
dbt
nodes
or
just
want
to
search
for
bigquery
nodes,
you'll
still
be
able
to
discover
these
entities
with
either
filter.
A
In
addition,
going
to
the
entity
page,
you
can
see
that
this
entity
page
is
showing
metadata
merge
from
both
the
dbt
node
and
the
bigquery,
so
you
get
the
both
the
best
of
both
worlds.
Here
in
the
schema,
you
can
see
schema
descriptions
that
are
coming
from
dbt
and
also
usage
information.
That's
coming
from
bigquery,
similarly
going
through
the
various
tabs,
we're
able
to
pull
in.
B
A
Merge
the
metadata
from
both
bigquery
and
dbt,
so
I'm
getting
view
definitions
from
dbp
properties
from
dvd,
but
also
things
like
queries
and
stats
that
are
coming
from
bigquery
finally,
jumping
into
lineage.
You
get
to
see
that
merged
lineage
ui
is
discussing
so
here.
When
I
look
at
the
lineage
between
various
entities,
I
no
longer
see
duplicate
nodes.
A
Instead,
I
see
merge
nodes
and
one
thing
I
want
to
call
out,
as
you
can
see
here,
this
is
an
ephemeral,
dvt
node
and
since
it
doesn't
actually
have
any
backing
in
bigquery
you're
not
going
to
see
that
bigquery
simply
because
there
is
no
equivalent
bigquery.
So
it's
able
to
understand
some
nodes
are
both
persistent
to
dvt
and
bigquery,
while
others
only
exist
in
the
bigquery
world.
A
A
So
I
just
wanted
to
briefly
talk
about
how
this
works
under
the
hood,
so
we
have
these
various
different
pages
search,
the
entity
view
and
lineage
and
they're
all
presenting
these
with
this
concept
of
a
merged
entity.
To
you
and
to
make
this
happen,
we
created
the
new
aspect
on
entities
called
siblings,
and
this
lets
an
entity
say
and
declare
that
it
has
another
sibling
entity
that
exists
inside
of
our
metadata
graph.
It
also
lets
us
annotate,
which
is
the
primary
sibling
for
breaking
ties
between
metadata
and
which
is
the
secondary.
A
In
terms
of
next
steps,
there's
a
few
things
that
we
want
to
continue
improving
when
it
comes
to
merging
the
dbt
and
warehouse
nodes.
So
one
is,
we
haven't
completed
the
visual
merging
there's
a
few
places
where
you'll
still
see
them
as
distinct
objects,
and
that's
an
auto,
complete
and
also
the
browse
cards
are
still
showing
individual
objects,
so
you'll
still
see
the
creativity
nodes
and
browse.
A
A
This
is
something
that
we
plan
on
adding
shorter
and
then
finally,
we
also
want
to
explore
how
we
can
use
the
siblings
metadata
pattern
for
other
types
of
relationships.
So,
as
I
showed
you
in
the
previous
slide,
we
have
this
concept
of
associating
entities
together
and
then
presenting
a
view
of
them
being
combined,
and
this
isn't
necessarily
a
dbt-specific
concept.
We
see
this
being
valuable
in
other
areas
as
well.
A
For
example,
if
you've
ingested,
multiple
data
sets
or
remote
entities
for
something
that
you
still
view
as
a
similar
concept
in
other
patterns,
so
maybe
sharded
data
sets
or
data
sets
that
you
know
one
reference
was
brought
in
from
one
source
and
one
reference
was
brought
in
from
another
source
and
although
the
references
were
slightly
different,
maybe
you
view
these
as
the
same
thing.
A
So
at
this
point,
I'm
going
to
go
over
to
sritanka
and
he's
going
to
share
some
other,
very
cool
updates
on
the
dbt
front.
A
B
So
now
that
we've
kind
of
seen
all
of
the
amazing
stuff
and
a
lot
of
love
in
the
chat,
gabe
so
check
it
out
and
mark
to
your
question
about,
can
we
do
apply
the
same
thing
to
kafka
and
hive
and
other
kinds
of
sibling
entities?
Let's
chat,
I
think
we
definitely
feel
like
this
might
be
an
interesting
way
to
combine
views
together
and
things
like
that.
B
So
moving
forward
dbt
tests,
I
think
most
of
you
hopefully,
who
are
dbt
power
users
are
also
familiar
with
dbt
tests.
But
I'll
give
you
a
quick
primer,
you
you
define
tests
right
alongside
so
next
slide.
You
you
define
tests
right
alongside
your
dbt
model,
but
the
way
we
were
treating
them
are
just
like
data
sets,
and
so
when
you
would
ingest
your
dbt,
catalog
tests
would
show
up
in
the
catalog
except
there
would
be
like
subtypes
of
tests.
So
you
search
for
customer
and
all
these
tests
would
show
up.
B
B
B
So
next,
this
is
how
you
run
a
dbt
test.
You
just
run
dbt
test
and
it
runs
the
tests
and
just
tells
you
what
happened.
The
challenge
with
a
lot
of
this
obviously,
is
that
the
output
of
this
only
stays
here,
and
it's
very
hard
for
you
to
know
where
all
of
this
went
and
keep
a
record
of
every
single
test
you've
run
in
against
your
warehouse
and
do
something
actionable
with
it.
B
So
what
we're
doing
right
now
is
connect.
The
the
nice
thing
about
dbt
test
is
once
you
run
it.
It
actually
generates
results
in
the
target
directory
under
the
run
results.json
file,
and
so
we,
when
we
launched
the
dbt
tests,
rewrite
that's
the
next
slide.
We
basically
said:
what's
the
simplest
way
for
people
to
consume
this?
B
B
You
have
the
timeline
view.
So,
if
you
click
into
any
of
these,
you
should
see
when
those
evaluations
were
last
run
where
they
succeeded
or
failed
on
some
of
them.
You'll
see
that
view
logic
node,
and
so,
if
you
click
on
it,
if
you
scroll
up
a
little
bit,
you'll
be
able
to
see
the
logic
for
the
node
as
well.
So,
for
example,
in
this
case
there
was
that
a
sql
statement
that
I
just
showed
you
you
can
basically
see
what
is
the
sql
that
backs
the
the
assertion.
B
B
So
the
best
way
to
integrate
with
the
data
hub
injection
system
is
to
produce
your
catalog
and
your
manifest.json,
and
your
run
results.json
that
comes
out
of
the
dvt
test
command.
You
can
either
put
it
in
a
local
file
system
and
drop
it
over
into
s3
and
then
in
the
dbt
recipe.
You
can
just
point
to
those
s3
artifacts
and
this
recipe
will
just
pull
it
in
and
produce
these
results
into
data
hub.
So
it's
a
great
it's
a
great
kind
of
follow-on
step
to
running
your
dbt
model
generation
and
your
dbt
test.
B
So
you
can
just
have
like
a
three-step
process
now
dvt
model
run
and
then
dbt
tests
and
then
push
metadata
to
data
hub
from
the
artifacts
you
just
generated
so
you'll
get
them
pretty
much
live
and
instantaneously.