►
From YouTube: DataHub 201: Metadata Enrichment
Description
Maggie Hays (Acryl Data) provides an overview of all of the ways in which you can enrich your metadata within DataHub during the July 2022 Town Hall.
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
All
right
so
we're
going
to
move
on
to
our
last
session
every
once
in
a
while,
we
will
do
kind
of
a
data
hub
101,
which
is
kind
of
more
of
like
the
basics
of
the
fundamentals
of
data
hub
today.
We're
actually
doing
a
data
hub
201,
so
a
little
bit
more
advanced
and
we're
going
to
focus
on
metadata
enrichment
and,
like
shanka
said
today,
is
very
heavy
on
it.
We're
talking
a
lot
about
ingestion,
we're
talking
a
lot
about
metadata
ingestion
or
enrichment.
Excuse
me!
So,
let's,
let's
dig
on
into
this.
A
So
let's
set
the
scene
really
quick!
You
and
your
team
have
successfully
ingested
a
metadata
into
data
hub.
We
have.
Your
stack
is
pretty
straightforward.
You
have
snowflake
for
your
warehouse,
so
we've
ingest,
we've
ingested,
schemas
tables
usage,
profiling.
Basically
everything
that
shoshanka
just
covered
from
dbt
we've
ingested
our
models
and
dbt
test
outcomes
and
looker
we've
ingested
explores,
looks
and
dashboards.
A
So
that's
great.
We
have
kind
the
foundation
there,
but
now
it's
actually
time
to
push
it
one
step
further
and
start
to
enrich
your
metadata.
So
if
we
take
a
look
at
our
at
just
a
sample
page,
so
this
is
our
pet
profiles.
Data
set.
What
do
we
mean
by
metadata
enrichment
like
what
are?
What
are
the
ways
that
you
can
do
it?
What
are
some
of
the
options
there?
A
The
next
piece
of
it
is
ownership
in
data
hub.
We
have
a
few
different
concepts
of
ownership,
so
it
could
be
anything
from
a
technical
steward
to
a
business
user
or
a
stakeholder,
but
really
just
helping
people
understand
who
to
go
to
when
you
have
questions
and
who's
responsible
for
those
entities.
A
The
next
part
of
it
is
tags,
so
these
are
more
kind
of
like
social,
social
tags
or
kind
of
ways
to
kind
of
organize
or
group.
Your
your
entities
together
in
a
less
governed
way,
so
a
little
bit
more
flexible.
A
There
and
again
you
can
add
tags
at
the
data
set
level
or
even
down
to
the
column
level
if
you'd
like
the
next
one
is
glossary
terms,
so
these
are
going
to
be
more
of
our
governed
business
terms
that
really
describe
how
the
how
the
data
entity
or
the
the
data
hub
entity
excuse
me
relates
back
to
a
business,
either
measure
or
definition,
etc.
Again,
those
can
be
defined
at
the
data
set
level
or
the
com
level
and
then,
last
but
not
least,
the
domain.
A
A
So,
first
and
foremost,
we
talk
about
this
idea
of
shift
left
a
lot
and
we're
going
to
go
into
detail,
but
so,
first
and
foremost
you
you
can
enrich
the
source
and
then
ingest.
You
can
transform
your
metadata
prior
to
ingestion.
A
You
can
do
a
bulk
import
using
our
csv
enrichment.
You
can
do
ongoing
enrichments
vr
api
or
you
can
just
do
manual
enrichment,
ad
hoc
as
needed
from
the
hub
ui.
So
let's
actually
dig
into
these
a
little
bit
when
we
talk
about
shift
left.
What
we're
saying
is
if
and
when
you
are
defining
your
metadata
or
enriching
that
metadata
at
its
source.
We
want
to
extract
that
as
much
as
possible,
so
for
this.
In
this
example,
in
this
you
know
very
simple
stack.
We
have
snowflake
dbt
and
looker
in
snowflake.
A
One
way
that
you
can
start
to
define
or
enrich
this
metadata
is
by
providing
column,
level
descriptions
or
table
level
descriptions
which
we
will
automatically
extract
for
dbt.
There's
this
idea
of
meta
annotations.
So
this
is
where
we
can
start
to
map
ownership.
We
can
start
to
map
tags
terms
domains
it's
super
flexible.
A
This
is
also
true
for
our
support
for
protobuf,
where
you
can
annotate
the
schema
there
and
do
some
mapping
that
way,
and
then
last
but
not
least
in
looker.
If
you
are
annotating
or
adding
descriptions
within
your
look,
ml
will
automatically
extract
that.
So
it
could
be
anything
from
that
field.
Level
description,
adding
tags
to
it,
etc.
A
So
the
idea
here
is
that
and
really
this
is
the
the
preferred
and
kind
of
like
most
bulletproof
way.
I'd
say
to
to
consistently
enrich
your
metadata,
where
you
really
want
to
do
it,
where
the
code
is
changing
and
evolving
as
the
as
the
ingestion
or
the
creation
or
transformation
of
data
evolves
over
time.
A
The
next
part
is
this
idea
of
a
transformer,
so
this
is
something
that's
supported
within
data
hub.
The
idea
is
that
when
you
are
connecting
to
a
source-
so
let's
say
snowflake,
maybe
there's
some
logical
grouping
of
ownership
by
schema,
or
maybe
there's
some
logical
or
maybe
there's
like
a
pattern
around
assigning
pii
to
columns
that
have
the
word
email
in
them.
The
idea
here
in
a
transformer
is
that,
when
you're
ingesting
from
a
singular
source,
you
apply
patterns
or
kind
of
rely
on
those
patterns
to
apply
either.
A
You
know
ownership
domain
domain
description.
You
know
kind
of
all.
The
things
we
had
talked
about
and
the
benefit
here
is
that
this
is
executed
every
time
your
ingestion
runs.
So
as
there's,
let's
say
a
new
table
added
to
your
schema.
If
it
has
that
pattern
and
it
matches
it,
it's
going
to
automatically
apply
your
tag
or
your
glossary
term,
your
owner,
etc.
A
So
it
really
just
keeps
up
with
the
evolution
of
the
data
at
source,
but-
and
this
is
a
great-
this
is
a
great
use
case
if
you're,
maybe
not
actually
managing
those
descriptions
or
owners
or
annotations
within
the
source
itself.
So
it's
just
kind
of
that
intermediate
step
before
you
actually
ingest
it
into
datahub.
A
We've
talked
about
csg
csv
import
a
couple
of
times
during
town
hall.
This
is
really
great
for,
after
the
fact
enrichment,
so
after
you've
actually
ingested
your
data
into
data
hub.
Maybe
you
actually
have
a
you
know
a
google
sheet,
doc,
that's
floating
around
and
people
are
kind
of
maintaining
some
idea
of
like
ownership
or
descriptions
there.
This
makes
it
really
easy
to
kind
of
crowd
source
that
information
and
then
bulk
ingest.
It
back
into
data
hub
and
what's
also
nice
here-
is
that
it
can
go
across
sources,
so
you
can
do
it.
A
You
can
have.
You
know,
do
a
single
bulk
update
for
snowflake
and
dbt
and
look
are
all
at
the
same
time
and
it
isn't
specif,
it's
not
specific
to
like
you,
don't
have
to
be
super
specific
about
pattern
matching
or
you
know
kind
of
source
by
source.
So
you
know
this
is
an
example
here
of
us
enriching
resources
across
looker
across
snowflake
and
dbt
we're
applying
glossary
terms,
we're
applying
one
or
many
tags.
You
can
also
do
the
same
thing
apply
one
or
many
terms
same
thing
around
owner.
A
You
can
do
one
or
many
and
you
can
provide
the
description.
So
all
of
this
is
additive
as
you
ingest
it.
Maybe
as
you're
collecting
this
information,
you
can
ingest
and
add
to
your
your
kind
of
metadata
model
there
and,
like
I
said
this
is
just
really
great
for
kind
of
like
bootstrapping,
your
initial
ingest
of
of
entities
into
data
hub
and
then
crowdsourcing
from
folks
within
your
within
your
organization.
A
The
next
one
is
api
enrichment,
so
this
is
going
to
be
more
kind
of
like
ongoing
or
programmatic
enrichment.
This
is
really
great
when
folks
are
starting
to
manage
definitions
as
code.
So
as
that
code
evolves,
you
can
actually
emit
or,
as
those
definitions
evolve,
you
can
emit
those
changes
back
into
data
hub.
Maybe
you
have
some
very
predictable
kind
of
deployment
processes
and
you
know
going
from
maybe
testing
and
staging
them
into
production.
A
A
This
is
also
relevant
to
our
kind
of
like
circuit
breaker,
similar
to
our
circuit
breaking.
A
So
you
know
when
you're,
when
your
dags
evolve
as
those
relationships
evolve,
you
can
emit
lineage
edges
and
you
know
really
just
manage
these
things
as
the
operational
tasks
evolve
and
the
pipelines
that
support
them
over
time
and
then
last
but
not
least,
you
can
always
just
go
into
the
ui,
and
you
know
manage
these
definitions
or
manage
these
this
this
enrichment
manually,
and
so
this
is
where
you
know
we
provide
the
ability
to
edit
table
level
description,
calm
level,
description
from
the
ui.
A
You
can
also
add
links.
So
if
there
may
be
some
external
resources
that
are
helpful,
you
can
point
out
to
those
you
can
add
your
owners.
You
can
add
your
tags,
you
can
add
your
terms
and
add
your
domain
and
all
of
that
fun
stuff.
Of
course,
this
is
going
to
get
a
lot
easier
once
we
have
the
bulk
editing
rolled
out
that
john
demo'd,
but
this
is
just
a
great
way
to
you
know
kind
of
like
last,
but
do
a
little
bit
more
ad
hocker
as
needed
enrichment
as
you
go.
A
So
with
that
in
mind,
there's
no
kind
of
one
right
way
or
one
perfect
way
to
enrich
your
metadata
within
within
data
hub.
So
we
encourage
you
to
you,
know
kind
of
mix
and
match
those
different
ways,
depending
on
your
stack,
depending
on
your
your
workflows
and
then
really
just
find
what
works
for
you.
So
you
can
keep
it
as
up-to-date
and
accurate
as
things
go
along
now.
Let's
go
enrich
some
metadata
folks.