►
From YouTube: DataHub Product Updates: Aug 27 2021 Community Town Hall
Description
Shirshanka Das (Acryl Data) gives an overview of recent product improvements during the DataHub Community Town Hall on August 27, 2021
A
The
first
product
improvement
that
we
would
like
to
share
is
a
business
glossary
phase.
One
and
I'll
do
a
quick
demo,
but
before
that,
just
a
quick
intro
to
business
glossary
itself,
it's
really
a
way
of
representing
a
tree
of
concepts
that
are
useful
for
attaching
to
existing
data
sets
or
fields.
So,
for
example,
at
your
company,
you
may
decide
to
have
a
taxonomy
that
says:
classification
as
a
top
level
node
and
within
that
terms,
like
confidential
or
highly
confidential
or
sensitive,
that
live
within
the
classification
node.
A
Similarly,
you
can
have
another
node
called
clients
and
accounts
and
all
client
and
account
stuff
lives
under
there
like
an
account
and
an
account
can
also
contain
a
balance
which
itself
might
be
a
term
under
the
clients
and
accounts
note
now.
Similarly,
you
can
also
have
relationships
across
these
taxonomies.
For
example,
a
balance
or
an
account
might
be
confidential
or
might
have
highly
confidential
data,
and
so
you
can
have
a
relationship
between
accounts
as
well
as
highly
confidential
terms.
A
So,
let's
let
me
show
you
how
you
would
describe
something
like
this
using
the
most
favorite
language
of
data
engineers,
yaml
and
I'll.
Get
into
a
terminal,
so
this
is
a
pr
I
opened
earlier
today.
It
shows
you
how
you
can
create
a
business
glossary
using
yaml
and
then
ingest
it
as
a
standard
source.
A
So
first,
let's
look
at
the
recipe.
If
everyone
is
familiar
with
the
recipe,
this
is
what
it
looks
like
you
have
a
source
and
a
sink.
The
sink
looks
exactly
the
same
destination
is
data
hub
rest
and
the
source
in
this
case
is
of
type
data
hub
business
glossary.
This
is
a
new
source
that
I've
created
and
a
config
file,
which
is
the
business
glossary
itself.
A
All
right
very:
firstly,
we
have
a
version
identifier
and
then
a
source
which
kind
of
describes
who
is
even
specifying
this
glossary.
So
in
my
case
I
decided
that
datahub
is
the
author
of
the
glossary
in
your
company's
case.
You
might
have
your
company's
name
here
and
then
default
owners
for
all
things
in
the
glossary.
These
could
be
users,
they
could
also
be
groups.
A
You
can
have
a
url,
for
example,
this
could
point
to
the
github
location
of
where
this
business
glossary
file
is
stored
and
then
below
that
you
have
nodes
and
then
contained
within
nodes.
You
have
terms
so,
let's
look
at
the
nodes
I
created,
I
created
a
node
called
classification,
just
like
I
showed
you
in
the
slide
deck
before
it's
got
a
description
and
it
has
terms
within
it
like
names,
the
terms
have
names
like
sensitive
confidential,
highly
confidential,
with
some
descriptions
attached
to
them.
A
A
The
owner
in
this
case
is
a
group
and
as
we
go
further
down
you
see,
gender
is
inheriting
the
sensitive
classification
further
down.
We
have
another
node
called
clients
and
accounts,
and
that
includes
I
copied
this
actually
from
fibo
and
that's
why,
when
I
define
the
term,
I
say
that
the
term
source
is
external,
the
source
rep
is
fibo.
A
The
balance
term
is
also
another
term
contained
in
fibo
comes
from
external,
so
this
is
what
a
yaml
file
looks
like,
and
you
could
either
have
all
of
this
glossary
in
a
single
file
or
you
could
split
it
up
across
multiple
files.
As
long
as
you
preserve
the
overall
structure,
then
you
can
imagine
checking
this
into
source
control
and
just
managing
it.
A
A
Hold
a
deep
breath:
while
ingestion
runs
there,
you
go,
we
ingested
11
terms.
Now,
let's
go
into
the
business
glossary
and
see.
If
we
see
them
there,
you
go
so
we
have
our
business
glossary
terms.
The
top
level
nodes
are
in
here.
We
can
go
into
them
and
see
the
terms
within
that.
We
see
that
ownership
has
been
ingested
as
well
and
we
can
go
into
each
one
of
these
things,
like
you
know,
remember:
email,
it's
got
a
source,
you
can
go
view
the
source.
A
Similarly,
we
can
go
into
accounts
and
go
to
the
account
term,
and
when
we
go
to
related
terms,
we
see
that
it
contains
a
balance
just
like
we
had
described
as
well
as
inherits
the
highly
confidential
term.
So
that,
in
a
nutshell,
is
how
you
can
produce
and
load
an
entire
business
glossary
into
data
hub.
A
So
what
good
is
a
business
glossary
if
you
don't
attach
it
to
data
sets
and
fields?
So,
let's
look
at
a
particular
data
set
that
we
might
want
to
attach
a
business
glossary
to
here
is
a
user
account
and
it
seems
to
have
some
tags
like
operational
and
seems
to
have
a
field
called
an
email,
but
we
don't
see
any
terms
attached
to
them
and
it
would
be
nice
to
know
that
the
user
account
is
actually
of
type
account
and
the
email
in
here
is
actually
of
type
email.
A
A
A
The
demo
works,
we
see
tags
and
terms
attached,
an
account
has
been
attached
to
the
account
user
account
and
then
the
email
term
has
been
attached
to
the
user.
If
I
go
into
the
email
now
I'll
see
that
there
are
related
entities
like
data
sets
that
are
already
attached
to
this
thing
as
well
as,
if
I
go
to
the
account
term.
I'll
see
that
same
data
set
is
also
attached
to
these
terms.
A
It
also
you
can
also
search
for
these
things,
so
I
can
just
type
email
and
I'll
get
a
helpful
drop
down.
Where
autocomplete
shows
me
that
I
can
search
for
personal
information.email
and
if
I
search
for
it,
I
not
only
get
the
glossary
term,
but
I
also
get
the
data
sets
that
have
that
term
attached
to
it.
So
that,
in
a
nutshell,
is
the
business
glossary
demo?
A
In
addition
to
that,
we
also
went
in
and
improved
our
representation
of
highly
structured,
nested
schemas.
So
now
you
can
finally
ingest
data,
hub's
own
kafka
topics
and
actually
explore
the
metadata
schema
itself.
We've
done
a
lot
of
work
in
representing
structs
as
well
as
unions.
Well,
so
you
should
be
able
to
actually
browse
the
schema
pretty
nicely
and
understand
what
this
very
complicated
schema
looks
like.