►
Description
David Araujo, Data Governance Product Manager at Confluent, shares the importance of data governance within streaming data platforms during Metadata Day 2022.
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
Hello-
everyone
it's
great
to
be
here
today
to
talk
about
metadata
within
the
context
of
real-time
data
or,
as
we
prefer
to
call
it
here
at
confluent
dating
motion.
So
my
name
is
david
and
I
basically
spent
my
days
thinking
about
metadata
and
thinking
about
tools
to
manage
metadata
on
the
kafka
ecosystem.
A
So
the
company
that
I
work
for
confluent.
Basically,
we
build
a
data
streaming
platform
that
is
changing
the
way
businesses
around
the
world
operate
right.
So
we
are
helping
customers
to
identify,
realize
and
unlock
more
value
by
leveraging
real-time
data
streams
across
the
entire
enterprise.
However,
there's
a
catch
and
to
get
to
this
point,
meaning
to
safely
expand
the
use
of
real-time
data
streams
across
the
business.
We
believe
and
our
customers
believe-
and
I
think
we
all
believe
that
governance
is
required.
A
So
why
we
think
governance
is
important
specifically
for
real-time
data.
This
was
kind
of
like
a
realization,
so
over
the
years,
we've
been
fortunate
to
to
work
with
customers
across
the
globe
and
across
different
industries
on
their
journey,
to
kind
of
like
become
this
event-centric
enterprise,
where
kafka
is
at
the
heart
of
all.
The
data
movements
across
the
business
and
typically
kafka
is
interesting
because
it
has
this
viral
effect.
A
And
this
really
means
that,
unlike
data
governance
or
as
I
personally
prefer
to
call
it,
data
enablement
will
at
some
point,
be
a
main
point
of
concern
and
challenge
as
you
move
along
this
event,
streaming
maturity
journey
towards
a
central
nervous
system
of
real-time
data.
So
really,
it
was
clear
that,
like
growth,
scale
and
governance,
they
go
hand
in
hand
together.
A
So
in
our
case,
in
particular,
the
more
streams
of
data,
the
more
kafka
topics,
the
more
kafka
clusters
developers,
lines
of
businesses,
these
data
products
in
the
the
new
data
mesh
world,
the
more
controls
and
visibilities
and
self-service
data
discovery
and
access.
You
will
need,
and
again
this
growing
complexity.
We
believe
it.
A
It
equates
to
the
need
for
data
governance
and,
and
data
enablement
and
plus,
I
think
we
we
haven't
talked
too
much
about
these
today
here,
but
really
these
data
regulations
across
the
world
they're
really
putting
more
pressure
on
how
to
manage
data
and
not
just
data
at
rest,
meaning
data
in
databases
and
other
systems,
but
increasingly
more
and
more.
These
data
in
motion
these
real-time
streams
of
data
as
businesses
move
towards
this
real-time
paradigm
as
a
way
to
stay
competitive.
A
And
we
here
within
the
kafka
community-
and
I
think
all
here
in
the
metadata
community.
We
all
think
that
metadata
is
a
very
key
component
on
addressing
some
of
these
challenges
and
lucky
for
us
on
the
kafka
ecosystem.
We
have
a
lot
of
metadata,
and
so,
if
you
start
looking
under
the
surface
of
all
these
data
in
motion,
we
see
an
ecosystem
of
metadata,
a
world
of
metadata
that
we
decided
to
to
leverage,
in
our
case,
to
build
a
complete,
open
and
controlled
data
in
motion
platform.
A
A
In
our
case,
we
have
topics
metadata
so
the
place
where
we
store
all
these
real-time
streams
of
data.
We
have
the
connectors
that
bring
that
in
and
push
that
out
of
kafka.
We
have
k
sql
db
metadata,
so
k.
Sql
is
the
streaming
database
of
stream
processing
database
on
top
of
kafka.
We
have
kafka
clusters
metadata.
A
We
have
user-defined
metadata
right,
like
these
business
concepts,
the
groceries,
etc.
We
have
environments
metadata.
So
we
have
production
environments,
test
environments,
staging
environments.
We
have
relationships
metadata,
so
the
lineage
aspect
very
important
right
how
these
different
objects
connect
and
relate
with
each
other
and
be
able
to
surface
that
we
have
quality
metadata.
A
In
our
case,
we
are
able
to
monitor
and
and
and
observe
the
streams
of
data
and
so
provide
kind
of
like
quality,
metrics
and
metadata
about
what
we
see
and
more,
and
so
with
that
we
decided
to
commit
here
at
confluent
on
making
metadata
a
first
class
citizen
on
the
platform
and
a
key
part
of
our
governance
strategy
is
what
we
call
the
stream
catalog.
So
this
is
a
metadata
repository
that
makes
sense
of
data
in
motion
and
the
kafka
ecosystem
around
it.
A
So
the
stream
catalog
is
built
on
top
of
an
entity
type
system
that
defines
the
different
types
of
objects
created
on
the
confluent
and
the
kafka
platform,
and
these
are
the
schemas,
the
fields
in
the
schemas,
the
topics,
the
connectors.
So
basically
all
the
metadata
types
that
are
explained
in
the
previous
slide
and
so
there's
technical
metadata
from
these
entities
that
flow
in
near
real
time
into
this
catalog.
A
We
expose
a
set
of
restful
apis
and
a
graphql
api,
so
something
that
we
invest
recently,
because
we
found
that
it
was
extremely
powerful
to
explore
and
navigate
all
these
data
relationships
through
graphql
and
also
we
expose
this
through
the
confluent
platform
ui.
It's
kind
of
like
embedded
directly,
there's
not
a
separate
thing,
but
kafka
is
a
piece
of
the
puzzle
and
we
know
that.
A
And
so
we
know
that
there
is
a
rich
ecosystem
and
community
around
metadata,
and
our
plan
and
strategy
here
is
very
simple-
is
to
basically
be
open
and
share
and
collaborate
with
other,
more
general
purpose
metadata
systems.
So
this
idea
of
metadata
sharing
of
having
metadata
flowing
through
different
systems
is
very
key
to
our
strategy
in
helping
our
customers
with
these
growing
challenges
around
governing
data.
A
So
just
to
finalize.
In
summary,
our
mission
here
is
really
to
provide
our
customers
with
kind
of
like
a
self-service
dating
motion
platform
where
people
they
are
enabled
to
do
more
and
better,
with
their
company's
real-time
streams
of
data,
so
building
a
platform
that
welcomes
growth
that
helps
set
the
guard
rails
to
control,
to
control
that
growth,
but
but
in
a
way
that
it
doesn't
lock
the
data
right.
This
is
very
important
to
us,
as
we
think
about
kind
of,
like
modern
data
governance.
A
It's
kind
of
like
these,
these
these
being
have
a
balanced
view
around
locking
the
data
and
then
locking
the
data
and
really
help
and
promote
innovation
and
and
help
business
set
apart
themselves
by
by
building
these
modern,
real-time
products
and
services.
So
that's
all
for
today
was
a
very
quick
talk.