►
From YouTube: Mar 19 2021: DataHub Community Meeting (Full)
Description
Full version of the DataHub Community Meeting on Mar 19th 2021
Welcome - 00:00
Project Updates by Shirshanka - 01:49
- 0.7.0 Release
- Project Roadmap
Demo Time: Themes and Tags in the React App! by Gabe Lyons - 09:36
Use-Case: DataHub at Wolt by Fredrik and Matti-Pekka - 19:28
Poll Time: Observability Mocks! - 37:08
General Q&A from sign up sheet, slack, and participants - 51:12
A
Can
everyone
see
my
screen
all
right
great?
I
had
a
an
amazing
talk
this
week
at
data
ops
unleashed
where
I
presented
for
30
minutes
and
at
the
end
of
30
minutes.
A
I
found
out
that
I
was
speaking
to
an
audience
that
wasn't
there
so
I'm
starting
to
learn
how
to
ask
for
feedback
before
starting
all
right.
Welcome
everyone
to
the
I
guess,
the
third
community
meeting
of
the
year
and
as
you
notice,
things
are.
A
We
are
going
to
go
through
a
pretty
packed
agenda.
We
have
project
updates
a
demo.
A
Oh
that's
showing
up.
Okay,.
A
A
Awesome
all
right
so
we're
to
go
over
the
project
updates,
go
through
a
quick
demo
of
themes
and
tags,
a
case
study
of
data
hub
at
walt
and
then
a
quick
poll
with
the
community
on
some
observability
marks
that
we've
been
working
on
and
then
q,
a
based
on
questions
that
have
come
in
and
questions
that
come
up
during
the
session.
A
A
I
think
data
hub
has
always
been
an
awesome
project,
but
we
never
really
put
it
together
to
explain
to
people
what
it's
all
about.
So
I'm
really
happy
with
how
this
turned
out.
We've
also
done
some
interesting
work
in
creating
a
live
demo
environment
so
every
morning
and
dexter
did
a
lot
of
this
work.
So
thanks
for
that
texture,
we
have
essentially
a
job
that
picks
up
the
latest
bits
from
master
from
the
main
branch
on
github,
deploys
it
out
to
a
demo
environment.
A
So
you
can
go
in
there
check
it
out.
Does
anyone
know
if
recording
is
on
because
otherwise
I
can
just
turn
it
up.
A
It's
on
it's
on
okay
cool,
so
we've
got
the
demo
environment
going
check
it
out.
It
gets
refreshed
every
morning,
but
we
also
have
the
ability
to
push
a
button
and
deploy
it
at
any
point.
Since
the
last
time
we
talked,
I
think,
five.
A
Since
the
last
time
we've
spoken,
I
think
five
plus
new
pocs
are
being
done
at
different
companies.
So
this
is
great
news.
We
published
the
road
map.
We,
we
changed
from
kind
of
a
visionary
missionary.
Two-Year
roadmap
to
a
very
targeted
six-month
roadmap.
A
Go
take
a
look
at
it.
It's
again
on
datahub,
project.io
scroll
down
to
the
bottom
and
you'll
see
a
link
for
the
roadmap.
It
includes
pretty
much
everything
that
the
community
had
asked
for
for
the
developers.
A
The
big
things
are
going
to
be
no
code.
Metadata
model
changes,
it's
going
to
be
a
big
hard
project
and
we're
going
to
spend
a
fair
bit
of
time
working
with
the
community
and
getting
it
out
and
there's
a
bunch
of
features
that
I
won't
get
into
in
detail
here
and
big
news.
We
have
a
new
release.
Finally,
it's
been
a
long
time
coming
about
three
months.
Since
we
had
released
a
new
release,
it's
0.7.0
there
are
a
ton
of
new
commits
in
it
about
200.
A
I
promise
we
won't
go
this
long
before
creating
a
release.
The
next
time
and
as
with
most
major
releases,
you'll,
probably
see
kind
of
a
minor
release
coming
out
in
the
next
week
or
so
as
we
catch
up
any
small
bugs
or
small
feature
edits
that
we
missed.
A
I
was
looking
at
the
number
of
contributors,
and
that
was
quite
nice
to
see
that
we
have
24
unique
contributors
to
the
project
for
this
time
period,
so
that
that
gives
us
a
lot
of
happiness,
because
it
means
that
the
project
is
actually
getting
the
kind
of
community
of
not
just
adopters
and
users,
but
also
people
contributing
changes
back.
So
this
is
great
and
we
want
to
see
more
of
that
all
right.
So
what
was
in
the
release?
A
The
first
and
most
important
thing
was,
of
course,
the
new
react
application.
So
if
you
haven't
been
paying
attention,
we
completely
rewrote
the
application
from
the
ground
up
using
react.
This
will
help
us
move
to
a
more
modern
stack
and
allow
the
community
to
give
us
many
more
contributions.
A
A
Another
big
thing
was
the
metadata
injection
framework.
We
were
always
amazing
at
getting
data
in
through
kafka,
but
a
lot
of
people
were
not
quite
sure
how
to
get
all
this
metadata
into
kafka
in
the
first
place.
So
we
built
a
metadata
injection
framework
in
python.
It's
it's
actually
one
of
the
most
beautiful
python
code.
I've
seen
so
please
go
check
it
out.
I
love
what
our
shell
has
done
with
it,
and
a
lot
of
folks
are
giving
contributions
back.
So
this
is
amazing.
We
already
have
quite
a
few
sources.
A
Athena
and
druid
was
contributed
very
recently,
and
I
know
that
dbt
is
cooking,
so
I'm
super
excited
about
that.
One
product
features
the
big
enterprisy
one
was
sso.
A
lot
of
people
have
been
asking
for
this
john
put
in
a
ton
of
effort
and
got
the
sso
integration
done
on
the
react
app.
He
tested
it
out
with
its
oidc,
and
unfortunately,
john
is
not
here
today,
but
we're
gonna
do
a
deeper
dive
on
this
at
a
future
date.
A
A
We
had
contributions
from
expedia
on
the
ml
model
ecosystem,
so
machine
learning
models,
they're,
actually
using
it
to
store
metadata
about
all
of
their
models,
and
hopefully
it'll
become
kind
of
a
thing
and
we'll
start
data
hub
becoming
the
the
ai
metadata
store
of
choice
are
also
getting
contributions
on
the
data
flow
and
the
job
ecosystem,
and
that's
something
that
the
world
folks
have
been
doing
we'll
hear
more
from
them
later,
and
the
big
breaking
change
is
that
we
have
finally
shed
our
elasticsearch
5
ghost
and
we
have
moved
to
elasticsearch
7..
A
The
linkedin
team
has
actually
been
moving
on
that
and
john
has
contributed
one
of
the
scripts
that
has
helped
them
migrate
from
elastic
five
to
seven.
So
hopefully
you
don't
need
to
do
these
big
migrations,
but
it's
better
to
get
them
done.
Early
before
we
start
getting
into
a
lot
of
more
exciting
features
so
get
on
with
it
and
hopefully
we'll
see
you
all
on
0.7
soon
awesome.
A
I
I
did
put
in
a
couple
of
slides
on
sso
we're
not
going
to
get
into
detail,
but
we've
checked
it
and
we've
tested
it
out
with
google
sso
and
octa,
and
there
are
documents
on
the
on
the
hosted.
Docs,
so
go
read
up
on
them
and
we
will
do
an
sso
office
hours
next
week.
C
For
sharing
it's
really
exciting
hearing
about
everything,
that's
included
in
this
release
a
lot
of
really
really
cool
stuff
yeah.
So
I'm
going
to
dive
into
two
of
the
things
that
are
included
in
the
release
in
their
react,
app
they're
available
to
use
now
and
they've
been
merged
in.
C
So
I'm
going
to
give
you
a
brief
overview
of
how
the
feature
works
and
show
you
how
you
can
get
started
using
both
tags
and
themes,
although
I
am
also
on
the
west
coast,
I
actually
wake
up
around
this
time
to
go
biking
quite
often,
so
it
feels
very
natural
to
be
awake
right
now.
C
Can't
everyone
see
this
go
so
both
these
features
we
know
have
people
have
been
requesting
from
the
community
community
for
a
long
time
and
I'm
really
excited
to
have
finished
them
up
for
tags.
I
want
to
give
a
special
shout
out
to
the
frederick
and
monty
from
walt
for
helping
clarify
the
spec
for
this
the
needs
and
and
drive
that
rfc.
It's
really
awesome
collaborating
with
both
of
you.
C
They're
going
to
have
they'll
be
able
to
have
shared
definitions
so
that
when
you
apply
these
labels
to
disparate
entities,
you
can
be
sure
that
everyone
knows
that
they're
talking
about
the
same
thing
and
they
can
be
applied
at
the
entity
level,
but
for
data
sets,
you
can
also
apply
them
at
the
schema
field
level
and
in
addition,
we
index
these
tags
in
elasticsearch
so
that
if
you
apply
a
tag
to
a
data
set,
you
can
then
recover
it
by
searching
for
a
tag
or
by
filtering
for
that
tab,
and
I'm
going
to
give
you
a
brief
overview
of
how
you
can
use
that
now.
C
So
if
we
go
over
to
my
data
hub
here,
I'm
at
a
airport
traffic
data
set-
and
you
can
see
that
I've
already
applied
a
tag
to
this
entity
and
this
side
could
have
been
applied
either
by
the
ingestion
pipeline
or
from
the
ui,
and
this
is
showing
a
direction
that
we
want
to
think.
Actually,
let
me
see
if
we
there's
some
type
thing
out
here.
C
I
don't
know
if
we
can
need
that
anyway,
so
this
tag
can
be
applied
from
the
ingestion
pipeline
or
from
the
ui,
and
this
is
showing
the
direction
that
we
want
to
take
data
hub
moving
forward,
which
is
that
we
want
to
start
data
hub
is
going
to
start
becoming
more
of
a
interactive
surface
as
well
as
a
read-only
surface.
So
if
you
look
at
the
mce
example,
json
file
in
the
metadata
ingestion
directory
you'll
see
some
example:
mce
events
that
are
ingesting
tags,
but
a
much
easier
and
more
intuitive
way.
C
It
can
be
just
applying
them
through
the
ui.
So
you
can
see
here's
this
tag
that
we've
applied
legacy
to
the
airport
traffic.
So
we're
saying
you
know
this
is
a
legacy.
Data
set
but
say
I
wanted
to
assign
say
that
this
data
set
also
needed
an
owner
assigned
to
it.
I
could
go
into
this.
Add
tag,
flow
and
type
in
owner
or
ownership,
and
this
is
going
to
search
our
repository
of
tags
and
in
the
type
head.
Any
tag
related
to
ownership
would
actually
come
up
right
here.
C
It's
serious
talking
to
me
since
no
tag
exists,
we
can
go
ahead
and
create
that.
So
we
can
say:
okay,
this
guy
is
going
to
need
ownership.
I
seem
to
be
having
trouble
hearing
we'll,
create
this
tag
and
say
give
it
a
description
and
when
we
create
this
tag,
it's
going
to
be
generated
as
its
own
entity,
so
that
if
we
wanted
to
reference
this
tag
from
other
entities,
we'd
be
able
to
say
this
element
needs
ownership.
C
Once
I
created
that
it
appears
on
the
data
set
and
if
I
click
on
it,
we
get
brought
to
the
tags
page.
We
can
see
who
created
it
again,
read
that
description
and
then
see
statistics
of
how
many
other
entities
this
tag
has
been
applied
to
and
when
I
click
on
this,
it
actually
brings
us
to
a
search
filtering
for
data.
Sets
that
have
this
tag,
so
you
can
see
that
it's
already
been
indexed
and
we
can
already
start
searching
about
it
and
we
can
use
tags
to
filter.
C
And,
as
I
said,
you
can
also
apply
tags
at
the
schema
level.
So
if
we
go
into
our
schema
and
to
enter
the
tags
column,
a
little
add
tag.
Button
will
pop
up.
We
add
that
we
get
practice
same
flow,
so
if
we
want
to
add,
say
that
this
field
needs
a
better
documentation,
I
can
add
this
needs
documentation,
tag
which
had
already
been
created
on
another
data
set,
it's
discoverable
here
we
click
on
that
and
add
it
and
there
the
tag
has
been
applied
to
the
schema.
C
C
So
the
second
feature
that
I
was
going
to
demo
is
themes.
We
know
that
this
has
also
been
something
that
the
community
has
been
requesting
for
a
while,
and
this
lets
you
customize
your
data
hub
instance,
so
that
it
can
feel
have
a
little
more
look
and
feel
that
perhaps
you
prefer
or
you
could
customize
it
to
look
a
little
more
like
your
internal
organization's
themes,
so
things
that
we
allow
you
to
customize
our
styling
like
background
color
line,
color
font
color,
the
font
things
like
that.
C
You
can
also
customize
assets,
so
you
can
insert
a
personal
logo
and
you
can
customize
things
like
a
welcome
message
as
well
and
we're
going
to
start
expanding
what
other
things
are
customizable
over
time.
Saxo
bank
is
currently
contributing
a
change
that
will
let
you
customize
menu
items
using
the
same
configuration.
C
I
think
dark
and
light
it's
a
little,
predictable
and
easy,
but
just
to
show
you
the
flexibility
that
themes
allow.
I
went
ahead
and
created
a
theme
of
my
own,
so
I
used
to
work
at
airbnb
and
there
I
made
contributions
to
airbnb's
internal
version
of
datahub,
which
we
call
data
portal
and
I
wanted
to
see.
Okay,
can
I
create
a
datahub
theme
that
would
fit
in
at
airbnb
and
again
just
using
the
same
theme
config
that
created
the
dark
theme,
the
light
theme
and
that
the
instructions
were
linked
here?
C
C
If
I
search
for
an
element
like
airport
traffic,
you
can
see,
the
banner
also
is
able
to
be
styled
and
labels
get
their
own
highlights
as
well.
So
this,
I
hope,
hopefully
gives
you
a
sense
of
the
flexibility
that
theming
allows
have
fun,
go
check.
It
out,
try
to
theme
your
own
and
again
look
forward
to
chatting
about
this
in
the
slack.
A
Channel
wow-
that
was
amazing
gabe.
Are
you
planning
to
replace
airbnb
data
portal
with
this.
A
Awesome
that
looked
amazing
cool,
so
a
couple
of
things
the
tags
are
also
up
on
the
demo
site.
Please
go
there
and
add
some
tags
be
gentle.
We
have
not
added
bad
word
filters
or
other
kind
of
things.
We.
This
is
a
friendly
community.
Every
morning
we
are
going
to
replace
the
tags
with
kind
of
the
fresh
like
the
default
set
of
tags,
so
you're
going
to
see
your
tags
get
wiped
away.
A
So
please
don't
get
too
attached
to
them
on
the
demo
website,
but
feel
free
to
go
in
and
play
around
and
for
teams.
I
think
we
just
have
the
default
live
team
up
on
the
on
the
website,
but
yeah.
A
I
think
one
of
the
great
things
that
happened
with
react
office
hours
that
we
ran
a
couple
of
weeks
ago
was
that
we
had
a
lot
of
conversations
about
react,
but
then
a
lot
of
conversations
about
other
things
about
data
hub,
so
we're
actually
thinking
about
how
to
run
community
office
hours
where
people
can
just
drop
in
and
talk
about
anything
data,
so
stay
tuned
for
that
as
well.
A
A
Frederick,
do
you
want
me
to
present
the
slides.
A
A
D
All
right,
yes,
thank
you
for
the
for
the
opportunity
to
come
and
share
some
some
of
our
learnings
of
working
with
data
hub.
So
my
name
is
frederick
and
I'm
joined
here
by
my
colleague,
matipekka
we're
from
a
company
called
vault.
D
D
We're
working
in
so
so
we
are
walt
is
a
technology
company
that
operates
sort
of
food
ordering
and
food
delivery
platform,
so
not
unlike
doordash,
for
example,
and
founded
in
2014
in
helsinki
in
finland,
and
currently
operate
in
23
countries
over
150
cities,
mostly
nordics
baltics,
eastern
europe,
some
mediterranean
company
countries,
asian
countries
like
japan,
for
example,
and
and
lately
we've
also
entered
germany.
D
E
All
right,
thank
you,
oh
hello,
so
I
am
materaka.
I
met
some
of
you
in
the
last
last
town
hall
and
to
give
you
a
bit
of
context
in
the
in
the
data
pipeline,
we
are
running.
We
have
been
now
moving
most
of
our
most
of
our
like
basic
etl
workloads
now
over
to
kafka.
So
we
over
at
kafka
connectors
on
top
of
our
operational
databases.
E
Then
we
use
kafka
to
store
and
distribute
the
data,
and
then
we
have
this,
like
in
in-house
developed,
streaming
framework
kind
of
a
model
repository
ingesting
the
data
and
then
uploading
that
snowflake
and
also
alongside
that,
we
utilize
heavily
air
flow,
especially
for
interacting
with
external
apis
third-party
systems,
but
also
other
of
our
like
internal
and
more
batch
batch
based
batch
based
systems
and
snowflake
is
our
main
data
lake.
That's
a
warehouse
kind
of
kind
of
an
in
between
a
model
model
there
and
for
data
hubs.
E
Perspective
like
we
have
this
luxury
of
storing
only
quite
well
structured
data.
So
we
don't
really
have
a
data
lake
where
we
dump
everything,
but
we
do
some
sensible
pre-processing
for
the
data
before
we
land
it
in
snowflake.
So
it's
in
a
like
usable
and
state
with
our
other
data
sets
and
to
make
sense
of
all
of
this.
We
started
last
year
to
evaluate
different,
open
source
and
also
other
other
options
for
managing
managing
all
of
this
all
of
the
metadata
we
have
and
like
we've
seen
previously
in
these
presentations.
A
Matty
I
had
one
question
which
I
can
ask.
Of
course:
the
the
circle
that
says
metamorphosis
is
that
the
confluent
term,
or
is
that
your
own
system.
E
E
In
this,
like
sql
alchemy
model
based
stream,
parsing
classes
that
then
we
can
register
automatically
as
soon
as
we
create
them
and
then
start
consuming
that
it's
from
kafka.
A
great
question.
I
actually
didn't
know
that
the
terms
overloaded
at
the
moment
got
it.
A
E
So
we
created
this
service
and
sdk
called
cellist
for
our
internal
use,
and
this
is
the
main
way
we
want
to
interact
with
data
hub
and
also
expose
this
whole
storing
and
interacting
part
of
the
with
the
metadata
service
to
our
like
internal
developer
users
like
engineers
who
want
to
publish
their
metadata
in
data
hub
and
the
main
main
reasons
or
main
main
benefits.
We
saw
from
this
approach
or
well
the
easier
integration
with
data
hub,
as
is
especially
when
we
started
working
on
this.
E
There
were
no
existing
framing
works
like
currently,
so
you
need
to
parse
the
kafka
messages
and
the
overall
schemas
by
hand.
So
this
helps
then
the
end
user
a
bit
and
also
allows
us
not
to
have
kafka
dependencies
on
each
of
the
project
that
wants
to
actually
use
this,
and
then
this
allows
us
to
reduce
the
complexity
of
the
metadata
models.
E
So
the
service
itself
is
simply
a
rest
api
written
in
python,
which
accepts
simple,
like
post
requests
and
then
based
on
the
payload
input,
step
dates
to
kafka,
to
be
processed
by
the
mce
consumer.
E
But
for
our
end
users,
for
actually
for
like
say
our
business
business
users,
the.
Of
course
the
ui
would
be
the
date
substitute
ui.
This
is
for
storing
the
data
and,
for
example,
fetching
downstream
lynch
information
for
for
some
some
entity
and
here's
a
slight
small
example
how
it
looks
like
in
practice.
So
we
have
a
simple
python
sdk
that
you
can
then
use
in,
for
example,
the
metamorphosis
project
mentioned.
E
We
simply
run
a
small
python,
scraper
script,
that
iterates
over
the
models
creates
the
required
payloads
and
then
outputs
that
into
the
service,
which,
for
example,
look
like
looks
like
here
that
we
have
this
simplified
data
set
format.
That
then
includes,
for
example,
the
schema
information
instead
of
having
them
as
different
entities.
That,
then,
might
require,
in
some
cases
some
intermediate
entities
for
linking
them
together.
A
I
think
we've.
E
A
Been
making
progress
on
having
the
python
sdk
get
easier
and
easier
to
use,
and
this
is
probably
something
that
we
will
motivate
mati
to
contribute
back
as
well,
so
it
can
it.
It's
amazing
right
to
have
this
easy
way
to
create
metadata
from
right
inside
python.
E
All
right,
yeah,
I
think
we
can
just
go
on
project.
D
Yes,
thank
you
yeah
just
generally,
as
stated
already,
oh
yeah,
sorry.
So
the
main
our
sort
of
vision,
vision
for
for
our
metadata
ingestion
is
basically
to
to
be
able
to
cover
the
whole
sort
of
data
realm
at
all.
So
starting
at
the
operational
databases
to
and
third
third-party
apis
to
the
to
the
warehouse,
snowflake
can
and
all
the
way
to
to
individual
machine
learning
models
and
and
looker
dashboards.
D
And
as
stated,
we
have
this
schema
repository
that
we
utilize
heavily
in
in
this.
We
can
easily
add
metadata
to
that
and
then
use
the
our
celeste
sdk
to
to
to
push
updates
and,
as
already
stated,
we've
been
proposing.
D
This
global
tags
feature
addition
to
data
hub
and
now
we're
exploring
how
to
utilize
those
most
efficiently
so
playing
with
the
idea
to
to
use
them
as
sensitivity
classes
on
on,
for
example,
schema
fields
and
or
on
higher
level
entities,
and,
as
you
saw
in
the
in
the
diagram
earlier,
we're
heavy
users
of
airflow
so
and
we
use
it
to
transform
and
and
and
move
data
around.
So
so
support
for
airflow
was
a
must
for
us.
So
that's
why
we
participated
in
the
addition
of
that
that
feature.
D
Let's
go
to
the
next
slide,
try
to
be
quick.
So
what
one
one
sort
of
aspect
to
us,
moving
or
taking
into
use
a
metadata
store
or
metadata
catalog
is.
Is
that
we're
moving
towards
a
sort
of
data
mesh
architecture?
I
hope
most
of
the
people
in
this.
D
This
crowd
knows
what
that
means,
but
basically
that
we
wanted
teams
to
individual
teams
to
be
take
more
ownership
of
their
their
own
sort
of
data
production
and
and
the
quality
of
that
data
and
and
and
the
flow
of
that
data
and
and
our
the
core
core
team
that
we're
part
of
with
then
just
provide
the
tools
and
the
infrastructure
and
and
monitoring
solutions
around
around
those
pipelines,
but
then
enable
to
in
order
to
keep
keeping
some
some
kind
of
control
over
the
the
whole.
D
Data
data
realm
or
data
the
data
platform
and
still
still
sort
of
have
it
in
a
manageable
state.
It's
sort
of
crucial
for
us
to
bring
this
this,
this
sort
of
a
data
discovery
and
data
catalog
into
the
mix
and
the
use
cases
for
us.
I
mean
it's
pretty
clear.
D
I
guess
most
of
these
are
quite
common
for
all
of
you,
but
you're,
obviously,
gonna
use
it
for
for
data
catalog
and
for
data
discovery
for
our
end
users,
compliance
use
cases,
governance,
ml,
lineage,
as
said
and
and
yeah
keeping
track
of
ownership
of
the
datasets
or
within
the
different
teams,
and
then
sort
of
one
one
thing
that
we're
probably
gonna
start
looking
into
more
more
extensively.
D
Quite
soon
is
sort
of
utilizing
the
downstream
lineage
and
sort
of
ownership
and
and
sort
of
stakeholder
relationships
for
alerting
sort
of
taking
downstream
actions.
I'm
thinking
about,
like
stopping
machine
learning
training,
runs
if,
if
we
notice
that
a
data
set
just
is
going
stale
or
things
like
this
all
right,
let's
go
to
the
last
slide.
I
think
we
have
a
maybe
a
few
minutes
so
yeah.
D
I
think
we're
we're
beyond
beyond
the
sort
of
stage
of
a
proof
of
concept
at
walt
where
we're
heavily
invested
in
in
this
product
already
in
the
community.
We,
the
experience
so
far,
has
been
super
great
and
we
see
a
very
bright
future
for
this
product.
D
We're
currently
sort
of
ingesting
our
or
working
yeah
in
just
the
the
sort
of
warehouse
tables
and
now
now
we're
sort
of
extending
that
outwards
to
our
kafka
topics
and
operational
databases
and
sort
of
downstream
as
well
into
into
looker
dashboards
and
so
on
and
yeah.
Maybe
maybe
one
of
the
cooler
cooler
features
that
we
we
want
to
look
into.
D
Is
this
like
downstream
chrome
steam,
we're
utilizing
downstream
lineage
to
for
alerts
and
and
monitoring,
and
then
at
layer
phase
we
always
start
onboarding
our
sort
of
business
users
for
using
it
as
some
data
discovery
tool.
B
D
A
I
think
this
is
this
is
great
and
we
would
love
to
get
some
of
these
user
avatars
as
part
of
our
team.
A
A
Cool,
we
have
about
20
minutes
left
in
the
session
and
I
am
going
to
try
something
interesting
this
time.
A
Which
is
observability
and
frederick
and
mati
talked
about
it
a
little
bit
as
well
right.
One
of
the
next
steps,
after
setting
up
a
a
data,
discovery,
catalog
or
platform
or
whatever
you
want
to
call
it
tool,
is
that
yeah
people
can
find
things
and
they
can
look
at
some
lineage
and
figure
out
who
to
talk
to
and
which
data
set
is
related
to
which
dashboard.
A
But
then
often,
we've
seen
that
the
next
question
that
comes
up
is
being
able
to
really
trust
that
this
data
set
is
indeed
the
right
one.
For
you
to
depend
on
for
the
rest
of
your
life,
I
mean
the
rest
of
your.
You
know
next
project
and
the
question
is:
why
would
you
depend
on
this
data
set?
What
makes
this
data
set
so
amazing
that
you
actually
want
to
use
it,
and
that's
really
that
difference
between
a
data
set.
That
seems
like
the
most
likely
candidate
for
you
to
use
versus
a
data
set.
A
That
clearly
seems
like
a
trusted
one
and
we
feel
like-
and
we've
seen
this
at
multiple
iterations
of
this
journey
at
different
companies-
that
the
operational
signals
coming
from
the
actual
active
part
of
the
data
ecosystem
gives
you
kind
of
that
important
signal
that
allows
you
to
understand.
If
you
should
trust
this
data
set
or
not
so
that's
kind
of
something,
that's
also
in
our
roadmap.
A
If
you
see,
we've
got
integration
with
data
quality
tools
and
things
like
that
coming
up,
and
we
wanted
to
instead
of
kind
of
working
on
it
and
then
finally
releasing
it
and
then
realizing
that
the
community
wanted
something
slightly
different.
We
wanted
to
flip
it
around
and
do
a
few
designs
do
a
few
mocks
and
then
share
it
with
the
community
and
kind
of
get
some
feedback
from
you
early
on,
so
that
we
get
the
chance
to
actually
build
the
product
in
the
right
way
for
the
community.
A
A
The
hive
data
set
in
question
you're
able
to
see
kind
of
a
quick
summary
of
what
that
data
set
is
all
about
it's
it's
not
just
that
it's
a
hive
data
set,
but
the
fact
it's
of
like
a
fact
table-
or
maybe
we
want
to
call
it
a
log
table
or
something
like
that.
But
it's
basically
a
immutable
set
of
facts
that
are
being
appended
to
the
data
set
over
time
and
there's
some
annotations
about
whether
it's
a
daily
partitioned
data
set
or
not.
A
A
What's
the
status
of
this
data
set,
when
was
it
last
updated?
What
were
the
checks
that
recently
ran
and
these
checks
might
be
operational
checks
like
hey?
It
was
recently
updated,
it's
a
daily
data
set
and
it
seems
to
have
landed
on
time.
So
that
looks
good,
but
also
that
it
has
past
validation.
There
were
data
quality
checks
that
were
run
and
all
of
them
seem
to
have
passed
successfully,
whatever
your
rules
are
and
that
there
are
no
active
issues.
A
A
The
next
kind
of
more
expanded
view
of
that
same
page
is,
you
know
you
might
search
for
the
data
set.
So
this
is
not
the
lineage
way
of
getting
to
the
data
set,
but
just
the
standard
search
and
then
you
click
and
you're
into
the
data
set
detail.
Page
very
similar.
You
can
see
the
same
card
below
most
people
are
familiar
with
kind
of
the
little
array
of
tabs
that
show
up
below
the
data
set
entity.
A
Page
right
and
most
of
us
are
familiar
with
the
schema
tab
and
you
know
the
ownership
tab,
but
imagine
that
we
put
up
the
ownership
up
to
the
right.
So
it's
always
there
and
available
and
schema
can
actually
stay
on
as
a
tab
in
this
array
of
tabs.
But
we
add
on
a
summary
which
includes
this
kind
of
operational
health.
A
So
that's
that's
kind
of
what
the
kind
of
landing
page
is,
how
we're
imagining
it
and
then,
if
you
click
into
one
of
these
tabs,
let's
say
the
events
tab.
A
Actually,
this
one
is
yeah
scrolling
down
a
little
bit
so
you're
you're
up
in
kind
of
the
entity,
detail
page
and
you
scroll
down
from
the
summary
into
kind
of
the
event
section,
and
you
can
see
kind
of
a
timeline
of
events
plotted
on
a
graph
that
shows
when
this
data
set
landed
now
for
streaming
data
sets,
there
might
be
a
different
thing
that
makes
sense.
This
is
a
fact-oriented
warehouse
table
right,
so
you
might
want
to
look
at
it
in
that
way.
You
know.
A
Typically,
you
know
the
standard
stuff
typically
lands
at
5
00
a.m,
but
at
the
sla
is
that
it
should
be
ready
by
6
a.m.
But
you
know
a
few
days
ago
there
was
a
problem
and
it
actually
showed
up
at
7.
00
am
or
8
am,
and
so
that
it's
late,
but
the
good
news
is
that
the
validation
checks
were
actually
run
on
that
partition
and
it
was
actually
validated
as
being
a
correct
partition
of
the
data
set.
A
You
have
the
overall
statistics
about
the
data
set,
including
a
quick
histogram
or
a
quick
trend
line
that
shows
you
how
many
rows
are
being
added
per
day
and
whether
that
is
in
line
with
what
you
usually
get
and
things
like
that.
The
goal
again
is
to
have
a
very
simple
and
easy
way
to
understand.
If
this
data
set
is
behaving
as
expected,
because
every
data
set
is
going
to
be
different,
there
will
be
some
data.
Sets
that
are
weekly
data
sets
be
hourly
and
then
for
streaming.
A
Data
sets
it's
a
whole
new
world
right,
so
that's
entity,
detail
page.
A
We
expect
to
allow
customization
about
what
you
think
are
important
events
and
allow
you
to
plug
in
different
producers.
That
can
say
different
things
about
the
data
set
so
that
it's
all
in
one
place
combined
and
multiple
tools
can
probably
be
emitting.
Important
updates
about
the
data
set
up
into
the
right.
You'll
see
this
report
and
issue
button
right
here,
and
that
is
meant
for
humans
to
essentially
say
hey.
I
think,
there's
a
problem
with
this
data
set
and
you
can
imagine
how
that
workflow
will
look
like
a
report,
a
problem.
A
The
problem
gets
logged
gets
routed
to
the
right
owner,
maybe
that's
integration
with
your
ticketing
provider,
whether
that's
servicenow
or
jira,
or
what
have
you
and
then
that
then
reflects
back
in
the
active
issues
that
are
going
on
with
the
data
set.
So
the
next
time
someone
goes
and
finds
the
data
set,
they
can
actually
see.
Oh
looks
like
there's
some
problem
that
someone
else
has
already
filed
and
they
can
go
check
on
how
that's
going
so
that's
the
events
table
stats
is
exactly
what
it
says.
A
Column
counts,
like
the
width
of
the
data
set
itself
partitions
trend
lines
looking
back
one
month,
three
months,
whatever
makes
sense
right
at
a
row
level
or
at
a
data
set
level
and
then
within
going
deeper
at
the
column
level,
understanding
histograms
for
each
individual
column,
understanding
if
there
are
nulls
being
found,
are
they
expected
or
not
expected,
so
that
naturally
leads
to
kind
of
a
data
quality
related
question,
but
kind
of
looking
for
feedback
around
how
people
would
like
to
see
this
evolve
and
I'll
drop
in
a
google
form.
A
The
next
thing,
the
next
tab
over
is
validation.
A
lot
of
people
in
the
community
are
using
great
expectations.
So
that's
the
that's
the
one
we're
showing
over
here
we
have
on
our
roadmap
integration
with
data
quality
tools
like
great
expectations.
A
So
once
you're
looking
at
a
data
set,
you
can
drop
into
the
validation
tab
and
it
shows
you
all
of
the
assertions
that
have
been
run.
Assertions
that
are
running
failing
succeeding,
what
those
assertions
even
are
and
being
able
to
kind
of
see
at
a
glance
how
things
are
going.
This
is
sort
of
like
your
cicd
view,
of
the
data
set
right.
A
A
So
I'll
give
you
just
a
little
bit
of
time
to
click
into
it,
bring
it
up
and
don't
context,
switch
and
I'll
I'll
go
back
to
the
slide
deck,
so
you
can
kind
of
see
it
really
quick.
A
A
Okay,
awkward
silence
done,
hopefully
everyone
at
least
have
the
form
open.
We're
gonna
drop
it
on
the
slack
channel
as
well,
so
you
can
always
get
back
to
it
and
we'll
drop.
Obviously,
the
slides
are
gonna
be
public,
so
you
can
go
back
and
look
at
them
and
tell
us
what
you
think
this
work
will
be
done
over
the
next.
You
know
month
or
two
months.
So,
as
you
can
see
from
the
roadmap,
we
do
have
quite
a
bit
of
work
ahead
of
us,
so
the
earlier
we
get
feedback
in
the
more.
A
We
can
make
sure
that
this
thing
is
going
to
look
just
right
for
for
your
team
and
for
your
company.
So.
A
All
right,
so
that's
pretty
much
it
from
a
programming
scheduled
programming
perspective.
We
are
not
yet
announcing
the
next
town
hall,
but
I'll
I'll
drop
in
the
it.
You
know
it's
the
standard
third
friday
of
every
month,
but
we'll
figure
out
the
exact
timing,
whether
we
move
it
by
one
hour,
we'll
get
some
feedback
from
the
committee
around
how
they
felt
the
timing
was
for
them.
For
me,
the
coffee
was
great,
so
I'm
feeling
fine
right
now
I
look.
A
I
took
a
quick
look
at
the
q,
a
section
and
there
were
a
couple
of
questions
that
popped.
One
was
around
business
glossary
and
the
second
goes
around
moving
from
confluence
to
data
hub,
I
can
address
the
first
one.
Really
quick
business
glossary
is
something
that
we've
been
working
with
the
community
on
for
a
while.
A
Now
I
think
saxo
bank
has
been
and
thought
works
have
been
collaborating
with
the
community
on
it
recently
with
the
addition
of
the
tags
feature,
we
had
a
renewed
discussion
around
it,
because
we
had
this
big
debate
about
our
tags,
the
same
thing
as
business
glossary
terms
or
are
they
different
things
where
we
netted
out
at
least
design
wise
was
that
we
will
keep
tags
similar
to
global
tags
or
hashtags?
A
A
But
now
you
can
kind
of
browse
into
a
specific
section
of
the
taxonomy
and
attach
it
and
give
it
a
different
kind
of
semantic
meaning.
So
we
think
the
application
process
like
how
you
apply
terms
to
data
sets
or
to
fields
are
going
to
look
very
similar,
but
the
way
you
manage,
the
the
taxonomy
itself
is
probably
going
to
be
much
more
interesting.
There
will
be
hierarchies.
A
One
of
the
things
on
the
roadmap
is
role-based
access,
control
or
fine-grained
access
control
on
metadata
elements
themselves,
so
that
will
allow
for
kind
of
the
tag,
governance
and
allow
for
business
glossary
governance
so
that
you
can
essentially
demarcate
owners
of
a
certain
taxonomy,
and
that
then
allows
this.
This
feature
to
actually
make
sense
for
a
lot
of
companies.
A
A
A
A
Okay,
since
we
don't
have,
we
don't
have
the
person
who
asked
the
question,
I
can
imagine
what
it
was
about.
There's
probably
a
ton
of
documentation
about
data
sets
that
is
being
written
in
wikis
and
people
are
basically
asking
hey.
How
do
we
get
that
documentation
into
data
hub,
or
vice
versa?
A
A
One
of
the
easiest
features
that
we
can
think
of
is
to
just
allow
editing
markdown
directly
in
the
data
set
description
page
so
that
you
can
kind
of
have
a
living
document
that
you're
maintaining
about
the
data
set
just
like
we
have
our
readmes
and
other
things
in
our
github
repositories.
So
that
is
definitely
something
that
is
on
our
short
term
roadmap
and
that
should
be
coming,
but
deeper
integration,
maybe
back
and
forth
between
confluence
and
data
hub,
is
not
something
that
we
have
planned
for.
B
The
question:
when
is
the
office
hour?
Please
the
coming
one,
because
we're
interested
in
the
ssl
support
a
lot.
A
Cool,
absolutely
something
just
watch
out
on
the
slack
channel
do
not
mute
notifications-
I
I
don't
like
doing
at
here,
so
if
everyone
is
paying
attention
to
the
slack
channel
I'll,
stop
doing
it,
but
a
lot
of
times
people
tell
me:
oh.
D
A
Didn't
know
there
was
a
town
hall,
so,
okay,
I
guess
I'll-
have
to
do
an
answer
just
to
remind
people.
So
so
we
will
announce
town
halls
over
there
and
sorry
office
hours
in
the
slack
channel.
Thank
you
just
pay
attention
to
it
and
we'll
give
kind
of
a
wide
range
of
time
so
that
people
can
hop
in
and
out.
A
D
Yeah
yeah
sort
of-
and
they
include,
documentation
there
as
well.
A
Yeah
yeah
that
would
be
interesting,
keep
the
ideas
coming
and
keep
the
contributions
coming.
I
think
it's
been
just
amazing
to
see
how
the
community
has
kind
of
taken
this
project
and
taken
it
to
places
that
we
ourselves
are
not
able
to
do
small
teams
right
everywhere
busy
with
many
other
things.
So
this
is
amazing
and
I'm
really
excited
about
dbt
and
looker.