►
From YouTube: Aug 27 2021: DataHub Community Meeting (Full version)
Description
Full version of the DataHub Community Meeting on Aug 27th 2021
00:00 Welcome
01:13 Project Updates and Callouts by Shirshanka
- Accomplishments in Aug
04:55 Business Glossary Demo by Shirshanka
- 0.8.12 Upcoming Release Highlights
15:40 Users and Groups Management (Okta, Azure AD)
21:48 Demo: Fine Grained Access Control by John Joyce (Acryl Data)
38:41 Case-Study: DataHub @ Warung Pintar and Redash integration by Taufiq Ibrahim (Bizzy Group)
56:48 New User Experience by John Joyce (Acryl Data)
A
Welcome
everyone
to
the
august
edition
of
the
data
hub
community
meeting.
Let's
get
right
to
it,
we
have
a
packed
agenda
as
usual.
First
off
I'll
go
through
the
project
updates.
As
usual
talk
about
what
we've
accomplished.
In
august
we've
been
busy
building
and
the
upcoming
release
zero
eight
twelve.
A
A
Overall,
from
a
commit
perspective,
I
was
just
looking
at
the
commits
since
the
last
town
hall,
we've
got
more
than
150
actually
and
so
we're
continuing
our
100
plus
commits
per
month
rate,
and
we've
got
more
than
24
more
than
20
committers,
almost
24
committers
from
13
different
companies
and
six
new
contributors.
So
welcome
to
all
of
you
looking
forward
to
more
contributions
from
each
one
of
you
and
we'll
keep
building
interesting
features
together
in
terms
of
the
biggest
highlights
we
have
business
glossary
phase.
A
A
One
john
will
talk
you
through
how
it
has
been
built
and
what
capabilities
you're
gonna
have
and
then,
similarly
on
the
users
and
groups
track,
we've
got
a
bunch
of
work
done
with
integration
with
octa
and
azure
ad,
as
well
as
just
in
time
provisioning
that
we'll
walk
through
in
other
tracks.
Typically,
we
go
over
like
product
improvements,
integrations
and
developer
experience
improvements,
we've
had
work
in
all
three
tracks
and
I'll
cover
them
next,
but
first
some
community
call
outs.
A
A
Call
out
frederick
for
continuing
to
improve
our
injection
code
base
recently
contributed
the
ability
for
you
to
extend
and
bring
in
your
own
sql
parser
when
analyzing
local
queries.
I
think
it's
going
to
be
quite
interesting
to
do
that,
then
we
have
toffiki
tawfiq
ibrahim
and
chris
colson
also
known
as
data
science
chris
for
collaborating
on
the
redash
contribution,
and
then
we
have
simon
orimus,
walior
and
serif,
who
have
been
consistently
asking
great
questions.
A
You
know
we
love
stack
traces
and
we
love
troubleshooting
issues,
but
we
also
like
talking
about
things
like
how
should
data
meshes
be
modeled
and
what
does
mlaps
look
like,
and
so
it's
been
great
having
high
quality
conversations
as
well
in
our
community
talking
about
the
community
dan
vestobe
excel
david
schmidt
and
chris
coulson
have
been
helping
the
community
out
generously
with
their
time
when
people
have
issues
helping
them
out
with
solutions.
So
thank
you
all
of
you
for
doing
that.
A
It
takes
a
village
to
keep
the
community
growing
and
thanks
a
lot
for
continuing
to
do
that
and
last,
but
not
the
least
dimitri
boykin,
for
giving
great
feedback
on
our
last
town
hall.
Mlaps
integration
we're
going
to
continue
building
out
better
lineage
integration
between
features
and
data
sets,
as
well
as
other
systems
in
the
ecosystem,
so
stay
tuned.
For
that.
A
Moving
on
the
first
product
improvement
that
we
would
like
to
share
is
a
business
glossary
phase.
One
and
I'll
do
a
quick
demo,
but
before
that,
just
a
quick
intro
to
business
glossary
itself,
it's
really
a
way
of
representing
a
tree
of
concepts
that
are
useful
for
attaching
to
existing
data
sets
or
fields.
So,
for
example,
at
your
company,
you
may
decide
to
have
a
taxonomy
that
says:
classification
as
a
top
level
node
and
within
that
terms,
like
confidential
or
highly
confidential
or
sensitive,
that
live
within
the
classification
node.
A
Similarly,
you
can
have
another
node
called
clients
and
accounts
and
all
client
and
account
stuff
lives
under
there
like
an
account
and
an
account
can
also
contain
a
balance
which
itself
might
be
a
term
under
the
clients
and
accounts
note
now.
Similarly,
you
can
also
have
relationships
across
these
taxonomies.
For
example,
a
balance
or
an
account
might
be
confidential
or
might
have
highly
confidential
data,
and
so
you
can
have
a
relationship
between
accounts
as
well
as
highly
confidential
terms.
A
A
So
first,
let's
look
at
the
recipe
if
everyone
is
familiar
with
the
recipe,
this
is
what
it
looks
like
you
have
a
source
and
a
sink.
The
sink
looks
exactly
the
same
destination
is
data
hub
rest
and
the
source
in
this
case
is
of
type
data
hub
business
glossary.
This
is
a
new
source
that
I've
created
and
a
config
file,
which
is
the
business
glossary
itself.
A
All
right
very:
firstly,
we
have
a
version
identifier
and
then
a
source
which
kind
of
describes
who
is
even
specifying
this
glossary.
So
in
my
case
I
decided
that
datahub
is
the
author
of
the
glossary
in
your
company's
case.
You
might
have
your
company's
name
here
and
then
default
owners
for
all
things
in
the
glossary.
These
could
be
users,
they
could
also
be
groups.
A
You
can
have
a
url,
for
example,
this
could
point
to
the
github
location
of
where
this
business
glossary
file
is
stored
and
then
below
that
you
have
nodes
and
then
contained
within
nodes.
You
have
terms
so,
let's
look
at
the
nodes
I
created,
I
created
a
node
called
classification,
just
like
I
showed
you
in
the
slide
deck
before
it's
got
a
description
and
it
has
terms
within
it
like
names,
the
terms
have
names
like
sensitive
confidential,
highly
confidential,
with
some
descriptions
attached
to
them.
A
A
Inherits
is
really
like
a
is
a
relationship,
and
so
what
I'm
basically
saying
is
emails
are
classif
are
classified
as
confidential.
The
owner
in
this
case
is
a
group
and,
as
we
go
further
down
you
see,
gender
is
inheriting
the
sensitive
classification
further
down.
We
have
another
node
called
clients
and
accounts,
and
that
includes
I
copied
this
actually
from
fibo
and
that's
why,
when
I
define
the
term,
I
say
that
the
term
source
is
external,
the
source
rep
is
fibo.
A
I
even
give
a
link
for
where
this
term
is
defined
in
the
fibo
glossary,
and
then
I
have
some
specializations
of
the
term
like
it
inherits
highly
confidential
and
it
contains
another
term
which
is
a
client
in
account
balance
term,
which
I
define
next.
A
A
Hold
a
deep
breath:
while
ingestion
runs
there,
you
go,
we
ingested
11
terms.
Now,
let's
go
into
the
business
glossary
and
see.
If
we
see
them
there,
you
go
so
we
have
our
business
glossary
terms.
The
top
level
nodes
are
in
here.
We
can
go
into
them
and
see
the
terms
within
that.
We
see
that
ownership
has
been
ingested
as
well
and
we
can
go
into
each
one
of
these
things,
like
you
know,
remember:
email,
it's
got
a
source,
you
can
go
view
the
source.
A
Similarly,
we
can
go
into
accounts
and
go
to
the
account
term,
and
when
we
go
to
related
terms,
we
see
that
it
contains
a
balance
just
like
we
had
described
as
well
as
inherits
the
highly
confidential
term.
So
that,
in
a
nutshell,
is
how
you
can
produce
and
load
an
entire
business
glossary
into
data
hub.
A
A
It
seems
to
have
a
field
called
an
email,
but
we
don't
see
any
terms
attached
to
them
and
it
would
be
nice
to
know
that
the
user
account
is
actually
of
type
account
and
the
email
in
here
is
actually
of
type
email.
So,
let's
ingest
those
terms
for
that
I'll
go
back
to
the
terminal,
and
in
this
case
I
have
a
nicely
prepared
datasets.json
file.
If
you're
familiar
with
how
mces
look
this,
you
should
be
right
at
home
here.
A
A
The
demo
works,
we
see
tags
and
terms
attached,
an
account
has
been
attached
to
the
account
user
account
and
then
the
email
term
has
been
attached
to
the
user.
If
I
go
into
the
email
now
I'll
see
that
there
are
related
entities
like
data
sets
that
are
already
attached
to
this
thing
as
well
as,
if
I
go
to
the
account
term.
I'll
see
that
same
data
set
is
also
attached
to
these
terms.
A
It
also
you
can
also
search
for
these
things,
so
I
can
just
type
email
and
I'll
get
a
helpful
drop
down.
Where
autocomplete
shows
me
that
I
can
search
for
personal
information.email
and
if
I
search
for
it,
I
not
only
get
the
glossary
term,
but
I
also
get
the
data
sets
that
have
that
term
attached
to
it.
So
that,
in
a
nutshell,
is
the
business
glossary
demo?
A
We
now
support
both
keys
as
well
as
values,
and
so,
if
you
use
the
new
kafka
source
and
upgrade
your
actual
data
hub,
libraries
you'll
be
able
to
see
both
keys
as
well
as
values
showing
up
in
a
toggle
bar
at
the
very
top
right.
In
addition
to
that,
we
also
went
in
and
improved
our
representation
of
highly
structured,
nested
schemas.
So
now
you
can
finally
ingest
data,
hub's
own
kafka
topics
and
actually
explore
the
metadata
schema
itself.
We've
done
a
lot
of
work
in
representing
structs
as
well
as
unions.
A
Well,
so
you
should
be
able
to
actually
browse
the
schema
pretty
nicely
and
understand
what
this
very
complicated
schema
looks
like
all
right.
So
that's
pretty
much
what
I
had
to
share
and
now
we're
going
to
go
over
to
john
who's,
going
to
give
us
a
quick
update
on
what
has
been
cooking
on
users
and
groups.
B
So
we've
had
some
recent
developments
on
the
ingesting
users
and
groups
front.
So
this
is
something
we've
actually
gotten
quite
a
few
questions
about
recently,
so
we're
putting
some
effort
into
making
sure
our
guidance
is
is
clear
around
how
to
ingest
your
users,
as
well
as
your
groups,
into
datahub's
platform
on
the
recent
developments.
B
This
means
that
when
people
log
in
we
will
provision
them
users
and
their
corresponding
groups,
if
they
do
not
already
exist
in
datahub
system
at
login
time,
we've
also
made
groups
searchable
via
the
ui
and
we've
added
group
members
on
the
the
groups
page
itself.
So
you
can
easily
understand
who
is
who
is
in
a
particular
group,
so
I
want
to
get
into
the
details
of
just-in-time
provisioning.
B
So
I
just
want
to
quickly
give
an
overview
at
a
high
level
of
like
user
and
group
management
and
data
hub,
there's
kind
of
two
paths
to
seeding
users
and
groups
into
into
data
hub.
The
first
is
what
we
call
proactive,
which
is
basically
your
batch
ingesting
users
and
groups
from
some
third-party
system,
some
external
system
like
octa
or
aed,
and
we
actually
now
provide
the
ability
to
validate
that
that
user
has
already
been
sort
of
ingested
at
login
time.
B
So,
basically,
you
can
go
into
octa
and
maybe
only
ingest
20
users
that
you
want
to
use
as
your
beta
users
and
when
they
log
in
they
will
either
be
allowed
or
denied
based
on,
if
they're
already
in
the
system.
So
that's
kind
of
the
proactive
approach
and
then
the
reactive
approach
is
what
I
just
talked
about.
It's
just
in
time.
Ingestion
at
login
time
over
oidc,
and
actually
both
of
these
today
do
require
open
id
connect
for
that
authentication
piece
if
you'll
hit
next
srishanka.
B
So
if,
if
this
doesn't
work
for
your
organization,
I.e,
you're
using
saml
or
ldap
would
work
better.
Something
else.
Please
do.
Let
us
know
we're
always
trying
to
get
feedback
about
this
particular
thing,
because
it's
a
it's
a
domain,
that's
kind
of
different
between
a
lot
of
organizations
and
we
do
it
one
way
we
use
oidc,
but
everyone
does
it
a
slightly
different
way.
So
we
definitely
want
feedback
to
understand
if
there's
a
better
way
to
to
kind
of
pull
in
your
organization's
users
and
groups
into
data
hub
and
seed
them.
B
Okay,
next
slide.
So
what's
on
the
horizon,
well
we'd
actually
like
to
add
sort
of
an
admin
console
in
the
ui
that
allows
you
to
manage
users
and
groups.
So
do
things
like
creating
new
groups
through
the
ui,
removing
groups
that
you
may
have
ingested
or
may
have
been
provisioned
and
then
manage
group
membership
so
actually
be
able
to
add.
B
You
know:
users
to
groups,
remove
users
from
groups
and
then
finally
we'd,
like
kind
of
fine-grained
user
state
management,
the
ability
to
kind
of
activate
and
deactivate
users
that
you
may
have
ingested
from
a
third
party
source
or
who
may
have
been
provisioned
at
login
time.
So
that's
what
we're
working
on
now.
B
A
Awesome
so,
as
usual,
we
have
a
lot
of
integration
improvements
pretty
much
across
the
board.
A
few
call
outs
would
be
redash,
we'll
talk
about
that.
Later.
Kafka
connect,
we've
added
support
for
jdbc
sources
as
well,
not
just
the
the
bayesian
one
that
was
there
before
and
for
mongodb.
We
added
some
small
tweaks
to
handle
really
large
schemas
that
were
coming
out
of
the
schema
inference
system.
So
now
you
know,
data
hub
is
not
going
to
crash
on
you.
A
If
you
have
13
000
the
fields
in
your
schema,
like
some
people,
do
all
right.
Moving
on
on
the
developer
track,
we
are
going
to
talk
about
performance
metrics,
I'm
not
going
to
discuss
that
too
much
here.
We've
added
a
lot
of
improved
documentation
for
injection
sources.
So
if
you
go
check
out
our
injection
docs,
our
source
docs,
are
much
improved
thanks
to
john
and
kevin
for
doing
that,
and
so
as
new
sources
come
on
board,
we
have
a
pretty
nice
way
of
adding
them
to
their
documentation.
A
Now,
all
right
so
with
that
back
to
john
to
start
off
with
the
first
session
of
the
day,
which
is
fine,
green
access,
control.
B
All
right,
thank
you,
shashanka.
Let
me
just
take
over
the
screen
here.
B
Yeah,
so
I'm
gonna
do
a
quick
overview
of
where
we
are
in
fine
grained
access
control.
This
is
something
we
started
thinking
about
at
the
beginning
of
the
summer,
based
on
a
lot
of
feedback
from
the
community
around
wanting
this,
this
capability
to
control,
who
has
access
to
what
metadata
on
datahub's
platform.
B
So
I'm
going
to
start
by
just
talking
about
what
access
control
is,
so
how
we
think
about
it
is
that
access
control
is
a
way
to
declare
who
can
perform
what
action
against
which
resources
and
how
we
model.
This
is
with
three
kind
of
sub
concepts.
One
is
an
actor
which
determines
the
who
portion
of
the
the
policy
a
privilege,
so
what
action
they
can
perform
and
then,
finally,
a
resource
or
an
object.
B
So
I'm
just
going
to
talk
about
a
few
policies
in
english.
You
know
on
datahub's
platform,
you
may
want
to
kind
of
restrict
who
can
do
certain
things
so
number
one.
Maybe
data
set
owners
should
be
able
to
add
documentation,
but
they
shouldn't
be
able
to
add
tags.
So
we
want
a
controlled
vocabulary
of
tags.
Perhaps
another
example
is
maybe
the
data
platform
team
should
be
able
to
edit
anything
about
a
data
set
right
because
they
manage
the
platform
they're
sort
of
the
admins
of
data
hub,
maybe
ted.
B
Our
data
steward
should
be
able
to
edit
any
data
sets
tags,
maybe
that's
his
job,
but
shouldn't
be
able
to
edit
the
description
or
the
ownership
or
anything
else.
And
finally,
maybe
the
administrative
group
should
be
able
to
manage
policies
themselves
right
so
should
be
able
to
dictate
who
can
do
what
on
the
platform?
B
We
wanted
to
apply
these
policies
to
resources
at
two
levels,
so
one
is
based
on
the
resource
type.
So
imagine
you
know.
Data
set
assets
or
dashboards
or
charts,
as
well
as
the
resource
identity
level,
so
to
be
able
to
call
out
a
particular
data
set
or
a
particular
chart
and
apply
fine-grained
access
control
against
that
asset
individually
and
then.
Finally,
we
wanted
to
model
this
concept
of
actors
using
our
concept
of
user
in
groups
that
already
exist.
B
So
we
wanted
to
be
able
to
say
that
john
should
be
able
to
do
something
to
a
particular
data
set,
or
maybe
a
group
should
be
able
to
do
something
to
a
particular
dashboard.
We
also
wanted
the
ability
to
support
sort
of
this
wild
card
predicate
and
say
all
users
or
all
groups
should
be
able
to
do
something
to
a
particular
asset.
B
B
So
now
I'm
going
to
go
into
a
demo
of
the
milestone,
one
implementation
of
policies
based
on
what
we
just
talked
about,
so
I'm
gonna
go
over
here
to
data
hub
and
you
know
right
off
the
bat
I'm
just
gonna.
This
is
the
default
deployment
of
of
the
new
policies
world,
so
I'm
gonna
go
ahead
and
search.
I've
just
got
some
of
this.
You
know
basic
sample
metadata
in
here
that
you
guys
are
all
familiar
with.
B
Probably
I'm
going
to
go
to
this
first
data
set
and
I'm
going
to
try
to
add
a
tag
right
so
let's
say
new
tag.
Okay,
I
already
have
one
my
new
tag
and
what
you'll
notice
right
away
is
that
we've
got
a
warning
here,
which
says
looks
like
you're
unauthorized
to
perform
that
action.
So
why
would
that
be?
Well?
That's
because
we
haven't
defined
any
policies
yet
so
I,
by
default.
I
am
not
able
to
do
anything
to
this
data
set
right.
B
You
know
actor
object
privilege,
so
I'm
going
to
start
by
giving
my
policy
a
name
and
I'm
going
to
actually
use
the
example
from
the
the
slides,
I'm
going
to
say,
data
sets
owner's
documentation
policy
right.
So
basically,
I
want
to
say
that
owners
can
update
documentation,
but
that's
it
about
a
particular
data
set.
So
next
I'm
going
to
choose
the
type
of
the
policy.
There
are
two
types
today.
B
Finally,
I'm
going
to
give
it
a
description
say
only
owners
should
be
allowed,
we're
sorry,
let's
actually
say
owners
should
be
allowed
to
edit
docs.
That's
it
I'm
going
to
hit
next
and
I'm
going
to
choose
the
asset
type
that
I
want
to
apply
the
policy
to
so
in
this
case,
it's
going
to
be
data.
Sets
I'm
going
to
choose
that
and
then
I'm
going
to
choose
the
asset
that
I
want
this
policy
to
apply
to.
So
I
can
either
search
for
a
particular
asset
right
or
I
can
just
say
all.
B
B
Finally,
we
get
to
the
third
final
screen
where
we
can
say
who
can
actually
do
this
and
you'll
see
right
away,
there's
three
kind
of
options
here
we
can
either
call
out
users
specifically,
so
I
can
say
datahub,
user
or
john
doe
or
whatever
we
can
call
out
groups
or
we
can
say
owners
right.
So
this
is
that
edge
based
predicate.
B
Finally,
I'm
just
going
to
save
this,
and
now
you
see
I
have
a
new
policy
right.
You
can
see
it's
in
an
active
state,
which
means
it
should
apply.
So
I'm
going
to
go
ahead
and
go
back
to
the
data
sets
as
you'll
notice
like
this
actually
isn't
owned
by
me.
I'm
logged
in
as
data
hub,
so
I'm
going
to
go
to
the
second
data
set
which
is
owned
by
me
and
I'm
going
to
attempt
to
update
the
documentation.
B
And
you'll
see
I
was
able
to
update
it
great
awesome,
so,
let's
actually
back
out
here
and
let's
try
to
update
a
data,
sets
documentation
that
I
don't
own
right.
So
I
don't
own
this
one,
I'm
going
to
come
in
here
and
say:
hey!
I
want
to
update
oops
looks
like
I'm
unauthorized
to
perform
that
and
that's
because
the
policy
doesn't
allow
me
to
do
that.
B
So
I'm
going
to
go
back
and
I'm
going
to
open
up
this
policy
again,
I'm
going
to
take
a
look
at
what
it
says
and
I'm
actually
just
going
to
deactivate
it
because
you
know.
Actually
I
want
to
revoke
this
policy,
so
I'm
going
to
go
ahead
and
click
deactivate
and
you'll
see
that
this
policy
is
now
in
an
inactive
state.
B
B
B
I'm
going
to
again
choose
data
sets
and
in
this
case
I'm
going
to
actually
look
up
a
particular
data
set.
So
I
want
to
say
that
I
should
be
able
to
you
know,
update
the
hdfs
data
set,
or
maybe
the
kafka
one
as
well,
so
I'll
select,
two
of
them
and
then
finally
I'll
select
a
privilege,
in
this
case
editing
tags.
B
And
then
I
will
just
find
myself
data
hub
and
I'll
save
it,
and
you
can
see.
We've
got
the
new
policy,
it's
in
the
active
state.
So
now
I
should
be
able
to
update
the
tags
for
this
hdfs,
one
which
I
wasn't
able
to
update
in
the
initial
case.
So
let's
say
my
new
tag
again
see
if
I
can
add
it
looks
like
I
was
able
to
add
it.
I
can
remove
tags,
of
course,
because
I
have
full
control
over
editing
the
tags
all
right.
B
So
I
can
probably
add
a
tag
here
as
well:
awesome,
okay,
so
we've
we've
correctly
created
two
policies
and
now
finally
there's
the
the
final
thing
I
want
to
demo,
which
is
just
cleaning
up
policies.
So
there
are
cases
in
which
you
may
have
created.
You
know
a
policy
by
mistake.
What
you
can
do
there
is,
you
can
actually
just
come
in
delete
the
policy
right,
delete
the
policy
and
we're
back
to
state
zero.
So
this
is
in
a
nutshell,
what
policy
management
and
role
fine
grained
access
control
will
look
like
on
datahub.
B
This
is
the
mvp
all
of
those
privileges,
the
assets
you
saw
both
metadata
privileges,
as
well
as
the
platform
privileges
will
be
supported.
Basic
platform
privileges,
including
managing
policies,
managing
analytics
things
like
that
eventually
that'll,
be
extended
to
include
things
like
managing
users
and
groups.
So
adding
groups
deleting
groups
things
like
that.
B
So
pretty
happy
about
how
this
turned
out.
Looking
for
feedback
from
the
community,
we
will
have
you
know
a
global
on
off
switch
here
which
I'll
talk
about
shortly.
When
I
get
back
into
the
demo
or
the
slides
here.
But
let's
let's
go
ahead
and
continue
here.
A
John
there's
one
question
about
who
can
even
edit
policies
like
who
has
admin
privileges
on
even
the
ability
to
add
or
create
policies.
B
Yeah,
so
we
we
model
the
ability
to
manage
policies
as
a
platform,
privilege
right
and
so
by
default.
Data
hub
will
will
ship
or
launch
with
a
set
of
sort
of
immutable
policies,
and
those
immutable
policies
will
grant
the
ability
to
manage
policies
to
manage
analytics
to
that
core
super
user
which
is
datahub
today.
So
when
you
launch
a
fresh
instance
of
datahub
that
datahub
user
will
have
all
privileges
on
the
platform
and
that'll
be
sort
of
the
jump
off
point
from
which
you
can
create
additional
policies.
B
B
Quickly
talk
about
the
implementation
like
what's
going
on
here,
you
know
recently,
we
we've
moved
our
graphql
api
to
the
metadata
service,
so
that's
actually
where
a
lot
of
this
is
kind
of
occurring.
So
what
happens
when
a
request
comes
in?
B
So
one
is
on
a
cadence,
so
you
can
configure
it
to
be
syncing
every
two
minutes:
five
minutes,
ten
minutes,
whatever
you'd
like
by
default,
it's
at
two
minutes
as
well
as
when
the
cache
becomes
stale.
So,
if
you
add
a
policy
or
edit
a
policy
state,
as
you
saw
in
this
demo,
we
will
actually
go
and
refetch
the
cache
and
and
reboot
the
cache,
and
so
that
gets
us
into
the
authorizer
itself.
B
This
key
component,
which
basically
maintains
that
cache
always
keeps
kind
of
the
latest
view
of
the
policies
as
well
as
makes
a
determination
at
you
know,
request
time
whether
to
allow
or
deny
a
particular
action,
and
it
does
so
by
exposing
an
api
that
takes
those
three
pieces
of
the
policy
that
we
had
talked
about
prior.
So
at
request
time,
the
invoking
code
will
pass.
You
know
an
actor
which
is
basically
the
user
principle
behind
the
request.
It'll
pass
the
groups
that
that
user
is
associated
with,
as
well
as
a
privilege.
B
B
So
it's
pretty
awesome
so
policies
in
in
practice.
We
we
we
want
policies
to
be
enabled
or
disabled
globally
at
deploy
time.
So
what
this
means
is,
you
can
continue
to
use
data
hub
as
you're
using
it
today
where
there's
no
policies
and
anyone
on
the
platform
can
do
anything.
B
We
wouldn't
recommend
that
we
recommend
you
actually
do
start
using
the
policies,
because
they,
I
think
they'll
be
very,
very
helpful
to
make
sure
that
metadata
stays
clean,
but
by
default
again,
datahub
will
be
that
super
user,
which
will
be
seated
with
irrevocable
kind
of
immutable
policies
that
say
that
it
can
do
anything
and
so
it'll
be
on
the
operator
to
go
and
spawn
off
additional
policies
on
a
per.
You
know,
policy
basis
from
that
core
admin
account.
B
Finally,
I'll
just
talk
about
a
little
bit
about
you
know.
What's
on
the
horizon,
for
policies,
so
after
we
get
this
kind
of
first
code
pass
done,
we
want
to
release
a
policies.
V1
usage
guide,
that'll
talk
about
how
you
create
policies,
how
you
manage
them,
hopefully
it's
self-explanatory,
but
I
think
it
will
still
be
pretty
helpful
to
have
something
accompanying
a
feature.
This
big
we'll
also
look
at
supporting
additional
predicate
types,
especially
on
the
resource
itself.
B
So,
as
you
saw,
there's
mainly
users
and
groups
which
are
able
to
do
different
things,
we
have
had
some
requests
from
a
few
folks
that
this
layer
of
indirection,
which
is
commonly
called
a
role,
would
be
perhaps
useful,
so
we're
actually
looking
for
feedback
from
the
community
and
direction
from
the
community
to
understand
whether
that's
a
requirement.
That
really
is
something
we
need
to
take
into
account
here
with
this
system.
B
So
that's
that's
pretty
much
it
thanks
guys.
I
will
hand
it
back
to
shoshanka.
A
Thank
you.
We
are
running
a
little
bit
late,
but
I'll
stay
with
the
policy
of
allowing
everyone
to
speak.
There
are.
A
On
the
chat-
and
we
will
take
it
on
general
because
I
don't
think
we
can
get
to
all
of
them
really
good
questions
thanks
for
for
handling
most
of
them,
but
I
think
there
are
a
few
others
that
are
still
open
all
right.
So,
let's
move
on
to
our
community
speaker
tawfiq,
who
is
coming
to
us
from
indonesia
thanks
sophie
for
staying
up
so
late
and
giving
us
your
time
take
it
over.
A
I
will
share
the
screen
so
and
then,
as
we
get
into
the
demo,
we
can.
C
All
right,
okay,
thank
you!
Srishanka,
good
morning,
everyone,
I'm
tavik,
I'm
from
indonesia
right
now,
I'm
working
at
busy
now
part
of
one
peter
group
and
I'm
going
to
share
our
case
study
with
data
hub
and
how
we
develop
the
re-dash
source
connector
here
next.
C
C
Now
we
are
serving
around
600
brands,
fmcg
brands
and
serving
around
230k
of
retailers,
fresh
indonesia.
So
actually
we
have
two
kind
of
business
here.
One
is
the
supply
side,
which
is
which
is
working
with
the
distributor
and
the
fmcg
brand.
And
the
other
thing
is
we
work
with
the
retail
part
with
the
we
call.
It
is
actually
in
in
initial
work
for
grocery
retailers,
yeah
nice
yeah.
This
is
a
data
ecosystem
at
one
printer
group
and
busy.
C
So
we
have
several
legacy.
Let's
say
a
legacy
stack
coming
from
existing
platform
from
corporate
enterprises
like
sap,
but
we
also
have
more
modern
architecture
like
cloud-based
application,
so
we
have
like
a
mix
of
technology
stacks
like
you
can
see
that
we
have
airflow.
We
have,
as
it
still
have
ssis
here
and
then
we
broke
some
of
this
stack
into
operational
part
and
the
analytical
part,
and
also
we
have
operational
domain,
which
is
actually
the
erp
and
the
application
databases
we.
C
We
also
touch
the
production
database,
like
updating
data,
synchronized
data
from
multiple
sources,
and
we
also
captured
changed
the
capture
from
the
application
database,
using
kafka,
connect
and
sync
it
into
multiple
things
like
operational
reporting,
dbs
and
then
also
right
into
the
bigquery,
which
is.
C
Processed
by
airflow
to
be
served
by
several
bi
and
reporting
tools,
you
can
see
that
we
have
multiple
reporting
services
like
we
have
the
the
old
stack
like
legacy:
sql
server,
reporting
services.
We
have
metabase,
we
have
redash
and
also
we
have
jupiter
why
we
have
so
much
stack
here,
because
we
we've
been
through
multiple
merch
and
sales
and
we
need
to
maintain
most
of
it,
because
the
users
still
need
to
use
it.
That's
why
metadata
and
then
the
lineage
things
is
really
important
here.
C
So
we
can
understand
easier
for
all
the
data
yeah
next,
so
why
we
need
the
data
catalog
in
in
busy
because
one,
the
first
one
is.
We
have
like
endless
repeated
question
from
from
anyone
like
where
the
data
is
how
it
is
produced
who
owns
it,
and
the
question
is
like
repeated
or
every
day
from
different
person,
and
we
we
we
keep
answering
it,
and
then
it's
also
difficult
to
look
for
lineage
and
impact
analysis
like
because
we
have
lots
of
data
source
and
a
lot
of
reporting
that
use
the
the
data.
C
It's
it's
quite
difficult
to
to
search.
If
we,
we
want
to
to
change
a
data
or
modify
data,
what
what
is
the
impact
for
for
the
other
for
the
the
application
for
the
reporting,
something
like
that
yeah.
C
So
this
is
our
journey
with
the
data
catalog
things
at
the
beginning
of
2020.
C
We
just
create
a
like
a
simple
manual
dead
lineage
on
cuba,
siege
and
then
we
moved
to
to
do
some
poc
with
mhg
atlas,
but
we
found
that
it
was
too
complex
and
too
hadoop
at
the
time.
So
we
we
stopped
the
plc
and
then
we
also
doing
some
plc
with
edmondson.
But
at
that
time
it
wasn't
really
answering
what
we
need
and
then
actually
at
the
end
of
2020,
we
found
data
hub
and
then
we
start
doing
poc
and
then
development
with
data
hub
next.
C
So
this
is
some
reason
why
we
choose
data
hub,
mostly
because
data
have
pretty
much
match
with
our
data
set
like
mostly
like
connect
and
bigquery
and
kafka,
because
data
have
used
kafka
a
lot
right
so
like
it's
really
matched
with
our
requirement
and
then
the
nokia
ingestion,
the
ml
recipes.
That's
really
really
helpful
for
us
and
then
the
development
of
the
source,
connector
and
sync
connector.
C
It's
really
really
helpful.
The
the
documentation
was
really
helpful
and
then
the
other
features
that
we
really
love
is.
We
can
show
the
dashboard
link
from
the
app
and
then
right
click
to
it
and
click
to
it,
and
then
we
we
will
brought
right
into
the
the
the
dashboard
itself
and
then
now
we
have
the
the
role-based
access
and
we
can
limit
what
user
can
do.
That's
just
really
really
awesome
right
now,
yeah
next.
C
So
this
is
our
data
hub,
integrations
usage
here
at
our
fintech
group.
We
have
databases
mostly
rdbms,
like
mysql
sql
server
postpress.
We
also
have
bigquery
kafka
and
there's.
There
are
two
source
integration
that
we
contributed.
This,
that
is
cafe,
connect
and
re-dash.
C
We
just
push
it
directly
to
the
data
hub
mcs
yeah
thanks,
so
why
we
developed
read
this
integration,
because,
after
the
merge
we
found
that
one
quinta
group
used
redact
a
lot
from
that
analyst
to
product
teams
to
hr
teams.
They
use
redact
a
lot,
they
practically
love
to
learn
sql
and
they
can
use
redash
quite
good
and
it's
actually
a
develop
based
on
the
superset
source
and
then
the
other
reason
is
actually
it
helped
the
plc
to
be
approved
internally
and
right
now.
C
A
C
C
Yeah
this
is
the
the
example
of
the
recipes
of
our
re
dash.
Is
you
can
find
it
in
the
documentation,
the
github?
Basically,
what
you
need
is
the
the
the
url
connection
of
the
radar
server
itself.
This
is
a
hosted,
no,
not
the
hosted
three
dash,
but
this
is
the
open
source
one
and
we
need
the
api
key
and
then
we
can
limit
the
page
for
like
for
testing
purpose
by
default.
This
is
not
limited,
so
it
will
ingest
all
the
dashboard
and
chart
and
we
have
the
skip
draft.
C
Optionally,
if
this
is
the
default
true,
so
if
you
want
to
invest
the
draft
or
unpublished,
dashboard
and
chart,
you
can
set
it
to
to
false.
C
C
C
This
is
our
data
hub,
for
example,
I
will
search
a
dashboard
called
the
personal
tracker.
C
C
But
currently
we
haven't
do
things
like
how
to
map
to
the
actual
table
because
it's
going
to
get
the
sql
parsing
like
look
ml
does,
but
we
haven't
developed
it
right
now.
A
C
If
you
want
to
yeah,
I
can
do
that
right
now.
I
already
prepared
it
for
you,
so
one
of
the
few
things
that
we
hope
data
hub
can
address,
that
is
the
lineage
visualization.
I
think
for
now
most
of
the
data
catalog
tools
actually
have
the
same
problem.
If
we
see
here
this
is
actually
the
the
lineage
that
came
from
our
legacy.
C
Lineage
google
sheets,
which
I,
which
I
just
push
it
into
data
hub,
and
if
we
see
that
this
is
having
quite
a
large
lineage
graph,
and
when
you
have
this
large
lineage
graph,
it
become
quite
difficult
to
to
read
actually
yeah.
This
is
one
of
the
things
that
hopefully
can
be
found.
The
solution
by
data
hub
teams.
A
Yeah,
I
think
we
will
ship
people
oculus
glasses,
so
they
can
fly
through
these
kind
of
lineage
graphs
soon,
yeah.
C
For
example,
like
one's.
C
A
All
right,
but
yeah,
point
completely
taken,
I
think
lineage
graphs.
They
look
beautiful
until
they
become
incomprehensible
and
I
think
that's
something
we
as
an
entire
industry
have
to
actually
tackle
yeah.
B
C
Data
hub
development
experience
coming
from
me
and
our
team.
Actually,
the
contribution
for
cap
connect
source
was
my
first
open
source
contribution
ever
actually
for
github
tastes,
yeah,
and
I
thought
that
the
community
is
very
welcoming
it's.
They
are
very
supportive.
I
even
got
some
private
message
and,
like
sisanka
asked
me,
do
I
still
want
to
to
contribute
something
like
it.
It's
it's
very,
it's
very
supportive,
yeah
and
the
documentation
is
very
helpful
like
how
to
add
new
ingestion
source.
It's
really
really
helpful
in
the
standard
way.
C
B
C
We
are
still
in
fb
assistant,
but
we
will
socialize
and
get
users
feedback
starting
from
next
week,
and
I
hope
that
this
will
give
impact
for
for
our
organization
and
expected
for
data
hub
yeah.
We
already
talked
about
the
lineage
for
the
large
graphs
and
then
we
also
interested
in
operational
data
quality
metrics,
something
like
lagging
metrics
and
then
row
count
something
like
that:
yeah
just
to
check
for
the
anomaly
on
daily
basis,
something
like
that
yeah!
That's
all
for
me.
Thank
you!
So
much
cool
thanks
a
lot.
What
time
is
it
right?
C
Now,
it's
around
almost
midnight
actually
all
right,
all
right,
no
problem,
I'm
I'm
quite
that's
natural!
Actually,.
A
Cool
cool
best
of
luck
for
today's
video,
then
all
right
thanks
a
lot
toffee
and
we
will
move
ahead
and
I'm
thinking
in
the
interest
of
time.
Since
we
only
have
five
more
minutes,
we're
going
to
skip
the
performance
metrics
demo
that
dexter
had
and
go
straight
to
the
surprise
session.
A
What
we
will
do
is
just
set
up
a
one-off
developer
session
for
people
who
are
really
interested
in
doing
a
deep
dive
into
data
hub
performance
measurement,
and
we
can
kind
of
do
an
office
hour
session
with
dexter
on
that,
and
he
can
show
us
how
he's
doing
load,
testing
and
measuring
performance,
or
we
can
do
it
as
a
follow-on
session
at
a
future
town
hall.
B
Yeah,
do
you
want
me
to
just
share
a
screen
again
sure
all
right
guys,
one
more
time.
You'll
have
to
see
me
today,
so
I'm
gonna
reveal
a
little
surprise.
We've
been
working
on,
and
that
is
extreme
makeover
data
hub
edition.
B
So
if
you
guys
have
been
tuned
into
datahub
for
a
long
time
since
the
beginning
of
the
year,
actually
you'll
know
that
you
know
our
first
pass
on
the
react.
Ui
was
really
to
get
it
to
parity
with
that
legacy.
Ember
app
and
what
we've
begun
to
do
in
the
last
month
is
actually
start
to
improve
the
ui
design
and
kind
of
get
it
into
that
envision
state
we
had
initially
when
we
began
the
migration
to
react.
B
B
So
I'm
going
to
come
over
here
to
data
hub
and
I'm
just
going
to
go
through
the
login
experience
here
and
immediately
you'll
notice,
a
fresh
new
appearance
on
the
home
page.
Now
this
redesign
is
actually
not
complete.
So
I
ask
that
you
kind
of
suspend
your
belief
here,
but
we're
going
to
go
ahead
and
search
for
some
data
sets
and
I'm
going
to
go
ahead
into
this
first
one
and
you'll
notice.
The
the
new
design
here,
a
couple
of
things
to
call
out
is
we've
kind
of
greatly
improved
the
the
schema
visualization.
B
So
we
have
this
nested
schema
kind
of
expansion
that
we
have
now
we
have
colored
tags,
which
is
exciting.
We
have
these
fun
owner
bubbles.
We
have
the
side
panel
on
the
right
side,
which
gives
the
kind
of
entity
at
a
glance
view.
So
we
have
the
documentation.
We
have
the
statistics
about
the
data
set.
We
have
the
tags,
we
have
the
owners.
B
Obviously
we
have
these
little
toolbars,
where
you
can
switch
between
different
things.
This
is
an
example
of
the
documentation
tab.
One
change
we've
made
is
that
editing
documentation
is
now
an
inline
process,
all
right,
so
my
update,
you
know
just
hit
save
okay.
We
saved
that
properties,
nothing
here,
some
lineage.
You
can
go
ahead
and
see
a
redesigned
sort
of
visualization
experience
a
little
bit
softer
on
the
eyes
nicer
around
the
edges.
B
Query's
got
a
little
bit
of
a
redesign
here
where
we're
actually
highlighting
the
sql
syntax
a
bit
better
than
before.
If
you'll
recall,
we
just
had
some
gray
text
previously,
so
I
think
this
will
be
super
useful
and
then
we
have
you
know
the
the
classic
stats.
Both
the
latest
view,
the
historical
view
not
too
much
has
changed
here
so
yeah.
B
So
basically,
this
is
to
announce
that
we're
going
to
be
kind
of
going
through
the
the
app
piece
by
piece
and
redesigning
it
with
this
fresh,
fresher,
new
look
which
we're
really
excited
about
and
what
I'll
talk
about
next
is
sort
of
what
that
journey
will.
Hopefully
look
like.
B
Okay,
let
me
just
present
so
yeah.
Just
this
final
slide.
The
next
thing
we're
going
to
be
doing
is
kind
of
expanding
beyond
that
data
set
page
to
the
other
entities,
charts
dashboards
tasks
and
pipelines,
and
then
we're
going
to
move
on
from
there
to
redesign
and
rethink
the
search
and
browse
experience.
This
is
sneak
preview
on
the
left
here,
where
we
actually
have
this
more
rich,
faceted
search
experience,
which
I
think
hopefully
will
be
a
big
improvement
from
from
what
we
have
today.
B
Finally,
we'll
be
enriching
that
home
page,
which
will
be
that
next
thing,
with
richer
recommendations,
as
well
as
richer,
jump
off
points
to
both
search
and
sort
of
the
more
classic
hierarchical
discovery
experience.
And
finally,
we
have
we
have
dark
mode
coming
as
well,
so
we'll
kind
of
be
transposing
this
slick.
New
look
into
the
the
dark
mode
world
as
well,
so
super
excited
would
love
any
feedback
from
the
community,
we'll
be
pushing
out
the
first
milestone
here
in
the
next
week
of
the
datasets
page.
B
You
know
all
the
pages
are
going
to
have
generally
this
branding,
but
some
of
the
you
know
dashboard
chart.
Other
pages
will
still
sort
of
mainly
look
like
they
did
previously
for
the
time
being
until
we
get
them
all
migrated.
So
that's
pretty
much.
It
really
excited
about
this
one
looking
forward
to
feedback
and
contribution
from
the
community
as
we
as
we
progress
with
this.
A
Awesome
and
that
kind
of
concludes
our
session
for
today.
I'm
really
sorry,
we
couldn't
get
to
you
dexter,
we'll
catch
you
up
in
a
either
a
deep
dive
session
with
the
developers
or
we'll
do
it
as
a
follow-on,
but
thanks
everyone
for
all
of
the
love
and
all
of
the
you
know
star
eyes.
I
was
similarly
excited
when
I
first
saw
the
demo.
So
yes,
it's.
A
It
actually
makes
such
a
big
difference
when
the
there's
an
amazing
platform
beneath,
but
you
can
finally
see
the
light
at
the
very
top
as
well,
so
really
excited
to
see
what
we
can
do
with
the
product
going
forward
thanks
to
everyone
for
all
the
contributions
and
for
staying
up,
and
let's
keep
the
momentum
going.
There
were
a
lot
of
questions
even
offline,
we'll
take
them
on
slack.
B
Bye
see
you
guys
and
before
I
hop
off
one
last
thing.
I
want
to
give
a
huge
shout
out
to
my
colleague
gabe,
who
drove
the
first
part
of
this
redesign,
so
thanks
to
gabe,
if
you're
looking
for
someone
to
thank,
don't
thank
me
or
thank
gabe
for
the
redesign.