►
From YouTube: Tech Deep Dive: DataHub Metadata Service Authentication
Description
John Joyce (Acryl Data) gives a deep-dive into the DataHub Metadata Service Authentication during the November 2021 Community Town Hall
Referenced Links:
https://datahubproject.io/docs/how/auth/jaas https://datahubproject.io/docs/how/auth/sso/configure-oidc-react/
https://github.com/linkedin/datahub/blob/681ed91a0006a2d20535c0d5c30f0a68afcfab9f/docs/introducing-metadata-service-authentication.md
A
So
I
think,
as
many
of
you
are
aware
today,
authentication
and
data
hub
is
handled
at
the
datahub
front-end
proxy
service
exclusively
and
currently
that
happens
via
jos,
which
is
you
know,
username
password
login,
which
you
can
tie
up
to
an
ldap
server,
for
example,
or
to
a
file
director
directly
or
via
sso,
so
oidc,
which
many
of
you
are
using,
and
what
we
do
is
we
verify
we
kind
of
set
and
verify
session
cookies
from
data
hub
front
end
proxy.
A
What
this
means
is
that
there's
limited
options
for
adding
authentication
to
the
metadata
service
layer
itself,
which
is
sitting
behind
the
the
proxy,
and
why
is
that
significant?
Well,
it's
significant
because
when
you're
ingesting
metadata,
typically,
the
recommendation
is
to
ingest
against
the
metadata
service
directly,
and
so
what
that
means
is
that
all
of
that
ingestion
traffic
will
go
unauthenticated.
A
So
there's
two
kind
of
suggestions
that
we've
had
for
folks
in
the
open
source
community
around
how
to
authenticate
requests
today.
So
the
first
option
is,
you:
can,
you
know,
set
up
a
custom
proxy
in
front
of
metadata
service
and
you
can
perform
your
own
authentication
there
or
you
can
actually
extract
that
session.
Cookie,
that's
set
by
datahub,
front-end
proxy
and
use
that
in
programmatic
requests,
as
long
as
they're
routed
through
the
front-end
proxy-
and
this
is
a
very
new
recommendation
actually
so,
just
at
a
high
level.
A
This
is
kind
of
what
the
old
world
looks
like.
We
have
on
the
left
here,
the
front
end
proxy
service
and
then
on
the
right.
The
metadata
service,
the
front-end
proxy,
is
where
the
auth
happens.
A
There's
a
session
cookie,
that's
passed
from
the
ui
and
the
front-end
proxy
also
handles
all
communication
with
external
identity
providers
to
perform
the
oidc
or
sso
flows,
but
all
communication
between
the
front-end
proxy
and
the
metadata
service,
along
with
your
metadata
ingestion
framework
and
the
metadata
service,
is
currently
unauthenticated,
and
so
just
to
summarize
what
the
problem
we're
trying
to
solve
in
the
last
month
was.
Is
we
really
just
don't
have
any
formal
support
for
making
authenticated
requests
to
data
hub
apis
which
live
in
the
metadata
service?
A
So
I'm
going
to
come
over
here
to
datahub,
as
you
can
see
in
the
top
right
here
we
have
a
new,
a
new
tab
called
settings
and
if
you
go
into
there,
you'll
see
one
one
option
which
is
called
access
tokens
today
and
what
this
tab
allows
you
to
do
is
to
generate
access
tokens
for
use
with
data
hub
apis
and
the
first
type
of
access.
A
Token
we're
rolling
out
is
what
we're
calling
a
personal
access
token
if
you've
used
github
recently,
you
may
be
familiar
with
the
concept,
but
really
what
this
is
on
data
hub
is
it's
a
token
that
you
can
generate
with
your
own
privileges,
so
maybe
I'm
john
joyce.
I
have
a
certain
set
of
policies
that
are
assigned
to
me
when
I
generate
an
access
token.
A
Those
policy
privileges
will
be
carried
over
to
that
token
as
well,
and
so
what
I'm
going
to
demo
is
actually
generating
an
access
token
for
data
hub
that
expires
in
one
day,
I'm
going
to
go
ahead
and
click
generate
personal
access,
token
and
you're
going
to
see
this
panel.
So
we've
got
the
token
itself
and
then
we've
got
a
little
example
of
how
to
actually
use
it,
which
is
pretty
helpful.
A
It
includes
the
domain
which
I'm
posted
on
right,
so
I'm
going
to
go
ahead
and
copy
this,
and
then
I'm
going
to
try
to
make
an
authenticated
request
to
data.
So
I'm
going
to
come
over
here
to
postman
and
I'm
going
to
hit
the
graphql
account
and
request
my
own
username.
So
that's
the
query
I'm
going
to
make
so.
A
First
of
all,
I
don't
have
any
authentication
specified
at
all,
I'm
going
to
make
the
request
and
you're
going
to
see
a
401
unauthorized,
because
I'm
not
allowed
to
do
that
and
then
I'm
going
to
go
over
and
add
the
authorization
header.
With
the
new
token
that
I
just
generated
I'm
going
to
try
again
and
there
you
go.
I've
generated
a
graphql
request
that
has
my
username
coming
back,
and
the
cool
thing
is:
is
that
this
not
only
works
against?
A
A
So
here's
a
summary
for
folks
that
couldn't
make
the
the
session
but
of
what
we
built.
So
I'm
gonna
go
over
a
couple.
First,
just
like
the
high
level
key
concepts.
We've
introduced
the
metadata
service
to
make
this
work,
we
have
an
actor
which
represents
a
unique
identity
or
principle
that
is
accessing
datahub.
A
So
this
can
be
a
user
in
the
future.
It
can
be
a
service
identity,
so
programmatic
type
of
request,
currently
we're
only
modeling
users,
there's
the
concept
of
an
authenticator
which
is
really
a
plugable
component
in
the
metadata
service
that
is
responsible
for
taking
the
incoming
request
context
and
resolving
an
actor
from
that.
So
being
able
to
say,
is
this
actor
authenticated
or
not?
A
There
is
an
authenticator
chain,
which
is
basically
a
group
of
authenticators
that
are
executed
in
sequence,
in
an
attempt
to
authenticate
an
incoming
request.
So
what
this
means
is,
you
can
actually
have
multiple
flavors
of
authenticator
that
are
stacked.
That
can
authenticate
a
request
in
multiple
ways.
So,
for
example,
one
authenticator
may
pull
out
ldap
credentials
and
try
to
verify
them
against
a
third-party
ldap
server.
The
next
authenticator
may
verify.
You
know
an
idp
issued
access
token
using
a
signature
and
maybe
there's
a
third
authenticator
in
the
check,
the
stack
that
does
something
completely
different.
A
The
next
concept
is
a
data
hub
access
token,
and
so
that's
kind
of
what's
powering
the
generating
personal
access
token
feature.
Basically,
we've
introduced
a
new
token
service
into
the
metadata
service
that
allows
you
to
grant
or
generate
access
tokens
on
behalf
of
a
user
as
well
as
validate
those
when
requests
are
coming
inbound
and
we
do
that
via
an
authenticator
which
basically
just
validates
the
token,
and
I'm
just
going
to
talk
a
little
bit
about
the
high
level.
A
A
I'm
not
going
to
go
too
deep
into
this,
because
it'll
take
a
little
bit
of
time.
Instead,
I'm
going
to
refer
you
to
a
document
that
has
all
of
these
details
covered
if
anyone's
interested
in
that,
but
I'll
kind
of
wrap
up
by
just
giving
an
overview
of
how
you
can
actually
start
using
this
once
it's
deployed.
Currently
it's
in
a
pr
review
right
now,
so
metadata
service
authentication
will
be
disabled
by
default.
For
now,
that
means
that
your
session
cookie-based
authentication
that
you're
using
right
now
will
continue
to
work
without
interruption.
A
So
it's
kind
of
an
opt-in
system
you
can
enable
it
using
a
single
environment,
variable
called
metadata
service.
Auth
enabled
all
you
need
to
do
is
turn
that
on
and
data
hub
front
end
and
data
hub
gms
and
you'll
start
enforcing
authentication
at
the
metadata
service
layer.
There's
nothing
else.
You
really
should
need
to
do
just,
except
for
noting
that,
when
you're
ingesting
new
metadata,
you'll
need
to
have
an
access
token
provided.
A
Finally,
you'll
need
to
grant
privileges
to
actually
generate
personal
access
tokens.
So
by
default,
not
everyone
on
the
platform
will
have
that
privilege,
so
it'll
actually
be
a
great
outbox
to
generate
a
token.
You
won't
be
able
to
use
it,
but
using
the
data
hub
policy
system
you're
going
to
be
able
to
assign
a
new,
privileged
platform,
privilege
called
generate
personal
access
tokens.
Of
course,
the
data
hub
root
user
will
come
with
this
by
default,
so
you
can
spawn
off
privileges
from
there
and
finally,
I'll
just
talk
about
where
we
want
to
go
from
here.
A
A
What
we
want
to
do
is
really
make
the
the
process
of
registering
authenticators
much
more
sort
of
dynamic,
similar
to
what
shoshanka
showed
earlier.
The
end
goal
here
is
to
be
able
to
have
a
plug-in
location
where
you
can
simply
copy
an
authenticator
implementation
into
there,
put
it
in
your
configuration
and
start
using
immediately.
A
So
we
currently
only
support
personal
access
tokens,
but
in
the
future
you
can
see
us
supporting
this
additional
type
of
token
as
well
kafka
ingestion
authentication.
So
we
want
to
be
able
to
authenticate
write
requests
that
are
coming
off
of
the
kafka
stream
via
ingestion
through
the
data
hub
kafka
sink
currently
in
the
first
pass.
A
These
will
not
be
authenticated,
but
the
data
hub
rest
requests,
sync
requests
will
be
authenticated
and
then
finally,
we'd
like
to
actually
kind
of
make
the
access
token
management
inside
the
ui
a
little
bit
more
robust
with
the
ability
to
view
previously
created
tokens,
manage
them
and
actually
revoke
tokens
on
the
fly,
and
I
think
the
summary
is
this
is
in
review
right
now.
A
There's
a
big
pr
and
I've
got
a
huge
document
that
talks
about
all
these
concepts
in
depth,
along
with
an
faq
section
about
how
to
start
using
it
configuration
examples
and,
more
so
feel,
free
to
hit
this
link.
I
think
maggie,
hopefully,
will
share
out
the
slides
at
some
point
to
start
reading
more
about
it.