►
From YouTube: DataHub Basics — Users, Groups, & Authentication 101
Description
Pedro Silva (Acryl Data) provides an overview of Users, Groups, & Authentication in DataHub during the December 2021
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
Back
to
basics,
so
the
idea
of
this
presentation
here
is
to
explain
a
little
bit
how
users
and
groups
from
a
metadata
entity
perspective
relate
to
when
you,
as
a
user,
try
to
log
in
to
data
hub
through
an
authentication
process.
Okay,
so
with
that
in
mind,
assuming
that
this
is
working,
let
me
try.
No,
it
was
so
good.
Yesterday,
oh
wait
here.
We
go.
There's
a
lag
here,
sorry
about
this,
so
how
do
they
relate?
A
There
are
essentially
three
steps
here
and
I've
color
coded
them
so
that
you
can
relate
them
afterwards
if
needed.
So
an
initial
step
is
how
you
ingest
user
and
group
information
into
data
hub
from
a
metadata
graph
perspective,
so
how
you
can
have
the
notion
of
users
correlate
to
other
entities
that
you
might
have
in
your
deployment
and
a
second
step
that
we'll
define
in
yellow
is
authentication.
So
how
can
you
actually
log
in
to
datahub?
A
How
can
you
access
the
ui
using
credentials
so
that
not
everyone
can
access
your
particular
datahub
instance
and
to
also
identify
who
you
are
as
a
person
within
data
hub
and
finally,
the
magic
sauce
here
is
how
you
can
actually
relate
and
match
the
user
that
has
just
been
authenticated
to
existing
users
in
the
metadata
graph.
So
with
that
in
mind,
let's
first
look
into
users
and
groups.
A
These
are
part
of
the
metadata
model
and
there
are
two
instances:
one
is
a
corporate
user
which
essentially
models
who
you
who
a
person
is
and
what
they
are
doing
in
relation
to
other
aspects
or
other
entities,
and
I
should
say
in
data
hub
so
you're,
just
modeling
a
person,
and
you
want
to
do
this
to
answer
questions
like
who
is
the
owner
of
x.
Who
has
worked
on?
Why,
right
so
an
example
here
could
be?
Who
is
the
owner
of
x
data
set?
A
In
this
case
it
would
be
john,
but
it
could
very
well
be
mary
or
jane
or
someone
else,
and
you
want
to
have
the
sort
of
information
in
data
hub.
The
other
notion
is
the
notion
of
a
corporate
group,
so
this
is
essentially
you're
modeling,
a
group
of
people
within
data
hub
right.
So
perhaps
you
do
not
have
a
single
owner
or
a
single
person
responsible
for
some
asset.
You
have
a
set
of
people
and
you
need
to
define
this
in
some
way
and
you're
essentially
saying.
A
Okay,
if
there
is
a
operations
team
that
is
handling
the
production
databases,
they
are
composed
of
more
than
one
person.
You
model
this
as
a
corporate
group
right.
It
matches
very
closely
to
what
you
would
ingest
or
what
you
would
see,
for
example,
from
an
ldap
group
and
that's
what
we're
trying
to
do
here.
A
A
A
Also
important
notes
here:
users
and
groups
only
represent
people
and
accounts
that
have
relationships
to
other
data
sets
or
may
have
not
currently,
but
perhaps
in
the
future.
It
is
only
used
at
that
level
for
the
entity
graph.
This
does
not
is
not
used
for
authentication
or
authorization
at
all,
and
that's
because
data
hub
does
not
have
a
database
or
a
service
to
manage
users
and
credentials,
and
what
we
do
is
we
defer
to
external
services
right.
A
So
this
is
your
single
sign-on
you're
connecting
to
an
external
service,
or
you
perhaps
have
ldap
or
jazz,
with
some
users
and
pro
users
and
passwords
in
a
file
somewhere
and
so
moving
on
from
that
step.
Two
in
this
process
is
authentication
so
and
therefore
currently
supports
two
ways
of
logging
in
and
being
authenticated,
which
are
jazz.
A
Oh
sorry,
yeah!
This
is
for
open,
connect,
open
id
connect,
sso,
not
other
methods.
That's
the
only
one
that
we
support.
Thank
you,
john
for
that,
and
this
authentication
step
serves
only
to
limit
and
identify
who
gains
access
to
the
ui
or
to
the
data
hub
components,
but
it
does
not
affect
the
major
the
metadata
graph
of
information.
Okay.
So
there
are
two
separate
and
very
distinct
notions
here.
A
One
is
related
to
the
graph
itself,
the
things
that
we
store
and
relate
in
data
hub,
the
other
is
how
you
gain
access
to
these
services.
So
where
does
the
magic
happen?
And
how
does
it
happen?
And
when
a
user
becomes
authenticators,
we
try
to
translate
that
into
a
corporate
user
entity
in
data
hub.
We
do
this
by
extracting
information
from
jazz
or
the
oidc
connection,
where
once
we
validate
the
credentials,
we
map
them
and
it
depends
on
what
you're,
using
for
the
jazz
use
case.
A
We
take
the
username
that
you
used
and
prefix
that
with
earn
li
corporate
user
and
that's
the
urn
of
the
user
that
has
just
authenticated
for
oidc
or
sso.
It
depends
on
the
claim
variable
that
you
have
configured
for
data
hub
to
extract
information.
So
in
this
case
you
would
do
earn
li
corporate
user
and
the
extracted
information,
so
a
simple
use
case
could
be
and
that
you
defined
an
email
as
the
claim
right.
A
So,
in
my
case,
let's
say
it's
pedro
at
acryl
dot,
io
that
entire
string
would
be
used
as
the
suffix
for
my
urn,
but
perhaps
the
only
thing
that
I
want
to
do
is
to
have
just
the
other
right.
I
don't
want
the
at
acryl
dot
io.
So
in
that
case
for
claims,
I
can
actually
customize
the
way
that
I
extract
that
information
from
the
regex
and
that
is
using
environment
variables.
So
off.
Oh
idvc,
username,
regex
and
here
is
the
example
for
that
I've.
A
Just
given
you
that's
how
it
would
work
but
bear
in
mind-
and
this
is
a
source
of
a
lot
of
confusion
or
a
significant
amount
of
confusion
that
I've
seen
in
the
community
is
that
these
earns
that
we
compute
when
you
log
in
they
must
match
exactly
to
the
earns
of
existing
corporate
users
so
that
you
have
that
match.
That's
how
that's
done.
It's
essentially
a
lookup
that
we
do
on
the
metadata
graph
right
and
now
once
you've
done
this
once
you've,
matched
it
if
you've
configured
sso
or
by
default.
A
Sso
does
this
is
meta
automatic,
metadata
provisioning?
So
I
mentioned
before
that
you
could
do
the
ingestion
via
the
framework,
but
you
can
also
do
this
on
the
fly
when
you
log
in
which
is
enabled
by
default
and
which
we
will
try
to
provision
user
and
group
profiles
when
users
first
log
in-
and
this
can
be
something
that
you
might
not
want,
for
example,
for
governance
use
cases
to
only
allow
access
to
the
to
data
hub
to
its
ui
and
to
its
back
end.
A
Four
people
have
been
previously
ingested
in
the
past
and
these
are
the
environment
variables
that
you
need
to
look
at.
Why
is
this
important?
Perhaps
you
just
want,
for
instance,
at
first
for
the
data
team
to
access
data
hub,
even
though
you've
enabled
sso
and
everyone
in
your
company
can
use
sso
right.
But
perhaps
you
just
want
to
subset
of
those
users
to
access
data
hub
for
let's
say
perhaps
confidentiality
purposes.
So
this
is
how
you
can
control
that
sort
of
thing
in
an
sso
scenario.
A
A
So
you
log
in
given
that
information
that
the
identity
provider
gives
we
extract
the
email
first
name
and
last
name,
and
we
create
a
corporate
user
from
that
and
if
it
does
not
exist,
we
add
it
to
the
database
and
and
then,
as
a
second
step,
we
try
to
extract
the
groups
if
they
exist
in
our
well-formed.
We
create
the
corporate
group
objects
again.
A
We
add
it
if
it
does
not
exist
and
then
the
final
step
is
to
create
a
group
membership
aspect
in
this
created
group
user
to
connect
to
the
other
groups
that
we
have.
So
this
is
how
ingestion
at
a
first
step
or
the
first
ingestion
with
sso
happens.
A
However,
bear
in
mind
that
and
the
reason
why
I'm
actually
giving
this
presentation
is
perhaps
because
we
found
it
to
be
a
little
bit
confusing
and
I'll
admit.
I
myself
was
quite
confused
even
when
trying
to
do
this
presentation
and
and
still
am
sometimes,
I
need
to
think
a
little
bit
to
understand
how
things
work.
A
So
I
completely
understand
if
other
people
in
the
community
also
feel
the
same,
and
I
want
you
to
know
that,
and
there
have
there
is
a
an
ongoing
conversation
over
whether
this
is
the
right
way
to
go
or
if
we
should
do
this
kind
of
relationship
between
metadata
and
authentication
differently.
If
that's
the
case,
then
please
any
thoughts
that
you
might
have
any
opinions
come
talk
to
us,
because
this
is
something
that
can
clearly
be
an
issue
and
for
people
who
are
onboarding
on
the
first
time
and
maggie.
That's
it
on
my
side.