►
Description
Iker Martinez de Apellaniz from Adevinta shares their metadata use cases and DataHub adoption journey.
A
B
Perfect
so
I'll
be
back
on
how
to
say
corporate
slides,
but
I
because
it's
friday
afternoon
for
me,
then
it's
a
little
bit
of
fun
moment
and
I'm
after
this
I
probably
will
have
beer
and
not
coffee.
Sorry
for
that.
For
the
rest,
you
need
to
wait
a
couple
of
hours
in
the
west
coast
for
that
to
happen.
So
first
of
all,
my
name
is
eager,
like
email
not
like
iphone,
so
I'm
father
of
twins.
I
was
a
data
engineer
in
the
past,
then
I
was
an
enabler
and
now
I'm
up
probably
that
adevinta.
B
So
now
I
don't
know
how
to
code
or
how
to
change
time
in
docker
anymore.
Sorry,
all
that
forgot
all
of
this.
But
what
is
this
anyway,
so
adivinde
is
a
marketplace
specialist.
So
we
have
many
marketplaces
around
the
around
the
globe
around
the
world
and
we
have
different
vertical
different,
the
tolerance
and
we
try
to
create
perfect
matches
on
the
world's
most
strategic
marketplaces.
This
is
the
the
the
fancy
title
we
use
for
the
vita.
B
So
that's
what
we
try
to
do
if
you
need
a
car,
go
to
adventures,
marketplaces
and
you
will
find
your
car,
your
house,
your
job,
a
new
pair
of
sneakers,
whatever
you
want
right,
many
many
brands
around
the
world,
many
many
marketplaces,
different
teams,
different
offices
and
since
last
summer,
or
something
this
summer.
Actually
we
booked
ebay
classifies
group
I'll
see.
B
Usually
you
have
a
new
t-shirt,
but
it's
from
the
other,
even
the
classifieds
group,
which
means
we
have
even
more
and
more
marketplaces
now
more
and
more
teams,
and
you
will
see
later
how
this
is
a
challenging
scenario
for
a
data
catalogue.
If
you
are
not
the
exiting
already,
if
we
look
at
the
warm
up,
this
is
more
of
like
this.
I'm
I'm
as
as
you
mentioned,
I'm
in
barcelona
right
right
now,
so
just
in
the
middle.
B
B
This
is
more
or
less
our
product,
we
call
it
the
data
highway
and
it's
basically
a
composite
of
many
many
managed
kafkas
that
we
manage
for
for
our
clients
for
our
tenants,
which
are
all
these
marketplaces
and
some
central
operation
groups,
and
we
also
have
data
sets
in
data
lakes
and
something
we
call
fmdq,
which
is
filter,
map,
dispatch
and
quality,
which
is
how
we
move
data
around
from
one
kafka
to
the
other,
from
one
kafka
to
the
data
lake
or
from
one
kafka
to
anywhere
you
that
you
need
it
actually,
then
we
have
an
inventory
of
assets
that
we
need
to
maintain
all
of
this
running,
and
then
we
have
something
that
is
called
data.
B
But
this
is
not
your
data.
This
was
before
data
was
partly
we
used
this
name.
We
decided
on
this.
It
was
a
cool
one
and
suddenly
someone
decided
to
copy
it.
So,
let's
call
it
for
the
purpose
of
this
chat
because
it's
already
confusing
inside
alevinta,
so
the
government's
ui
okay,
it's
going
to
go
over!
That's
right!
B
A
B
Self-Manage,
the
authentication
or
the
authorization
in
data
sets,
so
we
cannot
control
the
authorization
for
all
these
data
that
we
own
from
different
marketplaces.
Marketplaces
need
to
give
access
to
this,
because
we
are
like
five
people
or
ten
people
in
the
team,
and
there
are
many
many
datasets
as
you
will
see,
so
we
have
done
this
cell
surf
custom
made,
and
this
means
that
people
need
to
control
this
to
control
this.
We
need
a
list
of
data
sets.
We
also
need
to
comply
with
regulations.
If
anyone
is
in
europe,
you
will
know
gdpr.
B
B
B
The
problem
is
that
it
will
have
users
and
these
users
are
quite
demanding
right,
so
so
they
want
lineage,
they
want
documentation,
they
want
glossary,
one
dashboards,
they
want
full
text
search
and
all
these
fancy
stuff
I
want.
I
want
to
help
this
course
on
the
data
set,
and
I
want
communities
saying
this
is
a
good
data
set.
This
is
a
bad
data
like
are
you
serious.
B
My
kids
coming
here
and
asking
me
to
play
with
them.
I
don't
have
time
for
this.
Actually
I
have
that's
part
of
my
job,
but
it's
a
little
bit.
You
know
it's
a
little
bit
overwhelming,
so
we
said
cool
yes,
but
what
happens
is
we
have?
I
mean
a
lot
of
data
and
a
lot
of
flows
of
data
moving
around
so
an
inventor
is
over
is
not
good
enough
anymore.
We
need
to
change
and
there
is
a
tool
for
this,
which
is
called
a
data
catalog
right
there,
also,
which
tools
are
out
there.
B
We
need
to
do
a
global
data
catalog
and
we
need
to
do
it
like
data
mess.
I
hope
I
mean,
I
don't
know
if
you
are
doing
database,
but
everyone
talks
about
data
mess.
No
one
knows
how
to
do
data,
but
everyone
talks
about
data
right.
So
we
have
now
suddenly
subito
the
spanish
marketplaces,
belarus,
austria,
all
of
them
coming
and
probably
soon
we
will
have
also
the
ebay
parts
in
canada,
south
africa,
coming
also
to
say,
hey
take
my
date
I
went
out.
B
B
On
one
side
we
have
the
ingestion
pack,
which
is
how
do
we
put
data
or
metadata
and
lineage
inside
the
tool
on
the
other?
One
is
like:
how
do
we
manage
this
infrastructure,
so
it
scales,
so
it
grows
with
a
stable,
so
it
can
be
accessed
from
different
places
and
the
third
part,
which
is
the
one
in
the
the
api
and
the
integrations
part
which
is
the
acting
but
like?
B
B
We
did
three
with
different
teams
with
other
pieces
and
in
my
team
we
did
the
one
with
linkedin
data
hub
for
our
purpose
for
our
product.
Sorry-
and
we
said-
okay,
let's
start
with
some
research
on
on
which
alternatives
are
in
the
market.
So
we
look
at
a
couple
of
them.
We
already
knew
atlas.
We
already
knew
our
own.
B
We
have
some
people
looking
at
third
parties
and
we
were
quite
interested
already.
We
were
quite
biased
towards
linking
to
the
linkedin
data
hub,
so
I
have
to
say,
like
we
already
liked
it
from
from
the
media
and
from
the
blog
post
and
so
on.
So
we
said:
okay,
let's,
let's
give
it
a
try
and
we
find
it
in
june.
That
was
really
easy.
Just
to
display
data
from
this
off
the
shelf,
connectors
right
so
like
having
redstiff
data
having
a
thinner
data.
B
B
B
We
have
the
production
and
the
infrastructure
production
already
with
with
a
little
bit
of
bat,
but
we
have
it
production
ready
and
within
an
mba
mvp
of
the
ui
that
I
will
explain
later.
What's
the
problem
with
the
ui,
so
now
what
we
will
do
until
the
end
of
the
year
is
try
to
get
more
data
origins,
as
we
call
with
external
teams,
so
set
up
more
connections
from
different
marketplaces.
B
Some
more
bitcoin,
more
red
shifts,
more
athenas,
more
glues,
maybe
a
snowflake,
maybe
something
else
and
do
some
kind
of
used
characters
like
do
you
like
this?
Do
you
find
it
useful?
Is
it
easy
to
make
the
connectors
from
the
tenant
perspective
like
they
don't
own
the
infrastructure?
They
just
need
to
send
the
data
and
take
value
on
other
side
of
the
pipeline.
B
On
the
ingestion
path,
we
very
quickly
we
will
put
a
received
of
the
shelf
and
thanks
to
the
victim
which
give
us
the
credentials
to
this,
because
we
don't
regret
shift.
We
could
put
also
athena.
We
use
the
glue
connector
for
athena
because
it's
more
generic,
but
if
anyone
has
a
different
opinion.
Please
contact
me
in
the
slack
and
tell
me
the
process,
because
I
think
blue
is
better,
but
maybe
you
you
think
differently
for
kafka.
A
B
There
are
some
collisions
there
that
we
found
in
the
in
the
browse
path,
and
what
we
have
decided
already
is
that
we
will
replace
the
atlas
solution
regardless
of
the
result
of
the
poc,
which
is
still
running.
We
need
to
validate
with
users,
so
we
will
change
it
for
a
w.
We
will
change
atlas
for
linkedin,
because
it's
again
easier
to
maintain
on
the
infrastructure.
B
We
we
test
octa
because
we
need
the
list
of
users
and
the
next
we
are
already
talking
with
people
are
more
relative
and
maybe
hive,
maybe
a
snowflake,
maybe
you'll,
be
query
and
see
who
is
in
for
testing.
So
we
depend
a
little
bit
on
our
colleagues
and
the
click
is
not
clicking
so
yeah
and
the
custom
connector
again,
I
said
we
have
an
inventory
of
data
sets.
We
need
it
to
maintain
access
and
authentication,
so
we
need
to
implement
and
show
this
in
the
catalog.
B
This
is
key
for
us,
but
because
it's
a
custom
solution,
we
need
to
do
a
custom
connect
and
it
happens
the
same
with
this
filter
map,
dispatch
and
quality
image
which,
because
it's
custom,
custom
built
it
needs
a
custom
connect
without
the
spec
youtube
like
connector,
for
ourselves
no
worries
on
the
serving
part.
Okay,
we
deployed
this
in
kubernetes
and
again
kudos
to
the
common
platform
thing.
We
have
a
orchestra,
the
connectors,
orchestrated
with
the
home
charts.
If
I'm
not
wrong,
we
are
finalizing
the
monitoring
and
the
alerting
of
these
things.
B
You
also
are
improving
this
as
I,
as
you
mentioned,
the
the
the
good
thing
that
we
found
that
I
like
that,
we
like
is
the
meta
metadata
ingestion
on
top
of
kafka,
because
it
gives
this
possibility
of
getting
to
the
past
of
okay
reset
my
consumer
offset
and
present
all
the
all
this
data
and
if
the
ingesting
part
is
is
down
for
a
while,
then
it
can
catch
up
later.
So
that's
our
architectural
pattern.
B
We
have
pro
tried
in
the
past
and
that
happily
surprised
us
so
now
the
challenge
will
be
to
define
how
to
make
this
multi-tenants.
So
one
thing
that
doesn't
break
the
other
tenants
metadata
right.
It's
a
couple
of
a
little
bit
of
an
interesting
challenge
to
have
and
the
other
one
is
the
the
mvp
the
ui.
So
I
explained
with
an
mvp
of
the
ui,
which
means
like
the
research
that
we
have
done
internally
in
the
company.
It
says
we
already
have
too
many
ui's.
B
So
we
have.
We
have
the
data
cut
up
in
a
different
part,
from
the
governance
tool
or
from
the
machine
learning
in
the
platform
or
from
the
experimentation
platform.
It's
already
a
bigger
mess.
So
we
are
trying
to
consolidate
all
these
tools
into
a
more
centralized
ui.
So
don't
make
the
problem
worse
by
adding
now
a
data,
and
this
is
a
challenge,
and
this
is
a
pity,
because
the
challenge
is
how
to
do
something
as
good
as
the
data
hub
ui
internally,
without
rewriting
all
the
components
right.
So
this
is
an
interesting
challenge.
B
We
can
edit
the
metadata
and
it
stores
it
in
the
source
of
truth.
So
there
are
a
couple
of
things
here.
We
have
a
custom
dashboards
in
data.com
with
statistics
on
which
people
which
tenants
are
sending
data
to
this
data
set.
So
there
are
a
couple
of
things
here
that
we
cannot
replace
and
that's
why
we
are
not
using
the
the
ui
for
the
moment
or
we
are
not
using
the
ui
fully,
but
we
might
use
a.
B
B
So
we
we
will
investigate
how
to
do
this
with
our
own
things,
with
our
own
kafka,
with
our
own
s3
data
sets
with
the
pi
information
we
manage
with
the
rest
of
the
tenants
that
we
need
to
wait
at
least
until
next
year,
probably
a
little
bit
more
because
we
are
simply,
we
are
not
there.
Yet
we
are
lagging
governance
and
agreements
to
do
so,
I'm
very
quickly
because
I'm
not
checking
the
time
and
money.
Don't
look
at
me.
B
Some
findings
so
far
in
these
three
three
four
months,
we
have
actually
been
working
here
like
kudos
to
the
community
community
like
there
were
a
couple
of
names
in
on
that
slide
of
contributors
that
you
mentioned
so
I'm
we
are
super
happy
of
being
able
to
contribute
there.
Isn't
these
1
302
people,
probably
they
are
more
now.
B
The
atmosphere
in
this
lag
is
super
good
responses
is
great.
There
are
a
couple
of
hours
of
delay,
but
that's
normal
because
of
the
you
know.
You
are
sleeping
when
we
are
working.
So
that's,
okay
and
the
other
development
speed
new
features
are
super.
Fast
are
very
good.
Packs
are
fixed
very
quickly,
so
super.
A
B
There,
the
other
one,
is
the
architecture
that
matches
what
we
want.
So
that's,
okay,
simply
right
and
then
the
main
problem
here,
which
we
already
know
is,
is
independent
from
the
tool.
So
our
inventory
linking
data
have
customized
solution
commercial
solution.
The
problem
is
the
same
like:
how
do
you
fix
multi-tenancy?
How
do
you
do
it
like
the
like
latent
the
data
mesh?
What
do
you
do
when
there
is
no
metadata,
no
governance
and
with
the
data
quality?
So
I
describe
this
as
we
can
do
the
best
tool
in
ever.