►
Description
John Cragg describes the DataHub adoption journey at Depop, a UK-based peer-to-peer social shopping company.
A
Yep
here
we
are
and
oh
another
handoff.
We
have
john
who's
gonna
talk
to
us
about
the
data
hub,
hackathon
that
the
depop
guys
did-
and
you
know
it
was
pretty
cool-
to
see
them
pop
in
literally
pop
in
to
the
community
channel
and
say:
hey
we're
doing
a
hackathon,
and
you
know
in
a
few
days
they
had,
they
were
contributing,
glue
integration
back
to
us,
and
you
know
the
clarinet
folks
helped
them
out.
So
thanks
for
that
collab,
I
think
it
was
really
nice
to
see
that
happen.
B
Can
you
see
my
screen
well,
yep
awesome
yeah
a
bit
of
a
tough
one
to
follow
that
one
that
looked
really
good.
So
I
hope
that
is.
Don't
disappoint
you
here,
but
anyway,
my
name's,
my
name's
john
I'm,
the
lead
data
engineer
here
at
depop
and
yeah
here
to
talk
about
the
hackathon
that
we
did
with
the
data.
B
So
just
quick
intro
about
the
pop
and
who
we
are
we're
a
fashion
marketplace
for
the
next
generation
to
buy
and
sell,
discover,
unique
fashion,
so
we're
an
app.
Basically,
we
sell
provide
the
ability
to
sell
predominantly
second-hand
fashion
and
sustainable
fashion.
You
can
think
of
as
a
bit
like
ebay,
mixed
with
instagram.
B
That's
what
my
top
my
mum
anyway,
when
I
joined
but
yeah
lots
of
lots
of
people
in
the
uk
using
dpop,
and
it's
growing
around
the
world
as
well
as
the
us
too
yeah
and
we're
we're
growing
very
fast,
and
our
data
needs
growing
well
too.
B
So
why
did
we
look
at
the
data
hub?
Well,
we
need
to
enable
the
business
to
use
data
in
a
self-service
fashion
and
we
need
a
single
location
for
all
of
our
data
needs
shout
out
to
the
design
crew.
Who
did
that?
Certainly
wasn't
me,
so
I'm
going
to
walk
us
through
some
of
the
problems
that
we're
trying
to
solve
here
that
we
can
see
through
various
slack
messages
that
we've
had
through
the
company.
B
So
we've
got
issues
with
data
discovery,
somebody
new
joins
and
they
want
to
know
about
data
for
our
crm
and
they
they
don't
really
know
where
to
find
it,
which
is
a
bit
of
a
shame.
B
We
want
to
know
data
about
recently
viewed
items
or
any
data
about
banning
people
in
the
trust
platform,
and
we
don't
have
that
single
location
for
search,
so
the
data
hub
would
be
pretty
useful
there.
The
data
lineage
aspect.
B
I
don't
really
need
to
speak
about
this,
as
have
you
seen
a
perfect,
a
demonstration
of
how
how
that
works,
but
generally
producers
and
consumers,
and
seeing
who,
where
data
starts
and
where
it
ends
up
all
the
way
through
to
our
looker
instance,
that
would
be
very
useful
for
our
business
users
and
depop's,
a
startup
or
a
scale-up,
and
and
we've
got
lots
of
knowledge
in
our
heads
when
we
have
one
in
the
the
sort
of
documentation
phase.
B
So
the
tribal
knowledge
is
pretty
rife,
and
this
this
table
has
a
a
a
column
called
active
status
which,
over
the
years
has
has
baffled.
Many
people
in
the
business,
including
this
guy,
who
said
active
status,
could
just
about
mean
anything.
So
documentation
is
pretty
important
for
our
for
our
users.
B
B
What
did
we
try
and
do?
Well,
both
of
them
have
local
setups
that
use
docker
and
we
tried
to
go
from
zero
knowledge
of
these
products
to
getting
as
much
production
data
into
them
as
we
could
do
it
inside
two
days.
B
So
I
will
just
change
the
screen
and
I'm
gonna
only
show
the
the
demo
part
for
the
data
hub.
Obviously
this
is
what
we
managed
to
do
and
then
I'll
slip
back
in
afterwards.
So
I'll
stop
now
I
might
need
to
reshare
my
screen
actually
two
seconds.
B
Oh,
no,
that's
nightmare,
no
worries.
How
could
you
refresh
it.
C
But
data
hub
didn't
have
any
glue
support,
so
we
spent
the
last
two
days
figuring
out
how
we
could
ingest
data
from
glue
into
data
hub
and
we
managed
to
do
it.
So
that's
good.
So
if
you
look
in
data
sets,
we
have
broad
thing.
We
have
this
glue
here
and
then
this
then
goes
to
a
database
level.
C
So,
for
example,
if
we
just
click
in
random
one
like
daily
compacted
here's
all
of
the
tables
that
are
in
there,
if
you
search
for
product,
create
and
then
go
to
the
one
in
compacted
so
yeah.
This
has
the
search
as
well.
So,
for
example,
you
can
have
see
we
added
a
description
here.
Most
of
our
data
isn't
really
well
documented,
doesn't
have
descriptions,
but
in
the
schemas,
for
example,
in
glue
you
can
have
descriptions
for
each
field.
I
know
there's
often
confusion
about
like
what
a
user
id
actually
is.
C
Is
it
the
seller?
So
all
that
can
be
documented
got
the
name
of
each
field,
the
type
on
the
left,
the
descriptions.
So
that's
the
schemas
there's
also
ownership
that
we
could
pull
out.
All
of
ours
are
apparently
owned
by
owner.
So
that's
not
that
helpful,
but
that
can
be
changed
and
then
properties
has
just
probably
about
the
table
that
get
pulled
out
so
just
extra
information
in
there.
And
then
you
have
the
option
to
add
documents,
but
we
didn't
have
any
for
any
of
those
but
yeah.
C
I
think
we're
going
to
open
a
pr
in
the
data
hub
repo
for
the
glue
support
and
that's
pretty
much
it.
I
don't
know
if
I
missed
anything.
B
D
Yeah
sure
so,
basically
our
team
rob
abby
and
myself.
We
worked
on
integrating
the
redshift
tables
and
looker
so
as
similar
to
the
glue
schemas.
You
know
the
ratchet
tables
we,
it
basically
pulled
the
tables
from
redshift
and
it
has
again
schema
types
names
of
fields.
D
D
If
I
search
for
a
keyword
here,
I
am
able
to
see
the
all
the
entities
that
have
this
particular
tag,
and
I
can
even
do
that
the
opposite
way.
By
going
to
the
tag
and
then
looking
look
up,
you
know
everything
that
has
this
tag.
I
think.
Similarly,
if
I
look
for
a
another
keyword
here,
I
want
to
see
well
in
this
case.
For
example,
it
appears
in
the
table
name,
but
in
this
case
there
is
a
mac
for
calling.
D
So
it's
really
interesting
to
see
that
the
search
is
very
inclusive
for
the
looker
implementation.
So
there's
another
area
here
that
we've
been
able
to
integrate
a
particular
dashboard
here
with
a
description
to
scroll
in
I'm
able
to
see
the
obviously
tags
and
owners
and
everything
can
be
added,
I'm
able
to
see
the
actually
the
actual
looks
that
are
part
of
this
dashboard.
D
In
this
case,
we
just
provided
a
few
examples,
but
so,
if
I
scroll
to
one
of
them,
I'm
able
to
first
of
all
see
tags,
I'm
able
to
see
the
actual
tables,
so
the
data
source
for
this
particular
look
in
luca
and
obviously
scrolling
and
see
that
information
I
can.
I
don't
want
to
get
out,
but
yeah
there's
a
direct
link
to
the
to
the
look.
So
that's
really
nice
what
I
think
I've
covered
most
of
it
here.
D
If
you
check
confirm
sign
ups,
you
can
see
documentation
and
lineage
yeah.
Yes,
so
yeah.
This
is
a
ratchet
table
and
any
documentation.
Let's
say
it's
an
etl-based
store.
You
know
any
logic
that
is
part
of
that
creation
of
the
table.
We're
able
to
see
that
and
each
entity
here,
you're
able
to
see
the
upstream
dash
independency.
D
So
that's
very
useful
for
later
lineage.
B
Cool
so
that
that's
the
majority
of
our
demo
there's
some
faqs
afterwards,
but
we
were
presenting
to
the
business.
So
I
won't
show
you
those
so
yeah
what
we
achieved
during
the
hackathon.
Is
we
ingested
all
of
our
production
data
into
the
local
instances,
so
that
was
redshift
glue
and
kafka.
They
all
came
into
our
local
instance
of
the
data
hub.
We
also
linked
that
chart
of
looker
in
and
we
created
some.
B
We
used
a
metadata
change,
events
to
create
lineage
and
tags
and
documentation
and
owners,
and
we
created
a
merge,
the
pull
request,
which
was
pretty
nice.
So
I
think
that
the
most
important
thing
for
us
and
and
probably
sir,
for
any
advice
I
could
give
people
who
here
who
haven't
decided
yet,
is
why
we
actually
picked
the
data
hub.
B
Most
of
the
problems
we
had
with
the
munston
was
the
the
lack
of
kafka
support
and
when
we
tried
to
integrate
that
with
data
hub,
it
just
worked
straight
away
and,
as
you
can
see,
we
added
the
glue
integration,
which
was
really
easy,
and
the
the
process
for
adding
a
new
ingestion
type
was
was
super
super
easy.
It
was
very
straightforward:
the
docks
were
set
up
nicely
and
most,
I
think
pedro
said
earlier.
The
support
from
from
the
team
was
just
a
man
immense.
B
It
was
amazing,
we
were
messaging
at
all
times
and
we
were
getting
responses
and
pushing
that
vr
was
was
really
trivial
and
thanks
to
kleiner
for
helping
us
out
there
as
well
yeah.
The
the
aspect
of
data
lineage
is
really
important
for
us,
because
we
have
several
layers
of
transformations
on
a
business
level.
Anna
munson
didn't
really
support
that
very
well.
B
Looker
was
a
work
in
progress,
and
I
know
you
said
it's
in
the
in
the
contra
folder
at
the
minute,
but
we're
really
excited
to
see
that,
and
just
all
of
the
other
bits
that
you've
seen
already.
It
was
just
super
good
and
we
had
a
really
good
time
doing
it
and
contributing
back
and
we're
looking
forward
to
integrating
into
our
production
stack
in
in
the
next
couple
of
months,
so
yeah.
Thank
you
very
much
for
your
help.
B
I'm
really
pleased
to
be
working
with
you
and
it's
been.
It's
been
absolute
pleasure,
yeah
and
shout
out
to
the
team.
I
think
marie
is
here,
hey
maria.
Thank
you
very
much.
A
Thanks
john
yeah,
we
really
enjoyed
all
the
energy
that
the
pub
team
brought
into
the
project
so
keep
that
coming
awesome.
So
now
that
we
have
just
a
few
minutes
left,
I
wanted
to
do
one
of
the
things
that
we
had
promised.
We
would
do
for
the
community.