►
From YouTube: Humans of DataHub - Harvey Li
Description
Elizabeth Cohen and Maggie Hays sat down with Harvey Li, Senior Data Engineer at Grab. Harvey shares how DataHub is the secret weapon that's driving Grab's adoption of Data Mesh principles, his love for the DataHub Community, and MORE!
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
All
right
well
welcome,
folks
to
another
round
of
humans
of
data
hub
today
we
are
joined
by
harvey
lee
from
grab
harvey.
Please
introduce
yourself,
tell
us
where
you
work,
what
you
do
with
your
team
and
yeah
just
a
little
bit
about
who
you
are.
B
All
right,
thanks
maggie
and
by
the
way
I
love
this
beta
hub
community,
it's
so
vibrant,
welcoming
and
lively.
My
name
is
harvey
and
I
work
in
grad
graduate
south
is
asia's
leading
super
app
and
we
use
data
and
technology
to
improve
everything
from
transportation
to
payments
and
logistics
across
a
region
of
over
620
million
people.
We
offer
services
like
ride-hailing
food
delivery,
e-payments
last
month,
logistics
and
more
as
for
myself.
I
work
in
the
data
engineering
team
that
builds
data
applications
to
query
platforms,
governance
tooling,
to
serve
the
entire
data
lake
ecosystem
at
grab.
A
B
You
know
that
metadata
management
is
a
problem
that
any
organization
any
data
driven
organization
needs
to
tackle
at
some
point
in
time
and
fortunately
for
us,
we
actually
start
pretty
early.
We
introduced
a
third-party
proprietary
data
catalogs
over
three
years
ago
to
simply
to
rip
the
benefits
of
data
discovery
and
that
is
to
break
data
silos
and
make
data
easily
discoverable
by
anyone
that
needs
data.
B
However,
with
the
incredible
growth
of
our
data
scale,
more
and
more
use
cases
surface,
and
we
saw
increasing
needs
to
have
a
metadata
management
platform
that
not
only
provides
off-the-shelf
features
but,
more
importantly,
offer
us
the
building
blocks,
to
continue
and
to
tailor
for
our
use
cases,
and
because
of
that
last
year
we
actually
explored
a
few
open
source
data.
Catalog
solutions
and
we
found
that
data
hub,
which
is
extensible
architecture,
is
best
suited
for
our
needs.
B
A
B
B
B
B
B
So
we
see
data
governance
as
although
nato
governance
is
not
just
a
tooling
problem,
but
it's
actually
enabler
for
us
to
implement
that
in
an
organization
and
for
data
governance,
we
actually
take
a
tape.
First
mindset
will
instead
of
introducing
new
processes
or
add
more
overheads
to
the
data
users.
B
We
want
to
develop
tools
and
platforms
that
basically
govern
their
assets
and
make
sure
that
they
access
the
data
in
a
very
governmental,
and
for
that
we
actually
we
we
actually
use
other
gross
returns
a
lot
to
for
better
classification
and
to
define
data
access
rule
so
that
it
enables
some
outbound
integrations
and
data
hub
has
actually
a
lot
of
possibilities
on
how
system
can
integrate
with
it
either
through
open
api
through
graphql
api
or
recently.
The
the
new
addition
of
actions
framework
is
amazing
and
although
we
haven't
tried,
we
haven't
tried
that
yet.
A
Absolutely,
oh,
that
is
so
exciting.
I
am,
and
john
john
from
our
team
is
going
to
be
over
the
moon
to
hear
that
that
you
guys
are
looking
into
the
actions
framework,
there's
just
an
endless
amount
of
potential
with
that
we
are
just
we're
so
excited
to
see
what
what
the
community
ends
up
doing
so.
Thinking
about.
I
know
that
kind
of
at
the
beginning.
You
said
that
you
know
you
love,
you
love
the
data
hub
community,
obviously
we're
biased.
A
B
Community
is
very
helpful
like
for
me
when
I
started
to
use
data
hub.
To
be
frank,
my
first
impression
of
data
hub
is:
it
was
so
complex,
yeah.
C
B
Sometimes,
when
I
was
stuck,
I
know
that
there's
that's
the
community
behind
my
back.
That
is
able
to
offer
some
advice,
and
for
that
I
want
to
give
a
quick
shout
out
to
gabe,
to
joan
and
to
in
helping
us
and
to
get
help
us
to
to
get
to
or
help
my
organization
as
well
to
get
data
hub
in
in
the
current
status
so
yeah.
So
that
is
exciting,
and
maybe
let
me
also
share
a
story
about
how
we
get
the
idea
of
developing
this
hives.
B
So
at
that
time
we
saw
this
performance
issue
to
ingest
the
data
for
for
our
data
lake,
because
there's
simply
there's
so
so
many
tables.
So
we
actually
raised
that
concerns
in
the
data
hub
slab
workspace,
and
we
also
share
some
of
our
proposals
on
how
we
could
optimize
it
better.
Then,
fortunately,
one
community
member
actually
saw
that
threat
and
he
said
hey.
We
also
encountered
this
similar
performance
issue
in
another
open
source
catalog,
and
we
did
this
and
that
to
improve
that-
and
we
saw
that
was
a
good
great
idea.
B
That's
why
we're
actually
extending
a
new
plugin
to
support
to
support
this
and
make
it
more
performant
than
the
original
one?
That's
amazing
yeah,
so
so
there
I
believe
there
are
many
stories
like
that
that
happening
in
this
data
hubs
workspace
every
day.
So
it's
is
that
that
make
me
feel
make
me
laugh
at
this
community
so
much
it
is
very
intellectually
stimulating,
and
there
are
data
practitioners
across
the
globe
in
this
community.
B
C
A
A
My
gosh,
it's
such
a
joy.
I
because
I
I
likewise
find
like
I'm,
I'm
learning,
I'm
constantly
learning
and
also
just
over,
like
frankly,
overwhelmed
by
just
like
the
volume
of
meaningful
and
really
like,
intellectually
stimulating
conversations
that
happen
within
the
community.
There's
part
of
me
that
wishes
I
could
like
clone
myself,
so
I
could
only
pay
attention
to
those
and
just
like
soak
up
all
of
the
information
it's
so
so
great.
A
B
A
Yeah,
that's
something
you
know,
I
think
we
we
try.
The
volume
of
questions
is
growing,
so
we're
we're
trying
to
keep
up
with
it,
but
yeah.
We
really.
We
want
to
make
sure
that
you
know,
regardless
of
of
how,
how
big
the
community
grows,
that
everyone
has
that
same
experience
when
they
come
in
right,
the
questions
that
they
ask,
they
have
a
whole
kind
of
like
gang
of
support
behind
them,
so
that
they're
not
spinning
their
wheels
for
too
long
about.
A
B
Friendly
speaking
in
the
past,
my
favorite
channel
is
actually
troubleshoot
because
it's
been
by
savior
every
time
when
I
was
stuck
yeah
and
the
people
that
I
have
fall
like
usually
when,
when
I
go
stuck,
I
ask
the
question
there
and
almost
the
next
day
I
I
got
some
advice
in
the
stream.
So
it's
it's
truly
helpful
and
I
am
very
grateful
to
that
and
now
I
as
I
getting
more
good
at
troubleshooting
myself,
I
actually
asked
your
questions
there
and
occasionally
I
also
try
to
see
that
channel
and
see.
B
There
are
a
lot
of
juicy
and
exciting
updates
in
this
particular
channel
and
because
I'm
based
in
singapore
because
of
the
time
difference,
I
could
not
attend
the
town
hall
life.
So
usually
I
continue
to
tune
into
this
channel
and
get
a
recording
the
next
morning
and
watch
it
during
my
breakfast.
A
That's
amazing:
oh
my
gosh
yeah,
the
announcements
channel
is
a
lot
of
fun
every
time
I
post
in
there,
I'm
like
genuinely
excited
to
share
information
with
people,
it's
so
fun.
So
thinking
about
you
know
kind
of
what
data
hub
has
already
enabled
within
your
organization,
and
you
know
like
with
announcements
you're
getting
up
to
speed
with
all
of
the
new
features
that
are
coming
out,
and
you
know
like
our
road
map,
announcements
and
stuff.
A
B
C
B
A
product
by
itself
and
similar
to
products
in
e-commerce
websites
there
are
ratings
and
reviews
with
upvotes
and
downloads
totally
and
ratings,
and
reviews
can
actually
also
become
an
additional
dimension
to
data
quality
as
well
rating
and
reviews.
It
also
establish
a
healthy
feedback
loop
between
the
data
producers
and
data
users.
B
So,
in
short,
I
think
I
would
imagine
data
hub
to
be
a
one-stop
marketplace
for
data.
Everyone
in
the
organization
that
needs
data
can
go
to
the
data
hub
and
search
for
it
without
relying
on
the
tribal
knowledge
asking
around
in
slack
or
check
some
documentation
together
somewhere.
That
was
the
whole
way.
B
A
I
love
that
we're
actually
in
q2,
so
right
now
we
are
we're
starting
some
discovery
around.
What
does
collaboration
look
like
within
data
hub
right
like
is
it?
Is
it
as
simple
as
like
up
voting
or
giving
it
a
rating,
or
does
it
need?
You
know
what
what
is?
What
are
those
workflows
need
to
look
like,
so
I
will
very
likely
be
reaching
out
to
you
more
harvey
for
more
of
your
more
of
your
feedback
about
what
would
look
successful
there.
A
That's
awesome
thinking
about
data
hubs,
current
features
and
use
cases
like
what
is
your
favorite
feature
or
use
case
that
you've
come
across
so
far.
B
Okay,
I'm
going
to
say
a
lot
of
features,
because
all
these
features
are
actually
essential
to
get
us
to
where
we
are
today,
like
data
container
policy
based
access,
control,
lineage,
etc,
and
there
are
also
some
features
that
we
just
started
to
try.
But
we
see
a
lot
of
exciting
possibilities
there
and,
for
example,
like
the
data,
profiling
and
validation
feature
with
the
great
expectation
integration.
B
So
as
a
platform
team
at
graph,
we
actually
provide
tooling
to
cover
every
dimension
of
data
quality
from
data
refreshments,
to
completeness
to
correctness,
and
this
particular
feature
actually
helped
us
to
enhance
our
existing
observability
for
data
quality
as
a
whole
for
the
data
lake.
So
it's
really
helpful.
Speaking
from
an
engineering
perspective,
I
also
really
like
how
data
hub
managed
to
break
things
down
into
standard
and
simple
abstractions.
B
B
Majority
of
them
are
actually
contributed
by
the
community.
Yes,
so
yes,
so,
and
the
action
framework
is
also
one
example
of
such
it
actually
aware
all
the
capacities
and
allow
developers
to
just
develop
new
action
plugins
to
support
a
myriad
of
new
use
cases,
for
example,
to
enable
slack
notifications
if
the
technical
schema
of
some
key
data
set
has
changed.
So
all
a
lot
of
use
cases
are
also
unlocked
thanks
to
the
actions
framework,
so.
A
Oh,
my
goodness,
okay,
one
last
question
for
you
harvey,
so
I
know
that
the
you
know
the
slack
community
has
been.
You
know
just
a
great
resource
for
you,
but
I'm
curious.
A
If
you
were
to
you,
know,
meet
someone
who
had
maybe
just
joined
the
data
hub
community
or
is
interested
in
just
even
like.
You
know,
learning
more
about
the
implementation
of
it
or
deployment
of
it.
What
advice
would
you
what
advice
would
you
give
them?
I
think.
B
Metadata
management
is
an
evergreen
problem,
but
if
you
see
it
as
a
big
problem,
you'll
never
get
started
yeah
so
so
and
now,
fortunately
we
open
source
solution
like
data
hub.
We
all
no
longer
need
to
start
from
zero
to
manage
meta
data,
so
my
advice
is
just
to
try
to
identify
the
key
use
cases
in
your
organization
first
and
start
small
and
try
it
out
and
the
data
hub
documentation
has
been
already
great,
but
please
join
the
data
hub,
select
workspace,
there's
loads
of
tons
of
useful
resources
there
and
don't
be
afraid.
A
A
You
so
much
well
harvey.
This
has
just
been
a
total
joy
to
chat
with
you
today.
Thank
you
so
much
for
taking
the
time.
We
really
appreciate
you
and
we'll
see
you
on
the
internet.
Thank.