►
From YouTube: DataHub adoption journey at Geotab : Feb 19, 2021
Description
John Yoon describes the adoption journey of DataHub at Geotab.
Recorded at: DataHub Community Meeting : Feb 19, 2021
A
A
B
B
So
geotab
is
a
global
global
leader
in
telematics
with
about
2.1
million
subscribed
vehicles
using
our
products
and
services
we're
one
of
the
very
few
telematics
companies
that
make
both
hardware
and
software
for
anyone
who's
not
familiar
familiar
with
the
term
telematics.
It
means
that
we
use
iot
devices
and
oem
softwares
to
collect
data
from
vehicles
to
provide
various
products
and
services
to
help
our
customers
below
are
some
examples
on
how
we
help
our
customers
to
improve
their
fleet
productivity,
optimization,
enhance
driver
safety
and
achieve
stronger
compliance
to
regulatory
changes.
B
But
it
didn't
really
take
too
long
for
geotab
to
realize
that
it
wasn't
for
us.
Although
the
outputs
from
these
commercial
products
were
fancy
and
shiny,
actual
value,
add
from
using
the
product
simply
didn't
outweigh
the
direct
and
indirect
cost
like
vendor
lock-in,
customizability,
implementations
and
service
costs
and
licensing
fees.
B
I
think
most
community
members
from
the
use
cases
that
they
shared
from
previous
town
halls.
Most
of
us
had
a
very
similar
list
of
open
source
products
to
evaluate
from
and
from
from
those
we
short
listed
to
atlas,
amundsen
and
data
for
our
evaluation.
B
Functional
and
non-functional
requirements
were
very
important,
but
one
of
the
key
evaluation
metrics.
I
think
we,
I
wouldn't
say
unique,
I'm
sure,
like
someone
else
also
like
looked
into
it,
but
key
evaluation
metric
that
made
us
to
select
data
hub
was
approachability
and
technical
capabilities
of
leading
depth
developer.
B
Leading
dev
team
linkedin,
as
most
of
us
know,
had
solid,
have
solid
lists
of
open
source
portfolio
that
they
designed
and
donated
to
the
patch
foundation
and
also
data
hub
team.
They
have
been
very
approachable
responses
and
open
during
our
evaluation
phase,
so
from
a
very
small
team
at
geotab
trying
to
tackle
the
problem.
B
Us
so
our
first
crack
at
datahub,
we
onboarded
small
number
of
data
sets
just
over
250
and
had
60
users
from
one
department
to
try
out
datahub.
The
result
was
somewhat
disappointing.
The
adoption
rate
was
very
poor
and
feedback
was
discouraging
from
user's
eyes.
Data
hub
wasn't
any
better
than
how
they
searched
data
sets
in
google
bigquery.
B
For
some
it
was
useful,
but
there
were
there
weren't
enough
allocations
when
they
needed
to
find
something
on
data
hub.
So
I
asked
myself
like.
I
was
told
that
data
discovery
was
a
problem
at
geotap,
but
it
turns
out
the
scope
was
poc
was
poorly
established,
and
I
made
a
very
naive
decision
to
blindly
accept
the
comment
that
someone
else
said
and
took
the
scope
from
calibra
poc,
which
was
also
an
unsuccessful
poc.
B
So
for
past
few
months
I
did
taking
on
my
own
to
learn,
what's
really
going
on
behind
the
scenes,
so
just
to
give
you
guys
some
overview
of
what
data
journey
was
like
from
50
000.50
geotap
grew
very
fast
500
growth
rate
in
revenue
and
size
in
five
years
fast.
Within
that
time,
geotap
acquired
five
different
companies,
which
contributed
to
not
only
growth
in
revenue
and
size,
but
also
the
complexity
of
data
architecture
and
data
management
and
governance
structure.
B
B
Teams
aren't
so
big
and
they
work
with
relatively
small
sets
of
data
and
have
strong
tribal
knowledge
of
what
data
to
use
or
who
to
reach
out
to
ask
questions
within
their
domain,
and
this
was
the
one
of
the
key
reasons
why
users
from
poc
first
poc
didn't
have.
The
need
to
need
first
need
to
search
for
what
they
need
to
do,
what
they
need
on
data
and
many
teams,
don't
have
data
management
or
governance,
structure
and
ones
that
do
they
are
using
different
tools,
processes
to
integrate,
store
and
derive
data.
B
So
the
past
few
months,
I
spent
most
of
my
time
talking
to
people
from
other
departments
to
understand
where
we're
to
understand
where
we
are
in
terms
of
data
management,
then
made
a
proposal
on
what
we
would
need
to
do
to
change
what
we
would
need
to
change
from
architectural
perspective,
integration,
security,
compliance
operations
and
metadata
management
perspective.
B
So
in
2021,
one
of
our
goal
is
to
productionize
datahub
we're
currently
working
closely
with
srishanka's
team,
john
and
gabe
to
learn
more
about
their
react,
app
and
assisting
them
bit
by
bit
in
building
react
application
and
once
we're
comfortable
with
the
app
in
the
testing
environment,
we're
planning
on
productionizing
data
update,
geotab
and
internally.
B
B
Data's,
generalized
metadata
model
allowed
us
to
start
conversation
with
other
departments
at
geotab
to
model
custom
entities
that
they
want
to
catalog
while
capturing
meaningful
relationships
with
other
data
hub
entities.
So,
basically,
we
are
discussing
and
we'll
be
treating
data
hub
as
an
internal
open
source
project,
so
other
department,
dev
teams
also
can
contribute
to
internal
internal
features
and
custom
entities.
B
B
We're
still
very
new
in
open
source
journey,
but
our
plan
is
to
make
meaningful
contributions
to
the
community
as
much
as
possible.
We
just
started
to
contribute
to
the
open
source,
react
app
application.
We
made
a
couple
of
contributions
past
couple
weeks,
but
hopefully
the
numbers
will
grow
over
time,
we're
not
adding
too
much
value
at
this
point
in
time,
but
we're
slowly
shifting
towards
an
open
source.
First
mindset
to
generalize
generalize
our
use
case
as
much
as
possible
to
find
the
opportunities
to
contribute
back
to
the
community
while
solving
our
internal
problems.
B
So
these
are
some
of
the
wish
lists
before
I
close,
I
think
I
mentioned
in
this
flat
channel
that
hopefully
we
can
have
the
roadmap
timelines
updated
on
the
open
source,
skate
repo,
and
I
one
of
the
pain
point
when
we
were
having
discussions
like
internally
with
other
departments,
was
that
there
wasn't
really
easy
way
for
us
to
understand
quickly
what
entities,
aspects
and
properties
are
available
currently
available
in
data
hub.
B
So
we
can
minimize
the
redundant
effort
when
we
create
new
custom
entities,
so
hopefully
the
metadata
model
to
graph
with
graphic
visualization
to
kind
of
help
the
community
members
to
quickly
see
what
entities
and
aspects
and
properties
are
available
and
what
the
relationship
between
them
among
them
is
would
be
very
helpful.
In
my
opinion
and
column
level.
Lineage
is
something
that
we've
been
tackling
internally
to
ask
ourselves
like
how
like
what
would
be
the
most
efficient
and
automated
way
to
first
capture
the
column
level
relationship.
B
So
when
the
feature
is
available
on
data,
we
can
readily
surface
it,
and
social
feature
has
been
like
one
of
the
hot
discussion
internally,
but
I
know
like
most
of
the
commercial
products
have
this
feature,
but
it's
not
the
most
like
high
priority
item
on
our
like
backlog,
but
I
think
it
would
be.
It
would
be
very
valuable
for
data
hub
community
as
well
and
that's
about
it.
A
Questions
cool-
that
was
great,
john
thanks
for
sharing
the
journey.
I
think
this
I
can
relate
to
definitely
a
lot
of
those
challenges
and
concerns
the
one
thing
that
we've
had
quite
a
lot
of
debates
about
with
a
lot
of
teams,
especially
central
teams,
is
exactly
this.
A
Rationalizing
of
do
we
only
put
the
clean
data
in
data
hub,
meaning
the
clean
metadata
in
data
hub,
or
do
we
actually
put
everything
in
there
and
then
have
the
clean
data
rise
to
the
top
and
use
that
as
a
way
to
drive
data
governance?
So
that's
something!
That's
definitely
on
my
mind,
it's
it's
a
big
topic
of
debate
in
lots
of
communities
as
well.
C
If
I
can
just
quickly
jump
in
my
team
built
airbnb
data
portal
at
airbnb
and
we
went
through
a
similar
decision
making
process
and
there's
something
magical
that
happens
when
you
have
you
know
more
than
200
weekly
active
users
of
your
product,
you'll
find
the
right
blend
of
trusted.
Data
sets
and
data
sets
that
people
want
to
be
productive
with.
So
I
I
believe
it's
just
about
growing
usage
and
the
the
quality
of
data
sets.
Questions
will
settle
itself
once
you
get
the
experts
using
the
tool.
D
Yeah
we
we
encountered
the
same
the
same
challenge
here
in
amazon
with
our
clients
and
what
we
found
that
works
better
is
to
have
the
responsibility
for
the
publisher
of
the
data
sets
that
they
need
to
tell
if
the
data
is
reliable,
etc,
because
it's
also
relate
like.
We
see
a
lot
of
customers
and
even
our
internal
team
like
building
a
feature
store
right
so
is
this.
Data
set
is
something
that
you
can
rely
for
your
reporting
or
bi.
D
So
we
push
it
to
the
publishers
and
the
subscriber,
and
we
just
create,
like
you
know,
json,
that
that
defined
the
contract
between
the
publisher
and
the
subscriber
about
the
data
set.
So
we
try
to
you
use
technology
to
enforce
it,
but
what
I've
seen
that
you
always
need
the
men
in
the
middle,
like
the
data
steward
or
or
someone
from
legal
to
tell
okay,
can
you
actually
publish
this
data
et
cetera?
D
So
this
is
why
it
does
I
agree
with
you
like
you're
debating,
so
we
said:
let's
bring
the
publisher
the
they
are
the
owner
of
the
data,
so
the
they
have
the
responsibility
happy
to
share,
like
maybe
in
the
next
meet
up
like
like
some
architecture
or
somehow
we
we
solve
it
in
several
use
cases,
and
we
again,
like
you
mentioned
collibra
at
leon
like
we
saw
you
know,
we
are
looking
on
all
these
third
parties,
some
customers,
we
always
get
into
this
like
there,
isn't
a
man
in
the
middle
of
processes
that
needs
to
be
in
force.