►
From YouTube: DataHub at Viasat: DataHub Community Talk - Jan 15 2021
Description
Anna Kepler at Viasat describes why they chose DataHub over other open source and commercial technologies and their plans with it.
A
Anna
walk
her
through
her
journey
with
data
hub
at
viasat.
A
And
I'm
going
to,
would
you
like
to
take
over
the
screen
share.
A
A
B
Great
well,
first
of
all,
thank
you
so
much
for
having
me.
It's
been
really
pleasure
to
pick
this
tool
and
really
worked
with
the
community
to
get
it
into
production.
We
also
recently
just
couple
of
months
ago
deploy
data
hub
in
production.
It's
been
working
really
well,
and
so
we're
excited
to
share
how
we
are
using
it
and
how
we're
what
our
future
plans
and
serve,
how
we
arrived
at
that
at
the
selection
of
this
tool.
B
So
my
role
at
vyasad
is
the
technical
product
manager
about
analytics
platform.
So
I've
been
working
with
data
for
quite
a
bit,
so
it's
always
been
a
passion
of
mine,
and
so
metadata
is
definitely
part
of
that.
As
well
and
avaya
said
as
a
whole,
we
are
a
satellite
communications
company,
so
their
isp.
B
We
provide
internet
through
globally
to
communities
to
residential,
come
air,
commercial,
aviation
and
variety
of
other
services,
and
so
at
why
I
said.
B
Kind
of
a
state
of
our
users
and
data
is
definitely
very
complex.
Our
core
data
platform
has
which
I'm
a
part
of
has
a
lot
of
different
data
sources
and
microservices
data
flows.
Data
analytics
tools,
but
we
also,
as
the
core
data
platform,
was
evolving.
In
parallel,
we
ended
up
with
a
lot
of
mini
data
lakes,
databases,
data
sources
all
over
the
place,
and
so
it's
definitely
been
a
complicated
landscape
to
work
with.
B
In
addition
to
that,
we
have
variety
of
different
skilled
users
from
data
suppliers
and
preparers,
and
then,
even
within
the
data
consumers
themselves,
there's
different
capabilities,
different
skills
that
users
have
in
the
interactive
data
and
from
data
analysts
that
work
more
with
reporting
tools
for
the
data
scientists
who
are
ready
to
really
dig
in
and
explore
the
data
set.
So
there's
definitely
quite
a
complex
data
personas
there
as
well,
and
the
company
has
set
the
goal
to
really
be
very
data.
B
Driven
and
part
of
it,
of
course,
is
just
really
even
getting
access
to
data.
And
how
do
we
find
all
this
small
mini
data
silos
and
explore
it
to
the
users
and
very
good
approached
way?
And
so
some
of
the
challenges
that
we
are
trying
to
resolve,
I
don't
think
there
is
a
really
new
to
many
of
you,
since
you
are
here
and
part
of
this
community,
and
so
the
silent
knowledge
about
data
is
definitely
one
of
the
challenges
we're
trying
to
address.
B
By
introducing
the
data
hub
and
the
data
catalog
helping
our
users,
I
apologize
helping
our
users
to
even
find
data
to
remove
this
tribal
knowledge,
remove
a
bottlenecks
where
small
analytics
teams
has
to
constantly
work
with
different
teams
to
explain
to
them
all
the
data
that's
available
and
what
the
data
contains
and
how
to
best
operate
the
data.
B
So
the
slow
analytics
process
of
even
many
of
our
users,
for
example,
they
complain
about
even
finding
what
data
is
available
to
them.
Can
they
even
access
that
data?
What's
where
and
what
teams
to
work
with
then
get
into
the
data?
And
it's
been
even
in
the
last
couple
of
months,
the
community
has
been
as
we
introduced
the
data
catalog.
The
community
already
been
very
excited
and
we
see
a
lot
of
users
coming
to
the
data.
B
Look
looking
and
asking
questions
and
providing
a
lot
of
feedback,
so
it's
definitely
been
a
pleasure
to
introduce
that
within
the
company.
B
In
addition
to
that,
so
one
of
the
interesting
features
challenges
we're
trying
to
solve
by
introducing
data
catalog,
finding
all
the
siloed
data
infrastructure
and
potentially
integrating
it
into
a
core
data
platform
to
help
the
company
really
decrease
the
operational
costs
that
today
a
bit
inflated
to
decrease
operational
complexity
for
many
teams
and
to
really
concentrate
on
data
processing
and
utilization
of
data
rather
than
maintaining
and
securing
a
lot
of
data
systems
around.
B
Our
compliance
team
has
very
small
team
and,
as
the
company
growing
and
working
with
global
customers
around
the
world,
including
europe
and
brazil
and
africa,
so
a
lot
of
the
company.
A
lot
of
countries,
as
you
all
know,
introduced
different
compliance
laws
and
policies,
and
so
the
company
has
definitely
been
looking
for
solutions
and
we
have
started
it.
B
Looking
at
commercial
solutions
for
data
catalog,
but
unfortunately,
couldn't
find
anything
that
would
really
fit
our
needs,
and
so
I
just
was
in
a
meeting
this
week
with
our
compliance
team
about
the
data
hub,
and
I
was
sharing
everything
that
trishank
and
I
was
talking
earlier
this
week
of
all
the
road
map
and
all
the
items
features
they'll
be
introducing
and
they've
been
very,
very
excited
to
see
and
work
with
us
to
solve
a
lot
of
their
use
cases
and
a
lot
of
manual
operations
that
they
do
today
and
and
really
improve
their
compliance
moisture
within
the
viasat
itself.
B
And
so
our
technology
evaluation
started
sometime
june,
I
think
last
year
and
we
were
very
excited
and
data
hub
became
open
source.
We've
been
sort
of
following
the
journey
of
that
product
within
linkedin.
We've
always
been
a
fan
of
linkedin
products,
we've
operated
at
kafka,
a
few
other
systems,
so
it's
always
been
a
really
a
pleasure
to
see
that
linkedin
open
source
as
a
product,
and
so
data
hub
definitely
made
a
list
for
evaluation
and
some
other
systems.
We
looked
at
as
apache
atlas.
B
We
looked
at
amundsen,
netflix
metacat
and
a
few
things
that
we
evaluated
on
is
really
just
general
functionality.
B
The
feature
richness
we
were
looking
for
lineage
was
really
important
for
us
ease
of
search
of
this
of
the
data
different
metadata
ingest
methods,
overall,
security
of
a
product
as
well
as
data
modeling
flexibility
was
important
to
us
because,
as
I
pointed
out
earlier,
there
are
definitely
a
variety
of
mini
data
silos
around
the
company
and
we
definitely
anticipated
the
challenge
of
trying
to
model
all
that
data
in
a
very
flexible
way
to
ensure
that
we
onboard
all
the
teams
and
not
limited
in
that
ability
to
onboard
them
right,
and
so
I
use
of
development.
B
What
do
I
mean
by
this?
We
are
service.
We
like
open
source
products,
we
like
to
contribute
to
open
source
product
as
well,
and
so
we
did
evaluate
what
tech
stack
is
behind
the
each
of
these
products
to
ensure
that
we
are
capable
of
submitting
prs
and
really
understanding
the
code.
That's
needed,
maybe
even
helping
with
some
bug
fixes
with
the
community,
so
that
was
one
of
our
valuation,
criterias
and
then
ease
of
operations.
B
Our
team
is
very
small
and
we
do
operate
a
variety
of
different
tools
and
systems
and
the
easier
the
process
is.
The
stability
of
a
product
is
really
important
for
us.
The
upgrades
just
really
deployment
and
evaluation
from
development
to
production
ability
to
integrate
and
test
the
tools
before
promoting
to
production.
So
all
of
this
components
definitely
were
evaluated
as
well
scalability.
We
do
have
a
lot
of
data
and
we
do
have
a
lot
of
different
micro
services.
B
B
Additionally,
the
road
map
we
knew
that
we
won't
be
able
to
sort
of
take
advantage
of
all
the
heavy
chain
features
immediately
for
our
customers.
So
we
wanted
to
do
like
a
slow
rollout
and
addition
of
the
various
features
within
viasat,
and
so
we
looked
at
the
data
hub
roadmap
for
open
source
in
multiple
features
and
really
well
aligned
with
what
we
were
trying
to
do.
Introducing
the
lineage
tradition,
ml
models,
introducing
some
of
the
metrics
functionality
and
data
quality
ratings,
so
it
so.
B
The
timeline
looked
great
and
just
the
fact
that
we
were
seeing
everything
we
needed
on
that
roadmap
was
really
exciting
and
then
community
rating
just
for
product
itself,
the
github
rating
and
just
how
well
community
supports
the
product.
So
we
took
a
look
at
it
as
well,
and
I
guess
it's
no
surprise
since
I'm
here
today,
the
data
hub
was
the
product
we
selected
and
so
far
so
good.
It's
been
really
really
good
journey.
B
However,
we
implemented
our
own
ui,
not
really
implemented,
I
per
se,
but
we
did
have
an
existing
interface
for
somewhere
access
requests
with
some
basic
search
functionality
that
held
some
of
the
metadata
already,
and
so
we
reused
that,
because
customers
were
already
familiar
with
that
ui
and
they
were
a
lot
of
it,
all
the
access
requests
were
automated
and
we
didn't
want
to
remove
that
from
our
customers
and
we
didn't
want
to
sort
of,
extend
and
fork
the
data
hub
ui
as
well.
B
So
that's
sort
of
one
of
the
reasons
why
we
went
to
our
own
ui.
We
did
also
added
the
feedback
button
to
our
ui,
to
gather
as
much
of
the
information
for
our
users
as
possible.
B
We
introduced
some
of
the
product
metrics,
so
we
have
the
product
analytics,
that's
being
gathered
from
this
ui
to
really
understand
how
users
are
interacting
with
data,
what
type
of
features
they
want,
as
we
also
introduce
new
features,
to
make
the
experience
as
easy
as
possible
as
and
then
some
way,
flexibility
of
integrations
with
some
of
the
tools
that
we
have.
So
we
wanted
to.
B
Keep
that
option
so
as
an
example,
some
of
the
global
metrics
store
that
we
are
working
on
to
express
that
in
the
same
interface,
some
of
the
potential
visualizations
from
all
the
data
like
sampling
of
the
data
or
some
simple
graphs
within
the
within
our
ui
itself,
and
maybe
even
for
our
compliance
team
introducing
some
type
of
reporting
mechanism
within
the
ui
and
having
it
serve
like
all
integrated
experience.
B
So
so
we
went
with
that
approach
where
ui
is
ours,
but
back
end.
We
try
not
to
fork
it.
We
draw
a
contribute
to
open
source
community
and
we
have
been
doing
that
a
little
bit
already,
which
kind
of
leads
me
to
the
experience
so
the
operations.
I
chatted
with
the
team
this
week
to
really
understand
were
any
issues
and
deployments
as
we
were
doing
it,
and
there
were
minor
things
in
the
beginning
when
we
were
integrating
viral
kafka,
the
secure
kafka-
and,
I
believe
javier
sotelo
on
my
team.
B
B
I
think
some
of
the
other
issues
have
been
contributing
as
well.
That's
been
accepted.
I
think
he
helped,
even
with
some
of
the
code
reviews.
So
that's
been
really
good
to
see
just
how
how
well
the
community,
how
sort
of
responsive
community
has
been
and
how
accepting
so
welcoming.
So
it's
really
been
good
for
us.
B
So,
as
I
mentioned,
it's
been
operating
in
production
really
well
for
a
few
months
now
we
have
kind
of
bypassed
the
full
dev
set
up
and
went
straight
to
production
and
then
did
the
dev
set
up.
So
we
could
iterate
all
the
new
functionality
very
quickly.
So
today
we
operated
in
both
we
did.
B
I
think
we
didn't
do
much
of
a
complex
modeling
today,
but
so
far
we
also
had
great
experience
to
support
in
a
lot
of
the
data
that
we
maintained
in
some
of
our
small
mini
catalog,
and
so
it
was
very
simple
and
we
really
were
excited
to
see
the
rfc
for
the
business
glossary
we've
been
looking
somewhat.
B
Compliance
taxonomy
and
thinking-
maybe
we
could
contribute
to
a
conversation
around
that
as
well,
since
it's
on
our
roadmap
and
we're
excited
to
work
with
the
community
to
see
how
that
could
be
a
joint
conversation
and
so
just
even
presenting
here
today.
B
So
it's
definitely
a
pleasure
and
thank
you
for
having
me
it's
kind
of
what
we
have
today
and
what's
the
future
state
is
so
we
do
have
in
production,
we've
harvested
some
of
a
core
data
platform
and
we
integrated
our
ui
and
right
now
we're
starting
to
work
with
the
rest
of
the
viasat
teams,
and
we
already
got
a
lot
of
interest
from
all
these
teams,
which
was
very
good
to
hear
and
see
because
previous,
I
think
it's
been.
B
We
realized
that
we
are
not
the
only
one
who
up
here
sort
of
operate
the
data
infrastructure
but
needed
this
functionality,
and
the
team
has
been
really
delivering
very
good
products
within
the
company
and
good
service,
and
so
we
have
this
trust
of
our
teams.
But
we
introduced
a
good
tool
and
we
worked
with
them
to
explain
this
is
the
data
hub
features?
This
is
the
evaluation
process.
B
We
shared
the
data
hub
information
with
them
and
all
the
teams
been
really
supportive
of
our
selection
of
the
product.
So
it's
making
this
adoption
definitely
much
easier
for
us
and
so
we're
working
the
rest
of
the
teams
to
ingest
information
about
their
data,
working
with
a
compliance
team
to
see
how
we
could
introduce
features
necessary
for
them.
B
The
lineage
has
been
really
anticipated
within
the
companies
they're
waiting
on
that
and
will
be
probably
around
the
summer,
really
integrating
with
all
our
tools
to
introduce
lineage,
visualizations
and
then
dashboards
and
reports.
We're
really
excited
to
see
the
rfc
and
the
end
implementation
for
dashboards.
B
So
it's
definitely
we've
been
following
that
very
closely,
as
well
as
the
mel
models
we're
starting
to
sort
of
standardize,
a
lot
of
our
ml
approaches
around
the
company,
and
that
will
be
very
good
to
add
into
the
data
data
catalog
and
then
metrics
and
data
quality
integration
would
be
sort
of
following
closed
list.