►
From YouTube: Datahub Community Meeting (Full) : Jan 15 2021
Description
Full version of the DataHub Community Meeting on Jan 15th 2021
Agenda:
Announcements - 2 mins
Community Updates - 10 mins
Use-Case: DataHub at Viasat by Anna Kepler - 15 mins
Tech Deep Dive: GraphQL + React RFCs readout and discussion by John Joyce and Arun Vasudevan - 15 mins
General Q&A from sign up sheet, slack, and participants - 15 mins
Closing remarks - 3 mins
A
Awesome
all
right!
Welcome
everyone
to
the
first
data
hub
community
meeting
of
the
year.
It's
been
a
pretty
impactful
year
for
everyone
professionally
personally,
but
we're
looking
forward
to
amazing
things.
It
should
be
10
15,
20
21,
but
I
put
in
2020.
This
is
where
everyone's
mind
is
at
right.
We
feel
like
2020
has
not
yet
gone,
but
hopefully
this
year
will
be
better
for
everyone.
The
vaccines
are
on
the
corner.
A
So,
let's
get
to
it
the
agenda.
It's
pretty
packed
quick
announcements,
some
community
updates.
We
have
anna
from
viasat
who's,
going
to
be
talking
about
how
datahub
has
been
deployed
at
viasat
and
their
experiences.
A
There,
john
will
do
a
quick
readout
of
the
rfcs
that
he's
recently
been
published
and
also
lead
a
discussion
with
the
community
and
then
we'll
go
into
some
general
q,
a
based
on
the
sign
of
questions
and
questions
that
might
come
up
during
the
presentation
and
then
we'll
close
out
all
right
announcements,
a
very
short
announcement,
as
some
of
you
might
know,
I
have
left
linkedin
recently
and
I'm
starting
my
own
venture.
A
On
the
community
side.
We've
had
a
few
interesting
events
that
have
happened
in
the
metadata
community.
One
thing
that
we
were
participating
in
and
leading
was
an
industry-wide
metadata
day
event
that
happened
on
december
14th.
Last
year
I
have
put
up
a
few
slide
decks
for
the
conference.
What
happened?
A
It
was
quite
interesting
because
we
got
a
lot
of
projects
together
for
the
first
time.
You
know,
often
you
go
and
read
in
a
munsen,
blog
post
and
of
course,
amanson
is
the
only
system
mentioned
or
you
read
a
data
hub
blog,
post
and
data
hub
is
the
only
system
mentioned
or
you
go
look
at
collaboration
and
they're.
The
only
systems
that
are
mentioned,
and
we
wanted
to
kind
of
bring
all
these
people
together
to
talk
about
what
are
the
real
problems
and
what
are
the
real
issues?
So
it
was
good
to
get
everyone
together.
A
I
was
actually
going
over
the
survey
results
and
you
know
it's.
It's
been
a
busy
time.
I
hopefully
will
get
to
publish
them.
The
audience
did
fill
out
a
quick
survey.
It
was
nice
to
see
that
data
hub
awareness
was
quite
high.
It
was
actually
one
of
the
highest
awareness
metadata
systems
that
at
least
from
the
audience
that
attended.
So
that
was
nice
to
see
in
terms
of
quick,
cliff
notes
and
you
should
go
watch
the
whole
video.
There
was
a
lot
of
great
conversations.
A
Cliff
notes
would
be
stream.
First,
architectures
definitely
are
important.
People
agreed
that
lineage
was
an
important
problem.
There
was
a
lot
of
chat
around
fine,
green
lineage
being
important
and
some
of
the
big
thinkers
who
have
actually
lived
through
this
journey
many
more
times
than
we
have
throughout
other
industries.
A
They
repeatedly
kept
warning
us
about
the
dangers
or
the
challenges
in
getting
higher
level
understanding
from
metadata,
and
this
is
definitely
something
that
we
are
very
aware
of
so
highly
recommend
going
and
checking
it
out,
and
hopefully
you
know
we'll
have
more
of
these
events
and
kind
of
spread.
The
metadata
word
across
the
industry.
A
The
next
thing
from
a
specific
data
hub
community
perspective,
is,
I
dropped
in
a
poll
end
of
like
around
christmas
time.
I
was
actually
quite
surprised
to
see
how
many
people
were
still
around
and
voting.
So
thanks
for
all
the
votes,
we've
got
quite
a
few.
A
I
did
a
quick
scan
over
which
ones
got
the
most
votes
and
tried
to
bucket
them.
As
expected,
product
features
are
at
the
top
of
the
list,
top
of
mind
for
everyone
field
level,
lineage,
showing
pipelines
and
flows
being
able
to
have
social
features,
business,
glossary
dashboards
and
funnily
enough
visualizing.
The
metadata
model
was.
A
A
So
this
is
just
what
the
polls
say.
That
doesn't
mean
that
this
is
exactly
how
the
roadmap
will
look
like
stay
tuned.
I
think
we
have
to
work
with
the
community
and
make
sure
that
we
are
able
to
build
kind
of
these
features
across
all
the
different
companies
and
projects
that
are
working
on
data
hub
and
make
sure
that
we
are
able
to
deliver
all
of
this
for
the
community.
A
And
now
I
would
like
to
switch
over
to
now
because
there's
been
an
incredible
amount
of
work
done
in
the
last
quarter
itself
and
you
know
now
and
I
were
going
over
all
of
the
rfcs
and
pr's,
and
it
was
just
amazing.
So
I'd
like
her
to
go
over
what
has
been
accomplished
in
the
last
quarter
and
kind
of
give
a
brief
intro
to
the
work.
C
Sure,
hello
good
morning,
everyone
so
welcome
again
to
our
monthly
tanghao
very
excited
to
see
the
participants,
so
this
is
now
from
linking
I'm
working
more
closely
on
the
gma
side,
the
generalized
metadata
architecture.
C
So
I
really
wanted
to
take
the
opportunity
here
to
show
our
recognization
appreciation
to
all
the
contributions
to
the
data
hub,
so
I'll
quickly
go
over
a
few
that
show
here
in
the
list,
so
the
first
category
you
can
see
that
we
have
this
long
awaited
feature
which
is
the
field
level
lineage.
C
C
C
This
is
trying
to
provide
the
read
of
the
right
consistency
in
our
system,
so
the
other
two
secondary
index
search
and
graph
are
eventual
consistency
and
we
also
see
a
lot
of
and
more
coming
rfcs
one
is
from,
I
think,
maybe
his
name
is
called
madu
madhu,
I'm
not
exactly
sure,
but
I
put
his
id
there
for
business
glossary
rfc,
that's
great,
and
also
we
have
a
recent
rfc
from
john
jones
for
graphql.
C
Thank
you
very
much
for
all
the
contributions
we
are
looking
forward
to
more
coming
this
year.
A
Absolutely
and
one
of
the
things
you
know
being
at
linkedin,
I
know
how
how
hard
it
is
for
some
of
the
engineers
to
to
actually
keep
both
the
internal
data
hub
going
as
well
as
make
sure
that
the
open
source
data
hub
continues
to
stay
vibrant,
so
a
lot
of
appreciation
for
all
of
the
work
you've
done.
A
A
We
are
also-
and
thanks
now
for
some
of
the
meme
suggestions
that
she
gave
me
last
night.
We
are
also
starting
to
give
some
shout
outs
to
specific
people
in
the
community,
and
this
is
the
first
edition
of
what
I
would
call
the
data
hub
awards.
A
A
These
are,
this
is
a
someone
who
has
you
know,
provided
us,
something
that
we
weren't
anticipating
or
something
that
wasn't
directly
in
our
roadmap,
and
this
is
you
know
we
are
often
busy
with
our
blinders
on
saying
this
is
what
we
want
to
build
and
we
keep
on
working
against
this
and
then
suddenly
something
comes
in
and
says.
Well
how
about
this,
and
so
this
time
the
award
goes
to
no
surprises
saxo
bank
for
contributing
not
just
business
glossary
which
now
called
out,
but
also
kubernetes.
A
So
they
have
actually
been
amazing
partners
driving
a
lot
of
these
advanced
things
that
we
don't
directly
think
about.
There's
like
a
fibo
financial
glossary
that
they
contributed.
It
was
not
something
linkedin
was
focusing
on.
So
thanks
for
all
of
the
contributions,
I
don't
know
if
anyone
from
saxo
bank
is
on
the
call
and
would
like
to
just
say
a
couple
of
words.
D
Yeah
hi,
so
I
think
yeah.
Thank
you
so
much
for
calling
out
saxo
bank
and
I
think
it
has
been
an
amazing
amazing
partnership.
In
last
eight
nine
months,
we
got
a
lot
of
support
from
linkedin
and
you
know.
Definitely.
This
is
the
third
generation
architecture.
D
We
didn't
realize
what
we
were
picking
up
when
we
picked
it
up,
but
I
think
the
entire
bank,
including
the
governance
committee
and
and
the
business
stakeholders,
are
very,
very
convinced
of
our
selection
of
the
tool,
and
it
has
been
made
easier
by
the
partnership
that
we've
got
for
linkedin
and
our
vision
of
selecting.
This
is
now
true
right,
we're
evolving
the
culture
with
the
tool
by
contributing
to
the
community.
A
We
have
a
newest
deployment
in
the
community,
at
least
that
I
am,
and
we
are
aware
of
so
this
is
the
latest
company
that
has
been
brave
enough
to
push
the
red
button
and
deploy
data
hub
to
production
and
the
winner
is
shivam
gupta
at
growers,
and
I
checked
with
him
last
night:
hey
did
you
deploy
data
hub
to
production
and
he
said
yes,
and
I
said,
is
it
still
up?
He
said
yes,
so
hopefully
he
is
around
to
say
that
experiment
went
well.
A
He
did
say
that
they
are
having
a
midnight
sale
today
in
india,
kind
of
like
the
prime
day
sale
and
he's
on
call,
so
he
may
not
be
able
to
attend
so
well.
Next
time
we
will
catch
him
moving
on
all
right.
So
this
is
the
perseverance
awards.
We,
I
don't
think
we
want
to
give
it
too
often
and
I'll.
Tell
you
why.
But
this
is
basically
the
community
member
who
has
spent
the
most
amount
of
time
pushing
something
over
the
line,
and
this
time
the
award
goes
to
arun
at
expedia.
A
For
being
so
persistent
and
pushing
the
ml
models
pr,
he
started
in
july
and
then
jyoti
and
karam,
and
I
think
mars
also
they
were
working
with
him
very
closely
and
helping
him
get
it
better
and
better,
and
it
took
him
all
the
way
up
to
september
for
us
to
get
it
to
a
point
where
we
were
able
to
check
it
in.
We
would
like
to
shorten
it,
but
we
would
love
people
like
this,
who
are
able
to
actually
do
the
right
thing
with
us.
E
Yeah
thanks
shashank
yeah
yeah.
I
think
the
ml
models.
I
learned
quite
a
bit
from
the
from
the
folks
who
reviewed
the
pr.
There
were
quite
a
bit
of
changes
from
what
we
thought
internally,
so
it's
been
a
great
process
and
yeah.
We
have
been
with
datahub
for
about
eight
nine
months
and
it's
been
a
great
journey.
Thank
you.
A
A
A
Awesome
moving
on
this
is
another
award
that
I
came
up
with
it's
the
tech
excellence
award
and
this
you
know
I
mean
at
linkedin.
We
have
kind
of
three
vectors
of
excellence.
We
talk
about
leadership,
we
talk
about
execution
and
we
also
talk
about
craftsmanship
and
sometimes
we
talk
about.
A
You
know
two
out
of
three,
and
sometimes
it's
only
one
out
of
three,
but
this
time
as
I
was
looking
at
kind
of
the
last
couple
of
months
and
looking
at
kind
of
the
output
from
the
team,
I
had
to
say
that
the
tech
excellence
award
for
impact
should
go
to
john
joyce
for
the
amount
of
impact
he
has
had
in
such
a
short
amount
of
time,
through
ramping
up
on
data
hub
and
producing
high
quality
rfcs,
and
also
backing
it
up
with
pocs
on
both
graphql
and
react.
F
Yeah
thanks
srishanka.
I
won't.
I
won't
spend
too
much
time
here
because
you'll
hear
from
me
in
a
little
while,
but
I'm
glad
to
be
part
of
the
community
and
I'm
glad
to
share
some
of
the
the
work
I've
been
doing
recently.
A
Awesome,
okay,
so
now
we
can
switch
over
to
the
the
tech
part
of
the
presentations.
First
up,
we'd
like
to
have
anna
walk
her
through
her
journey
with
data
hub
at
viasat,
and
I'm
going
to.
Would
you
like
to
take
over
the
screen
share.
B
Great
well,
first
of
all,
thank
you
so
much
for
having
me.
It's
been
really
pleasure
to
pick
this
tool
and
really
worked
with
the
community
to
get
it
into
production.
We
also
recently
just
couple
of
months
ago
deployed
data
hub
in
production.
It's
been
working
really
well,
and
so
we're
excited
to
share
how
we
are
using
it
and
how
we're
what
our
future
plans
and
sort
of
how
we
arrived
at
that
at
the
selection
of
this
tool.
B
So
my
role
at
viasab
is
the
technical
product
manager
about
analytics
platform.
So
I've
been
working
with
data
for
quite
a
bit,
so
it's
always
been
a
passion
of
mine,
and
so
metadata
is
definitely
part
of
that.
As
well
and
avaya
said
as
a
whole,
we
are
a
satellite
communications
company,
so
their
isp
we
provide
internet
through
globally
to
communities
to
residential,
come
air,
commercial,
aviation
and
variety
of
other
services,
and
so
at
why
I
said:
wait
kind
of
a
state
of
our
users
and
data
is
definitely
very
complex.
B
Our
core
data
platform
has
which
I'm
a
part
of
has
a
lot
of
different
data
sources
and
microservices
data
flows,
data
analytics
tools,
but
we
also,
as
the
core
data
platform,
was
evolving.
In
parallel,
we
ended
up
with
a
lot
of
mini
data
lakes,
databases,
data
sources
all
over
the
place,
and
so
it's
definitely
been
a
complicated
landscape
to
work
with.
B
In
addition
to
that,
we
have
a
variety
of
different
skilled
users
from
data
suppliers
and
preparers,
and
then,
even
within
the
data
consumers
themselves,
there's
different
capabilities,
different
skills
that
users
have
when
they
interact
with
data
and
from
data
analysts
that
work
more
with
reporting
tools
for
the
data
scientists
who
are
ready
to
really
dig
in
and
explore.
The
data
sets
so
there's.
Definitely
quite
a
complex
data
personas
there
as
well,
and
the
company
has
set
the
goal
to
really
be
very
data.
B
Driven
and
part
of
it,
of
course,
is
just
really
even
getting
access
to
data.
And
how
do
we
find
all
this
small
mini
data
silos
and
explore
it
to
the
users
in
a
very
good
approached
way,
and
so
some
of
the
challenges
that
we
are
trying
to
resolve?
B
I
don't
think
there
is
a
really
new
to
many
of
you,
since
you
are
here
and
part
of
this
community,
and
so
the
silent
knowledge
about
data
is
definitely
one
of
the
challenges
we're
trying
to
address
by
introducing
the
data
hub
and
the
data
catalog,
helping
our
users
to
apologize,
helping
our
users
to
even
find
data
to
remove
this
tribal
knowledge,
remove
a
bottleneck.
B
So
a
small
analytics
team
has
to
constantly
work
with
different
teams
to
explain
to
them
all
the
data
that's
available
and
what
the
data
contains
and
how
to
best
operate
the
data.
So
the
slow
analytics
process
of
even
many
of
our
users,
for
example,
they
complain
about
even
finding
what
data
is
available
to
them.
Can
they
even
access
the
data?
What's
where
and
what
teams
to
work
with
when
get
into
the
data?
And
it's
been
even
in
the
last
couple
of
months,
the
community
has
been
as
we
introduced
the
data
catalog.
B
The
community
already
been
very
excited
and
we
see
a
lot
of
users
coming
to
the
data
clock
looking
and
asking
questions
and
providing
a
lot
of
feedback.
So
it's
definitely
been
a
pleasure
to
introduce
that
within
the
company.
B
In
addition
to
that,
so
one
of
the
interesting
features
challenges
we're
trying
to
solve
by
introducing
data
catalog,
finding
all
the
siloed
data
infrastructure
and
potentially
integrating
it
into
a
core
data
platform
to
help
the
company
really
decrease
the
operational
costs
that
today
a
bit
inflated
to
decrease
operational
complexity
for
many
teams
and
to
really
concentrate
on
data
processing
and
utilization
of
data
rather
than
maintaining
and
securing
a
lot
of
data
systems
around.
B
Our
compliance
team
has
a
very
small
team
and
as
the
company
growing
and
working
with
global
customers
around
the
world,
including
europe
and
brazil
and
africa,
so
a
lot
of
the
company.
A
lot
of
countries,
as
you
all
know,
introduced
different
compliance
laws
and
policies,
and
so
the
company
has
definitely
been
looking
for
solutions
and
we
have
started
it.
B
Looking
at
commercial
solutions
for
data
catalog,
but
unfortunately,
couldn't
find
anything
that
would
really
fit
our
needs,
and
so
I
just
was
in
the
meeting
this
week
with
our
compliance
team
about
the
data
hub,
and
I
was
sharing
everything
that
trishank
and
I
was
talking
earlier
this
week
of
all
the
road
map
and
all
the
items
features
they'll
be
introducing
and
they've
been
very,
very
excited
to
see
and
work
with
us
to
solve
a
lot
of
their
use
cases
and
a
lot
of
manual
operations
that
they
do
today
and
and
really
improve
their
compliance
moisture
within
the
viasat
itself.
B
And
so
our
technology
evaluation
started
sometime
june.
I
think
last
year
and
we
were
very
excited
and
data
hub
became
open
source.
We've
been
sort
of
following
the
journey
of
that
product.
Within
linkedin,
we've
always
been
a
fan
of
linkedin
products.
We
operated
kafka,
a
few
other
systems,
so
it's
always
been
a
really
a
pleasure
to
see
that
linkedin
open
source
as
a
product
and
so
data
hub
definitely
made
a
list
for
evaluation
and
some
other
systems.
We
looked
at
as
apache
atlas.
B
The
feature
richness
we
were
looking
for
lineage
was
really
important
for
us
ease
of
search
of
this
of
the
data
different
metadata
ingest
methods,
overall,
security
of
a
product
as
well
as
data
modeling
flexibility
was
important
to
us
because,
as
I
pointed
out
earlier,
there
are
definitely
a
variety
of
mini
data
silos
around
the
company
and
we
definitely
anticipated
the
challenge
of
trying
to
model
all
the
data
in
a
very
flexible
way
to
ensure
that
we
onboard
all
the
teams
and
not
limited
in
that
ability
to
on
on
board
them
right
and
so
is
of
development.
B
What
do
I
mean
by
this?
We
are
service.
We
like
open
source
products,
we
like
to
contribute
to
open
source
product
as
well,
and
so
we
did
evaluate
what
tech
stack
is
behind
the
each
of
these
products
to
ensure
that
we
are
capable
of
submitting
prs
and
really
understanding
the
code.
That's
needed,
maybe
even
helping
with
some
bug
fixes
with
the
community,
so
that
was
one
of
our
valuation,
criterias
and
then
ease
of
operations.
B
Our
team
is
very
small
and
we
do
operate
a
variety
of
different
tools
and
systems
and
the
easier
the
process
is.
The
stability
of
a
product
is
really
important
for
us.
The
upgrades
just
really
deployment
and
evaluation
from
development
to
production
ability
to
integrate
and
test
the
tools
before
promoting
to
production.
So
all
of
this
components
definitely
be
evaluated
as
well
scalability.
We
do
have
a
lot
of
data
and
we
do
have
a
lot
of
different
micro
services.
B
So
when
we
start
talking
about
the
lineage
and
ability
to
really
capture
a
lot
of
the
different
events,
that's
happening
with
the
data.
We
wanted
to
make
sure
that
the
scale
is
not
a
problem
for
the
new
tool
that
we
additionally
select
a
road
map.
We
knew
that
we
won't
be
able
to
sort
of
take
advantage
of
all
the
heavy
chain
features
immediately
for
our
customers.
B
So
we
wanted
to
do
like
a
slow
roll
out,
an
addition
of
the
various
features
within
viasat,
and
so
we
looked
at
the
data
hub
roadmap
for
open
source
in
multiple
features
and
really
well
aligned
with
what
we
were
trying
to
do.
Introducing
the
lineage
traditional
ml
models,
introducing
some
of
the
metrics
functionality
and
data
quality
ratings,
so
it
so.
B
The
timeline
looked
great
and
just
the
fact
that
we
were
seeing
everything
we
needed
on
that
roadmap
was
really
exciting
and
then
community
rating
just
for
product
itself,
the
github
rating
and
just
how
well
community
supports
the
product.
So
we
took
a
look
at
it
as
well,
and
I
guess
it's
no
surprise
since
I'm
here
today,
the
data
hub
was
the
product
we
selected
and
so
far
so
good.
It's
been
really
really
good
journey.
B
However,
we
implemented
our
own
ui,
not
really
implemented,
I
per
se,
but
we
did
have
an
existing
interface
for
somewhere
access
requests
with
some
basic
search
functionality
that
held
some
of
the
metadata
already,
and
so
we
reused
that,
because
customers
were
already
familiar
with
that
ui
and
they
were
a
lot
of
it,
all
the
access
requests
were
automated
and
we
didn't
want
to
remove
that
from
our
customers
and
we
didn't
want
to
sort
of,
extend
and
fork
the
data
hub
ui
as
well.
B
So
that's
sort
of
one
of
the
reasons
why
we
went
to
our
own
ui.
We
did
also
added
the
feedback
button
to
our
ui,
to
gather
as
much
of
the
information
for
our
users
as
possible.
B
We
introduced
some
of
the
product
metrics,
so
we
have
the
product
analytics,
that's
being
gathered
from
this
ui
to
really
understand
how
users
are
interacting
with
data,
what
type
of
features
they
want,
as
we
also
introduce
new
features,
to
make
the
experience
as
easy
as
possible
as
and
then
some
way,
flexibility
of
integrations
with
some
of
the
tools
that
we
have.
So
we
wanted
to
keep
that
option.
B
So
as
an
example,
some
of
the
global
metrics
store
that
we
are
working
on
to
express
that
in
the
same
interface,
some
of
the
potential
visualizations
from
all
the
data
like
sampling
of
the
data
or
some
simple
graphs
within
the
within
our
ui
itself,
and
maybe
even
for
our
compliance
team
introducing
some
type
of
reporting
mechanism
within
the
ui
and
having
it
serve
like
all
integrated
experience.
B
B
I
chatted
with
the
team
this
week
to
really
understand
were
any
issues
and
deployments
as
we
were
doing
it,
and
there
were
minor
things
in
the
beginning
when
we
were
integrating
viral
kafka,
the
secure
kafka-
and,
I
believe
javier
sotelo
on
my
team.
So
he
contributed
the
small
code
to
exposing
some
of
the
kafka
parameters
at
the
time
and
we're
really
happy
to
see
that
being
really.
I
think
it
was
within
the
days
it
was
accepted
and
emerged,
so
we
were
able
to
quickly
proceed
with
our
deployments.
B
I
think
some
of
the
other
issues
have
been
contributing
as
well
has
been
accepted.
I
think
he
helped,
even
with
some
of
the
code
reviews,
so
that's
been
really
good
to
see
just
how
how
well
the
community,
how
sort
of
responsive
community
has
been
and
how
accepting
so
welcoming.
So
it's
really
been
good
for
us.
B
So,
as
I
mentioned,
it's
been
operating
in
production
really
well
for
a
few
months
now
we
have
kind
of
bypassed
the
full
dev
setup
and
went
straight
to
production
and
then
did
the
dev
up.
So
we
could
iterated
all
the
new
functionality
very
quickly.
So
today
we
operated
in
both
we
did.
B
I
think
we
didn't
do
much
of
a
complex
modeling
today,
but
so
far
we
also
had
great
experience
to
support
in
a
lot
of
the
data
that
we
maintained
in
some
of
our
small
mini
catalog,
and
so
it
was
very
simple
and
we
really
were
excited
to
see
the
rfc
for
the
business
glossary
we've
been
looking
somewhat
compliance
taxonomy
and
thinking.
B
Maybe
we
could
contribute
to
a
conversation
around
that
as
well,
since
it's
on
our
roadmap
and
we're
excited
to
work
with
the
community
to
see
how
that
could
be
a
joint
conversation
and
so
just
even
presenting
here
today.
So
it's
definitely
a
pleasure
and
thank
you
for
having
me
it's
kind
of
what
we
have
today
and
what's
the
future
state
is
so
we
do
have
in
production,
we've
harvested
some
of
a
core
data
platform
and
we
integrated
value.
B
I
and
right
now
we're
starting
to
work
with
the
rest
of
the
viasat
teams,
and
we
already
got
a
lot
of
interest
from
all
these
teams,
which
was
very
good
to
hear
and
see
because
previous,
I
think
it's
been.
We
realized
that
we're
not
the
only
one
who
up
here
sort
of
operate
the
data
infrastructure
but
needed
this
functionality,
and
the
team
has
been
really
delivering
very
good
products
within
the
company
and
good
service,
and
so
we
have
this
trust
of
our
teams.
B
B
We
shared
the
data
hub
information
with
them
and
all
the
teams
been
really
supportive
of
our
selection
of
the
product.
So
it's
making
this
adoption
definitely
much
easier
for
us
and
so
we're
working
the
rest
of
the
teams
to
ingest
information
about
their
data,
working
with
a
compliance
team
to
see
how
we
could
introduce
features
necessary
for
them.
B
The
lineage
has
been
really
anticipated
within
the
companies
they're
waiting
on
that
as
and
will
be
probably
around
the
summer,
really
integrating
with
all
our
tools
to
introduce
lineage,
visualizations
and
then
dashboards
and
reports.
We're
really
excited
to
see
the
rfc
and
the
backend
implementation
for
dashboards.
B
So
it's
definitely
we've
been
following
that
very
closely,
as
well
as
the
male
models,
we're
starting
to
sort
of
standardize,
a
lot
of
our
ml
approaches
around
the
company,
and
that
will
be
very
good
to
add
into
the
data
data
catalog
and
then
metrics
and
data
quality
integration
would
be
sort
of
following
closed
list.
B
A
Awesome
thanks
anna
for
people
who
might
have
questions
for
anna,
let's
hold
off
and
have
it
at
the
end,
just
because
we
have
quite
a
lot
to
get
through
and
just
have
20
more
minutes
left,
so
I'm
gonna
switch
over
to
our
second
second
talk
for
the
day,
which
is
john
and
arun,
are
gonna
talk
about
graphql
and
react.
F
Yeah
sure
thanks
srishan,
thanks
anna,
that
was
great
to
hear
how
you
guys
are
using
it
at
the
company.
So,
as
many
of
you
may
know,
I'm
kind
of
a
recent
community
member
I
joined
about
a
month
ago.
Previously
I
was
working
at
linkedin
on
a
federated
graphql
layer
and,
more
recently
I
I
have
joined
sri
shanka
in
his
in
his
venture.
So
as
part
of
sort
of
booting
up
with
data
hub,
I
set
myself
a
goal
to
try
to
do
some
work
on
the
front
end.
F
Specifically,
I
was
looking
to
extend
the
front
end
to
add
something
like
dashboards
or
charts
that
have
been
added
in
the
back
end
and
quickly
that
initial
goal
sort
of
morphed
into
a
different
goal,
which
was
to
see
how
quickly
I
could
spin
up
sort
of
a
parallel
stack
on
the
front
end,
specifically
choosing
react
and
graphql,
and
what
I
want
to
share
with
you
guys
today
is
sort
of
what
I
built.
Why
I
built
it?
F
F
So
the
first
thing,
there's
kind
of
two
parts
to
what
I've
been
working
on.
The
first
thing
is
actually
adding
a
graphql
endpoint
into
the
existing
data
hub
front-end
server,
which
is
a
play
server.
The
reason
I
chose
to
add
it.
There
was
that,
first
of
all,
it
was
just
the
clearest
pathway
we
already
had
the
server.
We
can
just
add
a
new
endpoint
there
that
supported
the
graphql
spec.
F
I
didn't
have
to
spin
anything
else
up,
but
also
I
figured
that
the
existing
ember
app
may
benefit
from
being
able
to
talk
to
a
graphql
server
in
the
future.
So
I
decided
to
place
it
there
and
the
second
thing
you
can
see
that,
basically,
what
I've
done
is
just
add
a
little
module
to
data
hub
front
end,
and
the
second
part
is
introducing
a
new
react
client
that
talks
to
this
graphql
endpoint
to
create
the
views
on
the
front
end.
So
what
we
essentially
have
here
is
sort
of
a
parallel
ui
stack.
F
B
F
Hoping
to
get
out
of
get
out
of
this
is
to
actually
be
able
to
incubate
both
of
these
pieces
with
the
help
of
the
community
over
time
and
iterate
on
them
with
the
community
collaboratively.
F
So,
first
I'll
talk
about
why
I
thought
it
was
a
good
idea
to
sort
of
add
a
new
application.
F
Specifically,
you
know
I
had
come
to
the
project
without
much
much
experience
with
ember,
and
I
think
I
personally
faced
the
steep
learning
curve
of
ember
trying
to
get
up
to
speed
on
the
existing
data
hub
client,
and
so
I
think
that
technology
wise,
it
would
make
sense
to
add
a
react
app
mainly
because
it
can
extend
the
reach
and
accessibility
of
the
data
hub
front
end,
as
you
guys
probably
know
like
react,
is
quite
a
bit
more
popular
and
much
more
familiar.
F
F
F
These
aspects
so
specifically
making
sure
that
those
levers
for
customization
and
extension
by
individual
organizations
are
built
into
the
front
end
from
the
beginning.
We
can
have
a
chance
to
clean
up
some
of
that
dead
code
that
may
not
be
used
today
and
then.
Finally,
we
can
make
sure
that
we
have
very
clear
and
documented
paths
to
doing
things
like
extending
the
front
end.
F
So
you
know
in
order
to
understand
what
those
endpoints
from
the
play
server
were
returning.
They
were
just
json
globs
I'd
have
to
kind
of
probe
those
endpoints
directly.
Some
of
them
had
sort
of
these
view,
models
that
are
specific
to
the
client,
like
data
set
view,
and
some
of
them
were
just
gms
models
that
were
passed
through
so
like
the
corp
user
pdl.
F
So
I
found
that
it
really
helped
my
my
own
personal
iteration
speed
by
sort
of
taking
the
time
at
the
beginning
to
establish
that
explicit
contract,
and
I
think
graphql
is
great
for
this,
because
it
provides
these
self-documenting
strongly
typed
and
validated
contracts.
It
provides
like
a
dependency,
which
is
this
intermediate
api
layer
that
the
client
can
depend
on,
which
makes
it
also
very
easy
to
sort
of
switch
the
implementing
server
technology
in
a
way
that's
opaque
to
the
end
client.
F
The
next
thing
is,
you
know,
I
think,
graphql
in
itself
kind
of
reduces
the
the
api
calls,
as
well
as
the
noise
and
the
by
virtue
of
being
able
to
sort
of
ask
for
exactly
what
you
want,
as
well
as
being
able
to
traverse
related
entities
with
fewer
api
calls.
So
there's
just
a
reduction
in
this
back
and
forth.
F
So
if
you
guys
want
to
see
kind
of
how
I
did
it,
I
have
a
few
rfcs
open.
One
on
graphql
queries,
one
on
mutations,
still
some
discussions
ongoing,
which
I'll
discuss
in
a
bit
and
then
I
have
a
new
proposal
to
sort
of
incubate.
This
parallel
react
app,
as
well
as
a
proof
of
concept
for
for
all
of
those
things
as
well,
and
I
think
with
that,
I'm
going
to
just
quickly
jump
into
a
demo
of
the
react
stuff.
F
I've
been
working
on
now,
I'll,
okay,
can
you
I
might
want
to
screen
share,
let's
see:
okay,
okay,
yeah.
So
right
now
like
this
is
very
mocked
like
we
are
working
against
graphql,
but
we're
using
this
mock
graphql
server
that
instead
of
the
datahub
frontend
to
actually
populate
the
data,
but
I'll
just
show
a
few
screens
just
as
a
proof
of
concept.
So
here
you
can
see.
I
directed
us
right
to
a
search,
because
this
is
something
I've
already
implemented,
but
you
can
see
it's
pretty
similar
to
the
existing
search.
F
F
A
F
F
Yeah,
okay,
now
just
quickly
talk
about
some
of
the
open
discussions
we
have
on
these
topics,
so
the
first
thing
is
sort
of
how
we
should
model
graphql
queries.
F
Our
proposal
is
that
we
essentially
take
the
public
gms
models,
not
the
entity
and
aspect
models,
but
the
models
that
are
exposed
at
the
gms
get
and
batch
get
api
layer
which
is
like
dataset.pdl,
for
instance,
and
use
those
to
actually
auto-generate.
This
graphql
schema
such
that
we
don't
have
to
sort
of
maintain
multiple
type
systems
or
schemas
across
different
layers
of
the
stack
right.
Now
we
kind
of
have
this
divergence.
F
In
some
cases
there
are
different
view
models
on
the
front
end,
and
what
that
means
is
that
it's
just
more
difficult
to
extend,
there's
more
steps
to
to
make
changes.
However,
you
know
I
recognize
that
there
may
be
cases
where
we
need
kind
of
those
front-end
specific
fields,
and
I
think
we
can
do
both
with
some
sort
of
extension
system,
which
we
can.
We
can
talk
more
about
in
offline.
F
The
second
thing
is
modeling
mutations,
there's
kind
of
two
schools
of
thought
floating
around
right
now.
One
is
sort
of
keeping
the
both
the
mutations
and
the
queries
actually
sort
of
entity
oriented
to
the
front
end,
which
means
we
don't
explicitly
model
this
concept
of
entity
and
aspect
specifically
like
the
differentiation
between
the
two
on
the
front
end
and
the
front
end
simply
treats
all
of
these
models
as
entities
just
single
documents,
which
it
can
do
updates
against
as
opposed
to
having
to
have
different
routes
or
different
mutations.
F
For
each
aspect,
say:
ownership
schema,
etc.
That
you
need
to
change.
You
can
just
have
one
top
level
data
set
that
you
can
update.
With
all
of
that
information,
I
think
you
know
the
downside
to
the
aspect.
Oriented
is
that
you
just
have
this
more
coupling
throughout
the
entire
stack,
where
aspects
as
a
concept
sort
of
bleed
across
everywhere
right
into
data
up
front
end
and
then
eventually
into
the
client.
F
So
this
is
still
very
much
an
open
discussion
ongoing
I'm
interested
to
hear
what
the
community
members
think
for
certain
and
then
the
next
thing
is
sort
of
the
the
react
ember
drift
problem,
and
I
think
you
know
we're
very
aware
that
this
can
happen.
I
think
our
straw
man
is
that
in
the
long
term
the
react
app
should
be
kind
of
the
default
disposition
of
the
community,
and
what
that
means
is
you
know
in
the
short
term,
there's
definitely
going
to
be
that
functional
difference
as
react.
F
Kind
of
gets
up
to
speed
to
match
to
achieve
parity
with
the
ember
app,
but
once
that
happens,
we'll
have
to
kind
of
talk
about
how
we
strategize
around
migration.
F
I
think
in
our
proposal,
we'd
sort
of
recommend
a
migration
of
clients
from
the
ember
app
to
the
react
app
at
that
point
and
then
eventually,
eventually
deprecate
support
for
the
ember
client
in
the
long
term,
and
with
that
I'm
going
to
talk
about
one
other
open
discussion,
we
have
going
right
now,
specifically
a
collaboration
with
with
expedia
who
is
also
interested
in
graphql,
albeit
in
a
different
light.
So
I'm
going
to
hand
it
off
to
arun
who's,
going
to
talk
about
their
experience
with
graphql
and
datahub.
E
Thanks
john
yeah,
it
was
it's
really
good
yeah,
so
I'm
maroon
was
david.
I'm
an
engineer
in
expedia
group.
So,
as
we
were
like
similar
to
what
viasat
was
talking
about,
our
internal
approach
is
also
like.
E
We
have
the
backend
data
hub
and
our
ui
is
completely
internal,
that
we
hosted
up,
and
so
we
are
using
the
data
hub
backends
to
to
pull
information
right
now
directly
from
the
data
stores,
so
I'll
jump
into
some
of
the
motivations
behind
this
graphql
approach
internally
and
what
we
are
doing
along
with
john
so
internally,
we
have
a
react
and
node.js
front-end
application,
which
directly
talks
to
the
data
stores,
the
mysql
neo4j
in
order
to
read
some
of
these
data.
E
So
that's
the
main
motivation
and
where
it
fits
in
the
architecture
is
mainly
like
from
the
front
end.
There
would
be
a
separate
graphql
service
as
now
we're
thinking
it
like
a
string
boot
service
that
that
would
call
the
common
resolvers
the
common
resolvers
are.
Basically.
This
is
where
me
and
john
would
work
together
on
coming
up
with,
because
john
is
calling
up
from
his
play
server
these
common
resolvers.
E
So
we
didn't
want
to
duplicate
these
resolvers,
so
we
would
try
to
come
up
with
something
common
for
both
of
us,
and
these
would
call
the
gms
dolls
directly
through
the
sleep.
The
rest
of
the
architecture
would
be
familiar
for
you
guys,
because
that's
that's
similar
to
what
data
hub
is
these
green
components
are
the
only
ones
that's
added
yeah.
Moving
on
these
are
some
of
the
details
on
how
I'm
planning
to
implement
it.
E
So
that's
the
metadata
graphql
api
in
itself
would
be
a
standalone
deployed
that
would
call
the
common
resolvers
to
get
all
the
resolvers
fields
and
from
there
would
be
the
wrestling
call
to
the
gms
dolls
and
specific
to
the
gms
clients
say
the
data
sets
or
ownerships
or
any
of
the
other
things,
the
ml
model
that
that
gets
added.
So
all
of
this
would
be
called
from
the
common
resolvers.
E
So
by
this
way
we
wouldn't
we
would
be
using
the
same
code
across
for
the
both
the
front-end
and
gms
apis
yeah.
That's
all
from
me
and.
F
Thanks
erin
yeah
I'll
just
go
ahead
and
summarize
that
discussion.
So,
although
the
specifics
are
still
in
flux,
I
think
what
we're
thinking
is
that,
because
a
runes
use
case
requires
sort
of
a
stand-alone
application
that
will
not
be
communicated
to
from
a
front-end
as
in
like
react
or
ember.
But
instead
a
node
server
will
essentially
have
a
common
library
that
we
can
both
work
against,
which
has
essentially
a
shared
graph
derived
from
the
gms
models,
as
well
as
a
shared
set
of
resolvers
that
can
be
pulled
into
both
of
those
deployables.
F
And
this
is
just
the
the
overall
picture
here
where
you
have
the
expedia
node
server
and
the
ember
and
the
react
app
in
the
picture
as
well
here
yep,
so
that
that's
pretty
much
it
thanks.
Thanks
erin
appreciate
it.
Thank
you.
F
A
All
right
thanks,
john
and
arun,
looking
forward
to
the
collaboration
here
in
terms
of
the
general
q,
a
I
was
just
taking
a
look
at
the
sign
up
sheet
and
questions
over
there.
One
of
the
questions
that
has
been
asked
from
the
community
a
couple
of
times
is:
we
had
seen
some
sneak
peeks
of
the
new
lineage
ui
when
is
that
getting
rolled
out
to
the
open
source
version?
So
if
you
know
harsh
or
nacho,
you
guys
are
on
the
call,
maybe
you
can
share
your
plans.
G
Thanks
this
is
harsh.
I
I
don't
think
nacho
could
make
it
today,
but
we
are
actually
working
actively
on
rolling
out
our
new
ui.
It's
it's
definitely
a
leap
in
terms
of
your
user
experience,
so
it
will
make
it
much
easier
to
understand
the
flow
of
relationships
and
the
other
thing
that
we
are
also
working
currently
actively
is
to
have
jobs
as
nodes
within
the
ui
to
connect
you
know
to
build.
G
The
linear
end
went
which
should
also
show
different
transformations
or
movement
of
data
happening
across
the
ecosystem.
Yeah,
I
think
so
stay
tuned.
For
that
we
we
should
be
hoping
to
launch
that
out.
This
quarter,
there's
a
few
other
things
that
are
coming
from
the
lineage
perspective.
Is
we
just
as
you
saw
early
in
the
in
the
talk
that
we
checked
in
the
rfc
for
azkaban
jobs
and
flow?
G
So
we
want
to
launch
a
reference
implementation
for
that
onboarding
those
entities,
and
that
should
also
serve
as
an
example
for
the
more
popular
airflow
integration
that
folks
are
past.
So
those
are
a
couple
of
things
that
are
in
the
pipeline
and
hopefully
we'll
get
that
out
from
linkedin
as
well.
Yeah.
A
Awesome
is
that
gonna
include
the
spark
integration
as
well.
A
Awesome
were
there
any
other
questions
that
people
wanted
to
get
to.
A
H
So
at
I'm
ryan
and
I
work
with
arun
at
expedia,
so
we
we've
had
some
performance
issues
with
neo,
especially
on
loading.
I
I
know
that
linkedin
had
mentioned
that
they
were
also
experiencing
some
performance
issues
and
we're
looking
into
different
alternatives
and
ways
to
fix
it.
H
One
of
the
things
I've
been
looking
into
recently
is
d
graph,
based
on
some
of
the
benchmarks
that
they've
posted
it
looks
pretty
promising.
I've
done
some
preliminary
tests
with
some
small
loads
as
well
and-
and
it
looks
like
there's
a
pretty
massive
speed
up.
I
was
just
wondering
if
y'all
had
done
any
research
into
some
of
other
alternatives,
including
maybe
any
progress
you
all
have
made
on
the
the
kafka
streams
connector
for
neo.
C
Sure,
hey
ryan,
so
for
the
d
graph
we
actually
haven't
really
kind
of
deep
dive
in
it.
We,
but
we
did
read
the
article
that
talks
about
the
graph,
and
we
also
like
the
fact
that
this
is
a
rdf
graph.
It
has
the
native
graphql
support.
C
It
also
like
supports
strong
consistency
and
full
text
search.
However,
I
think
it's
also
lack
of
the
grambling
which
we
try
to
make
it
as
our
query
deal
in
the
in
the
plan
internally,
as,
as
you
mentioned,
for
the
new
4g
performance
issue.
C
Yes,
we
realized
that
we
are
working
on
that
in
progress
and
we
don't
have
a
new
benchmark
or
any
performance
testing
out
yet
because
we
didn't
get
too
much
bandwidth
last
quarter,
but
we'll
try
to
get
onto
that
this
quarter,
and
also
internally,
we
are
for
the
graph
technology.
We
were
actually
also
looking
integrations
with
liquid,
which
is
a
in-memory
graph.
That
developed
a
design
and
developed
in
linking
and
liquid,
is
also
looking
for
open
source,
starting
seriously
put
it
on
on
the
roadmap.
C
So
there
might
be
some
updates
in
future
for
this
stay
tuned.
So
for
the
d
graph,
we'll
also,
if
we
have
more
bandwidth
to
give
it,
give
it
a
test
some
some
time.
H
Okay,
yeah
and
I'm
I'm
working
on
doing
a
quick
implementation
as
well.
So
oh.
C
Let's
actually
have
some
offline
sync
up
on
this
as
well.
Maybe
you
can
do
a
demo
next
time.
H
Yeah
we'll
see,
hopefully
I
get
you
know
something
up
and
running.
That's
worth
demoing.
A
A
Be
great
that'll
be
great,
that's
the
power
of
the
community
love
it
all
right.
So
I
think
we're
just
on
time.
So
thank
you!
Everyone
and
see
you
on
the
19th
of
february,
so
we're
basically
standardizing
on
the
third
friday
of
every
month
from
now
onwards.
So
you
know,
stay
safe
and
stay
healthy
and
we'll
see
you
in
a
few
weeks.