►
From YouTube: gRPC November Meetup/ gRPC integration and its applications in Hive Metastore by Zhou Fang
Description
gRPC is a modern open source high performance RPC framework providing advanced features such as authentication, service mesh support, and streaming. Dataproc Metastore (DPMS) on GCP is integrating Hive Metastore with gRPC as an access path in addition to Thrift. This talk introduces the design of gRPC integration, and how it enables new DPMS features such as Cloud Run endpoint, find-grained IAM, and metastore federation.
A
A
How
do
they
replace
the
old
apache,
strict
rpc
endpoint
with
grpc,
and
how
did
we
build
new
application
based
on
that
at
first?
I
guess
because
maybe
not
everyone
I
think
familiar
with
a
hive
or
headline
stores.
I
would
like
to
briefly
introduce
this
product.
A
A
So
this
api
is
currently
apache
script
and
this
api
is
what
is
used
by
all
the
interface
defined
here.
I
presto
spark
the
architecture
of
a
high
metastore
looks
like
this.
So,
as
I
mentioned,
it
has
a
selected
server.
The
interface
is
swift
and
there
is
a
mypasto
server,
which
is
a
stateless
rpc
server
and
all
the
metadata
is
stored
in
a
relational
database,
which
you
can
see
on
the
right
side.
The
rdbms
and
the
database
and
the
server
together
becomes
the
high
metastore,
and
the
hive
spark
presto
c
will
read.
A
So
we
have
a
lot
of
pinpoints
using
swift,
actually,
okay,
because
this
is
a
product
like
built
decades
ago.
At
that
time,
maybe
swift
is
the
most
popular
rpc
and
point
and
is
a
beginner.
A
So
almost
all
the
apache
ecosystem
are
not
using
jpc.
But
today,
when
we
look
at
how
swift
is
used
here,
they
identify
the
like
limitations,
for
example,
if
the
swift
does
not
support
streaming
apis
like
decades
ago
metadata
here,
as
you
can
see,
is
just
like
informational
database
tables.
They
are
not
real.
Data
data
is
very
big,
but
metadata
is
small,
but
today
things
has
been
changed.
The
metadata
itself.
It
becomes
very
big
because
data
is
bigger
and
bigger,
so
even
a
rate
to
the
store.
We
all
written
a
lot
of
results.
A
For
example,
it
will
return
thousands
of
petitions,
it's
a
a
huge
payload.
So
if
we
have
streaming
api,
it
would
be
very
helpful
to
improve
the
performance,
but
unfortunately,
it's
not
available
in
strict
and
also
swift.
We
cannot
use
it
together
with
serverless
platform
in
the
previous
diagram.
A
A
A
It
asks
users
to
use
self-managed
approaches,
for
example,
kerbals
and
ldap.
For
example.
Then
customers
need
to
set
up
those
authorization
by
themselves.
They
need
to
freedom,
have
a
kerbalist
kerberos
k,
distribution
center
or
ldap
server,
which
is
very
complex
and
not
secure.
A
What
what
is
missing
here
is
that
the
rpc
tool
does
not
have
a
fully
managed
authorization
or
authentication
solution,
and
in
future
we
also
plan
to
have
better
identity
and
access,
for
example,
based
on
which
rpc
method
you
call,
we
verify
your
identity
according
to
different
permissions.
For
example,
if
you
access
different
database,
we
will
check
whether
you
have
permissions
on
this
database.
So
all
those
complex
authentication
approach,
we
don't
have
a
good
way
to
build
it
in
through
the
day.
A
For
those
reasons,
we
decided
to
invest
to
migrate
to
grpc,
and
here
are
a
few
benefits
to
be
identified
when
we
do
this.
Migration
first
is
high
performance,
it's
used
http
2
and
the
profile
3.
it
has
built
in
authentication
support,
and
it's
called
token-based
authentication
approach
and
also
it
has
sl
and
t
os
pdls,
and
so
many
of
those
approaches
are
compatible
with
cloud
native
authentication
frameworks.
For
example,
on
google
cloud
we
have
iam,
which
is
an
integrated,
managed,
authentication
framework.
A
If
we
can
let
users
use
iem,
they
will
be
very
secure
and
they
don't
need
to
deploy
any
authentication
layer
by
themselves
and
also
the
identifier
is
very
easy
to
extend
the
functionalities
through
the
intercept
api
of
grpc.
I
will
show
how
it
is
used
for
later
in
our
applications.
A
Also,
japanese
is
better
supported
by
service
mesh,
for
example,
istio,
because,
as
a
cloud
service,
we
have
metastore
high
meta
stores,
endpoint
exposed
to
customers,
but
that
endpoint
is
not
directly
exposed
to
customers.
We
have
a
layer
behind
ahead
of
it
so
which
we
use
is
you,
but
the
strafe
does
not
have
the
better
support
with
structure.
So
jrpc
is
still
the
best
here
and
also,
as
I
mentioned,
jrpc
is
better
integrated
with
a
cloud
run,
which
is
a
serverless
platform
to
deploy
servers.
A
A
A
So
here
is
how
we
do
the
jpc
integration,
so
the
first.
This
is
actually
very
hard
for
us
because
in
straight
high
metastore,
so
the
swift
rpc
interface.
It
has
a
over
100
methods.
A
If
we
want
to
replace
it
with
jpc,
we
need
to
rewrite
the
intel
server,
which
is
a
huge
amount
of
work,
and
another
issue
is
that
the
whole
ecosystem
is
using.
This
service
interface,
if
we
change
it
to
jpg
our
client,
cannot
use
it
because
they
are
clients,
they
are
swift
clients
in
the
swift
client.
They
are
integrated,
impressed
or
in
spark,
and
also
they
don't
want
to
introduce
a
huge
change
to
the
upstream
open
source
world
to
replace
it
with
the
entire
grpc.
So
they
want
to
do
it
step
by
step.
A
A
For
example,
as
you
can
see,
in
the
first
picture,
the
upper
one.
On
the
left
hand,
I
have
a
jpg
client.
On
the
right
hand,
I
have
a
swift
server.
So
how
can
this
stream
the
server
talk
with
this
jrpc
client?
So
the
solution
is
that
we
deploy
a
proxy
sitting
together
with
the
server
when
the
jrpc
client
sends
a
jpc
request.
This
proxy
translates
the
jpc
client
request
to
a
strip.
A
The
request,
under
the
thrift
request,
is
processed
by
the
server
the
server
return
of
server
response,
the
proxy
again
translated
back
to
jpc
response
and
give
it
back
to
the
client
then
choose
this
client,
the
proxy,
together
with
the
server
the
tool
behaves
like
a
single
jrpc
server.
So
this
is
the
proxy
on
the
server
side.
A
So,
on
the
client
side
the
same
actually,
today
we
only
have
select
client,
but
if
we
run
a
run,
a
proxy
sitting
together
with
the
client
and
they
do
strip
the
jrpc
translation,
then
a
swift
client
can
work
together
with
a
jpc
server.
So
we
need
two
proxies
and
then
you
can
imagine
the
local
communication
between
client,
proxy
and
the
local
communication
between
server
proxy
the
security
rift,
but
the
communication
between
the
two
proxies
on
the
public
internet
now
becomes
jrpc.
A
So
here
is
the
idea
of
the
proxy.
So
now
you
can
see
we
don't
need
to
change
any
code
of
the
server
right
now
we
can
make
the
communication
jrpg
here
is
how
we
implement
the
proxy.
There
are
three
steps
so
on
the
left
side
you
can
see.
This
is
the
three
of
the
spec
of
the
strap
on
the
right
side.
This
is
an
equivalent
jrpc
struct
and
the
the
first
step
is
that
we
manually
add
an
equivalent
jpg
interface.
A
It
is
a
mapper
of
java
object,
so
we
use
it
to
build
the
core
of
the
proxy.
Every
time
we
receive
a
swift
java
object,
for
example,
a
request.
We
use
this
map,
strap
to
map
it
to
a
jpc
request
and
send
it
out
and
in
most
of
time
it
comes
to
the
mapping
automatically.
But
there
are
corner
case,
for
example,
in
strict
you
can
define
a
list
of
objects
as
a
key,
but
it's
not
possible
in
jpc.
A
The
third
step
is
that,
because
in
the
proxy
and
the
switzerland
jpg
proxy,
it's
lessening
on
thrive's
request
for
lessening
on
jrpc
requests,
so
we
need
to
strip
the
server
or
jrpc
server
and
this
server.
We,
we
don't
want
to
write
this
server
manually
because
it
has
hundreds
of
methods,
hundreds
of
handler
functions,
to
write.
A
A
A
A
So
the
final
step,
we
plan
to
throw
away
this
swift
server
and
use
only
this
jpc
server
or
make
this
jpc
server
the
primary
server
and
the
user
proxy
when
people
need
to
use
system
server
for
backward
compatibility-
and
we
have
a
high
proposal
here
so,
which
is
the
works
we
are
currently
working
on-
to
make
it
open
source.
A
Okay,
so
I
would
like
to
introduce
the
tools
that
I
use
this
demo
a
little
bit.
I
use
google
cloud
platform
and
I
use
a
service
called
data
clock,
so
data
probe
is
a
manager
service,
running
spark
hive,
hadoop
press
or
all
those
attached
or
open
source
data
processing
tools
by
google
cloud.
A
A
A
A
Called
jpg
demo
one-
and
we
can
use
this
to
quickly
test
that
we
we
are
using
this
high
method
store
by
checking
whether
this
database
will
be
seen
in
the
log
here.
So
let
me
refresh
the
log
and
you
can
see
that
the
create
database
request
has
been
printed
here,
which
means
that
the
connection
is
successful
and
we
are
using
this
high
metastore
and
we
can
verify
the
name
jrpc
demo,
one
here.
A
A
The
head
meta,
swift,
endpoint,
is
specified
here
you
can
see.
This
is
the
same.
Stripped
endpoint
of
the
high
method
we
just
used
so
with
this
cloud
run
service
and
the
high
metastore
service,
we
actually
get
a
new
grpc
high
method
service
because
it
will
be
lessening
on
grpc
high
methods
request.
A
A
This
is
because
we
will
use
docker
to
run
this
proxy,
so
this
is
docker
image.
So
similarly,
it's
the
image
of
the
proxy
binary.
A
A
Now,
let's
run
this
proxy.
A
A
A
A
So
this
log
is
for
the
command
that
we
just
run
is
for
create
database
jpg,
demo
2,
which
means
that
when
we
create
database,
it
sends
out
a
script
request
under
the
proxy
received
it,
and
then
it
sends
this
request
to
the
grpc
request
to
this
club
run
proxy
and
this
proxy
forwarded
to
the
server
which
is.
Finally,
it
is
a
high
metastore
and
we
can
verify
on
the
high
metastore.
Do
we
receive
this
request?
A
So
you
can
see
here
the
another
database,
jrpc
demo2,
create
database
request
is
received
here.
So
this
is
how
jpc
proxy
works.
So,
as
you
see
with
this
local
proxy
now,
this
high
hive
start
to
use
a
grpc
client
to
send
out
the
grpc
high
metastore
request
and
together
with
this
proxy
on
the
server
side,
this
high
metastore
becomes
a
new
hamburg
store
which
exposes
a
grpc
endpoint,
which
is
this
one.
So
the
network
connection
in
between
between
the
client
and
the
server
is
not
jrpc.
Instead
of
swift.
A
Okay,
let
me
jump
back
to
the
original
presentation,
so
in
the
previous
demo
I
showed
how
the
proxy
idea
I
described,
works
in
our
product
and
now
the
next
slide
is
now
we
have
jrpgc.
Now,
how
do
we
build
new
application
when
we
have
the
superpower
of
grpc,
so
the
important
features
that
we
build
is
I
give
an
example
here
called
the
fine,
green
iem
access
control
and
the
public
customer
endpoint,
so
the
before
we
have
jpc,
we
only
have
swift.
A
So
what
we
do
is
that
there's
no
enforced
access
control
in
high
metastore.
It
means
that
if
you
have
the
the
ip,
if
you
access
the
certain
point,
assuming
that
you
are
not
deploying,
for
example,
kerberos
or
ldap
server,
so
then
everyone
can
access
the
enterpoint.
A
So
in
order
to
protect
the
endpoint
from
others,
because
you
can
imagine
the
hive
methods
to
actually
store
all
the
metadata
of
our
organization,
so
the
data
security
is
very
important.
What
we
do
is
that
we
hang
this
ip
inside
a
private
network
which
means
that
outside
this
vpc
network,
no
one
can
access
it.
So
this
is
how
we
protect
it,
but
it's
there
a
lot
of
complexities
using
this
infrastructure.
A
The
issue
is
that
when
using
private
ip,
you
need
to
do
a
ipre,
because
all
of
our
resource
is
not
created
in
the
customer
visible
customer
project,
we
created
all
the
resources
in
a
hidden
tenant
project.
So
in
the
customer's
view
they
only
say
that
I
created
the
resource.
It
gave
us
a
served
endpoint,
it
don't
need
the
customers,
don't
need
to
worry
about
all
the
resources.
A
So,
but
in
this
way
we
need
a
vpc
theory
and
we
also
need
to
drag
your
education
in
the
customs
project
and
another
solution
without
be
disappearing
is
called
a
private
service
connect
but
easily.
We
need
to
do
a
lot
of
like
a
network
work,
but
now,
after
having
jpc
our
status
now
is
that
they
expose
the
public
endpoint.
So
this
endpoint
is
public.
Everyone
can
access
it,
there's
no
restriction
of
private
ip,
but
the
access
is
protected
by
a
iem.
A
A
A
Now
we
deploy
the
jrpc
swift
proxy
on
cloud
ram,
so
the
cloud
run
can
give
the
user
a
public
endpoint
and
on
the
customer
side,
most
of
our
customers.
They
are
using
databrick
on
google
cloud.
So
on
the
virtual
machine
of
dataproc,
we
will
pre-install
and
run
the
switcher
proxy,
so
customers
don't
need
to
do
anything
and
for
external
customers
who
don't
use
data
pro
cluster.
We
are
working
on
like
releasing
this
to
the
public,
so
everyone
can
run
the
swift
proxy
by
themselves.
A
For
this
I
find
green
iem
access
control.
We
actually
reuse
the
jrpc
interceptor
and
it
will.
It
will
check
the
user's
token
and
forward
the
token
to
the
im
service
to
check
this
permission
and
it
will
deny
the
service
if
the
corner
don't
have
the
permission.
A
Okay,
this
is
all
I
have
today.
So
to
summarize,
in
this
talk,
they
like
enhance
high
microsoft
with
japanese
integration,
as
we
enable
new
features
on
it,
for
example,
bankrupt.
I
am
because
endpoint
and
the
methods
of
federation-
which
I
don't
give
a
detail
here
and
in
future
there
are
actually
still
a
long
way
to
go
because
seriously
is
still
used
in
the
metastar
core.