►
From YouTube: Challenges in building and operating a Multi-Cloud-Provider Platform - Jörg Schad, ArangoDB
Description
Building a cloud-agnostic platform used to be a challenging task as one had to deal with a large number of different cloud APIs and service offerings.
Website: https://www.arangodb.com/
Organized by @Microsoft @kubermatic7173 @SysEleven
Thanks to our sponsors @CapgeminiGlobal, @gardenio, @sysdig, @SUSE, @anynines, @redhat, nginx, serve-u
A
So
yeah,
as
you
can
figure
out,
I
tried
to
squeeze
as
many
buzzwords
into
the
title
as
possible,
but
in
short,
we'll
talk
about
how
to
manage
containers
or
especially
stateful
Services,
across
multiple
Cloud
providers.
So,
in
short,
what
we
are
doing,
we
are
building
a
managed
database
service
across
multiple
Cloud
providers,
so
basically
targeting
the
big
ones,
AWS,
Google
and
Azure
and
kubernetes
is
a
great
abstraction.
A
So
what
we
decided
to
do
is
we
don't
want
to
do
all
the
work
ourselves
we're
actually
using
the
manage
kubernetes
solution,
Spies
of
big
cloud
providers,
because
we
don't
have
that
team,
and
we
also
don't
want
to
have
that
Focus
being
focused
around
like
database
services
to
operate
that,
but
actually
kubernetes
is
not
exactly
kubernetes
it
really
it's
challenging
to.
We
still
need
to
differentiate
between
different
Cloud
providers
in
particular,
but
we'll
go
into
more
details
about
security,
authentication,
authorization
networking.
This
is
probably
the
team's
favorite.
A
So
if
we
did
a
quick
survey
with
the
team
members,
what
I
like
their
favorite
differences,
they
are
load
balancers
for
pretty
much
on
the
top.
We'll
talk
about
storage,
different
kubernetes
version,
handling,
container
runtimes
logging,
cluster
management,
Etc,
let's
see
how
much
we
can
squeeze
in
in
half
an
hour.
A
A
So
in
short,
why
why
do
we
actually
care
so
rankerdb
we're
an
open
source
graph
database?
We
also
support,
like
other
data
models,
such
as
document
full
text,
search
like
graph
analytics,
using
Google's,
Pringle
fabric,
etc,
etc,
but
I
think
the
interesting
part
is
we.
Actually,
we
are
distributed
systems
by
ourselves.
So,
interestingly,
I,
like
our
architecture,
is
similar
to
kubernetes
architecture
itself
and
we
very
early
on
invested
in
integrating
with
the
originally.
A
This
is
also
how
I
got
to
know
a
Wrangler
DB
in
the
mesos
ecosystem
when
Apache
missiles
was
still
around,
but
then
also
we
see
kubernetes
ecosystem
because
we
soon
figured
or
back
then
I
wasn't
at
a
Wrangler.
Db
I
was
still
over
at
mesosphere
as
a
partner,
but
already
back.
Then
it
was
a
big
topic.
A
How
can
we
actually
operate
persistent
services
on
top
of
all
those
containers
systems,
and
so
this
is
where
we
have
cuporango,
also
open
sourced
our
operator
and
then
also
the
managed
service
Oasis,
which
I'll
talk
about
next,
is
actually
also
building
up
on
kubernetes
itself
and
therefore
also
leverages
the
cooperango.
A
Why
do
we
care
actually
like
The
Field's,
a
big
Trend
I've,
been
writing
database
systems
for
the
past?
What
probably
like
10
12
years
and
I
felt
like
in
the
beginning,
we
were
really
targeting
a
static
static
like
setup.
Those
were
like
some
servers
in
a
basement
and
those
servers
were
pretty
stable.
There
was
a
network
switch
in
the
middle
and
we
had
up
times
of
like
a
year
or
something
for
some
of
the
processes
running
there.
A
Nowadays,
it's
actually
a
way
more
Dynamic
infrastructure,
so
we
have
all
those
Cloud
providers,
AWS
telling
us
to
please
relocate
somewhere,
kubernetes
telling
us
to
move
a
part
over
somewhere.
So
for
me,
this
actually
really
this
merging
of
like
Cloud
native
and
databases.
It's
like
one
of
those
coolest
things
happening
right
now,
because
it
can
really
bring
together
all
my
passions,
but
it
really
also
means
internally
in
the
database.
A
We
need
changes,
so
we
are
currently
also
redesigning
some
core
parts
of
the
database
to
deal
more
with
what
I
would
call
like
the
dynamic
infrastructure
aspect,
but
this
is
not
so
much
what
this
talk
is
about.
This
talk
is
more
like
on
the
other
side.
A
How
can
I
then
operate
it,
and
if
we
look
at
just
the
cloud
native
landscape
actually
most
big
databases
nowadays,
they
show
up
here
as
like
a
cloud
native
database
and
also,
as
I
said,
like
from
our
side
most
of
our
customers
nowadays,
even
though,
if
they
don't
use
a
managed
service,
they
are
running
in
some
kind
of
cloud
environment
so
either
using
darker
images
using
the
kubernetes
operator
or
even
if
they
deploy
it.
A
A
For
us,
this
is,
then,
the
part
where
we
have
our
managed
service
for
people
who
don't
want
to
run
it
themselves.
This
is
built,
as
already
mentioned,
on
as
we
manage
kubernetes
solutions
by
the
big
cloud
providers,
allowing
us
to
really
focus
on
what
we
do
well
manage
and
scale
operate
database
systems-
it's
it
has
open
apis.
Terraform
provider
was
just
also
open
sourced
last
week,
and
so
there's
just
a
lot
to
be
playing
with,
and
this
talk
is
now
basically
about
yeah,
what's
happening
down
here.
A
What
are
our
challenges
to
actually
operate
our
infrastructure
across
those
big
cloud
providers
and
making
that
pretty
much
mostly
invisible
to
the
user
out
there?
Why
should
you
care,
I,
mean
I?
Think
all
of
us
we
know
here
know
about
kubernetes.
Here
containers
are
providing
already
great
abstraction.
We
can
move
them
anywhere.
We
can
deploy
them
anywhere,
don't
have
to
care
too
much
about.
What's
underneath,
then
we
get
a
container
orchestration
in
there,
allowing
us
to
actually
do
that
at
scale.
A
We're
scheduling,
resource
management
and
then
also
service
Management
on
top
and
then
the
next
level
of
abstraction
we
are
putting
in
here
is
then,
how
can
we
yeah
run
those
containers,
and
this
is
where
kubernetes
is
really
then
coming
in
abstracting
away
from
the
different
Cloud
providers
everywhere.
A
So
we
already
see
a
lot
of
abstraction
happening
and
maybe
the
last
thing-
and
this
is
a
follow-up
talk-
we
actually
gave
it
the
last
kubecon
it's
about
different
kubernetes
clusters,
so
ex-colleague
of
mine
used
to
say,
like
kubernetes
clusters
are
like
Pringles,
you
can't
just
have
one
we
actually
checked.
We
have
about
Oscar
I.
Think
last
time
we
counted
was
like
a
40
or
so,
including
staging.
So
there's
a
lot
of
different
clusters
going
on
and
we
also
we
are
working
on.
A
How
can
we
actually
migrate
databases
across
clusters,
but
that's
kind
of
the
the
follow-up
talk
to
that
so
yeah
for
us.
We
didn't
want
to
manage
this
kubernetes
ourselves.
A
It
said
like
we
have
40
plus
clusters,
so
it
would
just
be
ridiculous
to
manage
that
ourselves,
so
we
actually
decided
to
use
the
managed
version
by
the
different
Cloud
providers
and
which
actually
is
really
helping
us
a
lot
to
abstract
away
a
lot
of
the
the
challenges
in
operating
that,
but
on
the
other
hand,
it
also
comes
at
a
cost
of
course,
because
it's
managed
by
them.
We
give
up
certain
control.
A
We
cannot
set
certain
parameters,
but
we'll
see
that
during
this
talk,
maybe
the
last
question
before
we
really
dive
into
the
challenges.
Why
should
we
actually
do
multi-cloud
provider?
Isn't
one
cloud
provider
enough?
Maybe
quick
question
in
the
audience
like
how
many
of
you
are
actually
using
different
Cloud
providers
across
like
one
product
or
one
service?
A
Okay,
some
yeah,
but
for
most
probably
using
one
Cloud
proof
provider
is
sufficient
and
it
will
still
simplify
your
life
still.
For
us
it
was
a
requirement
because
it's
actually
a
requirement
of
our
customers.
So
for
many
of
our
customers
it
basically
tell
us
yeah.
We
want
to
run
it
on
a
certain
cloud
provider
or
we
don't
want
to
run
on
another
cloud
provider,
so
this
company
policy
it
can
be
either
in
or
exclusive
Amazon
might
be
a
competitor
for
certain
people
and
then
also
just
yeah.
Where
do
you
keep
your
data?
A
Some
are
also.
They
don't
want
to
have
a
specific
vendor
dependency
and
then
yeah.
There
is
a
flexibility,
but
the
last
point:
it's
also
I
feel
for
a
lot
of
people.
It's
it's
a
buzzword.
So,
even
though
I
talk
about
what
it
means
where's
the
challenges
where
to
look
after
just
always
keep
in
mind.
Do
you
really
need
it
because
it
will
add
to
your
operational
challenges
running
across
different
Cloud
providers?
A
Okay.
Finally,
the
actual
part
of
the
actual
content.
So
what
are
the
challenges?
We've
seen?
I
think
we
briefly
went
over
in
the
beginning,
so
let's
just
jump
right
in
and
first
kind
of
the
tldr
blowing
off
some
steam
slide.
So
on
Amazon
on
eks,
one
I
would
say
like
our
biggest
challenge,
our
biggest
complaint.
If
we
would
write
them.
A
letter
is
probably
like,
if
I'll
see
a
resource
management
like
create,
they
create
a
lot
of
resources
on
the
fly.
A
So
if
you
create
a
certain
instance,
they
basically
create
a
lot
for
that
behind
you
and
that
could
be
okay,
but
it's
actually
it
gets
harder
if
you
have
to
remove
it,
because
you
have
to
follow
certain
orders-
and
this
is
always
not
intuitive
like
which
order
it
is,
and
so
it
makes
the
removal
challenging
we've
kind
of
I
think
we
now
have
that
pretty
much
under
control,
but
in
the
beginning
it
actually
used
to
be
encoding.
All
of
that
was
was
quite
some
effort.
A
Not
all
recess
resources
have
tax,
also
playing
in
there
and
then
the
error
handling
is
a
bit
Harding
of
of
strings
needed
there
for
us
to
actually
differentiate
between
different
error
cases.
Google
is
actually
better
in
those
aspects.
It
also
feels
newer,
so
I
think
a
lot
of
those
aspects.
If
you
actually
look
at
the
history,
when
was
when
were
those
Services
built,
that
makes
a
difference,
so
Google
and
Azure
say
feel
a
much
newer
and
so,
for
example,
what
Google
What
their
most
annoys.
A
Us
is
probably
the
aggressive
update
policy
of
the
kubernetes
Clusters,
so
I
mean
we've
run
and
managed
service
ourselves,
and
from
that
perspective
we
would
love
to
force
people
to
upgrade
from
one
version
to
the
other
as
soon
as
possible,
because
it
simplifies
our
operational
life.
But
of
course,
on
the
other
hand,
if
Google
is
forcing
us
to
upgrade
a
kubernetes
cluster,
it
makes
Oscar's
life,
then
Oscar,
State
lights
up
Kirsten
might
always
be
some
challenges
in
there.
A
Azure
really
I
think
the
biggest
child
is
they've,
really
greatly
improved.
So
again,
it's
you
still
feel
that
it's
a
nearer
newer,
Service,
but
especially
over
the
last
two
years,
I
feel
it
has
greatly
improved
and
a
lot
of
small
issues.
We
were
seeing
still
like
two
years
ago.
Probably
the
biggest
complaints
are
still
like
limited
limited
resource
quarters.
So,
for
example,
I
think
it's
now
30
VM
scale
sets
we
can
have,
and
this
is
just
across
different
environments.
A
This
is
just
too
little
for
us,
a
slow,
persistent
volume
attachment
and
then,
as
the
autoscaler
across
now.
It's
also
not
that
bad
anymore.
A
Okay,
first
point
resource
creation.
So,
as
already
mentioned,
we
tried
to
look
through
like
what
are
we
actually
creating
when
we
set
up
like
one
of
those
Wrangler
DB
clusters,
or
one
data
cluster
plus
and
a
Wrangler
DB
database
cluster.
So
on
Amazon
we
set
up
a
VPC
internet
gateway,
net,
Gateway,
subnets,
routing
tables,
Etc
security
groups
and,
of
course,
also
an
eks
cluster
and
yeah
they're.
A
Probably
our
biggest
challenges
is
that
there
are
a
lot
of
resources
created
like
on
the
fly
as
a
dependency
and
those
removal
with
dependencies.
In
the
beginning,
it
wasn't
easy
because
we
had
to
learn
a
lot
of
things,
the
hard
way
when
they
weren't
removed
when
we
got
arrows
on
removal,
because
also
not
everything
is
properly
tacked
on
Google
and
also
Azure.
This
is
a
bit
simpler
on
Google,
it's
basically,
we
create
the
VPC
zgke
cluster
into
node
pool.
A
A
So
probably
on
Azure
I
said
like
we
still
sometimes
face
a
challenge
that
we
can't
create
as
many
resources
as
we
would
like.
So
the
number
of
VM
skill
sets
already
increased
over
the
last
years.
I
think
it
used
to
be
10
a
year
or
yeah
roughly
year,
one
and
a
half
years
ago,
so
30
is
already
much
better
but
still
quite
limited,
and
we
also
see
that
across
some
of
the
other
resources
manage
kubernetes.
A
This
is
probably
not
so
much
about
the
different
Cloud
providers,
but
it's
something
to
keep
in
mind.
If
you
want
to
go
in
and
use
a
manage,
kubernetes
Solutions,
it's
great.
It
actually
takes
a
lot
of
your
plate,
but,
on
the
other
hand,
comes
with
some
other
challenges,
so
there
are
forced
upgrades.
A
A
This
is
always
bad
because
you
want
control
about
like
how
you
move
your
volumes,
how
you
move
your
database
servers
of
course,
yeah
you
have
to
stay
to
manage,
then
it's
also
the
availability
of
different
versions
and
probably
like
access
to
like
kubernetes
API
server
options,
so
also,
then,
adding
to
that
or
including,
like
different
command
line,
options
which
we
just
cannot
set,
but
we
would
like
to
set
so,
for
example,
the
authorization
web
hook
or
other
things.
A
Where
simply
we
don't
have
that
control
because
we're
using
a
managed,
kubernetes
service,
okay,
authentication
authorization,
basically
quite
varying
across
different
Cloud
providers,
and
we
always
it's
kind
of
like
a
continuous
thing.
There
are
a
number
of
different
open
source
or
proprietary
solutions
for
that
who
claim
like
we'll.
Do
it
the
same
everywhere
so
far
when
reviewing
a
number
of
those
open
source
projects
they
actually
were
insecure.
A
After
all,
and
not
really
meeting
our
requirements,
keep
in
mind
again
that
on
managed
kubernetes,
you
not
necessarily
can
set
all
the
options
to
use
them.
So
currently,
our
solution,
for
that
is,
we
use
service
accounts
plus,
like
the
Oasis
or
the
Wrangler
DB
Cloud
authentication
system
kind
of
interlocking,
with
each
other
logging
in
order
audit
log.
A
So
again,
this
is
kind
of
each
cloud
provider
has
their
own
sync,
but
with
grafana
Loki
I
think
we
are
now
in
in
a
pretty
good
State
to
capture
those
things
abstracted
away
from
the
different
Cloud
providers.
So
I
think
there
you
need
to
just
kind
of
find
a
solution,
but
there
are
enough
mature,
open
source
Solutions
out
there,
which
can
help
you
with
that.
A
Okay,
now
we
are
getting
more
choosy
kind
of
Hardcore
Parts
in
there
and
that
would,
for
example,
be
storage,
so
storage,
again,
probably
more
important
for
us
than
if
you're
running
a
state,
stateless
Service,
just
some
front-end
containers,
but
for
us,
especially
like
a
storage
performance
of
course,
is
critical.
So
we
did
quite
some
extensive
studies
in
the
beginning
about
like
different
performance
and
how
we
can
then
abstract
it
away
for,
for
our
users,
basically
tells
them
hey.
A
We
have
different
performance
tiers
and
we
try
to
keep
those
performance
tier
roughly
comparable.
For
example,
iops
throughput
Etc
across
different
Cloud
providers
so
and
yeah.
This
took
like
some
some
playing
around
and
it
really
differs
across
different
Cloud
providers
or
even
in
between
different
Cloud
providers,
so,
for
example,
configurable
iops
across
different
volume,
types
that
at
AWS
some
you
can't
or
other
Cloud
providers,
then
also
on
AWS
itself,
and
this
is
probably
more
kind
of
one
of
those
things
where
you
yeah.
A
You
still
need
to
be
aware
of
what
you're
using
so,
for
example,
gp3
still
requires
a
known
a
controller.
Are
we
still
using
a
known
controller
whereas
for
gp2
volumes?
It's
just
the
entry
stuff
and
we'll
see
this
pattern
also
with
other
parts,
but
that's
just
again.
A
So
just
a
little
bit
violates
this
kubernetes
as
an
abstraction
layer,
or
it's
kind
of
like
installing
special
drivers
for
special
Hardware,
and
you
have
to
figure
out
what
you
need
on
which
cloud
provider,
and
probably
it's
just
from
an
operational
perspective.
This
just
means
for
us.
We
have
to
monitor
and
operate
those
things
differently:
external
storage,
again
different
form
between
different
Cloud
providers.
A
So
also
just
put
that,
like
explicitly,
there
is
a
blob
storage,
it's
just
for
our
backup,
which
we
write
to
those
also
across
different
regions.
So
again
there
we
actually
need
to
have
a
proprietary
solution
or
multiple
implementations
for
different
Cloud
providers.
A
Networking,
as
mentioned
when,
when
I
did
a
quick
survey
with
the
team
load,
balancers
actually
turned
out
to
be
the
most
favorite
like
high
highly
ranked
different,
most
annoying
difference
between
the
different
Cloud
providers.
So
just
in
terms
of
setup.
So,
for
example,
what
is
all
supportive?
What
what
do
we
need
for
setup,
DNS,
Etc?
What
is
being
supported
in
in
terms
of
it?
What
is
the
performance?
What
is
the
different
timeouts?
Are?
A
They
actually
vary
quite
a
lot
again,
first
of
all,
between
different
Cloud
providers,
but
then
also
especially
AWS
offers
multiple
choices.
So
they
think
this
is
a
pattern
we've
been
seeing
over
and
over
is
AWS
is
simply
the
let's
say,
most
mature
one
to
frame
it
like
that.
Are
they
often
offer
multiple
choices
with
different
characteristics?
A
So
then,
you
have
to
choose
again
what
you
need,
and
so
here,
for
example,
it's
also
that
we,
some
of
them,
actually
require
a
special
controller
where,
whereas
others
are
in
in
tree
available
just
readily
available
internal
network.
Also,
it's
a
different.
So
internal
network
is
like
what
is
if
we
transfer
data
between
different
regions-
and
this
really
varies
on
what
is
supported.
A
So,
for
example,
AWS
is
offering
a
Transit
gateways,
VPC
peering,
but
that
still
means
we
have
to
be
aware
of
how
we
transfer
data
in
between
nodes,
so,
for
example,
for
multi-region
backups
or
then
also
when
we
are
now
working
on
like
the
cluster
migration
between
different
regions.
A
Private
endpoints
basically
means
if
our
customers
are
running
their
applications,
also
on
AWS,
they
can
use
a
private
endpoint,
and
so
the
data
traffic
is
just
staying
within
the
AWS
or
Azure
or
other
network,
and
so
they're,
for
example,
also
differences
with
AWS
those
private
endpoints
only
work
within
one
region
only,
whereas
with
Azure
and
Google
they
work
across.
So
that
was
also
learning
when
we
initially
were
working
with
that.
A
Networking
for
multi-tenancy,
so
the
Oasis
data
clusters.
So
where
see
data
the
database
is
actually
running,
they
are
multi-tenant.
A
So
on
one
kubernetes
cluster,
we
might
have
multiple
Rango
DB
clusters,
and
so
we
still
need
isolation
so,
especially
because,
like
a
single
VPC
is
used
for
the
entire
data
cluster
due
to
one
kubernetes
cluster
only
running
within
one
VPC,
so
we
have
a
strong
need
for
Network
separations,
obviously,
and
so
our
solution
for
that
is
psyllium
I,
think
we
covered
that
a
bit
more
in
our
kubecon
Talk.
What
we
do
there
with
psyllium,
how
we
use
it,
but
basically
have
been
pretty
happy.
A
Initially,
it
was
a
bit.
The
support
for
different
Cloud
providers
was
a
bit
varying.
So,
for
example,
especially
Azure
was
not
as
well
supported,
but
that
has
also
really
improved
over
the
last
year.
So
by
now
we
actually
feel
it
works
pretty
much
the
same
across
all
Cloud
providers.
This
is
just
I
think
the
mess
of
the
message
from
this
slide
should
be.
A
A
Okay,
what
about
on-prem
I
think
I'll
just
briefly
skip
that
over,
but
we
just
we
looked
into
different
different
solutions,
on-prem
as
well
and
actually
like
even
those
certified
kubernetes.
You
also
end
up
with
difference
between
all
those
different
kubernetes
Solutions
out
there
will
next
year
get
bet
any
better,
not
so
sure
one
thing
I
think
which
is
interesting.
There
is
like:
where
are
we
going
to
end
up
with
all
the
different
container
runtimes?
A
What
does
it
mean
for
us
also
in
terms
of
security
again
a
set
like
for
us?
It's
really
crucial
to
have
isolation
in
between
different
deployments
in
between
different
containers.
So,
of
course
also,
those
container
runtimes
are
interesting
for
us
and,
let's
see
if
there
is
going
to
be
support
on
the
managed
kubernetes
or
how
we
are
moving
moving
forward
forward
over
there.
A
Actually
I
was
a
bit
quicker
than
I
thought,
but
thanks
for
listening,
I'm
happy
to
jump
back
to
any
of
the
slides
for
questions
or
or
any
feedback,
but
as
we
have
five
minutes,
maybe
the
questions
amongst
the
people
who
just
raised
their
hand.
Where
do
you
have
the
biggest
like
pitfalls
in
between
different
between
different
Cloud
providers?.