►
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
All
right
welcome
to
my
session
on
building
a
manage
database
service
using
kubernetes
operators
before
getting
into
the
weeds
I
want
to
start
off
explaining
who
I
am
and
why
you
should
listen
to
me.
So
my
name
is
Jimmy
zilinski
I'm,
the
founder
of
a
company
called
auth,
said
at
offset.
We
build
a
database
called
spice
DB
and
what
spice
DB
does?
Is
it
stores
your
authorization
data?
So
when
you
build
a
permission
system
for
your
applications,
eventually,
you
hit
a
couple
different
types
of
problems.
A
So,
despite
my
background
being
in
product,
I
still
write
code
every
day
and
carry
a
page
over
the
services
that
I
build
prior
to
founding
onset
I
worked
at
a
company
called
coreless
which
got
acquired
by
a
red
hat
and
at
coralesce
I,
actually
co-created
a
cncf
project
called
the
operator
framework
alongside
some
other
members.
Now,
members
of
the
LCD
team
and
what
the
operator
framework
lets
folks
do
is
basically
more
easily
build
operators,
so
they
can
customize
kubernetes
and
extend
it
in
ways
that
make
sense
for
running
their
their
domains.
A
A
All
right
to
level
set
the
for
the
talk
before
anything
I,
always
like
to
kind
of
Level
Playing
Field,
make
sure
that
everyone
is
using
or
understands
the
same
terminology
or
what
we're
talking
about
before
diving
right
in
so
to
be
able
to
even
discuss
this
topic,
we
have
to
kind
of
cover
two
major
subjects
right.
The
first
is
what's
a
managed
database.
Our
first
in
the
second
is
what
are
kubernetes
operators
so
manage.
Database
service
is
going
to
be
pretty
much.
A
You
Outsourcing
the
operational
side
of
a
database
to
a
particular
provider.
So
instead
of
you
spinning
up
a
database
and
managing
on
top
of
your
own
Hardware
or
even
Cloud
Hardware,
this
is
going
to
be
someone
else
doing
that,
for
you
I'm
purely
giving
you
the
details,
you
need
for
your
application
to
connect
to
that
database
and
then
they're,
basically
out
of
the
out
of
the
way
for
that.
So
you
don't
have
to
maintain
a
pager
or
anything
like
that
to
make
sure
that
the
database
is
operational
and
able
to
serve
traffic.
A
There's
kind
of
two
different
types
of
providers
that
you
can
Outsource
to.
There
are
Cloud
providers
that
obviously
have
the
expertise
in
running
software
on
top
of
Cloud
environments,
so
examples
of
that
would
be
Amazon
RDS
and
Google
Cloud
platforms,
Cloud
SQL.
These
providers
offer
the
typical
relational
databases,
but
they
also
have
Individual
Services
for
more
specialized
databases.
A
The
other
type
of
expert
that
you
can
Outsource
this
to
are
the
actual
database
providers
themselves,
so
folks
like
cockroach
Labs,
so
on
cockroachdb
dedicated
and
my
own
company
outside
selling
supposed
to
be
dedicated,
but
there
are
plenty
of
other
database
providers
in
the
space
that
also
do
similar.
Elasticsearch
and
redis
also
come
to
mind
as
kind
of
examples
of
these
database
provider.
A
Experts
that
offer
these
types
of
services
all
right.
So
then,
what
are
kubernetes
operators,
so
operators
are
custom
controllers
for
kubernetes
that
encode
application
specific
logic,
so
that
basically
means
extending
the
kubernetes
API
and
teaching
it
about
effectively
new
Concepts
that
are
specific
to
your
domain.
A
The
point
to
all
of
this
is
to
basically
effectively
improve
how
kubernetes
is
able
to
handle
running
that
application,
but
even
more
greater
concept
is
actually
encoding
your
domain
into
kubernetes,
so
that
the
kubernetes
control
plane
actually
becomes
the
central
interface
for
everything
it
becomes
the
source
of
truth,
and
you
can
always
use
the
standard
tools,
like
your
your
dashboard
or
Coupe
control,
to
query
that
and
understand
what
is
running
it
in
production
all
right,
so
without
further
Ado
I'm,
going
to
talk
about
my
anecdotal
experience,
building
supposed
to
be
dedicated.
A
The
reason
why
I'm
going
to
use
this
is
not
only
familiarity,
but
also
because
we've
actually
built
the
service
semi-recently.
There
are
a
lot
of
other
managed
database
services
that
are
probably
built
on
top
of
kubernetes,
but
because
of
the
recency
of
this
I
think
it's
probably
more
applicable
to
someone
looking
to
build
a
similar
service
today,
if
they're
trying
to
do
that
for
building
their
own
product
or
just
building
a
platform
engineering
team
internally
at
their
business.
A
So
the
rest
of
the
talk
is
kind
of
be
basically
me
describing
the
system
we've
built
kind
of
like
this.
The
decision
making
process
we
went
through
and
kind
of
the
way
we've
kind
of
divided
things
up
and
how
we
think
about
the
different
problems
that
we
had
to
solve.
A
So
at
a
10
000
foot
view
we
can
kind
of
break
down
this
problem
into
three
major
phases:
provisioning,
runtime
and
then
the
day
two
operations,
the
provisioning
side
is
going
to
be
kind
of
how
we
create
the
customer
environments.
A
A
This
is
actually
pretty
subtle
when
you're
trying
to
understand
what
things
should
need
to
be
updated
with
the
life
cycle
of
kubernetes
itself
versus
things
that
can
be
iterated
on
with
changes
to
the
application,
so
that
that's
one
of
the
subtle
aspects.
Another
big
one
is
how
you're
going
to
promote
changes
to
these
different
customer
environments.
A
How
are
you
going
to
roll
out
kubernetes
updates
or
any
changes
to
the
aforementioned
cluster?
Configuration
and
how
are
you
going
to
do
that
in
a
way
that
can
be
Progressive
and
so
that
your
customers,
if
whether
they
have
maintenance
windows
or
they're,
very
sensitive,
to
updates?
If
you
can
get
the
updates
at
the
regular
Cadence
that
they're
expecting
and
we
move
on
to
the
runtime
phase?
The
runtime
phase
is
about
basically
what
things
have
to
be
running
live
when
customers
are
using
these
systems.
A
This
is
where
the
managed
database,
as
a
service
kind
of
differentiates
from
a
lot
of
other
workloads
that
you
might
be
running
on
kubernetes,
because
the
customers
are
actually
going
to
be
modifying
the
the
cluster
itself
in
real
time,
and
that
basically
means
that
we
need
to
be
able
to
not
only
manage
our
own
configuration
but
also
be
able
to
respond
to
end
users
deciding
they
want
to
take
operations
like
scaling
their
database
cluster
up
or
down.
A
A
So
all
these
things
kind
of
complicate
the
actual
production
runtime
of
of
the
system
and
basically
being
able
to
run
a
service
in
a
way
where,
when
different
events
like
scaling
up
or
scaling
down
happens-
or
you
lose
a
node
in
kubernetes
to
be
able
to
handle
that
in
such
a
way
that
you
don't
lose
any
performance
or
drop,
any
requests
is
very
tricky
and
something
that
every
database
of
the
service
is
going
to
need
to
manage,
because
you
can't
necessarily
make
changes
in
the
application
code
that
is
talking
to
your
database.
A
A
Specifically,
as
I
said,
customers
can
modify
these
environments,
so
that
means
we
not
only
have
to
be
able
to
reproduce
our
clusters,
but
also
be
able
to
reproduce
the
state
that
the
customers
changed,
and
then
we
also
have
to
power.
Our
own
operational
workload,
so
that
means
we
need
to
be
able
to
aggregate
metrics
across
customers
who
need
to
understand
the
health
and
state
of
the
customers.
A
Page
are
engineers
and
things
like
that
when
something
is
going
wrong
on
a
customer
environment,
so
I'm
going
to
dive
deeper
into
provisioning,
now
I'm
going
to
list
out
some
of
the
Technologies
and
some
of
the
Core
Concepts
that
we've
chosen
to
go
with
I
would
say
a
lot
of
these
different
Technologies
are
personal
choice.
I'm
not
saying
you
should
choose
one
over
the
other,
but
I'm
going
to
include
why
we
ended
up
with
the
ones
that
we
have.
A
But
these
reasons
are
kind
of
organization
specific,
and
if
you
have
a
company,
for
example,
that
has
a
ton
of
terraform
expertise,
go
ahead.
Use
terraform,
I.
Think
that's
going
to
be
a
better
choice
for
you.
If,
if
that's
what
you
have
the
expertise
in
at
your
company,
but
for
us,
for
example,
we
picked
palumi,
we're
very
comfortable
writing
go
code
and
we
actually
wanted
to
build
ultimately
one
binary
that
is
kind
of
our
infra
program
that
can
manage
all
kinds
of
different
things.
A
A
So
that's
why
we
ended
up
picking
palumi
for
actually
reconciling
configuration
on
a
cluster
we
use.
A
Arcocd
flux
is
another
example
of
a
cncf
project
that
also
kind
of
does
this
continuous
deployment
we
ultimately
aligned
it
on
Argo,
specifically
because
it
has
a
nice
web
UI
for
checking
the
health
of
all
the
environments,
but
also
it
has
kind
of
nice
functionality
for
actually
applying
the
changes
like
dry
runs
and
pruning,
and
you
can
actually
write
Lua
to
to
kind
of
extend
Argo
in
some
scenarios
where
we
specifically
for
when
you're,
creating
operators
you're
going
to
create
custom
kind
of
definitions
of
healthiness
in
the
status
fields
and
Argo
can
be
extended
with
Lua
to
actually
understand
those
to
know
whether
a
a
custom
resource
that
you've
actually
created
for
your
operator
is
healthy
or
not.
A
So
that's
super
useful
functionality
there
for
the
actual
configuration
we
use
on
the
cluster
itself.
We
use
customize.
We
previously
used
queue
a
lot,
but
we
ultimately
migrated
to
customize,
because
it
was
really
easy
to
structure
integrates
directly
with
Coupe
cuddle,
so
our
Engineers
have
to
install
any
additional
tooling
it's
way
easier
to
onboard
Engineers,
because
if
you
understand
kubernetes,
you
probably
a
kubernetes
yaml
at
least
manifests.
A
You
are
going
to
understand
using
customize
to
some
degree
and
it
lets
us
actually
really
reuse
a
lot
of
tools
off
the
shelf
because
you
can
kind
of
point
to
any
manifest
in
a
git
repository
and
use
that
as
a
reference
and
kind
of
extend
that
using
customize.
So
as
we
adopt
more
and
more
of
the
kind
of
standard
Community
tools,
we
can
kind
of
just
point
customize
to
those
tools
and
get
them
vendored
almost
for
free
or
with
very
little
notification
and
yeah.
A
If
you're
using
Q,
you
can
have
to
do
all
the
legwork
of
kind
of
importing
and
transpiling,
basically
a
yaml
into
queue
and
you're
kind
of
on
your
own
for
a
lot
of
a
lot
of
the
tooling
and
structure
but
I.
Imagine
some
of
that
will
change
over
time.
So
it's
not
necessarily
kind
of
dry.
If
you're
watching
this
video
six
months
from
now,
maybe
the
the
state
of
the
world
for
Q
has
improved
dramatically.
So.
A
Finally,
we
also
use
GitHub
actions
and
we
use
GitHub
actions
mostly
because
we
can
automate
a
bunch
of
the
GitHub
apis
for
opening
and
emerging
pull
requests,
and
that
ties
very
much
so
into
the
concepts
I
want
to
talk
about
the
high
level
Concepts
that
we
have
are
largely
around
kind
of
our
promotion
process,
which
we
call
the
ring
model.
The
ring
model
is
specifically
about
basically
bucketing
customers
into
groups
of
stability,
so
that
we
can
slowly
roll
out
changes
one
phase
at
a
time
to
that
bucket
of
customers.
A
So,
for
example,
what
we
actually
do
is
we
have
a
staging
instance
and
the
staging
instance
gets
every
change
pushed
to
it
as
part
of
a
continuous
deployment.
And
then,
when
things
look
good,
we
promote
that
to
what
we
call
ring
zero
and
ring
zero
as
other
testing
environments,
whether
it's
doing
performance
testing
or
just
staging
environments.
At
all
said,
then,
what
what
happens
is
once
that
kind
of
passes
the
Q8
there?
A
Then
we
actually
run
it
to
ring
one
which
would
be
our
rapid
kind
of
released
phase,
so
customers
that
have
adopted
into
getting
updates
sooner,
but
potentially
less
stable
releases
and
then
kind
of
so
on
and
so
forth.
We
promote
to
ring
two
which
is
more
stable
and
then
ring
three
which
is
more
stable,
Etc.
A
A
This
way,
so
we
know
it
scales
because
it's
being
used
by
big
companies
like
Microsoft
so
and
finally,
we
have
kind
of
get
Ops,
but
githubs
by
Bots
is
kind
of
how
I
want
to
talk
about
it,
because,
while
githubs
was
great,
making
changes
in
some
of
these
repositories
can
be
very
verbose
and
error
prone.
It
can
take
a
really
long
time,
so
what
we
actually
do
is
we
have
automations
all
around
it,
so
you
can
manually
kind
of
Click
from
a
drop
down
to
say.
A
I
want
to
promote
this
ring
to
this
ring
and
then
box
handle
the
rest.
So
you
get
kind
of
like
the
the
benefits
of
having
everything
checked
in
to
get
and
if
you
had
to
manually
override
anything,
you
could
but
also
a
lot
of
the
error
prone
side
of
copying
and
pasting
specific
versions
into
specific
places,
all
automated
away.
So
in
the
general
case,
you
pretty
much
don't
have
to
open
your
Editor
to
to
make
the
changes
that
you
want
to
see
propagated
to
the
system.
A
So
here
is
a
drawing
of
our
customized
configuration.
We
kind
of
split
it
into
three
top
level.
Folders
We
have
the
bases,
the
features
and
the
overlays.
If
you're
familiar
with
customized
overlays
are
typically
used
for
the
kind
of
end
results.
That's
going
to
be
a
renderable
thing
that
you
can
actually
apply
to
a
cluster,
so
we
have
a
Dev
one
or
actually
variations
of
Dev
ones,
and
then
we
have
kind
of
customer
specific
ones,
the
customer
specific
ones.
We
keep
in
a
separate
repository.
A
The
info
repository
that
tracks
all
the
customer
environments
and
the
Devlin
lives
in
our
mono
repo,
alongside
the
configuration
itself,
but
then
overlays
are
composed
of
at
least
one
or
more
base
and
then
a
set
of
features.
So
examples
of
features
are
like
postgres
database
or
ECR
for
getting
your
your
images
on
this
cloud
provider
or
GCR
if
you're
using
Google
Cloud.
A
So
we
actually
break
everything
down
into
these
different
features
that
you
can
then
compose
together
and
to
actually
build
a
working
system,
and
then
the
bases
are
kind
of
like
the
base
layout
for
a
cluster
that
installs,
the
things
that
we
want
to
assume
are
always
going
to
be
there.
So
in
the
the
actual
like
regular
cluster
base,
we
have
basically
the
monitoring
stack,
that
we
want
to
use
to
deploy
to
absolutely
every
cluster,
to
make
sure
we
kind
of
have
a
baseline
of
understanding
the
health
of
every
cluster.
A
That
is
not
specific
to
any
workload
that
we
deploy
to
it.
This
gets
used
both
on
a
an
info
cluster
that
we
run
a
centralized
for
our
infra,
like
operations
team,
but
also
then,
on
all
the
customers
as
well.
But
then
we
also
have
this
Dev
base,
and
this
Dev
base
is
basically
filling
the
gap
between
something
like
a
Docker,
desktop,
kubernetes
or
kind,
and
making
that
exactly
similar
to
what
we
get
when
we
run
palumi
to
generate
a
cluster
on
a
cloud
provider
for
an
actual
production
environment.
A
So
that
kind
of
fills
the
gaps
there
so
that
the
Clusters
look
exactly
the
same.
They
have
the
same
starting
base.
Then
we
apply
the
base
and
then
we
apply.
Whatever
features
are
specific
to
that
that
environment,
so
here
is
kind
of
the
architecture
of
the
git
Ops
pipeline
in
our
monorepo,
as
I
said,
we
have
a
configuration
that
lives
in
there.
A
That
makes
it
so
developers
can
iterate
on
configuration
and
also
the
code
for
the
different
projects
and
kind
of
spin
that
Stack
Up
locally
and
running
on
their
machine
and
test
everything
out.
And
then,
when
that
looks
good
that
gets
committed
to
a
monorepo
and
then
what
happens?
Is
we
have
this
other
info
repo
which
tracks
customer
environments
and
customer
environments
are
actually
organized
into
rings
and
then
those
Rings
reference,
a
specific,
commit
Sha
of
the
mono
repo.
A
So
that
you
can
actually
point
it
to
a
particular
snapshot
of
the
configuration
of
the
monorail
at
a
point
in
time,
so
that's
how
we
kind
of
get
basically
all
the
version
tracking
and
the
ability
for
us
to
promote
different
versions
of
the
configuration
to
different
customer
environments.
Inside
of
that
infra
repo.
We
also
have
the
binary
that
manages
palumi
and
that's
what's
going
to
provision
the
individual
clusters.
A
We
have
configuration
files
for
each
each
customer
environment
and
there
as
well
so
that's
kind
of
the
the
Central
Central
source
of
what
is
represented
in
production.
Every
cluster
is
also
deployed
into
its
own
cloud
provider
account.
So
if
you're
running
on
Amazon
each
customer
runs
in
an
AWS
account,
that's
individual
to
that
that
particular
customer.
That's
just
the
level
of
isolation
we've
chosen
for
the
system,
but
that's
not
necessarily
A
Hard
requirement
for
for
every
managed
database
as
a
service,
we're
just
a
security
product.
A
So
we
take
kind
of
isolation
a
bit
more
seriously
than
a
lot
of
other
people.
So
then,
finally,
we
have
our
centralized
infra
kubernetes
cluster.
This
is
what
runs
Argo.
It
runs
Thanos
so
that
we
can
actually
collect
metrics
and
and
query
and
understand
the
runtime
of
our
customer
environments,
but
what
Arco
is
going
to
be
doing?
Is
it's
going
to
be
pulling
the
infrar
repo
and
asserting
that
each
of
the
customer
environments
is
synchronized
to
the
proper
state
that
the
customer
environment
is
configured
for?
A
So
it
does
make
sure
that
if
there's
anyone
that
logs
into
a
machine-
and
they
are
debugging
something
if
they
skew
the
configuration
it's
going
to
be
restored
eventually
by
Argo
that
way,
even
if
a
machine
gets
get
compromised,
we
kind
of
have
something
that's
going
to
reset
the
cluster
and
basically
make
sure
that
nothing
is.
Nothing
is
the
way
it
shouldn't
be.
So
that's
our
high
level
of
kind
of
the
git
Ops
workflow.
We
have
time
to
move
on
to
the
runtime
environment
in
the
runtime.
A
We
have
built
two
custom
operators,
so
this
is
going
to
be
the
kind
of
the
with
kubernetes
operators
portion
of
the
talk
which
is
kind
of
the
meat
and
potatoes.
So
we
have
decided
to
split
basically
our
system
into
two
different
operators.
A
We
make
those
open
source
because
we
want
our
customers
or
any
open
source
users
to
also
be
able
to
operationalize
and
runs
by
Stevie,
just
as
good
as
we
can.
So.
This
includes
scaling
space
DB,
making
sure
that
it
doesn't
drop
traffic,
making.
Sure
Space
TV
knows
how
to
basically
self-cluster
it
handles
running
migrations
of
the
data
changes
across
versions.
A
It
makes
sure
that
it
has
an
update
graph
and
make
sure
that
you
go
from
a
supported
version
to
a
supported
version
and
basically
assures
you
that
you
have
zero
downtime
as
you
go
through
the
upgrade
process.
So
this
kind
of
logic
all
lives
inside
the
space
DB
operator,
and
then
what
we
have
is
is
the
auth
Z
operator,
which
is
our
proprietary
operator,
and
this
includes
automations
that
largely
are
reliant
on
assumptions
about
how
we've
laid
out
our
clusters.
A
So
if
the
functionality
is
tightly
coupled
to
opinions
and
decisions
for
how
to
run
a
kubernetes
cluster,
then
we
keep
it
in
the
proprietary
one.
So,
purely
because
it's
not
applicable
to
anyone
else's
deployment,
it's
only
applicable
to
ours.
A
A
So
what
we're
actually
doing
is
making
it.
So
when
a
user
logs
into
the
the
dashboard
for
a
space
to
be
dedicated,
they're,
actually
seeing
a
view
of
kubernetes
and
the
resources
that
live
on
the
cluster
and
when
they
say,
for
example,
choose
to
create
a
new
space.
Db
cluster
they're,
actually
talking
to
a
JavaScript
application
that
is
going
to
talk
to
the
kubernetes
API
and
create
custom
resources,
and
that
is
how
the
core
of
everything
is
functioning.
A
It's
all
using
kubernetes
as
the
source
of
Truth
and
then,
of
course,
we
kind
of
have
all
the
additional
tooling
that
compose
our
kind
of
opinions
for
how
to
run
kubernetes.
That's
using
things
like
Contour
or
insert
manager
and,
like
the
Prometheus
operator,
I,
think
things
like
these,
so
at
the
core
of
the
concepts
of
our
runtime
include
basically
centralizing
everything
into
the
kubernetes
control
plane.
You
want
to
use
that
as
your
source
of
Truth.
It
makes
it
a
convenient
API
for
for
managing
all
these
things.
A
For
us,
our
data,
like
the
actual
control
plane,
that's
being
used
for
our
customers
to
make
changes,
is
one
in
the
same
with
the
control
plane
that
our
operations
team
is
managing.
So
that
gives
us
a
convenient
way
to
to
basically
interact
with
the
system.
We
don't
have
to
build
some
kind
of
admin
interface
into
the
dashboards
that
we
can
kind
of
get
our
operations
team
access
to
the
customer
control
plane.
A
Now
there
it's
just
one
in
the
same
control
plane
for
us,
so
that's
where
a
lot
of
the
benefits
come
from,
but
also
the
the
power
of
the
operators
being
that
the
customer
driven
changes
actually
live
also
in
the
cluster.
So
this
is
what's
enabling
the
fact
that
a
customer
can
log
in
start
making
changes
to
the
infrastructure
and
those
can
apply
immediately,
because
all
those
automations
are
not
a
living
operator
that
has
to
get
paged
and
go
to
the
cluster
and
make
a
change
to
it.
A
Some
of
these
namespaces
get
applied
to
absolutely
every
cluster,
and
some
of
these
namespaces
are
exclusive
to
a
particular
cluster.
So
the
awesome
monitoring
namespace,
for
example,
gets
deployed
to
absolutely
all
clusters
that
we
run.
This
includes
kind
of
like
the
all
the
infrastructure.
We
need
for
paging,
alerting
doing
metrics
tracing
applications
that
are
running
on
the
on
the
cluster
base,
and
this
goes
all
the
way
to
non
kind
of
spice
DB
customer
clusters.
A
So
this
is
fully
generic
and
can
be
reused
across
the
company,
but
then
gets
specialized
by
kind
of
the
resources
that
get
created
in
other
namespaces.
So
then
we
have
the
lsed
system
and
I'll
set
a
region.
So
the
difference
between
these
two
are
the
system
is
what
I
would
say?
Is
the
customer
facing
control
plane
so
multiple
in
region,
in
customer
environments,
where
they
actually
are
running
in
multiple
regions?
So
say
you
have
a
Europe
and
a
North
America
kubernetes
cluster
deployment,
so
you
have
two
individual
clusters.
A
What
ends
up
happening?
Is
you
pick
one
as
your
control
plane
and
that's
where
the
offset
operator
runs?
That's
where
the
dashboard
runs,
anything
that's
kind
of
driving.
The
information
on
the
dashboard
is
going
to
live
there
and
what
ends
up
happening
is
when
you
choose
to
provision
something
there.
The
offset
operator
actually
understands
the
configuration
for
the
other
regions
that
make
up
the
customer
environment
and
it
will
create
resources
in
the
appropriate
cluster.
A
So
onset,
a
region
is
kind
of
the
thing
that
standard
Tides
standardizes
a
cluster
to
be
able
to
run
spice
DP.
So
primarily
it
has
the
spice
to
be
operator
in
it
and
that's
going
to
sit
there
and
watch
for
the
request
to
create
clusters
or
make
changes
to
clusters
that
the
offset
operator
is
then
going
to
create
as
a
reaction
to
a
customer
making
a
change
in
a
dashboard,
and
it's
going
to
create
those
clusters
inside
of
the
tonight
namespace.
So
the
tenant
namespace
is
where
all
the
kind
of
runtime
customer
data
is.
A
This
is
where
the
systems
they've
proficient
live.
It's
the
one
that
the
operations
team
is
mostly
going
to
be
inspecting,
because
these
are
these
places
where
the
customers
are
actually
live,
making
changes.
This
is
what
we
typically
focus
on
for
backing
up
data
like
customer
specific
configuration,
the
things
that
they
have
actually
changed
on
the
system.
A
Every
other
kind
of,
like
smaller
namespace
in
here
are
the
kind
of
cluster
dependencies,
so
we
use
the
Prometheus
operator
and
Kube
State
metrics,
just
to
like
make
sure
that
we're
kind
of
got
the
standard,
centered
operational
kind
of
deployment
for
collecting,
metrics
and
observability
from
the
cluster
I
kind
of
mentioned
earlier,
that
we
use
cert
manager
and
Contour
as
our
Ingress
and
pki
infrastructure,
and
then
we
actually
create
two
deployments
of
Contour
in
the
internal
and
external
namespaces.
A
These
namespaces
are
for
internal
and
external
traffic,
so
because
customer
environments
are
often
in
vpcs,
like
virtual
networks,
that
traffic
goes
through
a
specific
load,
balancer
and
then
internet
facing
traffic
goes
through
the
external
load
balancer.
So
that's
how
we
kind
of
differentiate
those
and
do
peering
to
internal
networks
at
our
customers,
companies
and
then
finally,
we
have
Valero,
which
is
going
to
do
backups
and
then
all
the
cube
system,
e
namespaces,
that
you
get
from
the
different
Cloud
providers,
cool,
so
kind
of
transitioning.
A
Now
to
a
final
topic:
the
Final
Phase
the
day
two
operations,
these
Technologies
are
kind
of
the
standard
ones
and
the
reason
why
you
pick
the
standard
ones
is
kind
of
like
the
high
level
concept
I
want
to
mention,
which
is
that
the
observability
data
isn't
just
for
you,
because
you
are
building
a
system
that
is
kind
of
customer
facing
infrastructure.
A
Some
of
this
data
you're
going
to
pass
on
to
your
customers.
They
want
to
know
what
the
latencies
are
of
the
the
database.
They
don't
know
how
much
CPU
they're
using
they
want
to
know
how
much
capacity
they're,
using
if
they're
going
to
have
to
scale
up
if
you're
growing
up
to
scale
up
is
that
going
to
affect
their
bill.
A
So
it's
not
pure
purely
your
decision
on
what
kind
of
Technologies
you're
going
to
choose
for
for
these
Stacks,
because
they're
going
to
integrate,
potentially
with
customer
systems,
they
might
want
to
ingest
logs
or
traces
or
metrics
from
their
database
as
it's
running
into
their
own
systems,
so
that
they
can
also
page
their
Engineers.
If
something
is
going
wrong
inside
of
the
manage
database
service.
So
for
that
we're
using
all
the
standard
kind
of
Prometheus
ecosystem
for
observability.
A
So
that's
kind
of
the
Prometheus
operator,
Coupe
State,
metrics,
Thanos
grafana
the
works
there
and
then
we
use
for
traces
but
generically
just
open
Telemetry
and
then,
as
I
described
before,
backups
need
to
be
not
only
data,
so
we're
using
kind
of
the
Box
standard
cloud
provider,
datastore
backups,
so
things
that
come
with
the
data
store
themselves,
but
we're
also
building
ppis
so
that
our
customers
can
basically
export
data
out
of
live
systems
or
stream
that
data
to
a
replica
that
they
have
themselves,
maybe
on
a
completely
different
premise:
on-prem
or
a
backup
environment.
A
So
we're
we're
kind
of
tackling
this
on
both
fronts.
But
the
unique
thing
is
actually
not
the
backup
of
the
data,
but
the
fact
that
you
have
to
also
back
up
the
configuration,
because
if
you
restore
the
cluster
and
replay
all
plumey
and
your
configuration
changes,
that's
not
going
to
include
any
of
the
changes
the
customers
have
made
to
the
control
plane
themselves.
A
So
that's
where
Valero
comes
in
and
we're
actually
continuously
backing
up
the
changes
that
customers
are
making
to
the
Clusters
so
that,
if
we
have
to
restore
a
cluster,
we
can
restore
absolutely
everything.
And
the
kind
of
nice
thing
is
it's
all
kind
of
decoupled
in
different
ways.
So
we
can
restore
just
the
customer
data
if
we
need
to
restore
it
to
a
older
version
of
maybe
the
cluster
or
an
older
version
of
the
configuration
all
the
namespaces
that
run
in
the
cluster.
Because
everything
is
broken
into
these
three
different
categories.
A
So
with
that,
like
to
conclude,
you
can
find
me
on
social
media
in
these
three
places
on
Twitter
blue
sky,
and
then
you
can
always
email
me
if
you're
interested
in
any
of
the
projects
I
talked
about,
we
have
a
link
to
Spice
TV
dedicated
the
open
source
space.
Dp
operator
is
available
for
exploring
and
kind
of
like
learning
how
we
went
about
automating,
the
actual
operational
side
of
our
database.
A
A
So
examples
of
this
are
kind
of
custom
informers,
setting
statuses,
according
to
other
properties,
that
of
like
other
resources,
you're
managing
things
like
being
able
to
pause
your
your
operator
so
that
the
operator
stops
reconciling
so
that
a
human
can
come
in
and
debug
kind
of,
like
these
higher
level
patterns
that
you
would
always
need
to
implement
that
are
not
like
the
core
logic
of
the
operator,
we've
kind
of
abstracted
in
a
way
that
you
can
import
and
then
also,
if
you're,
more
interested
in
spice
DB
itself,
you
can
always
join
the
space,
DB
Discord
or
look
at
our
GitHub
organization.
A
We
have
plenty
of
other
open
source
projects
all
around
the
cloud
native
ecosystem
things
regarding,
basically
all
parts
of
the
stack
operators
grpc
the
database
itself,
clients
for
the
database.
Things
like
that.
So
thanks
for
your
time.