►
From YouTube: Webinar: K8s Audit Logging Deep Dive
Description
Many people know that Kubernetes can report API activity to logging back ends and that auditing is a powerful security tool, but what happens in the real world when you have:
- Multiple API servers
- Mutating Admission Controller Webhooks
- Aggregated APIs
- Webhook audit log backends
- Massive API throughput requirements
The short answer is, things get tricky. In this short seminar, we’ll take a brief look at the more complex and deeper issues faced by Kubernetes operators when seeking to implement comprehensive, efficient, and secure Kubernetes auditing.
Presenter:
Randy Abernethy, Managing Partner @RX-M
A
Okay,
let's
get
started
I'd
like
to
welcome
everyone
to
today's
cncf
webinars
k8
auditing
in
depth.
My
name
is
jerry
fallon
and
I
will
be
hosting
today's
webinar.
We
would
like
to
welcome
our
presenter
today,
randy
abernephy,
managing
partner
at
rxm.
We
just
have
a
few
housekeeping
items
before
we
get
started
during
the
webinar.
You
are
not
able
to
talk
as
an
attendee.
There
is
a
q,
a
box
at
the
bottom
of
your
screen,
so
please
feel
free
to
drop
your
questions
in
there
and
we'll
get
to
as
many
as
we
can.
A
B
Hey
thanks
a
lot
for
that
good
morning
afternoon
evening
and
welcome
this.
Is
the
kate's
auditing
in
depth
session
a
bunch
of
interesting
stuff
to
take
a
look
at
over
the
next
half
hour
or
so
so?
Why
don't
we
go
ahead
and
just
jump
right
in
all
right,
so
my
name
is
randy
abernathy,
I'm
a
cloud
native
geek
of
the
first
order,
I'm
a
big
fan
of
microservices
and
apache
thrift
and
things
in
that
area,
and
I
work
for
rxm
and
we're.
B
You
know
diving
a
little
cloud
native
folks
over
here.
That's
just
a
quick
note
on
me.
B
The
session
today
is
gonna
cover
auditing
and
we're
going
to
we're
going
to
start
from
the
start
and
go
through
some
of
the
basics,
but
we're
going
to
quickly
move
into
more
advanced
concepts
and
we're
going
to
talk
about
some
of
the
challenges
and
issues
with
multiple
api
servers
and
mutating
admission
controller
web
hooks
and
how
to
deal
with
different
audit
log
back-ends
in
scenarios
where
you
have
you
know
kind
of
massive
throughput
requirements,
and
you
know
what
exactly
you
can
expect
from
audit
logging
from
a
throughput
and
capacity
standpoint.
B
So,
let's,
let's
start
off,
you
know
audit.
What
is
audit
logging
if
we
just
start
at
the
very
start?
Well,
the
definition
of
audit
is
an
official
inspection
of
an
individual's
or
organization's
accounts,
typically
by
an
independent
body-
and
this
is
a
you
know-
kind
of
an
interesting
parallel
to
what
kubernetes
auditing
is,
and
you
can
see
the
the
the
context
sentence,
I
think,
is
actually
even
more
telling
audits
can
be
expected
to
detect,
can't
be
expected
to
detect
every
fraud.
B
So
this
is
exactly
the
spirit
of
auditing
in
kubernetes
logging
happens
in
in
services
that
you
run
in
kubernetes,
including
the
control
plane
services.
Like
your
you
know,
controller
managers,
scheduler
the
api,
server,
kublets
and
so
on,
and
they
all
log
to
standard
out
standard
error
and,
if
you're,
a
systemd
service,
that's
going
to
be
manageable
through
journal
cuddle
and
all
that
stuff.
B
If
you're
actually
running
in
a
pod,
as
you
may,
if
you're
running,
you
know
kind
of
a
cube,
adm
style
setup
for
your
your
control
plane,
then
cubecode
logs
would
be
able
to
show
you
the
log
output
of
these
different
services
and
it
can
be
managed
with
you
know,
plugins
that
forward
the
logs
off
the
backends
low
key
elasticsearch.
B
What
have
you
and
then
you've
also
got
events
taking
place
inside
the
cluster
control,
plane
events
and
those
events
are
going
to
be
visible
through
code,
get
events,
for
example,
or
if
you
describe
an
object,
you'll
be
able
to
see
information
about
it.
But
this
is
a
different
beast.
Auditing
right
auditing
is
designed
to
give
you
the
ability
to
inspect
an
individual
organization's
accounts
right
and
to
be
able
to
detect
activity
that
might
be
fraudulent,
for
example,
some
of
the
other
verb
uses
here.
Companies
must
have
their
accounts
audited.
B
He
made
use
of
knowledge
gleaned
from
economics
class
he
audited,
so
being
able
to
watch
and
oversee
something
right
is
sort
of
the
idea
of
the
audit
log
and
it's
a
you
know
it's
different
in
kind,
because
what
it's
designed
to
do
is
capture
the
who,
what
and
whys
of
activity
going
on
in
the
cluster
and
it's
usually
at
a
far
more
granular
level
than
these
other
types
of
logs
right
application
level.
Logging
that
you
get
in
standard
out
and
standard
error
is
going
to
be
things
like
you
know
I
created
this.
B
I
did
that
it
did
this
other
thing
and
some
of
the
details
may
be
obscured
for
security
reasons
or
something
like
that.
But
an
audit
log
is
designed
to
capture
all
of
the
details.
It
is
designed
for
no
holds
barred
inspection
of
what's
going
on
in
the
cluster,
and
so
you
know,
generally
only
you
know.
Privileged
individuals
should
be
looking
at
the
audit
log
because
it
can
expose
a
lot
of
stuff.
B
You
know
you
can
you
can
look
at
the
exact,
manifest
posted
by
every
user
for
all
of
the
resources
that
they're
creating.
You
can
see
the
responses
in
detail
from
the
cluster
and
so
on
and
so
forth.
So
it's
really
a
you
know:
a
function
like
you
would
have
a
security
audit.
That's
really
what
the
audit
log
is
for
it's
for
facilitating
those
types
of
activities.
B
So,
let's
start
off
with
just
some
of
the
basics.
This
is
the
definition
straight
out
of
the
kate's.
I
o
docs
kubernetes
auditing
provides
a
security
relevant
chronological
set
of
records,
documenting
the
sequence
of
activities
by
individual
users,
administrators
or
other
components
affecting
the
system.
B
And
in
fact
you
know
how
frequently
you
know
certain
types
of
activities
are
taking
place,
and
so
you
know
you
can
always
go
and
dig
around
and
config
files
and
things
like
that
to
sort
of
figure
out
how
things
are
set
up
and
what
they're
doing
but
going
to
the
horse's
mouth
is
always
you
know
the
authoritative
answer,
because
you
might
see
a
configuration
file
that
says
this
thing's
supposed
to
happen
every
five
minutes
and
you
look
at
the
audit
log
and
it's
happening
every
three
seconds.
B
Well,
that's
putting
a
lot
of
load
on
the
control
plane.
Maybe
you
want
to
look
into
why
that
is?
Is
it
because
the
config
file
is
mistyped?
Is
it
because
there's
a
default
that
is,
is
at
play
in
some
scenarios
where
there
isn't
a
config
file?
B
You
know,
there's
all
sorts
of
interesting
things
you
can
glean
from
digging
through
the
audit
log,
so
it
can
be
used
for
you
know,
for
security
professionals
and
forensics,
and
things
like
that,
but
it
can
also
be
used
for
cluster
debugging
and
performance
tuning
and
it's
a
you
know,
just
a
really
all-around
powerful
facility.
B
So
if
we
were
to
look
at
the
the
architecture
picture
of
the
audit
log,
it
probably
looks
something
like
this:
all
roads
lead
to
the
api
server.
The
api
server
is
the
state
manager
for
the
cluster.
At
the
end
of
the
day,
it's
the
microservice
that
owns
all
of
the
metadata
describing
what
the
cluster
is
doing.
Now.
B
The
api
servers
are
stateless
themselves,
but
they
have
the
logic
that
is
there
to
handle
authentication
authorization,
admission,
control
and
all
of
these
types
of
things
that
decide
whether
something
that
a
end
user
would
like
to
create
as
a
specification
is
going
to
be
accepted
or
not.
Now,
if
that
specification
is
accepted,
it's
going
to
be
dropped
into
etcd.
So
ncd
is
a
highly
consistent
key
value
store
sitting
behind
the
api
server,
and
you
know
this
is
a
simplified
model.
B
You
generally
have
multiple
api
servers
and
a
cluster
of
ncd
nodes,
but
the
communications
channels
are
the
same
right.
Everybody
talks
to
the
api
server
and
only
the
api
server
talks
dead
cd,
and
so,
if
you
want
to
know
the
status
of
something,
if
you
want
to
create
something
delete,
something
update,
something
modify
something
you
do
it
through
the
api
server.
So
essentially
the
api
server
is
the
gatekeeper
of
all.
B
So
if
we
enable
the
api
server
to
log
all
of
this
activity
at
the
api
level
and
since
the
api
server's
api
is
the
gateway
to
all
state
in
in
the
kubernetes
cluster,
then
you
know
we're
really
creating
a
place
where
we
can
see
everything
happening.
Now,
it's
not
completely
true.
It
is
a
distributed
system.
B
You
know,
kublets
keep
the
cache
of
the
pods
that
they're
supposed
to
be
running,
and
there
are,
you
know,
little
pockets
of
information
throughout
the
system,
but
at
the
end
of
the
day,
if,
if
the
api
server
is
right
and
good,
most
things
in
your
cluster
are
going
to
be
right
and
good,
and
if
there's
something
wrong,
the
api
server
is
going
to
be
able
to
see
that
in
most
cases.
So
it's
the
perfect
place
to
be
capturing
this
kind
of
detailed
logging.
B
Now,
when
you
run
an
api
server
by
default,
there
is
no
audit
log,
and
so
we
don't
have
this
facility.
It's
therefore
very
good
to
know
that
the
audit
log
exists
and
to
start
thinking
ahead
and
saying
hey
today
we
don't
need
the
audit
log,
but
after
something
catastrophic
happens,
it's
too
late
right.
You
want
to
see
what
caused
this
big
problem
and
you
have
no
no
record
of
the
activity.
So
there's
a
lot
of
different
ways.
B
When
you
think
about
the
the
the
cluster
picture
that
you
see
here,
you
can,
you
can
quickly
rationalize,
you
know
kind
of
a
of
the
the
amount
of
activity
that's
going
to
be
coming
into
the
api
server.
If
you
imagine
that
you've
got
a
a
maximal
sized
cluster.
Currently
upstream,
kubernetes
is
5000
nodes
right.
So,
if
you've
got
a
5,
000
node
cluster
you've
got
5
000
kubelet's
reporting
status
to
the
api
server
on
a
regular
basis.
B
B
The
the
reporting
and
the
constant
reconciliation,
the
self-healing
behaviors
that
that
kubernetes
is
performing
all
the
time
can
create
massive.
You
know
flows
of
messages
into
the
audit
log,
depending
on
how
you've
got
it
tuned.
So
the
api
server
is
really
the
centerpiece
all
the
requests
to
view
or
modify
state
go
through
that
api
server,
and
so
it's
the
central
place
where
we
want
to
do
auditing.
So
what's
in
the
audit
log?
B
Well,
it
tells
you
what
happened
so
if
I,
for
example,
create
a
pod
or
create
a
deployment
or
create
a
service,
that's
going
to
be
recorded.
It's
going
to
record
when
it
happened,
so
the
api
server
will
time
stamp.
The
audit
log
event
they're
called
it's
going
to
specify
who
initiated
it.
So
it'll
capture
my
identity,
and
this
is
very
detailed.
So
if
I
am
an
administrator
and
I'm
impersonating
another
user,
when
I
create
this
pod
it'll
capture,
both
of
those
identities,
the
my
identity
and
the
party,
I'm
impersonating.
B
So
what
object
did
this
happen
on
so
the
pod,
the
deployment,
whatever
you
can
get
the
exact
identity
of
that
object
and
where
was
it
observed?
So,
as
information
you
know
is,
is
being
processed
by
the
api
server
we
have
different
stages
and
so
on,
and
then
from
where
was
it
initiated?
B
Where
did
this
request
come
in
from
and
where
is
it
going?
So
if
there's
any
destination
for
this
thing,
that
can
also
be
identified
in
the
audit
log.
So
an
example
of
an
audit
event
might
look
something
like
this
and,
as
you
can
see
here,
we're
just
tailing
the
the
audit
log
file
and
just
grabbing
one
line
and
the
modern
audit
log
is
json
based.
So
the
old
audit
logging,
which
is
deprecated
at
this
point
legacy.
B
Audit
logging
was
a
text
based
format
and
so
audit
events
were
a
single
line,
but
the
current
approaches
json,
which
is
a
lot
easier
to
parse
and
process
and
store
and
search
and
index,
and
all
that
kind
of
stuff.
So
if
we
just
clean
clean
up
the
formatting-
and
you
know
space
it
and
indent
it
a
little
bit
there
with
jq
our
json
query
tool,
we
get
a
nice
dump
like
this,
and
so
you
can
see
this
looks
a
lot
like
a
kubernetes
resource,
a
kubernetes.
B
You
know
manifest
or
a
kubernetes
spec,
and
so
it
follows
the
same
exact
principles
as
everything
else
in
kubernetes,
where
it
is
kind
of
a
sort
of
a
declarative
approach
with
key
value
pairs,
and
you
know,
support
for
nesting
and
collections
and
things
like
that,
and
so
any
kubernetes
object
that
you
would
try
to
create.
Would
have
a
kind,
and
so
that's
the
type
of
thing
it
is.
This
is
an
event.
B
However,
the
kind
of
a
thing
doesn't
create
a
unique
definition,
and
this
is
because
parties
can
can
create
custom
resources,
for
example,
and
so
if,
if
my
company,
let's
say
rxm,
creates
an
event
type
of
resource
and
then
let's
say
you
know,
another
company
creates
an
event
type
of
resource.
How
would
you
disambiguate
well
there's
an
api
group
that
you
would
organize
those
kinds
under,
and
so,
as
you
can
see
here,
this
is
an
audit
case,
io
based
event,
and
it
also
has
a
version.
So
it
takes
three
pieces
to
put
together
a
complete.
B
You
know
kind.
Actually
you
need
the
the
group.
You
need
the
the
kind
itself
which
is
subordinate
to
the
group,
and
then
you
need
a
version,
and
so
there
are
other
types
of
events,
and
this
can
be
confusing
to
people
first
getting
into
audit
logging.
The
typical
kubernetes
event
that
you
deal
with
is
a
control
plane
event.
That
is
not
an
audit
event.
B
So
there
are
different
kinds
of
events,
keep
an
eye
on
that
api
group
to
know
which
type
of
event
you're
dealing
with,
and
then
these
guys
have
they've
got
a
number
of
other
bits
of
information
which
makes
them
a
little
bit
different
from
a
typical
kubernetes
resource.
Generally,
kubernetes
resources
would
support
metadata
and
the
object
in
question
would
have
a
name.
So
these
events
have
you
know
they
have
identity.
You
know
per
se,
but
they're
not
named
right,
they're,
just
events
in
a
stream,
and
so
it's
not
like
a
pod.
B
That
would
have
a
name
and
also
you'll
note
that
these
events
aren't
labeled
right,
they're
emitted,
only
we
don't
create
them,
they're,
an
artifact
of
activity
in
the
in
the
cluster
and
so
they're,
created
by
the
api
server
in
a
stream,
as
things
happen
and
they're.
So
they're
they're
a
little
bit
different
from
your
traditional
resource,
but
the
the
format
is
kind
of
similar.
Now
you
will
note
down
at
the
bottom
here
that
we
have
annotations,
and
these
are
just
exactly
like
annotations
in
a
typical
kubernetes
resource.
B
They
give
us
the
ability
to
expand
on
the
functionality
of
the
audit
log
event
without
damaging
the
overall
spec.
So,
for
example,
if
you
create
a
pod-
and
you
want
to
tell
you
know
some-
oh
let's
say
some
some
c
and
I
plug
in
something
special
like
maybe
the
c
I
plug-in
has
some
tricky
dual
networking
functionality
and
you
want
to
tell
it
to
put
you
on
the
b
network.
You
could
use
an
annotation.
B
Kubernetes,
doesn't
know
anything
about
multiple
networks,
but
by
plugging
an
annotation
in
there
you're
creating
a
key
value
pair
that
kubernetes
basically
passes
around
everywhere,
but
just
ignores,
and
so
all
of
the
plug-in
components
and
extension
points
in
kubernetes
are
often
going
to
use
these
annotations
to
to
augment
the
functionality
of
a
particular
thing,
and
so,
in
the
case
of
audit
logging.
That's
exactly
true!
So
if
we
have,
for
example,
an
admission
controller
that
we've
added
to
our
cluster
through
a
web
hook
well,
kubernetes
can
create
audit
events
around
hey.
B
This
thing
got
denied
because
the
pod
security
policy
denied
it,
but
if
the
if
the
web
hook,
that
is
not
part
of
kubernetes
denies,
this
we
need
to
you
know,
maybe
have
some
some
reasons.
Why
or
if
it
mutates
the
request.
We
might
want
to
know
what
the
mutation
was
and
all
those
types
of
things
can
be
represented
in
annotations.
So
annotations
are
really
really
give
us
a
lot
of
flexibility
here.
B
Some
other
things
that
you'll
note
that
we're
going
to
talk
about
in
a
second
are
that
there
are
stages
of
processing,
and
so
you
can
record
events
at
a
given
stage
of
processing
or
multiple
stages
of
processing.
If
you
like,
we
have
a
you
know
the
user,
as
we
described,
who
was
involved
here.
So
this
is
this
particular
node.
So
that's
the
hostname
of
the
node
that
that
made
this
api
get
request
and
you
can
also
see
the
the
url.
So
this
was
the
request
api.
B
So
this
this
particular
node
was
getting
the
api
v1
nodes
information
on
itself,
which
it
is
allowed
to
do
and
it
is
going
to
do
on
a
on
a
regular
basis
right
to
get
any.
You
know
kind
of
updated
information
about
itself.
This
is
an
interesting
thing
about
kubernetes
right.
You
have
to
remember
that
when
you
submit
a
specification
to
kubernetes
the
api
server,
basically
verifies
it
from
a
security
standpoint
and
then
dumps
it
into
cd.
B
There
is
no
guarantee
that
that
means
it's
going
to
be
okay
or
work
right
and
so
things
asynchronously
then
kick
off
after
that,
like
the
scheduler
assigns
pods
to
nodes
and
if
there's
no
node
available
your
pod
might
be
pending
if
the
pod
does
end
up
on
node,
but
there
you
know
the
image
that
you've
specified
in
the
pod
is
no
good.
B
That's
gonna
cause
the
kubelet
to
not
be
able
to
pull
the
image
and
it's
not
gonna
run
again,
but
in
all
cases,
as
far
as
the
api
server
was
concerned,
the
spec
was
good
and
it
saved
it
to
cd.
So
you
have
to
also
have
you
know,
a
fair
amount
of
understanding
of
kubernetes
to
be
able
to
follow
through
with
some
of
these
events,
because
the
ways
that
you
would
find
out
these
other
things
would
be
after
the
fact
right.
B
The
user
posted
that
pods
fact
sure
and
there
was
no
errors,
but
that
doesn't
mean
it's
okay.
The
scheduler
might
might
attempt
to
do
something
and
report
that
it
couldn't
be
scheduled.
The
cubelet
might
report
a
status
of
image,
pull
you
know,
you
know,
fall
back
and
you
know
be
be
be
failing
to
to
pull
the
image
and
continue
retrying
and
reporting
that
so
you
can
find
lots
of
different
pieces
of
the
puzzle
and
wiring
that
all
together
is
is
definitely
a
skill
that
you
develop
through
practice.
B
So
you
know
one
would
suggest,
then,
that
if
you're
going
to,
if
you,
if
you
find
that
audit
logging
is
going
to
be
an
important
part
of
your,
you
know
operational
environment
working
with
audit
logs
and
starting
to
craft,
you
know
some
experience
and
and
dashboards
and
things
like
that
through
your
backend,
you
know
log
management
systems,
whatever
they
may
be.
Splunk
or
elasticsearch
kibana,
you
know
grafana,
on
low-key
or
whatever
it
is.
B
You
know
getting
getting
prepared
and
developing
some
skills
ahead
of
time
can
really
pay
dividends
when
you're
in
a
a
scenario
where
there's
a
failure
or
some
security
event
that
you
need
to
deal
with.
So
what
is
the
definition
of
the
fields
in
the
audit
event
and
and
how
are
they
all
organized?
Well,
that's
a
good
question
and
I'm
just
going
to
pull
up
something
that
you're
probably
familiar
with
here,
go
to
kates
dot,
io
docs
and
pull
up
the
reference
here.
B
So
if
you
want
to
know,
what's
you
know
how
to
specify
resources?
The
api
for
kubernetes
is
essentially
this.
These
json
documents
right,
I
mean
you,
you
get
post,
put
and
delete
these
things,
but
all
of
the
activity
that's
taking
place
is
in
response
to
these
documents,
and
so,
as
I
mentioned,
if
I
go
ahead
and
search
for
event
and
we
look
down
the
left
hand
side
here,
you
can
see
that
there's
a
metadata
api
and
there's
an
event
defined
there.
B
Right
so
these
these
are
all
the
api
resources
known
to
the
to
this
particular
api
server.
And
so
these
are
things
you
can
post
you
know
and
put
and
and
delete
through
the
api
server
api,
but
you'll
see
that
there's
a
bunch
of
these
resource
types
pods.
You
know
the
early
guys
that
were
there
with
version
one
that
don't
have
an
api
group,
that's
the
core
group,
so
those
guys
are
always
in
the
core
group
and
then
you've
got
a
bunch
of
these
guys
that
have
different
groups
depending
on.
B
Look
for
events.
You
can
see
that
we've
got
events
and
event
right
events,
dot,
kate's
dot
io.
So
this
is
a
completely
different
group.
It's
an
it's
a,
not
the
same
resource
as
the
the
audit
log
resource.
It's
not
part
of
the
api
right.
It's!
These
are
just
a
format
for
audit.
You
know
information
being
emitted,
so
you
know
we're
kind
of
stranded
here,
because
if
we
you
know
you
can
search
around
and
you're
just
you
know,
you're
going
to
find
these
are
legacy
right,
old
api
versions.
B
So
there's
just
there's:
no
there's
no
information
here.
There's
there's
limited
information
about
the
format
of
the
api
events
and
things
like
that,
and
so,
whenever
you're
in
doubt,
you
know
go
to
the
sources
because
kubernetes
being
open
source
is
a
huge
benefit.
The
quality
of
the
code
in
kubernetes
is
pretty
dang
high
because
there's
a
there's.
You
know
this.
All
of
this
governance
involved
in
you
know
how
changes
are
made
and
you
know,
reviews
and
you
know,
minimum
requirements
for
documentation
and
in
the
code.
B
So,
as
you
can
see
here,
all
we
really
need
to
do
is
go
to
kubernetes
in
github,
and
you
know,
move
down
to
the
api
server
package
and
then
look
in
the
audit.
B1
types
go
and
you're
going
to
find
definitions
of
all
of
the
types
of
things
that
the
audit
subsystem
uses.
B
So
you'll
find
information,
for
example,
here
on
the
event
struck,
and
you
know,
every
single
field
is
described
and
you
can
of
course
even
see
the
data
types
that
are
being
used
here
which,
if
you
can
read,
go
which
isn't
that
hard
to
ingest.
If
you
know
any
programming
language
really,
you
got
a
a
leg
up,
so
getting
detailed
definitions
for
events
and
all
the
different
fields.
B
You
can
find
here
you're
running
into
something
that
you're
not
familiar
with,
but
let's
cover
a
few
of
the
key
things:
api
servers,
you
know,
process,
requests
and
stages,
and
so
they
authenticate
users,
they
authorize
users,
they
admit
resources
as
a
final
stage
in
the
security
processing.
There's
other
things
that
happen
as
well
and
so
from
an
audit
standpoint.
Receiving
request
is
something
that
we
can
log.
B
If
we
want
to
this
is
the
first
stage
this
is
generated
as
soon
as
the
audit
handler
receives
the
request,
and
so,
if
you,
if
you're
interested
in
you
know
every
single
request,
that's
made.
That's
that's
a
stage
next,
the
response
started.
So
this
is
after
the
response
headers
are
sent,
but
before
the
response
body
is
sent,
so
this
would
apply
to
like
a
long-running
request
like
a
watch
request
or
something
like
that,
and
then
response
complete
again.
This
is
the
response
body
complete.
B
So
after
there's
no
bytes
left
to
send,
and
then
there
are
also
you
know
in
in
go
programming,
a
panic
means
something
pretty
catastrophic
has
happened
and
while
you
might
be
able
to
recover
from
that
most
of
the
times,
that
means
that
you
know
the
api
server
would
would
crash.
So
I'm
pretty
serious.
So
those
are
some
stages
that
you
can
see
identified
in
the
events,
and
you
can
also
use
these
stages
for
filtering
events
too.
B
As
we'll
see
so
audit
levels
control
the
level
of
data
emitted
for
an
event,
none
means
you're
not
going
to
log
anything.
This
is
the
default.
So
if
there's
no
policy
specifying
what
to
do
nothing's
going
to
happen,
then
you've
got
metadata.
So
this
is
basically
going
to
log
all
the
you
know,
the
high
level
stuff,
like
the
the
header
type
of
information
that
you
would
have
in
a
http
sort
of
exchange.
So
it's
going
to
log
the
user,
the
time
stamp
resource
information
the
verb
used.
B
So
you
can
basically
see
what's
going
on,
but
you
won't
be
able
to
see
the
details,
so
you
wouldn't
be
able
to
see,
for
example,
you'd
see
that
somebody's
creating
a
particular
pod,
but
you
wouldn't
see
what
the
pod
spec
is.
Now
that's
going
to
do
two
things:
by
sticking
with
metadata
you're
going
to
be
able
to
see
broad
activity
in
your
cluster
you're
going
to
be
able
to
know
what
kinds
of
things
are
happening
and
and
which
objects
they're
happening
too,
but
you
won't
have
the
details.
You
won't
know.
Specifically.
B
If
you're
going
to,
if
you're
going
to
capture
the
request,
which
is
often
you
know
a
big
piece
of
the
puzzle,
then
you
might
want
to
think
about
capturing
the
response
as
well,
so
that
you
can
see
the
the
response
body
coming
back,
though
again
you
know,
if
there's
a
lot
of
activity
on
your
cluster,
that's
looking
things
up
constantly,
then
the
responses
could
be
large
and
that
could
you
know
you
know
not
not
just
incrementally,
but
you
know,
potentially
multiplicatively
increase
the
amount
of
logging,
so
each
of
these
gives
you
progressively
more
log
output,
and
that
means
you're
going
to
have
to
have
more
capacity
and
throughput
in
your
log.
B
So
here's
the
audit
policy,
so
an
audit
policy
is
again
a
lot
like
a
you
know,
a
typical
kubernetes
resource
and
it
is
saved
in
a
file
and
provided
to
the
to
the
api
server
in
order
to
allow
it
to
you
know,
decide
what
kind
of
auditing
you'd
like
to
do
and
the
audit
policy
file
is
incredibly
powerful.
B
It
allows
you
to
really
get
very,
very
specific
about
the
types
of
things
that
you
want
to
capture.
So,
for
example,
you've
got
high
level
specifications
that
you
can
add
like
omit
request
received
stage.
So
that's
you
know,
that's
sort
of
a
global
policy,
then
you've
got
individual
rules
and
these
rules
specify
the
level
so
in
this
case,
request
response
down
here
metadata
down
here:
none
and
then
you've
got
the
resources
and
so
you've
got
the
group.
So
this
is
the
core
group
right.
Empty
string
is
core
group.
B
B
Forensics
around
you
can
you
know
you
can
sometimes
carve
out
10
20
30,
40
percent
of
the
you
know
of
the
I
o
by
just
carefully
blocking
off
certain
bits
of
logging
using
level.
None
so
really
useful,
useful
tool-
and
you
know
you
know
very,
very
powerful
and
gives
you
lots
of
granular
control.
So
again,
where
do
we
get
the
the
details?
This
is
not
a
kubernetes
api
server
resource.
It's
a
it's!
An
audit
subsystem
configuration
file.
B
But
the
rules
are
kind
of
the
interesting
one,
and
so
you've
got
the
users
component
of
the
rule,
for
example
user
groups
and
if
you're
familiar
with
our
back,
if
you've
done
any
kubernetes
security,
and
I
would
say
that
that
that's
almost
a
prerequisite
to
you
know
working
with
the
audit
log.
If
you're
in
this
space
and
you're
doing
this
kind
of
stuff
you
you
may
be
a
security
professional
or
that's
a
hat.
You
wear
and
so
audit
logging
involves
similar
types
of
constructs
right.
B
You
are
you're
in
our
back
going
to
give
a
particular
principal
a
user,
a
group
or
a
service
account
some
capability,
so
some
verb
on
some
object,
so
some
resource
type
and
those
resource
types
again
can
be
scoped
by
the
the
group
and
then
they
can
even
be
scoped
down
to
a
specific
named
resource
and
then
a
kind,
and
so
that
just
kind
of
carries
on
here.
So
if
you're
familiar
with
with
our
back,
the
audit
rules
are
very
similar.
B
So,
as
we
mentioned,
auditing's
not
cheap,
it
increases
the
memory.
Consumption
of
the
api
server
remember
from
the
model
that
we
saw
only
the
api
server
is
involved
in
in
auditing.
So
you
know
you
don't
really
have
to
worry
about
the
activity
from
you
know
the
controller
managers
or
the
other
guys.
B
It's
really
the
api
server,
and
so
you
know
topping
your
system
getting
some
baselines
of
your
server
without
auditing,
and
then
you
know,
maybe
progressively
ratcheting
up
the
policy
to
increase
the
amount
of
auditing
and
watching
your
resource
consumption
on
the
api
server
side.
You
know,
is
not
a
bad
thing
to
do.
That
way.
You
can
sort
of
get
a
sense
for
you
know
where
the
diminishing
returns
are.
If
you,
you
know,
if
you
basically
log
everything
it's
going
to
be,
you
know
crazy.
B
You
know,
you
know
every
byte
in
is
going
to
be
magnified
by
two,
because
you're
going
to
be
writing
it
to
the
audit
log
with
a
bunch
of
extra
metadata.
So
you
know
kind
of
getting
a
sense
for
the
the
throughput
capabilities
of
your
system
and
and
the
memory
capabilities
is
gonna.
Be
important.
Memory
is
a
key
piece
of
the
puzzle
because
the
the
the
api
servers
audit
subsystem
is
gonna
to
capture.
B
So
how
do
we
set
up
the
api
server?
Well,
the
api
server
has
30
audit
logging
options
and
the
most
important
one
is
perhaps
audit
policy
file,
and
so
that's
the
file
to
actually
use
to
to
define
your
audit
policy
and
a
lot
of
times
when
people
you
know
start
thinking
about.
You
know
this.
B
This
file,
you
you
you
can
you
can
put
it
in
a
bunch
of
different
places,
but
at
the
end
of
the
day,
in
a
cube
adm
scenario,
you
would
probably
put
it
in
a
protected
directory
and
then
you
would
hostpath
mount
it
into
the
api
server
container.
That
would
be
a
typical
scenario.
B
So
the
next
thing
that
we've
got-
and
let
me
just
maybe
I'll
just
show
you
a
quick
example
of
that.
So,
let's.
B
There
you
go,
and
so
you
can
see
this
guy.
This
guy's
got
a
host
path
for
var
log
audit,
that's
where
the
log
output
is
going,
but
the
audit
the
the
policy
file,
also
as
it
turns
out,
is
there
a
lot
of
times
the
policy
file
would
be
in
etsy
kubernetes
or
something
like
that,
but
there's
the
mount
path
inside
the
container,
so
it's
the
same
as
the
host
and
then
in
the
configuration
of
the
api
server.
So
we
run
the
cube
api
server
in
our
configuration.
B
So
those
are
two
key
configurations
and
you
can
run
the
kubernetes
api
server
with
a
minus
minus
help
and
that'll
dump
out
all
the
switches,
and
there
are
a
lot-
and
you
can
also
you
know,
use
the
documentation
for
reference
if
you
want
to,
but
this
is
a
complete
list
of
the
the
current
with
kubernetes
119.3
audit
log
options.
B
Now,
if
you
want
to,
you,
have
two
possible
backends
for
the
the
api
servers
audit
logging,
one
of
them
is
a
local
file
or
it
doesn't
have
to
be
local
a
file
and
then
the
other
one
would
be
a
web
hook,
and
so
there's
a
posting
protocol
for
the
web
hook
to
receive
all
the
events
and
in
either
scenario
there
are
lots
of
settings
right.
B
You
know
post
style
output,
so
that's
the
you
know
the
basic
configuration
stuff
so
now,
let's
now
that
we
kind
of
got
an
idea
of
audit
logging,
we've
seen
some
events,
we
know
how
to
configure
servers.
We
understand
this
policy
thing.
What
are
some
of
the
concerns
that
you
run
into
in
practice
with
this?
Well,
one
of
the
first
things
that
you
run
into
is
having
multiple
api
servers,
because
nobody
in
a
production
environment
is
going
to
have
one
api
server,
because
the
api
server
goes
down.
Your
cluster
is
dead.
B
Two
is
probably
fine
for
most
clusters.
You
get.
You
know
that
way.
If
one
of
them
goes
down,
you
still
have
the
other
one,
and
you
know
it's:
it's
pretty
unlikely.
You're
gonna
lose
two,
and
and
when
you
add
api
servers
you
you
get
some
scaling
ability
right,
because
the
logic
being
processed
by
the
api
servers
is,
you
know
now
distributed,
but
really
at
the
end
of
the
day.
You
know,
maybe
I'll
just
go
back
here
to
this
previous
model.
B
B
Etcd
is
already
a
bottleneck
just
keeping
up
with
the
the
the
configuration
of
the
cluster
right
and
the
events
that
are
happening
because
those
you
know
the
eight.
You
know:
control,
plane,
level
events.
Those
events
are
actually
being
stored
in
std
for
a
period
of
time,
but
the
audit
log
data
that's
massive,
so
we
have
to
have
a
completely
different
channel
totally
independent.
B
For
the
you
know,
the
the
audit
log
and
when
you
think
about
this
ftd
is
is,
is
often
in
a
production
system
running
on
a
different
cluster.
So
you
would
have
a
you
know,
a
three.
You
know
rolling
production.
B
You
probably
have
a
five
or
a
seven
node
std
cluster,
and
so
you
know
the
api
server
cardinality
is
independent
of
std
if
you're,
if
you're,
not
running
them
on
the
same
box
right
so
a
collapsed
std
api
server,
where
the
std
nodes
are
running
on
the
same
machines
as
the
api
servers
is,
is
an
okay
way
to
do
things,
but
if
you're
doing
things
in
that
way,
then
what
defines
the
number
of
api
servers
you
have?
Is
the
xtd
cluster
size?
Not
the
api?
You
know
not
the
api
service.
B
Two
api
servers
is
fine
for
high
availability
for
most
clusters,
but
here's
a
problem
when
you
have
multiple
api
servers
and
let's
say
you
have
three
you're
going
to
need
a
load
balancer,
and
so
you
might
set
up
you
know
if
you're
using
google
cloud,
you
might
use
a
network
load
balancer
on
amazon
or
something
google
cloud
use,
their
load,
balance
or
azure
use
their
load
balancer
to
basically
front
end.
B
Your
api
servers
and
all
your
your
your
cube,
kiddo
configs,
are
going
to
refer
to
the
the
tcp
load
balancer,
and
so
they
hit
this
guy.
He
just
forwards
the
traffic
onto
one
of
these
api
servers
and
you
don't
know
the
difference,
but
there's
a
health
check
so
that
when
one
of
these
guys
crashes,
they
only
send
you
to
the
guys
that
are
alive.
B
Well,
that's
great,
but
the
downside
is
you
don't
know
which
one
of
these
guys
is
going
to
be
dumping
out.
The
audit
log
information
that
you're
interested
in
so
you
could
do
something
like
run
a
pod
as
you
can
see
down
here
and
then
you
know
tail
the
audit
log
looking
for
you
know
some
sort
of
pod
activity
and
not
see
it
because
you're
looking
at
the
wrong
audit
log
right
and
you-
you
came
in
here
and
hit
this
guy
and
this
guy
logged
the
activity
and
these
two
logs
don't
have
anything
about
it.
B
So
the
api
servers
are
shared,
nothing
right,
they're
microservices,
they
don't
know
you
know,
what's
going
on
in
the
other,
server
they're
really
focused
on
you
know
being
as
independent
as
possible.
So
the
downside
is
your
audit
log
is
now
sharded,
basically
and
so
to
unchart
it
you're
going
to
do
something
like
run
a
fluent
d
or
a
log
stash
or
a
beat
or
a
fluid
bit
or
something
like
that
to
splunk.
B
You
know
forward
or
something
to
move
that
log
data
into
a
back
end
where
you
can
get
a
complete
picture
of
what's
going
on
in
the
cluster,
and
so
that's
important
another
thing
about
these
distributed.
You
know.
Api
servers
is
that
on
the
upside
you
get
scaling
right.
So
if
you've
got,
you
know
huge
throughput
going
on
in
your
audit
log,
you
just
divided
it
by
three
by
having
three
api
servers,
and
so
you
know
as
long
as
your
network
can
can
handle
it
and
you've
got.
B
You
know
the
bandwidth
on
the
on
the
actual
wire
you're
you're,
using
three
nicks
you're
using
three
sets
of
memories
you're
using
three
sets
of
disk,
whatever
the
case
may
be,
you've
really
got
scale
there,
so
this
is
one
way
in
which
actually
having
multiple
api
servers
can
in
fact
have
a
dramatic
impact
on
your
scaling
challenges
because
we're
not
using
audit
logging.
B
You
know
I
mean
one:
api
server
can
handle
a
pretty
honking
big
cluster,
it's
the
ncd
cluster,
that's
always
the
bottleneck,
and
so
you
know
two
api
service
is
good
for
three
is
even
better,
but
you
know
if
you
had
audit
log
challenges,
you
might
wanna
go
to
four
or
five
and
you
know
get
your
audit
logs
scaled
out.
You
know
using
using
more
api
servers
and
remember
the
cardinality
could
be
completely
different
from
the
xtd
cluster
because
you
know
usually
that's
a
separate
cluster
of
servers.
B
Another
thing
people,
often
you
know
stub
their
toe
on
for
a
day
or
two
is
using
config
maps.
Configmaps
are
awesome
for
configuring
things
and,
you
might
say,
hey
I'm
running
the
api
server
in
a
pod.
Why
don't
I
set
up
the
policy
as
a
config
map
you
could.
But
what
happens
when
you
start
the
cluster?
B
B
You
need
to
make
a
request
to
an
api
server,
that's
going
to
hit
std
and
give
you
the
config
map,
but
there's
no
api
service,
there's
a
chicken
and
egg
problem.
So
most
people
skip
that
and
do
some
other
technique
to
standardize
their
policies.
But
this
is
another
pitfall
right.
What
if
api
server
one
has
one
policy
and
server
two
has
a
different
policy
and
server.
Three
has
a
different
policy.
B
I
mean
there.
There
could
be
excuses
for
doing
that.
It
is
totally
possible,
but
it's
going
to
give
you
weird
asymmetric
log
output
right
from
the
different
servers,
so
in
most
cases
that
I've
run
across
you
probably
want
them
to
be
the
same,
and
you
might
want
to
have
some.
You
know
sort
of
immutable
infrastructure,
ansible
salt.
You
know
whatever
type
of
things
it's
you
know,
keeping
those
files
in
sync
or
have
them
from
a
shared
disk.
You
know
or
something
so
next
thing
to
talk
about.
We've
got
a
few
more
things
here.
B
I
know
I
think
we
have
to
wrap
up,
but
I'll
try
to
hit
a
couple
more
things
here
and
then
we'll
see.
If
we
can
get
some
time
for
questions
so
mutating
admission
controller
webhooks,
it
can
be
useful
to
know
which
mutating
web
hook
mutated
an
object
in
an
api
request
and
if
you've
got
a
you
know
if
you've
got
a
bunch
of
plugins
into
the
api
server
that
are
potentially
changing
the
nature
of
a
resource
that
somebody
created
by
default.
The
api
server
won't
know
anything
about
it
right.
B
It's
going
to
call
these
guys,
and
you
know
it's.
It's
they
do
what
they
do
and
then
the
the
api
server
just
you
know,
moves
on
to
the
next
unit
in
the
chain,
and
so
what
we
want
to
be
able
to
do
is
see
where
in
the
chain
you
know,
change
a
happened
and
where,
in
the
chain
change
b
happens.
So
a
popular
example
would
be
istio,
for
example,
and
the
istio
proxy
injector.
B
So
I
create
a
pod,
I'm
oblivious,
I
just
you,
know,
wrote
my
app
and
I
put
it
in
a
pod
and
I
go
to
deploy
it.
And
now
the
the
api
server
says.
Oh,
we
have
a.
We
have
a
you
know,
a
mutating
admission
controller
here
that
wants
to
mess
with
this
pot
and
what
is
he
going
to
do
he's
going
to
add
to
the
pod
of
an
init
container?
B
But
that
mutation
is
fairly
complex
and
it
could
interact
in
weird
ways
with
other
mutations
and
it
becomes
hard
to
sort
of
figure
out
what's
going
on
unless
you
have
some
way
to
introspect,
and
this
is
exactly
what
mutating
admission
controller
web
hooks
can
do
by
using
annotations.
So
you
can
see
in
the
example
here,
mutation
web
hook
and
mission
controller
case
io
round
one
index
two.
So
if
you're
familiar
with
admission
controllers,
we
have
you
know
different
phases
so
round
one
is,
is
the
first
pass,
but
then
we
once
everything's
mutated.
B
You
might
also
have
you
know
an
admission
controller,
that's
going
to
allow
or
disallow
only
and
so
that
that
machine
controller
you
know,
would
come
up
in
another
round,
and
so
we
have
these
different
rounds
and
then
we
have
the
indexes.
So
in
this
case
we're
the
the
second
of
the
of
the
mutating
controllers,
and
so
we
have
the
configuration
and
we
specify
some
configuration
data.
We
have
the
web
hook
information
and
then
we
have
the
status
of
whether
this
guy
mutated
it
or
not.
B
So
in
this
case,
this
this
controller
did
not
mutate
the
resource,
and
so
that's
a
nice.
You
know
piece
of
documentation
that
you
can
now
get
from
your
system.
You
can,
you
can
see,
you
know
if
you're
mucking,
around
with
admission
controllers,
you
can
really
look
in
and
get
a
deep
dive
into.
What's
going
on,
another
thing
that
we
can
do
is
we
can
specify
the
actual
mutation,
and
so,
if
you
have
a
request
level,
you
know
audit
or
higher.
B
This
is
all
you'll
get
by
the
way,
if
you
are
just
a
metadata
level,
so
just
yes
or
no,
I
mutated
this
object,
but
if
you're
at
request
level
or
higher,
which
is
more
detailed,
then
as
you
can
see,
we
get
the
patch
right,
and
so
you
actually
have
the
you
know
the
the
information
about
how
things
were
changed,
which
is,
you
know,
really
really
can
be
useful
again
in
debugging
scenarios.
B
Okay,
so
I
think
a
couple
more
things
and
then
we'll
wrap
up
here
so
auto
log
monitoring
the
the
api
server.
It
has
two
open
metrics
style,
metrics,
endpoints
or
metrics
metrics
in
its
slash,
metrics
endpoint,
and
one
of
them
is
api
server,
audit
event
total.
So
that's
the
cumulative
total
of
audit
events
and
then
there's
api
server.
Audit
error
total.
So
that's
the
total
number
of
events
that
were
dropped
due
to
an
error
in
exporting,
so
I
hadn't
talked
about
this
just
yet.
B
But
if
you're,
if
you're,
you
know
dumping
huge
amounts
of
information
to
a
back
end
like
an
elastic
search
or
a
fluency
aggregator
or
something
and
you're
overwhelming
it
with
ios
one
way
to
fix
that
problem
is
to
batch
a
bunch
of
events
together
and
do
a
single
io
with
a
collection
of
events,
and
so
you
can
reduce
the
number
of
ios
by
a
factor
of
10
by
just
collecting
every
10
events
and
submitting
them
as
a
unit
and
that
often
solves
problems.
B
Another
thing
that
you
can
have
is
you
can
have
you
know,
sort
of
up
and
down.
You
know
performance
in
these
aggregators
because
they
might
be
servicing
lots
of
other.
You
know
log
streams,
and
so
you
might
need
to
buffer
your
output.
You
might
send
them
a
batch
and
then
a
whole
nother
batch
and
another
batch
another
batch.
You
might
have
five
or
six
batches
waiting
and
then,
as
soon
as
they
process
that
first
one
then
they
might
catch
up.
B
So
you
need
to
sort
of
look
at
what
the
lag
is
and
figure
out
a
buffer
size
that
also
works
for
them,
and
so,
if
you
end
up
running
out
of
buffer
you're
going
to
drop
events-
and
this
will
tell
you
if
you're
doing
that,
so
those
are
both
really
important,
because
the
first
one
event
total
is
going
to
give
you
an
ability
to
sort
of
estimate
and
discover
spikes.
So
if
you
have
a
prometheus
or
something
system,
monitoring
the
metrics
from
your
api
server,
you
can
plot
that
and
look
for
anomalies
or
trends.
B
So,
there's
just
an
example:
dump
running
a
cube,
kiddo
proxy
on
a
machine
because
you
know
to
avoid
all
the
tls
stuff
that
you
need
to
get
the
metrics
and
then
just
curling
the
proxy
to
get
through
to
the
api
server
on
the
metrics
endpoint,
and
you
can
see
that
we've
got
the
audit
event
in
this
case
is
what
we're
grabbing
for
the
total
and
you
could
pull
up
the
error
in
the
same
way.
So
that's
some
of
the
metrics
other
things
handling
massive
throughput.
B
So
we
talked
about
batching,
blocking
and
and
strict
blocking
batching
is
where
you're
gonna
buffer
events
asynchronously
blocking
you're
gonna,
actually
block
the
api
server
responses
until
the
event
is
processed.
So
that's
you
know,
that's
pretty
draconian
and
will
impact
users
and
then
blocking
strict
is
the
same
as
blocking.
But
when
there's
a
failure
during
the
audit
logging,
the
whole
request
is
rejected
to
the
user.
So
that's
even
more
strict,
so
batches.
B
You
know
typically
what
what
people
would
set
to
for
their
the
buffering
strategy
and
then
there's
a
you
know:
buffer
sizes
wait
times
throttling
all
sorts
of
things
that
you
can
set
up
here
to
help
control
the
throughput.
B
Other
considerations
remember
that
each
api
server
is
independent,
shared
nothing
and
so
scaling
them
can
can
give
you
some
scale.
So
if
your
back
end
web
hook
is
the
bottleneck,
you're
gonna
have
to
think
about
that
as
well,
but
you
can
scale
the
api
servers
to
scale
the
stuff
out.
Okay,
I
think
we're
getting
pretty
close
to
the
end
of
time
here,
so
I'll
wrap
up
thanks
a
bunch
really
appreciate
your
time
and
maybe
we'll
see
about
some
questions.
A
B
Types
go
source,
is
the
the
you
know,
defines
the
structure,
so
I'm
not
positive
I'll
I'll
I'll
I'll
see.
If
I
can
find
out,
though,
and
maybe
I
can
post
an
answer
with
a
follow
up
in
the.
C
When
we,
when
we
upload
things.
A
B
Sure,
okay,
let
me.
B
This
picture
so
the
the
ftd
cluster.
C
Just
gonna
see
if
I
could
draw
something,
but
I
don't
think
I
oh
yeah.
I
can't
here
we
go.
B
I
used
so
many
different
darn
presentation
tools.
These
days
I
have
to
keep
track.
So
if
you
have
an
ntd
clutter,
let's
say:
let's
do
it
simple?
Let's
just
keep
an
example
of
three
that
ftd
cluster
with
three
nodes
is
going
to
use
the
raft
consensus
protocol
to
elect
a
leader,
and
so
let's
say
this
guy's
the
leader.
B
So
if
you've
got,
let's
say
three,
let's
say:
you've
got
five
these
guys
just
to
make
it
a
little
bit
more
interesting,
so
say:
you've
got
five
sed
nodes
and
you've
got
three
api
servers,
so
the
api
servers
are
all
going
to
write
to
the
leader
and
in
general
they're
going
to
read
from
the
leader
too,
and
you
might
say:
oh
my
god,
that's
terrible
because
the
more
api
servers
you
add,
the
more
load
you're
creating
on
that
leader
and
while
that's
true
at
the
end
of
the
day,
the
etsyd
cluster
is
highly
consistent.
B
When
you
write
to
the
leader,
it
has
to
write
that
data
to
all
of
the
other
nodes
in
the
cluster
and
furthermore,
because
it's
a
highly
consistent
key
value
store.
It
has
to
know
that
a
quorum
of
the
nodes
have
committed
the
data
and
so
etd
becomes
the
bottleneck
in
most
cases,
when
you're,
when
you're
you
know
experiencing
control,
plane,
challenges,
and
so
the
you
know,
the
api
servers
are,
are
they're
anonymous,
faceless,
identity-less,
microservices,
you
put
a
load
balancer
in
front
of
them.
B
You
hit
the
load
balancer,
you
don't
care
which
one
you
get
because
no
matter
which
one
you
talk
to.
It's
always
going
to
give
you
the
same
picture
of
the
world,
because
you
have
this
highly
consistent
key
value
store
that
stores
all
the
state.
No,
the
api
servers
have
some.
You
know
caching
and
things
like
that,
but
at
the
end
of
the
day,
everything
comes
from
fcd.
B
So
if
you
ask
number
one
two
or
three
of
those
masters,
what
pods
are
out
there
they're
all
going
to
give
you
the
same
answer,
and
so
the
the
state
in
etd
is
is,
is
the
the
real
you
know,
bottleneck
the
management
of
that
and,
unfortunately
adding
more
nodes
to
cd
slows
it
down
the
fastest
scd
cluster
is
a
single
node
because
he
doesn't
have
to
copy
the
data
to
anybody,
and
so
the
reason
you
need
to
have
multiple
nodes
and
that
production
systems
are
like,
usually
five
or
seven.
B
Is
that
if
you,
for
example,
want
you
know,
diversity
and
and
failure,
tolerance
resilience
which
you
want.
You
know.
In
most
cases
you
can
have
three
availability
zones,
for
example
in
the
cloud,
and
you
can
run
your
std
cluster
across
all
three
and
if
you
lose
any
az,
you
still
have
a
quorum
right.
Quorum
is
n
over
two
greater
than
so.
If,
in
this
case,
five
over
two
that's
two
and
a
half,
so
three
would
be
the
next
higher
integer.
So
any
three
of
these
guys
and
we're
good.
B
So
we
could
lose
a
whole,
a
z
or
if
a
node
crashes
you
can-
and
let's
say
you
take
a
node
down
for
maintenance.
You
can
still
have
another
node
crash
and
be
okay.
Seven
is
a
little
bit
safer
than
that,
but
it's
a
little
bit
slower.
So
that's
sort
of
the
relationship
right,
stateless
microservices,
the
api
server,
usually
behind
a
load.
B
Balancer
like
this,
would
be
like
a
kubernetes
service
sort
of
right,
but
in
the
case
of
a
load
of
api
servers,
you'd
usually
use
not
always
but
usually
use
something
else,
because
you
want
people
to
be
able
to
access
the
cluster
who
are
not
in
it,
and
so
you
know,
services
a
load,
balancer
service,
you
know
you
know,
could
make
sense.
But
again
you
you
got
to
worry
about
chicken
and
egg
problems
when
you're
creating
resources
in
the
cluster
to
access
the
api
server.
B
The
api
server
is
the
thing
that
gives
you
access
to
those
resources,
so
usually
some
sort
of
external
load
balancer
in
front
of
the
api
servers
and
then
the
api
servers
communicating
with
the
leader
of
the
std
cluster
and
the
entity.
Cluster
leadership
is
dynamic,
so
all
the
api
servers
are
typically
going
to
know
about
all
the
fcd
servers.
C
A
Thank
you
very
much
randy,
and
I
want
to
thank
you
again
for
a
wonderful
presentation.
Unfortunately,
we
are
out
of
time.
I
would
like
to
thank
everybody
for
joining
us
today
and,
as
I
said
before,
today's
webinar
and
slides
from
today's
presentation
will
be
available
on
the
cncf
webinar
page
at
cncf,
dot,
io
webinars.
A
Thank
you.
Everyone
for
attending.
Thank
you
again,
randy
for
a
wonderful
presentation.
Everybody
take
care,
stay
safe
and
we
will
see
you
at
the
next
cncf
webinar.