►
Description
Is your idea of fun sitting in front of a camera, live streaming to the internet, debugging and fixing a broken Kubernetes cluster? Doubtful.
What if these Kubernetes clusters were intentionally broken by members of the Kubernetes community, tasked with making your chances of fixing said clusters as slim as possible?
Join us today to learn the key methods, tools, and takeaways David has learnt fixing over 50 Kubernetes, live on his series: Klustered
A
Right,
so
our
next
speaker
probably
needs
very
little
introduction.
Is
he?
Are
we
bringing
him
on
to
the
stage.
A
Is
it
happening
it's
happening
so
probably
the
most
prolific
live
streamer
in
our
native
ecosystem.
His
media
empire
includes
rocco
live.
Slash,
looks
good
to
me
on
cloud
native
tv
and,
of
course,
custom.
It's
david
flanagan.
How
are
you
buddy,
I'm
doing
really?
Well
it's
nice
to
be
here.
How
are
you
I'm
good,
I'm
good!
So
you
you
have
you
recently
become
a
have.
You
got
a
little
one,
a
new
little
one.
A
And
he's
not
here
today,
so
he
missed
his
opportunity.
Oh.
A
Almost
certainly
right,
I
will
pass
you
with
the
floor
and
have
at
it.
A
B
Now,
a
little
bit
about
me.
I'm
a
senior
developer
advocate
focused
on
cloud
native
and
kubernetes,
and
I
work
for
a
company
called
equinix
metal.
I
also
am
a
cncf
ambassador
and
influx
ace.
I
am
the
host
of
the
official
kubernetes
office
hours
co-chair
of
cloud
native
tv
as
well
as
hostess
lgtm,
as
matt
said,
and
my
youtube
channel
that
I
devote
far
far
too
much
time
to
is
available
at
rawcode.live.
B
Now
clustered
is
really
good
fun.
We
are,
I
think,
25
to
28
episodes
in
right
now,
depending
on,
if
you
count
teams
solos
and
the
newber,
the
newcomer
edition
and
the
general
idea
is
really
really
simple.
I
reach
out
to
friends
within
the
kubernetes
community.
These
are
kubernetes
contributors
and
end
users,
and
I
give
them
a
freshly
baked
kubernetes
cluster
and
tell
them
to
break
it.
B
Whichever
way
they
can,
they
don't
give
me
or
the
other
members
of
the
clustered
episode
any
information
on
what
has
been
broken
and
we
go
live
and
we
share
our
screen
and
we
try
to
work
through
all
of
the
broken
bits,
identifying
symptoms
and
looking
for
cause
and
effect
to
see.
If
we
can
get
that
customer
back
online
and
on
paper,
it's
really
simple.
All
you
have
to
do
is
upgrade
the
clustered
pod
from
image
v1
to
imagev2.
B
B
Hopefully
it's
going
to
play
yeah.
We
still
have
no
cluster
dns
cube.
Dns
has
endpoints,
okay,
that's
after
eight
yeah,
let's
see
if
guy
wants.
D
D
B
B
C
B
But
the
number
being
so
high
worries
me,
but
we're
now
at
the
stage
where
I
have
to
go
and
pick
up
my
daughter
so.
C
D
C
D
B
B
B
So
I'm
going
to
take
you
on
my
path
of
failure
from
clustered
all
the
assumptions
that
I
made
and
the
things
that
I
got
wrong
and
the
things
that
I
learned
so
the
first
thing
and
then
probably
the
most
prolific
attack
surface-
that
people
use
on
cluster
to
break
these
clusters
is
the
linux
system
itself.
Now
this
is
just
a
really
shorter
video.
C
D
C
B
Okay,
it's
an
interesting
start.
Yeah
yeah
do
an
do:
an
ls
dash,
l,
a
on
user
bin
change,
mod
or
probably
just
bin
change
mode.
B
B
All
right
so
take
a
look
at
the
screenshot.
This
is
the
team's
edition.
This
was
a
red
hat
versus
talus
systems
and
jiffy
on
the
red
hat
team
and
his
other
colleagues
removed
the
executable
permission
from
chimod,
not
only
that
all
the
files
and
binaries
that
we
see
in
white
here
also
had
the
executable
permission
removed.
B
Let's
change,
attributes
to
mod
qbdm
cube
control,
oc
which
they
were
using
as
an
alias
to
control
scp,
so
they
couldn't
pull
in
other
files
and
even
peril,
and
what
I
love
about,
that
is
that
it
is
really
really
simple
and
a
mistake
that
we
could
all
make
easily
and
then
not
something
that
everyone
is
fully
aware
of
how
to
fix
so
cube
control
and
cubad
members.
You
know
you
know.
C
B
B
However,
what
we
learned
in
that
episode
is,
you
can
actually
use
the
dynamic
linker
directly
to
execute
binaries
on
the
machine
that
don't
have
the
executable
bit
set
and,
in
this
case,
we're
using
the
dynamic
linker
to
re-enable
the
executable
bit
on
the
chi
mod
binary
itself
with
the
gmod
binary,
and
I
think
that's
a
wonderful
tip
and
a
great
debugging
thing
also
something
I
learned
during
custer
is
that
learns
file
attributes.
You
know
I'm
familiar
with
the
the
chamod,
the
777,
the
604s
etc,
but
there
are
extended
attributes
on
all
files.
B
B
B
You
know
you
may
be
familiar
with
eb
and
nf
tables,
something
I
wasn't
aware
of
until
very
recently,
on
an
episode
of
clustered
is
there's
the
concept
of
traffic
control,
a
tc
command
for
manipulating
the
packets
on
the
device
through
quality
of
service
rules,
and,
of
course,
we've
got
ebpf
and
xdp,
which
are
being
leveraged
heavily
by
the
cilium
project.
B
Now
what
I
didn't
know
about
iptables
before
clustered
is
that
you
can
actually
apply
drop
rules
using
the
statistic
module
to
apply
a
randomization
effect
to
the
drop
packets,
causing
what
appears
to
be
intermittent
errors.
Errors
at
the
ip
tables
level,
particularly
sneaky
another
common
attack.
Surface
we've
seen,
is
people
changing
the
dns
policy
on
the
pods.
B
B
There's
some
tips
for
working
with
networking.
You
know
I'm
not
going
to
stand
here
and
say
everyone
should
stop
selling.
Of
course,
you've
got
your
own
needs
that
have
to
be
met,
but
selium
is
a
wonderful
cni
implementation
and
it
ships
with
something
called
hubble
hubble
gives
you
a
visualization
and
user
interface
into
all
of
the
network
policies
across
your
cluster
and
service
communication,
showing
you
in
real
time,
packets
that
are
being
dropped
or
accepted,
allowing
you
to
kind
of
trace
all
these
requests
through
your
system.
B
It
is
a
super
power
tool
and
I
encourage
everyone
to
check
it
out,
even
if
you're,
not
using
the
hubble
and
psyllium,
even
if
you're,
not
using
hubble
zillion,
provides
an
editor
for
modifying
and
working
with
these
network
policies
as
well.
You
can
go
to
editor.selim.io
and
you
can
actually
start
to
build
through
a
visual
interface
or
drop
the
yaml
into
the
box
to
visualize
the
network
policies
that
you
need
in
your
system.
This
is
a
really
quick
and
fast
way
to
bring
in
those
network
policies
and
I've
seen
time
and
time
again.
B
B
Now
next
is
zcd.
This
is
the
biggest
scariest
one
that
we
ever
see
like.
I
don't
think
anyone
who
has
ever
appeared
on
clustered
has
been
happy
when
scd
is
unhappy.
Nobody
is
actually
that
familiar
with
debugging
ncd
itself,
and
nor
should
you
be
right,
but
there
are
a
few
things
you
need
to
be
aware
of
number
one
is
we've
seen
many
people
now
unclustered
attack
etd
by
modifying
the
quota
or
fill
in
disks
or
writing
arbitrary
junk
data
into
ltd
itself.
B
All
of
these
end
up
with
an
entity
alarm
now,
even
when
you
fix
the
problem
by
cleaning
up
the
space
you're
moving
the
junk
keys,
increasing
the
size
of
the
resource
quota
that
alarm
stays
in
place,
even
if
you
restart
ltd
and
all
sorts
of
things,
and
it
was
only
through
trial
and
error
and
pain
that
we
realized.
You
actually
need
to
alarm
this
arm
for
the
lcd
to
become
healthy
and
happy
once
again,.
B
Add
in
your
key
and
then
literally,
you
get
all
the
secrets
and
all
namespaces
as
json
and
do
acube
control
replace
to
re-encrypt
the
values
as
they
go
in
the
encryption
and
ltd
and
kubernetes
at
rest
as
applied
when
modifying
or
writing
to
ncd
and
the
way
to
fix
the
partially
encrypted
problem
is
just
to
add.
Both
providers
so
add
a
key
and
an
identity.
B
All
right
so
where
next
that's
a
really
good
question,
but
there's
something
I
want
to
address.
First,
there
are
a
lot
of
places
that
you
can
attack
a
kubernetes
cluster
and
I've
built
this
word
map
just
to
show
roughly
what's
going
on,
but
you
know:
we've
seen
people
attack
the
cri,
the
csi
we've
seen,
people
who
attack
the
controller
managers
even
to
the
extent
of
recompiling
the
controller
manager
and
publishing
their
own
image
or
recompiling
a
cubelet
and
publishing
their
own
image,
making
things
immutable,
applying
policies
via
cavernogs
policy,
oppa,
etc.
B
B
And
this
isn't
really
a
talk
about
failures.
Specifically,
I
don't
know
if
you've
been
paying
attention
to
the
quotes
as
we
move
through,
but
all
of
this
failure
things
where
I'm
talking
about
the
next
area.
We
want
to
address,
there's
a
quote
that
tells
you
that
failure
is
just
a
part
of
the
knowledge
cycle.
The
way
that
we
learn
is
by
dealing
with
these
problems,
and
I
think
that
is
crucial.
B
So
what
I'm
going
to
encourage
everyone
to
do
is
to
remove
hero
culture
right.
We
don't
need
hero
developers.
What
we
need
is
the
confidence
and
the
ability
to
say
I
don't
know
within
our
teams
within
our
organization
and
if
you're
bold
enough
to
do
it
in
public,
I
get
front
of
an
audience
every
week
with
clustered
and
spend
most
of
my
time
going.
I
have
no
idea
what
I'm
doing,
and
I
think
I'm
pretty
good
at
this,
and
we
have
a
duty
and
an
honor
to
set
a
precedence
for
new
people
entering
this
industry.
B
The
hero
developer
is
no
longer
a
thing,
so
my
journey
next
is
taking
me
towards
ebpf.
I
have
noticed
through
all
of
these
clusters.
Episodes
the
ebpf
probably
has
the
answer
to
every
problem
that
we've
ever
dealt
with,
and
I'm
going
to
use
one
slide
just
to
try
and
show
you
what
I
mean
by
this.
So
ebpf
is
really
performant
as
the
bytecode
compiles
into
the
kernel
that
exposes
tracing
probes,
allowing
us
to
understand
what
is
actually
happening
at
the
kernel
level.
B
There
are
some
really
great
projects
in
this
space
for
kubernetes
psyllium
we've
already
covered
balco
by
cystic,
is
amazing
for
getting
into
the
audit
log
and
the
events
happening
within
the
kubernetes
system.
We
have
inspector
gadget
by
the
convol
team
and
pixi
by
pixie.
Labs
are
completely
autonomous,
uninstrumented
observability
into
your
kubernetes
cluster.
You
should
definitely
check
it
out.
B
B
I
o
snip
open
snip,
tell
you
when
files
are
opened
or
being
written
to
or
being
read
from,
and
then
there's
all
the
tcp
snips
as
well,
which
are
going
to
give
you
visibility
into
all
the
packets.
Within
your
system,
ebpf
is
a
superpower
and
it's
what
I'm
really
keen
and
excited
to
be
learning
next,
if
you
want
to
learn
more
yourself,
there's
some
links
here,
I'll
publish
a
link
to
the
slides
on
the
slack
channel
momentarily
but
check
out
the
bcc
examples
at
github.com.
B
C
B
It
is
a
whole
lot
of
fun
once
you
get
over
the
awkward
bit
like.
Oh
I'm
just
going
to
be
completely
confused,
I'm
going
to
have
no
idea
what
I'm
doing,
but
it's
through
collaboration
and
painting.
You
know,
I'm
not
doing
it
alone.
I've
had
a
guest
about
me.
We
have
teams,
we
we
talk
about
the
symptoms.
We
try
and
understand
the
problems
and
it's
through
those
conversations
that
we
actually
are
able
to
share
so
much
knowledge
with
the
broader
kubernetes
community.
C
Fantastic
well
we're
running
slightly
over,
so
I
will
say
thank
you
very
much
and
then,
if
there's
any
questions,
if
you're
hanging
out
in
these
slacks
for
a
bit,
then
folks
can
hopefully
post
some
questions
into
slack,
and
you
can
take
a
look.
So
we
are
taking
a
quick
break
time
for
folks
to
grab
coffees
or
water.