►
From YouTube: Stop Hitting Yourself! - Michael Russell, Elastic
Description
Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
A
Hey
everybody.
Thank
you
for
coming
today.
The
attire
with
my
talk
is
stop
hitting
yourself.
I
was
just
talking
to
a
gentleman
in
the
back
on
my
way
in,
and
you
came
here
pretty
much
just
for
the
title.
So
I
promised
a
few
people
I'll
at
least
hit
myself
at
least
once
just
to
make
sure
everybody
got
what
they
came
for.
So
let's
get
started,
I've
got
a
pretty
packed
slide
deck
today,
okay,
two
quotes
to
start
off
with
computers.
A
A
So
this
is
me
my
name
is
Michael
Russell
I'm,
an
Australian
living
in
the
Netherlands,
a
new
tract
I
work
at
elastic
as
a
software
I'm,
a
software
engineer
for
the
infrastructure
team
I,
currently
respond
to
Michael,
Mick,
Mickey,
Mike,
Mikey,
Russell,
rusty,
crazy
boss
and
I'm.
Looking
for
new
nicknames,
potentially
a
Chinese
one
would
also
be
great
to
add
to
the
list.
A
So
if
anyone
has
anything
just
find
me
out
for
the
conference
or
at
the
booth
for
the
elastic
stuff
and
I
enjoy
food
travelling
gaming
and
also
I
forgot
that
I
left
this
in,
but
photoshopping
myself
into
ridiculous
situations
like
that.
Okay,
so
the
title
of
the
talk
is
stop
hitting
yourself.
If
anyone
not
familiar
at
the
reference,
this
is
from
a
TV
show
called
The
Simpsons
and
in
The
Simpsons,
the
the
guy
on
the
left.
A
Here
his
name
is
Nelson,
who
was
a
school
bully
and
the
guy
in
the
ride
is
a
bit
of
a
nerd,
so
the
kind
of
guy
he
writes
software
and
what
he's
doing
is
he's
filming
him
and
he's
grabbed
his
arm
and
he's
saying,
stop
hitting
yourself.
Stop
hitting
yourself
stop
hitting
yourself
and
the
be
sort
of
connection
to
this
is.
This
is
how
I
feel
that
the
experience
is
a
lot
with
modern
software,
where
you
go
to
deploy
something
and
you
immediately
get
an
arrow.
Oh,
you
need
to
set
this
configuration
setting.
A
You
fix
that
you
deploy
something
else
and
then
you
run
into
the
next
problem,
and
this
is
a
pattern
that
I've
seen
throughout
throughout
the
years
of
my
career,
which
just
keeps
coming
back
even
with
newer
technology.
So
after
writing
this
speech
and
reading
it
back
I
realized.
It
was
pretty
negative
because
I'm
pretty
much
complaining
about
software,
the
whole
time
and
showing
the
negative
side
of
things,
but
I
just
wanted
to
say
that
I
really
enjoy
what
I
do.
A
I
really
like
kubernetes
I
didn't
get
paid
to
say
that
and
they
didn't
actually
pay
me
at
all.
So
you
can
believe
what
I
say.
Okay,
so
it's
gonna
be
a
sort
of
storytelling
format.
I've
got
a
few
different
stories
working
from
the
older
age
of
technology
right
up
until
modern
day
with
kubernetes,
so
we're
gonna
start
where
everyone
was
around
five
years
ago.
Writing
bash
scripts.
So
one
day
a
developer
came
up
to
me
and
he
said:
I
need
you
to
do
a
production
I
need
you
to
do
a
restore
of
our
production
database.
A
I've
been
working.
The
company
for
a
few
months,
I,
didn't
know
that
we
had
a
backup,
sir
apparently
we
did,
which
was
good
to
know
and
when
I
asked
him
about
it,
or
he
could
tell
me,
was
there's
a
bash
script
and
a
cron
job
somewhere.
So
this
was
off
to
a
good
start.
We
at
least
had
some
form
of
backup
method.
The
question
then
was:
will
it
actually
work
so
to
go
back
even
further
in
time?
This
is
what
the
script
looked
like
20
years
ago,
the
original
version.
A
So
at
this
point
it's
not
too
bad.
It's
just
a
my
single
dumb
command.
It's
putting
that
into
a
backup
directory
which
is
hopefully
somewhere,
that's
gonna
lost,
and
you
know
this
is
not
too
bad.
This
was
okay,
but
after
this
is
how
the
changelog
worked
when
I
opened
it
up,
I
removed
a
few
of
the
change
logs,
but
over
the
years
lots
of
advancements
have
been
made
once
the
changes
have
been
done,
little
optimizations
and
things-
and
you
know
this
script-
had
history
I
thought.
A
If
this
has
been
running
for
20
years
now,
it
must
be
a
pretty
good
script
right.
This
has
been
doing
it:
production
backups
for
20
years.
I
can
trust
this.
So
this
is
what
the
script
looked
like
from
memory.
I
just
had
to
make
it
up
a
bit
and
at
first
glance
this
looks
pretty
good,
particularly
the
the
top
line
up
there.
The
important
do
not
remove
warning
and
set
II.
So
if
anyone
who
hasn't
done
much
bash
flip
ting
said
he
is
really
really
important.
A
So
this
is
the
first
example
of
stop
hitting
yourself
and
luckily
someone
else
had
already
run
into
it.
So
what's
that
II
does,
is
it
says
that
if
any
of
the
lines
fail,
I
need
the
commands?
I
want
you
to
actually
exit
and
say
it
failed?
Otherwise,
the
only
thing
needed
for
successful
backup
was
this
line.
So
before
the
guy
out
of
this
five
years
ago,
the
script
was
always
successful,
because
the
echo
backup
successful
always
worked.
A
They
were
doing
the
dumper
table,
so
that
was
already
a
bit
of
a
concern
and
then
I
would
then
need
to
do
the
restore
in
reverse
order,
which
isn't
too
bad
definitely
possible,
and
the
other
thing
is
that
a
cron
job
which
sends
an
email
when
it
fails,
is
pretty
much
the
the
lowest
level
of
monitoring.
You
can
imagine
if
you
don't
get
an
email
every
day,
you're,
just
basically
assuming
hey.
My
backup
must
have
what
you
don't
know
if
it
actually
ran.
A
If
your
email
server
is
broken,
you
don't
really
know
anything
so,
there's
actually
only
a
or
things
that
I
think
email
is
actually
useful.
For
so
one
of
those
things
is
a
human
sending
a
message
to
another
human
when
they
already
know
that
human.
So
that's
one
of
the
things
and
the
other
thing
is
saving
any
articles
for
yourself
or
cat
photos
that
you
want
to
send
to
your
girlfriend
later.
That's
pretty
much.
A
The
two
things
I
find
email
useful
for
something
that's
not
so
good
for
is
any
kind
of
monitoring
alerts,
computers,
automated
systems
needing
to
contact
humans
and
finally,
alerting
the
fire
brigade.
That's
from
the
IT
Crowd,
it's
a
real
scene,
so
if
you
haven't
watched
it
I
highly
recommend
the
show.
So
the
surprising
thing
was
is
that
it
was
a
backup
script
with
a
lot
of
history
in
around
for
20
years,
I
really
was
stole
script
and
it
actually
worked
and
I
was
pretty
surprised
and
the
site
was
working.
A
It
was
up
when
I
searched
for
a
job.
It
was
a
job
website
back.
She
seemed
to
be
some
stuff
in
the
database,
so
I
was
pretty
happy
with
that,
and
I
was
quite
surprised.
So
we
now
had
a
backup
that
was
restorable,
which
was
a
pretty
good
upgrade
from
earlier
in
the
morning
when
I
didn't
even
know
we
had
backups.
A
So
that's
when
you
actually
start
to
look
into
it
and
find
out.
Oh
actually,
it
may
have
restored,
but
is
all
the
data
there?
So
the
interesting
part
here
was
that
the
search
index
that
I
was
looking
at
was
a
search
engine.
So
not
the
my
sequel
database
itself,
so
that
one
was
okay,
but
the
actual
database
was
more
or
less
completely
empty
and
then
we
had
a
look.
There
was
only
eight
jobs
online
when
they
should
have
been
around
25,000.
So
at
this
stage
I
thought,
maybe
it
was
just
a
bad
backup.
A
Maybe
something
happened
this
morning.
Maybe
yesterday's
backup
was
actually
fine
all
the
week
before.
Let's,
let's
see
what
we
have
and
what
I
did
is
I
had
a
look
at
the
size
of
the
backup
for
the
previous,
so
this
was
a
daily
backup,
I
think
so
for
the
previous
week
and
I
saw
this,
which
was
pretty
horrifying
because
you've
got
imagine.
A
This
is
a
script,
that's
been
running
for
20
years
and
what
you
would
expect
to
see
is
that
the
jobs
table
was
append-only,
so
every
job
from
the
whole
history
of
the
company
was
in
there.
So
you
would
expect
to
see
a
five
megabyte,
backup
approximately
and
then
everyday
getting
a
tiny
bit
bigger.
So
as
soon
as
you
see
no
backup
so
they're
even
remotely
similar
to
each
other,
you
pretty
much
can
conclude.
This
was
one
table
as
well.
A
So
it's
quite
a
good
feature
request
and
the
interesting
part
is
this:
next
change
log,
so
someone
else
came
along
and
once
the
backups
were
actually
failing,
Baria
would
have
fixed
the
reason
that
was
failing,
and
this
meant
that
all
of
a
sudden
there
was
a
greater
use
of
data
for
the
backups.
So
suddenly
this
was
filling
up.
A
A
The
pipe
in
the
my
secret
dump
command
into
gzip
and
what's
interesting
here,
is
that,
as
we
said
before,
the
default
for
a
bash
script
is
that
as
long
as
the
loss
line
works,
that's
okay
and
everything
else
can
fail,
so
that's
fixed
by
the
set
EE
and
then
the
new
feature
here
which
is
needed
is
set
a
pipe
fail
and
what
this
command
means
is
that
any
command
on
the
line?
Sorry,
this
one,
the
only
important
command
in
whole
script.
If
that
fails,
that's
okay!
A
A
We
were
back
into
the
situation
where
every
single
backup
was
successful
again,
so
in
total,
looking
at
the
dates,
there
was
a
good
12
days
of
valid
backups,
but
we
didn't
have
them
anymore,
but
it
could
have
been
nice
so
to
get
back
to
what
I
said
earlier,
software
is
easy
to
use,
but
hard
to
run.
Data
is
easy
to
backup
but
hard
to
restore
hard
to
restore
and
verify.
A
Okay.
Let's,
let's
talk
about
something
a
bit
more
modern.
Hopefully,
we've
improved
from
our
bash
scripting
days.
So,
let's
start
with
docker,
so
everyone
has
used.
Docker
is
probably
done
this
at
some
stage
or
probably
multiple
times
at
us
at
docker
is
working.
This
is
the
helloworld
tutorial
for
dhoka
dhoka,
one
hello
world,
and
not
only
do
you
get
a
hello
world
message,
but
you
also
get
a
nice
bit
of
output
about
what
actually
happened
to
make
all
of
that
magic
work.
A
There's
one
particularly
number
two
is
what's
very
interesting
here,
so
the
docker
daemon
pulled
the
hello
world
image
from
the
docker
hub.
That
happens
to
be
a
haiku
and
it's
been
verified
by
our
own
haiku
block
called
mbaku.
You
can
ask
me
about
that
later,
but
what's
interesting
about
this
is
that
none
of
that
sentence
is
actually
true.
Pretty
much.
Every
part
of
it
can
be
picked
apart
and
shown
to
be
something
who's.
Gonna
set
people
up
to
hurt
themselves
later
on.
So
the
first
part
is
that
the
purple
writing
should
be
the
doctor.
A
Daemon
checked:
if
the
image
hello
world
with
the
tag
latest
was
already
downloaded
and
available
locally.
If
it
already
existed,
it
did
nothing.
The
next
part
is
that
it
doesn't
actually
check
if
it's
the
right,
hello
world
image
from
the
docker
hub.
If
you
happen
to
have
another
image
called
hello
world
with
the
tag
latest,
it
would
also
use
that
as
well.
A
A
A
Someone
came
to
us
and
said
you
haven't
patched
a
production
in
around
six
months,
which
happened
to
be
the
amount
of
time
we've
been
running
application
for
so
pretty
much
translated
to
you
have
an
ever
pouch
reduction.
So
when
they
looked
into,
we
found
out
exactly
what
I
was
explaining
before
it's
only
going
to
connect
and
pull
in
the
latest
tag.
If
that
tag
doesn't
exist
locally,
so
the
build
server
already
had
the
six
point,
six
tag,
so
it
never
will
ever
it
will
never
ever
update
it
again.
So
we
look
this
up.
A
That
was
an
open
issue
and
docker
note.
It's
now
called
mobi,
of
course,
and
they
said
you
need
to
pull
the
image
first.
So,
like
that's
a
pretty
easy
fix,
we
updated
our
deployment,
build
scripts
to
include
docker
pools,
sent
to
our
six
point
six
first
and
then
we
could
go
to
our
application
and
that
should
work
just
fine
right,
nope,
not
yeah.
So
the
next
problem
that
we
ran
into
is
that
a
version
tag
is
completely
up
to
the
owner
of
the
image
of
how
they
want
to
implement
this.
A
So
this
is
the
current
documentation
for
the
center
Wes
image,
and
the
short
version
is
basically,
if
you
use
that
6.6
version
they're
going
to
match
that
to
the
ISO
release
all
sent
to
us.
So
back
in
the
days
when
you
used
to
download
a
CD
in
Burnet,
that's
a
unpatched
version
of
sent
to
us
which
they've
done
on
purpose
and
their
recommendation
is
to
run
run.
My
update
and
yum
clean
all
in
Davo.
So
this
was
good
news.
We
found
out
okay,
here's
the
reason
why
it's
clearly
documented.
A
We
just
need
to
add
this
in
so
we
added
with
him
and
everybody
celebrated.
We
actually
did
a
deploy
and
confirmed
yep
it's
updated
now.
This
is
perfect.
So
that's
why
we're
very
surprised
when
one
month
later
I'll
hit
myself
again.
Why
are
you
hitting
yourself
why
having
a
batch
production?
So
we
don't
learn
about
the
dockable
cache,
sorry
for
anyone
who
doesn't
know
yep
any
of
those
commands
that
you
add
inside
like
this
one,
the
way
darker
determines
whether
or
not
to
run
these
is
based
on
the
actual
command
itself.
A
So
the
first
time
you
run
this,
it
is
actually
gonna,
do
an
update
and
update
everything.
If
you
then
run
it
again
with
directly
after
it's
gonna.
Go,
though
you
didn't
change
the
update
command.
So
this
there's
no
need
for
me
to
run
that
again
right
and
to
make
it
even
worse.
We
actually
checked
it
twice,
but
on
those
two
example
times
that
hit
different
build
servers,
so
each
time
we
saw
it
updating,
so
we
really
thought
we'd
solved
it.
So,
once
again,
the
solution
here
is
to
add
yet
and
over
I
get
another
default.
A
A
So,
let's
get
on
to
kubernetes,
that's
what
everyone
came
here
for
not
to
talk
about
bash
scripting.
Is
there
a
bash
corn
that
would
be
pretty
nice?
I
can
talk
about
so
I
was
hoping
that
kubernetes
had
taken
a
big
step
and
actually
tried
to
fix
some
of
this
tagging
behavior
that
Dockers
heart,
which
is
cause
I'm,
gonna
guess
at
least
some
people
here
also
the
same
pain
that
I've
run
through
a
few
times,
but
unfortunately,
kubernetes
maintains
to
try
to
be
compatible
with
the
docker
way
of
doing
things.
A
So
the
imageable
policy
is
also
set
to
the
default
of
docker.
But
luckily
you
do
have
a
nice
option.
Immutable
policy
always
so
you
can
pull
in
the
latest
tag.
If
there
are
depending
on
your
tagging
strategy,
so
let's
now
look
at
a
hello
world
example
for
kubernetes.
This
is
a
very
basic
one
that
a
lot
of
people
should
have
done
so
just
starting
an
engine
x
container
and
then
we're
going
to
expose
the
deployment
with
a
load
balancer.
A
Well,
it's
really
nice
about
kubernetes.
Is
that
when
you
want
to
come
on
like
that,
you
can
actually
take
a
look
and
see
what
has
kubernetes
generated
for
me
and
when
you
take
a
look
at
the
emma
file,
you
look
through
this
and
go
yeah
this.
This
looks
good.
It's
it's
figured
out
rolling
updates
as
a
setting
for
replicas
that
I
can
tweak,
so
I
can
make
it
highly
available
in
Tripoli.
A
Seams
are
always
by
default,
even
though
it
says
it's
not
when
you
use
it
keep
co2
wrong,
come
on
and
resources,
no
I,
don't
have
any
resources,
so
I
guess
that's
empty.
So
that's
sort
of
the
default
experience
that
people
will
get
when
deploying
an
application.
They
might
take
this
and
tweak
it
slightly
to
their
needs,
but
that
sort
of
where
this
goes
wrong
again.
A
Here's
a
few
questions
that
kubernetes
did
not
ask
you
and
did
not
fill
out
in
the
template
and
as
I'm
saying,
let's
just
think
how
you
would
answer
these
yourself
and
don't
just
listen
to
my
answer
so
much
for
deploying
your
application.
Is
it
okay?
If
all
of
the
containers
are
running
on
the
same
Hurst,
no
you'd
rather
spread
them
out
right.
That's
why
using
kubernetes?
Is
it
okay?
If
all
the
containers
go
down
during
a
rolling
upgrade
of
conveys
itself,
no
I
I
don't
want
that
either.
I'll
turn
that
off.
A
Please
should
these
containers
always
run
or
just
sometimes
no
okay
should
I
check
if
the
containers
healthy,
hopefully
that's
an
easy
one.
You
asked
for
three
replicas.
Do
you
really
want
that
many,
or
do
you
just
like
that
number
a
lot?
How
about
two
is
two
okay?
So,
let's,
let's
get
into
those
with
a
few
practical
examples,
so,
firstly,
the
update
strategy,
like
I,
said
before
at
first
glance,
this
looks
pretty
good.
You
see
it's
a
rolling
update,
not
surge
is
one
max
unavailable
is
one,
so
you
think
to
yourself.
A
What
will
that
actually
do
when
I
do
an
update?
So
this
is
what
I
would
expect
to
happen.
I
would
expect
I
deploy
a
new
version
that
new
version
will
actually
get
deployed,
and
the
moment
the
version
is
deployed
its
then
going
to
remove
the
old
one.
That's
the
standard
way
of
doing
rolling
updates.
You
don't
want
to
take
one
down
and
have
any
downtime,
but
here's
what
actually
happens
once
again.
A
I'll
hit
myself
just
for
the
guy
up,
the
back
there
you're
welcome
and
what
actually
happens
is
you're
going
to
deploy
the
new
version
at
the
same
time
that
that
new
version
is
being
deployed,
kubernetes
is
gonna,
say
I'm
allowed
to
have
max
edge
of
one.
So
it's
gonna
have
two
containers
running
at
once
and
max
unavailable
at
one.
A
This
also
fixes
is
that,
once
you
say
max
unavailable
zero,
it's
gonna
always
keep
three
during
it
up
rolling
upgrade
with
max
unavailable
of
one
what's
gonna
happen
is
it's
gonna
start
a
new
one
and
at
the
same
time
remove
one
so
during
an
upgrade
you're
gonna
go
from
three
replicas
down
to
two,
so
you're
just
willingly
giving
up
a
third
of
your
capacity
during
an
update,
and
then
you
know,
you're
still
open
to
node
failures
and
everything
else.
So
that's
that's.
Where
that
do
you
really
want
three
or
do
you
like?
A
A
Nettie's
upgrade
so
quite
often,
this
is
not
being
done
by
the
users
of
the
application,
which
makes
it
even
more
surprising,
but
imagine
we've
got
our
nginx
deployment
running
we've
got
three
replicas,
we've
got
everything
configured
properly,
we've
tested
it
all
nicely
and
all
of
a
sudden
our
application
goes
down
and
we're
logged
into
a
kubernetes
cluster,
and
we
have
a
look
and
we
see
three
containers
not
running,
and
you
think.
Okay,
that's
that's
not
ideal
and
the
defaults.
If
you
remember
question
before,
is
it
okay?
If
all
the
containers
start
working?
A
That's
exactly
what
happens
out
of
the
box.
If
you
do
a
well
and
restart
of
each
node
and
the
container
fails
to
stop
again
somewhere
else,
it's
actually
gonna
just
keep
going,
but
luckily
there
is
a
solution
that
you
do
need
to
configure
it.
So
that's
where
port
disruption
budgets
come
in
with
a
pod
disruption,
budget,
you're,
actually
able
to
define,
tell
kubernetes
from
a
user
point
of
view.
This
application
can
only
have
one
container
unavailable
at
any
one
time.
A
So
what's
gonna
happen
in
this
case
is
kubernetes
is
going
to
try
and
upgrade
a
node.
It's
gonna
drain:
it
move
the
container
somewhere
else.
If
the
container
fails
to
startup
kubernetes
gonna
stop
the
whole
cluster
upgrade
just
for
you,
because
you're
special
and
it's
gonna
annoy
everyone
else
until
you
fix
it,
but
you
know
you've
at
least
prevented
some
outages
for
something
that
you
may
not
control
next
example.
A
If
you're
running
a
three
node
kubernetes
cluster,
it's
very
possible
that
all
of
those
containers
will
end
up
on
a
single
kubernetes
node,
a
single
virtual
machine,
and
if
that
machine
does
actually
crash
and
is
unable
to
connect
to
kubernetes
the
default
waiting
time
is
5
minutes
after
5
minutes,
kubernetes
will
go,
I
think
the
nodes
not
coming
back
now.
Let's
start
those
containers
somewhere
else.
So
5
minutes
of
downtime,
just
because
you
haven't
set
up
your
pod
ante
affinity.
What
are
these
are
very
hard
to
say
so?
A
The
way
pod
anteye
a
finiti
works
is
that
this
is
a
way
to
define
where
your
containers
can
and
can't
run,
there's
also
affinity
and
anti
affinity,
and
in
this
example
here
the
the
only
really
important
line
is
this
one
here,
the
topology
key
and
with
this
you're
able
to
say
for
this
nginx
deployment
I
want
to
make
sure
that
all
of
them
are
running
on
a
different
host.
So
this
is
going
to
guarantee
that
they
don't
end
up
on
the
same
node.
A
It's
then
also
possible
to
divide
this
up
by
zone
or
on
any
other
labels
that
you
might
have
on
your
nodes,
but
just
by
adding
this
in,
you
can
simply
now
say,
make
sure
the
containers
separated
and
running
on
different
hosts,
so
that
I
can
actually
survive.
A
failure
feels
like
something
that
should
be
a
default,
but
it's
just
not
so
at
this
stage
things
are
pretty
battle-hardened
and
you
think
there
is
no
way.
A
Something
could
take
this
down
that
we
could
fix
by
adding
in
some
configuration
and
you'd,
be
pretty
much
right,
we're
nearly
there
and
because
you're
doing
so
well,
some
other
people
in
your
company
say
hey
that
kubernetes
thing
actually
seems
to
work
and
you
guys
are
now
getting
work
done.
Can
we
have
some
of
that
kubernetes
and,
of
course
you
want
them
to
use
it
as
well.
So
you
help
them
out.
A
You
show
them
how
it
works,
but
you
know
they've
come
into
it
later
than
you
and
they've
seen
updated
documentation
and
they've
also
seen
this
resources
thing.
So
I
thought,
let's,
let's
configure
resources,
because
someone
actually
read
the
docs
and
understands
what
it
does.
So
at
this
stage,
they've
deployed
a
few
applications.
A
They've
asked
for
some
CPU
and
memory
requests,
and
now
your
application
has
gone
down
and
you
can't
figure
out
why,
until
you
actually
have
a
look
at
what
the
resources
do
so
out
of
the
box
in
kubernetes,
if
you
have,
if
you
use
no
resource
definitions,
it's
considered
to
be
best
effort.
So
in
a
world
where
all
resources
and
kubernetes
are
best
effort,
it
actually
turns
out
pretty
ok,
because
if
you
have
two
containers
running
on
a
host
and
they
both
want
to
use
all
the
CPU
they'll
get
half
each.
A
If
you
add
a
third
one
in
they're
now
going
to
get
a
third
each
and
this
works
great,
but
then
what
happens?
Is
someone
comes
along
and
does
things
properly
and
actually
says,
hey
I
want
some
CPU
kubernetes
is
gonna?
Go,
oh,
that's!
Ok!
No
one
else
wanted
any
I'm.
Just
gonna
give
all
of
that
to
you
instead
and
it's
gonna
starve
the
containers
of
any
resources,
and
it
was
actually
just
completely
delete
them
because
their
best
effort,
you
didn't
tell
me
they
were
important.
A
You
said
you
wanted
me
to
run
them,
but
you
didn't
really
say
they
all
that
important.
So
this
is
what
the
resources
stuff
looks
like.
Hopefully
everyone
seen
it
before
the
the
quick
explanation
of
how
it
works,
because
a
lot
of
people
get
confused
between
limits
and
CPUs
is
limits
and
requests.
Sorry,
so
limits
is
what
it
sounds
like
that's
a
hard
limit.
So
if
you
set
the
CPU
limit
to
1,000
million
CPU,
it's
gonna
just
limit
your
process
to
that
speed.
A
The
memory
limit
at
500
megabytes
when
you
hit
that
it's
gonna
do
an
out
of
memory,
kill
a
bit
like
Linux,
does
and
actually
restart
the
container.
The
request
is
actually
going
to
reserve
these
amounts.
So
if
you
don't
add,
any
requests
in
you're
gonna
run
into
the
same
problem
before
it's
kind
of
like
best
effort
with
a
limit,
it's
almost
worse
and
not
having
anything
at
all.
A
But
a
request
is
gonna
guarantee
that
at
any
time
your
container
is
going
to
be
on
a
node
that
has
at
least
one
CPU
and
and
500
megabytes
of
memory
for
you
and
then,
of
course,
you're
gonna
want
to
set
these
as
well.
So
if
everyone
starts
setting
how
much
memory
they
want,
everyone's
just
gonna
say:
I
want
20,
CPUs
and
all
the
memory.
So
what
you
can
also
set
up
is
you've
got
resource
quotas,
we're
on
a
namespace
level.
A
You
can
say
this
team,
or
this
namespace
has
access
to
50
gigabytes
of
memory
and
20
CPUs
and
then
you're
gonna
find
out.
It's
very
painful
to
keep
telling
every
single
developer
every
single
pull
request,
don't
forget
to
add
in
resources,
so
you
also
have
default
resource
limits
where
you
can
define
a
default
setting
for
anyone
who
doesn't
set
it.
A
So
if
you
deploy
an
application,
don't
set
any
resource
limits,
you're
gonna
get
a
certain
amount
by
default,
which
is
probably
not
going
to
be
enough,
but
enough
for
them
to
realize
that
they
need
to
actually
do
it.
So
it's
a
good
way
to
hit
them
back.
I
guess
is
a
better
way
of
putting
it.
So
I
think
this
is
the
final
one,
and
all
of
these
are
based
on
a
true
story
of
constantly
trying
to
get
an
application
working
running
into
problem
after
problem.
A
So
in
this
case
your
website
is
functioning,
but
someone
has
a
look
at
the
logs
and
you're
losing
about
one
in
ten
requests
and
you
have
no
idea
which
one
it
is
of.
What's
going
on
and
after
taking
a
good
look,
you
do
find
out
that
one
container
has
got
itself
into
a
bad
state
and
probably
shouldn't
be
receiving
any
traffic.
So
the
nice
answer
to
this
is
you
have
readiness,
verbs
and
liveness
verbs
both
of
these
support.
A
Doing
HTTP
calls,
but
also
just
running
a
shell
command
and
once
again
it's
another
thing
that
people
get
very
confused.
I
would
urge
everyone
to
at
least
go
for
a
readiness
probe.
Readiness
probe
is
used
for
a
whole
bunch
of
different
use
cases
in
kubernetes.
So
when
you're
doing
an
upgrade,
it
waits
for
the
readiness
probe
to
be
going
when
you
have
a
service
with
a
load
balancer.
A
The
readiness
probe
defines
whether
or
not
it's
actually
inside
of
the
service
when
you're
doing
kubernetes
upgrade
the
port
disruption
budget
uses
this
ready
in
this
probe
to
decide.
Yes,
this
container
is
actually
healthy.
Again,
the
liveness
probe
functions
the
same
way,
but
this
one
is
only
used
to
restart
the
container
if
it's
failing.
A
So
this
should
be
something
normally
a
bit
simpler
where,
if
you
know
the
HTTP
server
is
not
responding
at
all,
let's
try
restarting
it
and
see
if
that
helps
so
here's
the
thing
that
you
guys
should
take
a
photo
of
to
summarize
all
this.
If
you
don't
want
to
remember
it,
but
this
is
just
all
of
the
things
that
I've
gone
through,
that,
if
you're
not
setting
these
for
production
already,
you
really
should
be
so
just
once
again.
Resource
requests.
A
As
we
saw
before
with
the
email
example.
Cron
jobs
is
way
more
important
than
actually
having
the
monitoring
in
the
first
place,
actually
being
aware,
being
able
to
trace
back,
see
what
happened
and
finally,
once
again,
another
true
story:
a
way
to
restore
your
applications.
If
someone
accidentally
deletes
the
kubernetes
cluster.
Luckily
we
keep
all
of
our
configuration
under
version
control
and
our
secrets
have
persisted
outside
of
the
cluster
in
hushka
vault,
so
it
is
possible,
but
it's
something
you
need
to
think
about
in
case
you
ever
have
to
do
it.
A
It
can
be
quite
difficult
when
you
have
services
dependent
on
each
other
as
well,
how
to
restore
stuff
in
the
right
order.
So
after
writing,
this
talk.
How
long
have
I
got
ok,
so
after
writing,
this
talk
I
had
to
actually
start
questioning.
Why
is
software
like
this?
Why
do
we
do
this
to
ourselves?
Why
do
we
make
us
hit
ourselves?
So
part
of
the
reason
here
is
that
it's
really
about
those
first
impressions.
A
A
None
of
these
features
existed
yet
if
you
suddenly
upgrade
it
to
a
new
kubernetes
version
and
it
started
spreading
your
pods
around
when
they
used
to
all
run
on
one
host,
that
could
be
pretty
unexpected
and
that
would
be
breaking
backwards
compatibility.
So
here's
a
few
ideas
I
have
to
try
and
prove
stuff.
One
is
actually
talking
about
it,
making
sure
that
people
aware-
and
this
doesn't
apply
to
kubernetes.
This
is
about
all
software
reading
the
documentation
actually,
testing
appears
of
how
does
failover
happen.
What
happens
if
I
do
something
silly
like
breaking
the
configuration?
A
How
does
it
respond?
Maybe
you're
so
creating
some
documentation
for
these?
Here's
an
example
of
a
battle-tested
application.
Using
all
of
these
features,
maybe
it's
even
possible
to
start
thinking
about
changing
some
of
the
defaults,
and
maybe
somebody
should
become
default
settings
a
bit
like
the
rolling
upgrade
strategy.
Another
potential
idea
is
to
go
even
further
and
have
a
code
level
implementation
of
this
a
way
to
actually
say
this
namespace
this
cluster
is
in
production
mode
and
it's
not
possible
to
deploy
it
without
these
things
being
configured.
A
And
finally,
it
could
also
be
something
potentially
added
to
a
home
chart
template.
So
another
quick
final
thing
for
me,
deploying
to
production
is
easy.
Running
tests
locally
is
hard.
This
is
a
little
plug
for
a
tool
that
I've
been
working
on
lately,
which
focuses
on
the
opposite
end
of
this
problem.
A
So
right
now,
I
say
thank
you
that
it's
actually
too
easy
to
deploy
something
into
production,
currently
I'm
working
on
home
charts
for
elasticsearch
and
the
biggest
concern
that
we
have
is
that
people
can
now
deploy
quite
a
complex
application
with
a
single
command
and
that's
gonna
lead
to
a
lot
of
people,
making
mistakes
who
don't
know
elasticsearch,
who
don't
know
kubernetes,
who
don't
know
how
to
run
a
cluster
and
it's
sort
of
setting
them
up
for
failure.
A
bit
where
you're
worried
about
this.
A
So
this
tool
is
about
approaching
things
from
the
opposite
angle
that,
while
I
find
it
so
easy
to
install
a
persistent,
highly
available
cluster
on
kubernetes
and
not
understand
it.
I
find
that
the
opposite
end
of
the
spectrum
actually
running
tests
in
a
development
environment
is
way
too
hard
and
difficult.
A
lot
of
my
effort
and
time.
My
job
goes
into
actually
making
sure
that
something
that
I
write
will
work
on
Windows
Mac
Linux
in
the
CI
environment,
on
Jenkins,
on
Travis,
CI
and
everywhere
else.
A
A
So
it's
also
going
to
forward
your
SSH
agent
in
the
into
the
container,
even
for
Windows
and
Mac
OS,
so
that
you
can
run
ansible
from
Windows,
Mac
or
Linux
or
in
your
CIO
environment,
using
the
exact
same
versions,
exact
same
system,
libraries,
it's
also
going
to
forward
environment
variables,
attach
the
dock
at
Saco.
It
also
supports
proc,
seeing
any
commands
host
for
any
specific
tooling,
like
key
base,
where
we
have
like
a
use
case
where
we
need
to
decrypt
something
from
key
base
to
unseal
vault.
So
it's
got
a.
A
A
B
Have
a
couple
questions:
first,
one
the
first
of
all
a
ratio
request
and
result
limits.
So
what
is
your
recommendation
on
this
point,
so
we
always
need
to
settle
soap
request
the
same
as
resource
limits.
Oh,
we
should
set
it
just
messy
for
possible
and,
as
second
questions
is
the
last
point,
a
way
to
restore
your
application.
If
kübra
they
get
a
little
version
control
a
problem.
B
A
So
the
first
question
was
with
resource
requests
and
limits
what
you
recommend
for
doing
that
for
most
of
our
applications,
we
tend
to
set
the
request
and
the
limit
to
the
same
amount,
because
there's
nothing
more
frustrating
than
finding
an
application,
that's
performing
worst
and
that's
because
it's
no
longer
able
to
burst
anymore.
It's
a
cool
feature
to
have,
and
it
sounds
great,
but
once
you
get
used
to
it
and
have
a
quiet
cluster,
you
don't
want
someone
else,
deploying
a
new
application
to
take
away
performance
from
the
first
one.
A
So
I
would
really
recommend,
depending
on
your
use
case
like
if
it's
something
that's
user
facing
that
needs
to
serve
a
request
in
real
time
to
really
consider
setting
those
the
same.
If
you're
running
like
batch
processing,
where
you
really
do
want
to
use
every
last
CPU
and
you're
not
worried
about
being
slower
and
faster
during
periods,
then
yeah
not
setting
limits
could
make
sense,
and
the
second
question
was
how
we
do
actually
managing
our
configuration
at
elastic.
A
So
right
now
we
use
helm
charts
and
we
then
use
the
terraform
helm
provider
to
manage
the
helm
charts
and
the
main
reason
for
that
is
that
we
can
guarantee
the
state
of
that.
So
you
don't
need
to
make
sure
that
everyone
is
running
home
upgrade
and
that
they
have
the
latest
version
deployed,
there's
nothing
worse
than
going
to
deploy
something
and
then
finding
out.
The
current
version
is
a
couple
of
versions
older
than
what's
currently
and
get,
and
the
other
advantage
that
you
get
using
terraform,
for
that
is
that
you
can
also
define
dependencies.
A
So,
like
I
said
before,
when
we
had
to
restore
our
cluster
from
nothing,
we
weren't
using
terraform
at
the
time.
So
we
had
to
figure
out
what
was
the
correct
order
to
install
stuff
so
like
okay
elasticsearch
needs
to
be
installed
because
metric
beat
talks
to
elasticsearch
and
then
bolt
needs
to
be
installed.
A
C
A
question
that,
and
so
how
can
we
make
the
consistency
for
the
test,
environment
and
the
production
environment
so
because
I
think
it's
a
big
headache
for
forcings
online,
because
always
when
we
have
a
production
environment
and
the
staging
environment,
and
something
of
the
staging
environment
is
also
not
exactly
the
same
as
a
production
environment.
How
we
build
the
consistency
between
these
two
environment,
yeah.
A
I,
don't
work
for
Hachiko,
but
I've
been
plugging
their
products,
but
we
also
use
terraform
for
this
and
the
reason
of
having
a
staging
kubernetes
cluster
and
a
production.
One
allows
us
to
test
kubernetes
changes
on
the
staging
environment
and,
if
you're
not
familiar
with
terraform,
you
define
the
state
that
you
want.
So
you
have
much
better
guarantees
that
is
actually
the
same
environment
as
you
have
in
production,
but
that's
always
a
difficult
problem,
but
at
least
with
kubernetes.
It's
made
it
a
lot
easier
to
get
something.
That's
very,
very
close.