►
From YouTube: 2017-02-09 Kubernetes SIG Scaling - Weekly Meeting
Description
2017-02-09 Kubernetes SIG Scaling - Weekly Meeting
B
A
B
Found
a
super
nasty
bug
and
the
reason
it's
super
nasty
is
it
escaped
all
radars
until
we
deployed
a
cluster,
so
it's
an
actual
cluster
which
had
separate
monitoring
right,
so
we
found
it
indirectly
by
zabbix,
even
though
our
profiles
showed
that
we
weren't
really
paying
attention
to
it.
So
here's
the
the
the
sort
of
summary
of
the
issue
that
we
found
is
that
we
have
with
14
and
15.
We
have
a
crazy
high
number
of
I
ops
going
through
SCD.
B
If
you
have
multi
master
at
cds,
this
will
cause
leader
election
failures
on
large
clusters
and
bring
down
your
whole
cluster.
So
it
failed
on
two
fronts
for
us
and
we
found
the
root
cause
of
it.
The
root
cause
is
basically
in
14.
We
have
a
bunch
of
corn.
Read
additions
that
occurred
now,
Korn
read
will
force
them
right.
B
So
if
you
have
a
highly
available
cluster,
this
basically
increases
the
rights
that
you
had
and
so
right
now
we're
trying
there's
a
ticket
but
I'll
link
to
in
the
docs
or
an
issue
and
we're
going
to
have
to
sort
of
triage
it
and
go
through
the
different
locations
that
are
doing
corn
reads
and
try
to
remove
them
more
possible,
and
some
things
may
need
to
be
rejected.
Because
of
this,
you.
A
A
B
Reason
it
did
show
up
before
is
we
don't
have
any
upstream
I
up
tracking
and
that
needs
to
be
done.
America
and
I
were
talking
about
it
like
we
need
to
add
at
a
gate,
because
right
now
and
a
lot
of
performance
tests,
we
monitor
CPU
and
memory
profiles,
but
we
need
to
add
a
gate
for
eye
appt
profiles
to
on
its
ed.
That's
that's
one
thing
that
we
can
do
proactively.
B
The
second
thing
is
that
we
weren't
paying
as
much
attention
on
our
side,
even
though
we
actually
had
the
data,
so
it
escaped
it
escaped
us,
because
we
were
in
many
cases.
We
actually
had
machines
and
environments
that
were
fast
enough
right,
but
then,
all
of
a
sudden
would
somebody
tried
to
deploy
on
Amazon
with
a
limited
pipe
mount.
Then
all
of
a
sudden
up
right
away
right.
D
Read
so
I
can
talk
a
little
bit
to
the
chain
of
events
was
cords
were
off
at
the
storage
level
and
there's
a
bunch
of
discussions
from
various
people
who
are
starting
to
run
at
TD
in
this
mode,
and
they
were
like
we're
seeing
slightly
weird
things
happen
at
big
scale.
We're
ordering
was
it
quite
what
we
expect
when
one
of
the
members
starts
to
lag
Yuri
sinks
that
got
something
kick
it
out
and
then
a
couple
of
seconds
later
put
it
back
in,
and
so
the
change
was
made
in
court.
D
Cubed
turn
core
reads
on
for
everything:
by
default
in
open
shift,
we
had
never
had
queries
on
from
the
beginning,
really
great
AJ,
and
there
are
flakes.
There
are
like
small
flakes
that
occur
when
one
of
the
members
lives,
but
it
was
just
never
liked
they.
Almost
all
the
things
like
it
would
cause
like
a
flutter
on
something
like
a
service,
maybe
briefly,
which
I
think
would
impact.
D
You
know
people
when
they're
running
into
extremes,
but
just
not
in
the
practical
and
in
open
gently
you're,
only
using
quorum
reads
and
not
even
really
core
memories
doesn't
be
even
more
sophisticated
for
authorization.
When
we
push
down
an
authorization
token
to
add
CD,
we
would
wait
until
all
the
members
confirm
they
had
it
to
prevent
the
you
know
the
classic
grant
access
and
then
someone
immediately
uses
that
token
and
gets
denied.
So
it
was
just
not
a
services
in
extreme
cases,
not
the
normal
cases.
D
A
D
D
I
worry
about
too,
because
I
think
we
can
relax
it
in
some
cases,
but
we
don't
have
like,
like
we're,
not
instrumented
on
the
client
side,
on
the
server
side
or
in
the
NCD
side
to
do
like
causal
reads,
and
so,
like
the
vast
majority
of
our
controllers
or
single
process,
causal
read
types
of
things,
but
we've
not
done
in
eight
infrastructure
work
to
do
like
you
know
even
the
simplest.
You
know,
session
style,
vector,
clocks
or
anything
up
and
down
the
stack
and
that's
you
know
a
significant
investment
to.
D
Like
the
simplest
possible
fix
for
someone
for
the
vast
majority
of
clients
is
a
precondition
on
the
CD
version,
so
the
list
watcher
knows
what
resource
version
it's
got
for
a
particular
resource
and
it
passes
it
down
to
the
controller.
The
controller
just
automatically
carries
that
through
impasses
that
down
to
the
API
server
and
the
API
server
says
the
resource
version
that
this
request
gave
me
is
stale
I'm
just
going
to
wait
until
I
hit
the
research
again,
we've
talked
about
plumbing
it,
but
it's
a
pretty
big
chunk
of
work.
D
D
That's
one
option,
but
it's
not
going
to
stop
if
the
Masters
for
talking
to
a
lagging
API
server
and
that's
the
same
thing
that
stuff
so
really
does
have
to
go
all
the
way
up
to
the
third
keep
going
and
back
down
and
threaded
like
we're
not
using
threaded,
or
there
is
no
scenario
today
under
which
someone
who's
sharing
a
client
across
multiple
number
teams
is
being
impacted
by
like
there's
just
no
state,
that's
shared.
This
would
be
like
a
you
know:
high
watermark
and
the
clients
probably
get
a.
D
We
have
to
go
to
a
couple
rounds
of
tests
to
be
sure
that
that's
nice
too
so
I
don't
know
if
anybody
else
has
any
other
easy
fixes.
I
mean
Tim
I,
think
like
when
we
had
the
off
thing
on
the
open
shift
side
for
create
and
then
make
sure
that
at
least
the
members
are
up
to
date
is
a
fairly
easy
thing
to
go.
B
D
B
I
mean
like
on
a
h,
a
proxy.
We
specifically
specified
session
affinity
because
we
knew
that
when
you
hit
when
you
hit
the
non
session,
one
you
were
doing,
you
could
potentially
run
into
these
issues.
Right
like
that
was
a.
It
was
a
hard
requirement
and
we
Daniel
and
I
even
chatted
about
this
about.
You
know
starting
to
have
session
affinity
all
clients,
but
I,
wouldn't
that
one
well
that.
B
D
And
we're
not
doing
load
distribution
on
the
API
servers,
so
there
is
the
case
of
today.
The
Masters
are
spraying
more
or
less
I.
Think
my
understanding
right
now
is
that
the
Masters
are
spraying
to
the
underlying
STDs
fairly
evenly.
If
we
once
returned
encore
breeds,
we
were
focusing
on
one
particular
master
or
one
particular
entity
master
the
to
get
the
benefit.
We
would
also
need
to
ensure
that
we
didn't
get
hot
spot
like
I
would
be
a
little
worried
about.
You
know.
D
Turning
on
such
an
affinity
and
rigorously
enforcing
it
from
api
server
dead,
CD
and
putting
in
scenarios
and
restart
hammering
one
of
the
at
cds,
because
all
three
of
all
for
the
masters
or
halfway,
there
are
start
talking
to
that.
The
API
servers
started
talking
to
one
particular
entity.
Id
is
no
worse
than
it
is
today
we're
talking
to
the
leader.
So.
C
C
One
other
point
here
is
that
it's
worth
talking
to
the
people
folks,
my
Borg
I-
mean
Borg
essentially
is
single
master.
At
a
time
where
you
know
the
backups
are
just
there
for
backups
and
they
just
scale
the
queen
size,
the
board.
B
Net
us
commit
that
was
something
that
they
talked
about.
Yes,
I
mean
we
could
get
higher.
We
could
get
much
higher
throughput
on
a
single
API
server.
You
know,
if
you
really
wanted
to.
We
could
also
do
things
like
windowed
rights.
You
know
shorter
short
cycle
with
the
rice
2xcd,
because
right
now
it's
like
a
straight
click
right
through
right.
It's.
D
A
the
discussion
that
we
had
the
same
time
that
this
happened,
so
we
discussed
an
open
shift,
whether
we
wanted
to
patch
it
out,
and
one
of
the
challenges
would
be
we're
actually
giving
up.
Availability
like
this,
because
the
ability
to
do
stale
reads,
even
though
it
causes
some
weird
consistency.
Issues
in
some
edge
cases
means
we
can
still
lose
the
master
and
satisfy
at
least
reads
you
just
can't
sense.
D
My
rights
and
Q
does
not
do
well
when
you
can't
satisfy
reads,
but
it's
fairly
resilient
in
many
cases
to
not
being
able
to
temporarily
satisfy
a
right
and
so
like
in
the
long
run,
even
though
I
agree
that
it
is
easier
just
to
turn
on
court
reads
or
bump
up
by
ops,
I
do
think.
Having
causal
consistency
at
the
client,
as
Joe
is
saying
over
to
agree
versions
like
I,
would
prefer
to
have
causal
consistency
in
the
client
and
sticks
it
that
way.
B
I
mean
the
numbers
that
we
have
on
API
server
today,
because
we
haven't
done
in
a
while,
but
the
numbers
we
have
today
on
API
server
/
on
the
load
balance
case
is
actually
really
good.
By
comparison,
we
should
probably
do
a
single
API
server
with
backup
standbys
and
just
to
see
what
the
performance
characteristics
are.
A
ver,
you
know,
do
an
a/b
test
for
that
model,
because
that's
the
traditional
model.
Honestly,
it's
only
it's
only
in
coover
dennys
that
I've
ever
seen
this
model
error,
you're.
D
I
don't
know
I
mean
the
masters
has
to
proxy
exec
and
port
forward,
and
so
when
we
went
through
the
first
round
of
designs
for
this,
we
said,
rather
than
creating
two
different
scale
out
layers,
we're
just
going
to
create
one,
and
if
we
don't
have,
I
mean
like
we
can
solve
the
cause
of
consistency
thing
if
we've
already
started
to
do
this
of
it.
So
I
don't
know
I
mean
I
agree.
D
Yes,
it
is
simpler,
but
we're
going
to
be
pushing
a
large
amount
of
traffic
through
the
masters
and
I'd,
rather
have
one
scale
out
master
layer
and
have
the
controllers
be
master
elected
that
have
three
different
types:
controllers,
API,
server
and
and
a
scale-out
proxy
layer,
because
I
mean
effectively.
The
masters
are
supposed
to
be
on
proxy.
D
And
we
have
that
at
least
we're
in
a
scenario
where,
as
long
as
you're,
not
using
two
clients
which,
as
far
as
I,
know
a
single
component
in
cube
that,
like
all
the
controllers
use
a
single
clients,
cube
losing
client
q
proxies
in
the
single
client,
like
you
have
to
work
to
use
to
clients
at
the
same
time,
I
think
that's
fairly
feasible.
To
make
happen.
So
I
don't
know
my
dad's
a
little
bit
to
not
change
the
API
server
model.
A
A
I
think
I
think
the
bigger
issue
is
just
understanding.
I
think
this
is
jose
point
from
earlier
understanding
that
this
positive
problem,
so
we
supposed
to
make
sure
I
understand
the
implication
you're
running
on
amazon,
the
the
recommendation
would
be
if
you're
running,
really
big
clusters
run
a
TD
run,
you're
a
TD
instances
on
separate
instances
with
high
I
ops
ii
lb
and
put
a
monitor
on
the
ayaat
so
that,
if
you're
ever
hitting
your
I
ops
limits
your
ringing
big
alarm
bells
that
you
need
to
go.
D
Thing
this
definitely
crops
up.
So
we
started
seeing
this
in
small
clusters
on
in
one
standard
two
instances
because
they
were
running
out
of
even
like
sorry,
we
are
seeing
high
I
ops.
It
is
not
blowing
up
those
clusters,
but
some
of
those
clusters
are
very
I
up
constrained
and
to
cpu.
Instances
are
reasonable
for
small
clusters,
but
I
would
expect
bigger
clusters
to
have
much
bigger,
I
ops,
provisioning
for.
A
D
A
D
So
I
mean
I
think
we
should
bring
us
up
with
API
machinery.
Certainly
it's
been
on
Dan
and
my
radar
for
a
little
while,
but
we
just
haven't
formally
put
it
into
anything.
If
16
we
left
court
reads
on
and
bumps
I
apps
for
those
things
for
most
people,
that's
okay
for
the
specialized
gave
for
open
shift
like
I,
do
think
we
can
probably
survive
it,
but
we'll
probably
make
some
different
recommendations
and
then
I
mean
again,
like
my
preference
is
to
just
fix
the
cause
of
consistency
from
the
client.
B
Don't
I
think
that
probably
worked
well
because
I'm
still
curious
on
the
numbers
of
single
API
server
I
know
that
it's
an
exit
strategy
that
gets
you
out
of
this
pretty
fast
and
that
allows
you
to
still
fix
the
other
ends
in
the
long
haul.
Right,
yes,
I
mean
that's
where
we're
not
seeing
terrible
one
wears
I
DHI
server
like
we
used
to
right
now.
Well,.
D
B
If
you
wanted
to
you,
could
just
do
it
be
even
smarter,
where
you
make
sure
you
don't
como
your
control
or
keep
the
way
you
have
your
H
a
set
up.
You
just
don't
Colo
controller
with
a
bi
server,
and
if
you
do
that,
you're
going
to
not
hit
the
same,
you
won't
compile
the
profiles
right.
That
will
be
a
different
position.
Yeah.
D
And
the
controller's
right
now
the
controller
manager
is
fairly
easy
to
chunk
into
smaller
subgroups
that
somebody
wanted
it
like,
even
as
we
get
the
number
of
shared
caches
down
so
we're
still.
We
still
got
like
a
three
to
four
cash
duplication
layer,
most
of
Andy's
work
for
that
I'll
go
in
1617.
We
should
be
down
to
single.
You
know
one
or
two
replication
factors
for
the
various
and
formers
and
list
Watchers,
which
one
that's
in
place
like
splitting
out.
The
controllers
causes
more
traffic
to
the
API
server,
but
less
traffic.
D
D
It
feels
like
right
now,
like
I,
mean
wojtek
I,
don't
know
what
your
opinion
is
on
this
with
the
update
status
changes
but
say
we
say
we
get
this
up.
This
performance
improvement
in
update
on
node
condition,
status
tracking
is
what's
the
next
update.
Is
pod
status,
update
lurking
behind
no
status
update
as
a
significant
cause
of
cluster
load
in
Qbert.
E
So
it
depends
like
on
the
load
we
have
reasonably
or
maybe
unreasonably
low
free
booze
in
our
test.
It's
like
I
think
we
are
generating
roughly
ten
to
twenty,
like
both
deletions
and
or
or
creations
or
changes.
So
like
changes
from
node
status
are
all
the
strangest
coming
from.
Node
objects
are
like
order
of
magnitude.
More
of
them.
Oh
I,
I'll.
D
I
feel
like
like
I
guess
at
my
my
gut
feeling
is
that
I
still
would
like
to
preserve
API
server
scale
out,
just
because
it's
the
easy
hatch
for
if
node
updates
are
crushing
the
cluster
or
pods
Bettis
updates,
and
we
want
to
go
from
a
500
node
cluster
to
a
thousand
no
cluster.
If
we
want
to
go
to
a
500
node
cluster
with
you
know,
10
pod
density
to
25
density
in
both
of
those
cases,
number
of
updates
from
the
nose
is
going
to
double,
roughly
and
being
able
to
throw
more
API
servers
at.
D
D
D
C
I'm
I
was
playing
around
with
an
grok.
Have
you
have
you
looked
at
in
brush?
It's
essentially
like
a
forwarding
thing
that,
like
you,
know,
skype
hooks
off
of
a
cloud
server
for
getting
access
to
stuff,
and
so
something
like
that
is
just
want
to
mentally,
distribute
it
right
where
you're
just
running
another
job
that
exposes
this
in
a
different
way
through
a
different
path.
Well,.
D
You
get
root
access
to
all
those
processes
everywhere
on
the
cluster
to
I'm,
confused
well,
I
mean
like
we
don't
so
in
like
this
is
like
a
pretty
common
thing
we
have,
but
we
don't
run
anything
that
has
access
to
a
can
car
than
like
no
level
privilege
on
anything
except
a
very
restricted
set
of
nodes,
just
because
any
any
kind
of
proxy
running
on
the
cluster.
D
C
Guess
what
I'm,
trying
to
what
I'm
trying
to
say
is
that
is
that
you
don't
have
to
run
these
things
as
a
shared
service.
It
can
actually
be
/,
namespace,
/
job
like
when
you
want
to
portforward
something
launched
a
specific
job
for
port
forwarding,
math
thing
and
then
have
it
used
some
other
resources
that
just
are
scalable
network.
You
know
you
know
Bastion
pipe
stuff
right,
so
it's
a
different
architecture.
It's
a
red
herring
right
now,
let's
not
worry
about
it.
Yeah.
C
D
Was
kind
of
the
original
design
decisions
I
think
we
actually
talked
about
bringing
up
business
before
node
towards
even
existed
and
before
we
have
the
changes
in
CRI
to
offload
exact
and
port
forward,
to
something
that's
even
resource-constrained
versus
just
a
dr.
Damon.
When
we
went
through
that
kind
of
stuff,
it
was
how
many
different
complex
pieces
do
we
want
to
run
as
the
edge
layer
and
trying
to
get
that
number
down
to
like
two
or
three
things
was
kind
of
the
overriding
concern
to
keep
the
complexity
of
topology
low
and
so
I
agree
like.
C
I
think
when
you
say
that
the
API
server
is
really
a
bastion.
The
thing
that
worries
me
about
that
is
that
you
generally
don't
give
your
bastion
permissions
in,
like
Ed
AWS
case,
we're
giving
it
like
essentially
rude
on
AWS
to
the
AI
em
roll.
So
they
can
do
the
cloud
provider.
So
getting
the
cloud
provider
separated
out
up.
E
D
C
D
C
D
C
D
D
C
E
D
10
the
kenwoods
to
bring
that
up
and
talk
about
yeah.
We
should
definitely
tell
people
that
if
they
were,
if
they
weren't
looking
at
I
ops
on
their
clusters,
they
should
do
that
calculation.
Hey!
Are
you
okay,
with
tripling
or
quadrupling
the
amount
of
I
ops
that
you
push
through
your
entity?
Instances,
yes
or
no
I!
Think.
B
Yes,
but
it's
more
than
that
tube
is
like
you.
Can,
if
you're,
not
careful
you
can.
You
can
totally
hose
your
cluster
so
on
a
side
point
I
do
think
that
we
should
have
gating
on
upstream.
We
need
to
start
adding
more
tests
so
that
we
can
monitor
this
because
it
gets
blindsided,
us
and
I
think
if
we
have
some
more
more
tests
that
validate
what
we're
actually
any
drift
and
performance
outside
of
you
know
just
the
standard
CPU
memory,
so
that
way
we
can
track
it
over
time.