►
From YouTube: 2016-10-13 Kubernetes SIG Scaling - Weekly Meeting
Description
2016-10-13 Kubernetes SIG Scaling - Weekly Meeting
A
B
Yes,
I'ma
start
but
see
ya,
I'm
start.
So
this
reminder
I've
been
looking
through
what
this
group
is
doing
since
august.
I
have
introduced
myself
someone
in
all
those
six
scale
meetings
and
recently
we
I
mean
we're
on
to
scalar,
and
the
team
started
some
basic
researchers
around
communities
and
openstack
on
top
of
Copernicus,
and
we
have
published
some
results
will
have
it
done
so.
Basically,
what
we
did
I
just
posted
links
to
the
chat
is
running.
B
Et
loaded
go
tests
that
I
used
as
far
as
I
know
by
you
guys
as
well
on
top
of
bare
metal
installed.
Copernicus
so
will
be
that
use
the
Google
cloud
engine
or
whatever
our
other
provider.
We
install.
The
Burnett
is
on
bare
metal
using
a
cargo
deployment
to
it's
basically
set
of
ansible
scripts
so
to
install
the
cupboard
edges
and
they
used
calico
for
the
Overland
networking.
B
Simply
because
we
had
such
requests
to
check
how
kalica
will
behave
as
a
ultra
networking
tool
and
we
simply
run
the
load
that
go
tests
from
ET
test,
sealed
against
up
to
three
hundred
fifty
five
nodes,
which
was
maximum.
We
had
at
the
lab
on
the
moment
and
just
shared
the
results.
1050
m
/
155
notes,
so
I
just
posting
to
the
chair.
So
most
probably
you
can
take
a
look
on
it.
If
it,
you
find
it
interesting,
that's
cool
and
one
more
test
that
we
have
done
sort
of
the
test.
Just
an
experiment.
B
We
created
a
small
tests,
you
that
is
checking
poster
competency,
meaning
real
container
workability
when
containers
are
really
become
available
and
a
report
about
they
will
appeal
to
some
common
master
so
and
in
check.
What
what's
the
capture
will
be
if
we'll
just
a
push
up
to
100
posts
per
node?
Sadly,
we
forgot
to
configure
could
learn.
B
It
is
that
time
to
allow
more
pods
to
be
pushed
because
as
far
as
I
know,
there's
some
limitation
about
100
can
both
score
node
I,
if
it's
all
configured
in
other
design,
but
we'll
run
it
with
the
high
density.
I
would
like
to
see
how
to
behave.
We
will
have
high
density
and
basically,
we
had
about
50
milliseconds
per
container
start
up
with
the
100
pots
per
node
density,
so
reaching
good
results.
I
believe
so
and
I
c-cup
commented
calico
would
greatly
affect
the
results.
B
So
I
believe
yes,
the
dude,
but
do
have
some
being
so
probably
I.
Don't
know
some
information
about
how
this
will
influence,
but
when
you're.
C
B
B
C
B
B
B
For
instance,
we
frequently
are
observing
issues
with
fries
and
docker
under
the
load
or
during
the
OpenStack
installation,
where
we
are
starting
lots
of
containers
at
once
like
when
we
are
deploying
OpenStack
simply
on
top
of
Copernicus.
So
dr.
is
freezing
somewhere
on
the
syscalls,
without
even
which
will
edit
debug
and
for
to
its
logs.
So
it's
kinda
a
bit
distribution
which
currently
trying
to
debug
what's
going
on.
So
this
happened
with
various
docker
versions.
We
tried
several
them
so
without
work.
B
So
here
in
these
tests
will
be
using
okay,
one
point
11.2,
but
will
also
run
one
point,
12
point
1
and
right
now
we're
checking
one
point
12.2,
so
yeah,
and
this
is
kind.
This
is
not
happening
with
the
equal
frequency
and
sometimes
it's
happening
more
frequent,
imes
less
so
I'm,
not
sure.
In
fact.
What
is
the
influencing
pattern
here?
Because,
basically
we
did
this
thing:
work,
clothes
and
october
notice.
Every
time
we're
observing
this
issue's,
but
sometimes
we
have
kind
of
ten
percent
of
nodes
freezing,
sometimes
one
percent
of
nodes
freezing.
B
Scheme
we
used
that
is
basically
installed
by
cargo,
so
this
is
DNS
attack,
I,
just
posted
the
link
to
the
diagram
that
is
used
for
this
installation.
Sometimes
DNS
stopped
working
as
it
is
supposed
to
so
I
have
observed,
like
couldn't
resolve
names
of
OpenStack
services
and
the
huge
load,
but
most
public.
This
is
it
as
expected
and
we're
trying
to
find
the
bottleneck
right
now.
B
But
if
you
we
say
about
synthetic
zest,
slack
or
running
just
api
testing
or
against
can
burn
notice.
It
was
pretty
okay.
I
mean
very
enlightened
with
the
numbers
you
guys
in
court
already
on
virtual
environments,
so
basically
the
same
a
bit
faster,
but
it's
less
nose
and
and
bare
metal.
So
it
should
be
faster.
B
E
B
Are
currently
installing
it
in
small
environment?
Yes,
because
we
have
plans
for
this
huge
Anza,
so
basically
scale
scale.
R&Amp;D
team
has
lots
of
tasks
today,
Burnett's
itself,
and
it's
not
the
only
singular.
We
need
to
go
through
right
now,
we'll
install
in
140
or
latinos,
and
we
will
continue
debug
doctor
for
is
an
issue
on
this
small
environment,
just
a
malaysian
highness
load,
but
yeah
we're
didn't
right
now,
but
not
on
that
scale.
B
I
mean
in
fact
one
point:
4.0
was
installed
or
one
150
notes,
mera
metal
by
us,
and
we
ran
poor
little
sister
top
measurements
exactly
against
140
I
need
to
put
this
information
to
the
results.
I
forgot
to
mention
that
there
was
other
version
used
to
post
our
applicants
in
measurements.
Yes,
so
we
will
try
to
get
150
notes.
B
E
G
E
Awesome
time
series
graph
of
my
single
master
minion
cluster
coming
up
over
time.
It
seems
like
it's
a
good
validation
of
like
an
end
user
workload
on
this
cluster
now
I
think
last
week,
Jeremy
I
der
aur
agar
presented
some
concepts
around
this
cluster
loader
or
work
load
generator
tool
that
I
think
other
folks
within
Red
Hat
are
working
on
and
Marek.
I'm
sure
you
probably
know
more
about
this
than
I.
Do
we're
trying
to
get
some
of
this
work
to
go
inside
of
this
new
protests.
E
B
It's
really
worth
talking
to
each
other,
because
these
this
tool
was
written
really
quickly
to
check
post
about
my
dancing
by
us.
But
I
think
we
need
to
coordinate
here
for
sure
and
I
really
eager
to
just
to
seed
the
the
tool
by
red,
hat
and
I
would
look
and
just
use
it
or
somehow
improve
it
and
use
it,
etc,
because
it's
still
that
we
used
was
really
simple
and
just
yeah.
C
C
C
Because
the
performance
at
5,000
nodes
against
your
vm
is
very
optimistic
right
and
most
of
the
tests
that
I
run
are
very
pessimistic
right,
because
the
optimistic
tests
don't
have
your
other
controllers
firing
and
a
whole
bunch
of
services
running
if
you're
measuring
against
density
itself.
It's
a
very
less
is
there's
a
lot
less
busy
things
going
on
in
the
system.
C
H
E
C
F
C
Right
now,
I
mean
like
I,
think,
maybe
for
next
week.
I
could
I
should
probably
run
through
how
we
do
our
analysis
and
I
mentioned
that
last
time
and
we're
like
I
mentioned
before
we're
pretty
pessimistic,
because
we
want
to
make
sure
that
our
customers,
even
if
it's
on
the
low
side
of
things,
are
well
within
the
bounds
of
a
fully
saturated
type
of
environment.
C
Because
what
often
has
a
tendency
to
happen
is
we
find
the
large-scale,
abusive
customers
who
then
take
your
numbers,
ignore
them
and
go
beyond
them
and
in
those
environments,
then,
as
long
as
we
give
up
pessimistic
numbers,
were
we're
kind
of
fudging
it
a
little
bit,
but
it
it
works.
It's
working
better
I
should
say,
but
I
would
be
remiss
if
I
worded,
like
give
a
number
like
5,000
and
a
customer
gets
close
to
that
with
certain
profiles
that
we
have
and
the
cluster
explodes.
That's
that's
when
I'm
literally
being
shipped
off
somewhere
yeah.
If.
E
You
reminded
them
it's
the
phrase
at
the
most
5,000,
not
least
5,000
yeah,
but
my
other
question
I
guess
was
I
think
we
were
talking
at
one
point
in
time.
Maybe
America's
helping
lead
this
discussion
around,
like
sfo's
for
individual
controllers,
for
saying
that
controller
sort
of
thing
that
actually
implement
and
user
facing
functionality
and
the
cases
that
I've
heard
you
bring
up
most
frequently
Tim
are
hey,
guess
what
happens
when
you're
actually
running
a
bunch
of
controllers
trying
to
handle
real
user
workloads.
E
F
I
can
also
need
to
figure
out
like
weak
matrix.
We
actually
care
about
like
of
qualified.
For
example,
network
propagation,
execute
proxy
and
propagation
is
important
one,
but
like,
for
example
like
do
we
really
care
how
quickly
employment
can
updates
replica
sets
like
as
long
as
it's
reasonable,
like
I,
really
want
to
push
the
throughput
higher,
and
so
we
actually
need
to
create
a
list
of
things
that
we
think
are
important
right,
because
not
everything
is
but
some
things
other
than
that
we
are
trying
to
measuring
our,
but
like.
H
Basically
throughput
is
an
important
thing
because,
obviously,
at
any
scale
like
we,
there
is
some
crew
put
that
the
cluster
supports,
and
there
is
some
true,
but
from
some
boundary
it
won't
support
the
fruit.
So
this
is
like
pretty
important
to
say,
like
that.
We
support
given
fruit,
but
not
anything
higher.
E
E
Feel
like
we
chatted
about
this
little
while
ago
and
I'm
not
sure
if
it
stalled
out,
or
we
decided
that
that's
not
worth
the
effort
at
that
point
in
time
were
or
what
but
I
mean.
This
is
a
great
idea,
but
I'm
trying
to
understand
maybe
like
are
we
trying
to
learn
any
of
this
in
the
15
time
frame?
Are
we
talking
about
sort
of
loftier
goals
that
were
in
a
for
long
term,
some
well
I.
C
Think
it
was
a
priority
problem
right.
It
I
think
everybody
agrees
that
this
is
something
we
need
to
do,
but
it's
it
becomes
kind
of
difficult,
sometimes
because
we're
almost
like
an
overlay
group
right.
So
sometimes
we
have
these
conflicting
priorities
that
that
prevent
us
from
doing
that,
so
I
agree.
We
should
almost
set
aside
like
a
set
of
p0
priorities
and
P
ones,
because
if
that's
not
something,
we've
done
as
a
cig,
you
know
fundamentally,
like
other
saves
literally
have
the
devoted
resources
towards
the
p0
p1
items.
C
But
if
we
as
a
group,
decide
to
do
that,
I
think
we
would
happily,
you
know,
have
a
resource
go
along
with
defining
those
metrics
and
start
to
do
to
create
tests
to
measure
against
them,
but
I
think
we
need
to
start
to
maybe
clean
house
so
to
speak
so
that
we
have
deliverables
for
given
release
and
that
that
those
deliverables
have
fly
cover
from
p.m.
that's
another
problem
right
as
Google's
p.m.
and
RPM.
You
know
sometimes
that
they
choose
priorities.
F
F
C
G
D
Can
we
have
a
quick
discussion
about
the
proposal
for
a
configuration,
dumping
I
know,
we've
talked
about
this
in
the
past.
I
just
want
to
get
kind
of
a
ball
rolling,
basically
in
games,
encode,
none
so
level
out
how
to
take
had
a
look
at
the
proposal
in
the
proposal
was
built
based
on
the
discussion
we
had
two
weeks
ago
on
config
them,
and
his
comments
were
long
lines
that
the
repulsion
will
only
work
really
for
culet.
It
will
not
work
for
all
other
components
in
the
control
plane
he
suggested
for
that.
D
We
actually
just
get
configurations
from
kid
from
back
from
config
maps,
flags
etc
at
the
pot
level,
as
opposed
to
just
config
a
component
configs
from
the
components
which
is
kind
of
not
what
we
agreed
to
not
all,
really
what
we
discussed
so
like
some.
Some
thought
I'm,
not
some
discussion
about
what
we're
where
we
should
go,
which
work,
but
should
we
take?
What's
the
writer
country
to
take
etc,
just
to
show
kind
of
a
proof
of
concept
that
we
can
actually
get.
D
A
G
D
No
no
worries
where's,
it
doesn't
seem
like
it
shouldn't
work.
It
just
seems
like
there's
a
lot
of
work
that
hasn't
been
done
yet
to
get
the
component
config
mass
for
the
components
and
so
I
think
that's
kind
of
guiding
that
suggestion
from
my
life
I'm.
Just
speaking
from
what
I'm
going
to
know
now.
Oh.
D
G
There
is
a
you
know:
we've
been
talking
about
sort
of
making
it
be
easier
to
configure
a
cluster
as
you're
bringing
it
up
as
part
of
like
you,
bad
man
and
other
sort
of
tools
and
sort
of
we're.
Looking
at
component
configures
as
one
of
the
key
pieces,
there
are
finishing
that
all
the
boys
I'm
looking
at
the
ok
ours
and
it
just
hasn't
bubbled
to
the
top
in
terms
of
sort
of
what's
happening
in
this
course,
and
one
of.
D
The
intentions
of
my
PR
is
actually
show
like
hey
look.
This
is
really
cool.
Complaining
things
is
the
way
to
go.
Maybe
we
should
do
this,
and
do
this
not
only
know
kind
of
motivate
people
to
go
forward.
Yeah
just
finish
it
off
exactly
anyway.
So
that's
two
PRS.
Please
have
a
look
I'll
post
them
on
the
chat
here
and
let's
even
get
this
rolling.
A
E
Yeah,
I,
like
I
kind
of
want
to
see
code
actually
happen
and
see
if
you
can
move
forward
here.
My
main
question
like
as
long
as
we
don't
think
this
is
exposing
any
security
concerns.
Then
great.
Let's
move
forward,
it
is
excluding
security.
Cert
concerns,
I
think
it
was
mentioned
of
getting
sick
off
the
walls
to
see
like
what
work
had
been
done
to
get
out.
D
Dude,
if
you
mention
that
my
PR
and
I'm
using
actually
the
same
framework
that
the
slash
metrics
and
point
is
using,
so
you
would
have
the
same
security
already
in
place,
but
you
know
I
need
some
wins
to
golf
to
just
go
ahead
and
get
that's
fine.
Let's
go
yeah,
that's
we
could
really
use
some
attention.
There.
A
All
right,
I,
don't
think
know
that
we
hit
the
etsy
d3
stuff
hard
enough
here
she
is
Jen.
Did
you
want
to
tee
up
the
specific.
C
Question
or
what's
the
current
issues
you're
seeing
why
tech
you
see
in
the
500s.
H
Now,
it's
ed
free
is
significantly
better
than
at
CD.
Do.
C
H
C
H
So,
let's
see
do
not
know
there
is
like
in
flight
ER
that
will
enable
at
CD
free
/
PR
a
builder,
but
we
need
to
like.
Yes,
there
needs
to
be
done,
some
tweaking
of
like
resources
and
number
of
patrons
and
stuff
like
that.
It
hopefully
will
be
done
today
so
hopefully
like
today.
Tomorrow
we
start
running
at
cd34
x,
ET
des,
let's
see
if
retest
or
PR.
C
A
C
That's
we
wojtek
head
uncovered
the
watch
performance
issue
with
the
301
hers.
It
was
actually
310,
I
believe
30
10,
and
so
we
can't
you
know.
The
previous
versions
are
not
something
that
you'd
want
to
use
at
a
larger
scale,
because
there's
known
deficiencies
or
issues
and
the
old
attach
was
only
back
patched
on
30
12.
I
Can't
hear
me
now
butter,
okay,
so,
like
we
fix
a
bunch
of
issues
in
a
watch
and
light
some
of
the
places
and
also
like
we,
we
update
the
upstream
on
to
like
30
12
custom
client
will
use
the
field
12
as
well.
It
will
be
better
like
to
upgrade
to
fill
12.