►
From YouTube: Scalability Team Demo - 2022-12-15
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
So
the
Igor
explained
it
to
me
already,
but
I
was
quite
surprised
to
see
in
saturation
issue
that
I
I'll
dig
up
later,
so
we
often
saturate
HPA,
Max
replicas
for
sidekick,
urgent,
CPU
bound
and
then
I
went
looking
and
I
saw
that
if
we're
at
Max
replicas
this,
this
thing
would
not
fit
and
fit
on
the
Note
pool,
like
the
the
CPU
request,
was
like
set
to
600
or
800,
something
like
that,
and
we've
only
got
40
V
CPUs,
but
we
allow
a
hundred
something
Max
replica.
A
So
apparently
we're
scheduling
that
stuff
everywhere
now
like
on
the
generic
load
tools,
as
well
as
the
dedicated
notebook
which
is
I,
think,
okay,
but
now
I'm
wondering
how
do
you
decide
how
to
size
things?
Do
you
just
size
them
based
on
what
you
think
the
workload
is
going
to
look
like
and
then
see
afterwards
if
it
fits
the
note
pool
or
do
you
need
to
keep
summing
everything
or,
like
anybody,
have
any
experience
with
that.
B
I
do
whenever
I
had
a
workload
I
I
do
I
do
do
the
math
to
make
sure
this
was
when
we
had
dedicated
new
pools
for
everything,
I,
don't
I,
don't
think
everyone
does
that
I
I
think
that
that
a
lot
of
folks
are
focused
only
on
sizing.
The
the
pods
in
the
Pod
count
for
the
HPA
and
just
assume
that
the
node
pool
is
an
infinite
resource
and.
A
I
was
doing
the
thing
that
you
were
doing,
but
based
on
the
Note
pool
that
I
don't
know
the
size
of
anymore,
because
yeah
40v
CPUs
was
not
accurate
before
and
we
could
still
run
the
workload
and
then
like
it
was
never
saturated
because
that
notebook
isn't
used.
It's
just
always
scheduling
on
the
generic
nodes.
C
A
Then
the
the
goal
is
to
do
like,
like
Matt
said,
others
are
already
doing
just
get
the
sizing
right
for
the
workload
You're
Expecting
with
the
max
replicas
to
have
a
limit
and
consider
the
node
pool
infinite.
B
To
be
clear,
I
don't
think
this
is
a
good
pattern.
I
think
that
I
I
disagree
with
the
trade-offs
that
were
that
were
made
in
in
that
in
that
consolidation
into
generative,
nude,
pools.
I.
B
Think
buckeying
is
a
super
useful
practice
and
we've
given
that
up
by
moving
to
by
by
moving
to
hybrid
workloads
in
in
the
shared
node
pools,
and
it
does
make
sizing
capacity,
planning
more
difficult,
yeah
I
would
say
impractical
because
we
don't
have
a
good
view
into
all
of
the
workloads
that
are
currently
sharing
sharing
those
node
pools
and
just
as
a
closing
note,
I
I'll
comment
that
the
two
resources
that
kubernetes
uses
for
scheduling,
that
being
CPU
and
memory
are
not
even
close
to
the
only
resources
that
can
be
contended
at
at
a
at
a
machine
level,
and
this
is
one
of
several
reasons
that
I
thought
the
isolation
boundaries
of
having
dedicated
node
pools
provided
a
wonderful
installation
so
that
you
don't
have
so
that
you
don't
have
problems
like
like
what
we
observed
with
sorry
I'm,
going
on
attention
agent
having
isolation
boundaries
is
useful.
B
A
A
B
Yeah
I
believe
so
it's
we've
got
two
two
resources
that
we're
scheduling
for
CPU
and
memory,
and
so
we've
got
node
pool
for
CPU
or
for
CPU
oriented
workloads
and
a
separate
one
from
memory.
Oriented
workloads.
B
B
It
really
I
I,
think
I
think
what
it
really
comes
down
to
is
that
we've
got
kind
of
a
limited
Buffet
of
of
machine
types,
in
other
words,
VMS
and
the
ratio
of
CPU
of
gigabytes
of
memory
per
vcpu
is
different
for
the
for
the
for
the
two
types
of
VMS
that
we're
using
in
in
these
two
generic
node
pools.
B
So
it's
it's
really.
It's
really
kind
of
saying.
If,
if
we've
got
pods
that
wants
a
relatively
large
amount
of
memory
per
vcpu
of
expected
consumption,
then
we'd
prefer
to
schedule
them
on
to
the
onto
the
nodes
that
have
a
larger
ratio
of
memory
to
the
CPU.
C
This
is
a
complete
change
of
subject
and
I.
Don't
need
anyone
to
answer
it
now,
but
is
there
a
reason
that
we
don't
ever
use
custom
node
to
our
custom
machine
types
in
gcp.
B
That's
a
great
question:
I
I
think
some
of
the
machine
families
have
a
fixed
ratio
and
some
of
them
don't.
If
you're
talking
about
like
adjusting
the
the
ratio
of
memory
to
CPU.
D
Create
you
can
mix
and
match
yeah
I
can
I
can
answer
for
redis
in
redis.
We
use
C2
instance
types,
because
those
have
the
fastest
yeah
single
threaded,
CPU
performance
and
for
C2
custom
instance.
Types
are
not
supported.
Okay,
that.
D
A
A
B
Speaking
just
for
myself,
I
really
didn't
have
this
stamina
to
to
follow
it.
So
I
can't
answer
your
question.
B
I'm
I'm
sure
that
there
is
I
just
don't
know
what
it
is.
I
I've
got
some
strong
opinions
about
it
and
I
didn't
have
the
energy
to
to
push
those
opinions,
and
so
I'm
kind
of
checked
out,
apart
from
keeping
tabs
on
what
the
current
state
is.
A
B
Sorry,
yeah
I
I'm
a
little
groggy
early
yeah,
yes,
so
I'm
I'm
part
way
through
I
thought.
This
was
kind
of
kind
of
an
interesting
Show
and
Tell
and
it
and
it's
pretty
quick.
B
Here
we
go
okay,
so
so
we've
done
a
lot
of
so
we've
done
a
lot
of
work
on.
B
Also
more
specifically,
several
months
ago,
the
we,
our
our
team,
discovered
that
rdb
backups
are
are
capable
of
triggering
CPU
saturation
on
on
redis
instances
that
have
the
the
save
your
active,
enabled
or
more
more
generally
redis
instances
where
the
BG
save
command
gets
run,
which
is
what
creates
these
rdb
dump
files.
B
So
this
is
an
example
of
a
CPU
profile
captured
from
the
primary
redis,
the
primary
redis
instance
in
the
redis,
persistent
Target,
that's
red
SO3
currently-
and
this
is
from
I
think
I-
think
it's
about
a
day
or
two
ago
on
captured
captured
this
week.
Just
so,
it's
it's
very
fresh.
B
So
we're
going
to
go
through
three
levels
of
filtering
here
and
I
kind
of
wanted
to
narrate,
actually
I'll
just
I'll
just
tab
through
them.
So
you
can
kind
of
see
what
what
the
picture
looks
like.
So
this
is
so.
This
capture
is
for
just
the
redis
server
process
itself,
that
is
to
say
the
the
Reddit
server
process
that
actually
handles
traffic
and
the
child
process
that
it
Forks
when
the
BG
save
kicks
off.
B
So
there's
really
two
processes,
and
it
this
this
this
view
represents
all
of
the
threads
for
those
for
those
two
processes.
Next
tab
over
filters
to
just
the
main
thread
of
the
the
primary
redness
process,
the
the
parent
process
and
the
I
I,
don't
know:
I,
don't
know
if
you
can.
If
you
can
see
the
the
band
present
here,
the
band
is
still
present.
Is
that
visible?
Okay?
B
Some
for
some
reason
when
I
take
a
screenshot
of
this?
It's
just
completely
a
flat
red
color,
so
I
wasn't
sure
if
that
was
coming
across
on
on
Zoom
anyway.
So
this
is
the
same.
This
is
the
same
point
in
time
that
we
saw
on
on
this
representation.
It's
just
filtering
to
the
main
thread,
and
the
important
point
from
this
from
this
view
of
the
graph
is
oh
by
the
way
the
sampling
frequency
is,
is
a
about
500
samples
per
second
and
we've
got
50.
B
Each
second
is
drawn
vertically
here
in
50
buckets,
so
each
of
these
buckets
represents.
Excuse
me
so
so
I
guess.
The
the
point
here
is
that
when
we
Mouse
over
these
cells-
and
you
see
a
count
right
here-
you
can
kind
of
multiply
that
mentally
by
10
and
get
the
percentage
of
CPU
CPU
time
that
we
have
represented
there.
B
So
if
we
just
kind
of
browse
here
we're
seeing
that
we're
burning
about
70
percent
of
CPU
time
prior
to
this
burst
and
then
during
this
burst
we're
sitting
pretty
at
100
CPU.
So
this
is
saturation.
So
the
question,
then,
is:
what
in
the
world
are
we
doing
during
this
saturation?
I'm
gonna
grab
a
bit
of
it
here
and
we'll
find
out.
B
So
let
me
make
the
font
a
little
bit
bigger.
Is
this
easier
to
read
so
so,
just
at
a
glance
you
can
see
kind
of
you
know
typical.
Looking
redis
stack
traces
where
we've
got
process
commands
doing
various
commands.
We
don't
really
care
what
they
are.
What
we
care
about
is
what
are
you
spending
CPU
time
on
and
just
at
a
glance
you
could
you
all
know
this,
but
for
anyone
else
that
isn't
familiar
with
this,
the
color
scheme
here
is
this
reddish
pink
color
is
these
are
stacked.
B
B
It's
culling,
it's
event,
Loop,
which
is
AE
main,
which
calls
process
events
which
is
handling
requests
from
clients,
and
it's
making
a
system
call
and
that
system
call
is
the
read
Cisco
and
it
turns
out
that's
reading
from
a
TCP
socket,
and
we
can
infer
that
this
is
a
TCP
socket
where
it's
receiving
client
requests,
because
the
the
call
Path
was
read,
query
from
client.
So
that's
just
kind
of
a
super
quick
tour
of
what
we're
looking
at
in
General
on
this
in
in
in
general,
for
these
stack
traces.
B
But
the
piece
I
really
want
to
focus
on
here
is
when
we
do
when
we
do
page
folds,
so
a
page
fault
means
the
process
is
trying
to
access
a
page.
That's
not
ready
for
it
that
the
kernel
has
flagged,
as
this
page
is
not
ready
to
for
user
space
to
consume,
and
so
kernel
has
to
do
to
do
something
to
make
that
page
ready-
and
in
this
case
this
is.
This
is
just
one
of
many
examples,
I
kind
of
wanted
to
show.
B
So
this
is
so
page
fault
I'm,
going
to
narrate
a
few
of
these
stock
trade
stack
frames,
so
page
fault
is
kind
of
the
generic
kernel
interface
for
the
CPU,
where,
where
some,
where
the
exec,
where
the
executing
thread
tries
to
access
a
virtual
memory,
address
and
Colonel,
does
the
virtual
to
physical
memory
address
lookup
and
the
the
page
table
says
Nope,
there's
not
a
physical
page,
that's
ready
yet,
and
that
triggers
a
page
fault
event
which
which
Colonel
kernel
handles
and
that's
the
portion
of
the
stack
that
we're
looking
at
here.
B
So
in
general.
So
that's
what
a
page
vault
is
just
as
a
general
piece
of
vocabulary
term,
a
little
bit
of
background.
The
important
part
of
this
stack
Trace
is
this:
do
WP
page,
which
is
this,
is
a
distinctive
code
path
in
the
kernel
for
handling
copy
on
right,
so
copy
on
copy.
It's
called
copy
on
right.
B
When
you
see
people
say
cow,
that's
stands
for
copy
on
right,
a
better
name
for
it
to
be
copy
on
first
right
and
the
the
idea
is
when
you
Fork
a
child
process,
and
you
use
particular
flags
that
say:
preserve,
give
the
child
a
virtual
copy
of
the
parents,
memory
image
or
some
portion
of
the
parent's
memory
image.
B
The
the
all
of
the
pages
that
haven't
you
know.
Initially,
all
of
the
pages
in
The
that
are
that
are
cloned
for
the
for
the
child
are
just
referencing,
the
same
physical
page
that
the
parent
is
referencing,
but
they
need
to
represent
different.
You
know
the
the
promise
is
that
those
pages
will
remain
intact,
even
if
the
parent
later
comes
by
and
and
changes
the
contents
of
those
pages.
So
this
is
why
we
have
copy
on
right.
B
The
important
Point
here
is,
the
parent
is
the
one
that
incurs
the
burden,
because
the
parent
is
the
one
that
will
continue
modifying
these
pages.
So
so
what
I'm?
What
I'm
going
to
do
here
is
tell
flame
flame
scope
to
search
for
all
occurrences
of
gwp,
page.
B
We'll
Zoom
back
out
and
we'll
see
that
roughly
30
percent
of
the
CPU
time
that
we
saw
here
and
remember
we're
highlighting
the
point
when
redis
is
com
when
the
redis
main
thread
is
completely
CPU.
Saturated
30
of
that
CPU
time
is
spent
doing
copy
on
rights
and
there's
a
bunch
of
code
pads
that
are
doing
that,
because
this
is
basically
once
we
Fork
the
child
process.
Anything
that
modifies
a
memory
map
that
that
was
inherited
by
the
child
has
to
be
protected.
B
So
any
pages
that
are
really
volatile
are
really
likely
to
incur
this
penalty
so
and
that
that
includes
things
like
redis's,
vertices,
client,
buffers
hotkeys.
B
So
what
we're
going
to
do
next
is
this
is
the
third.
This
is
the
third
tab.
I
wanted
to
show
before.
A
You
move
on
yeah.
How
did
you
go
from
like?
How
did
you
get
here
like
now?
We
can
see
that
this
this
copy
and
write
behavior
that,
like
that,
the
the
main
thread
needs
to
to
take
care
of
yes,
30
CPU
time,
but
I,
don't
see
how
how
you
discovered
that?
How
did
you
go
from
from
this
is
a
like.
It
looks
to
me
like
a
pretty
normal
reddish
flame
graph,
with
the
process
command
and
so
on.
So
how
did
you
pick
that
one
out.
B
Right
so
I've
done
some
work
in
the
past
with
with
copyright
overhead,
so
I
recognize
this
kernel
code
path
and
the
kind
of
the
the
high
level
pattern
here
is
when
we.
So
when
we,
when
we
have
say
I'm
gonna,
make
I'm
making
these
numbers
up
say:
you've
got
a
gigabyte
of
memory.
You
know
this
isn't
about
writers.
This
is
just
about
kernel
memory
management.
So
you
got
you
got
process
process.
One,
that's
got
sorry.
Pid1
means
something.
B
Special
you've
got
process,
a
that
has
a
gigabyte
of
memory
and
and
it
it
creates
a
child
process.
I
I,
say
fork,
but
this
this
applies
equally
to
the
the
Clone
Cisco.
So
so
you
you,
so
you
have
a
child
process
and
you
you
pass
it
and
you
Fork
it
in
a
way
that
that
preserves
that
gives
the
child
access
to
that
point
in
time
of
the
memory
image
at
the
point
when
it
Forks.
B
As
long
as
neither
of
those
processes
modifies
any
of
the
pages
that
were
that
are
shared
between
them,
then
both
of
their
virtual
memory
address
maps
can
continue
pointing
to
the
same
physical
pages,
and
you
won't
get
any
of
these
page
folds,
but
the
moment
either
of
them
starts
to
modify
those
pages.
That's
the
that!
That's
the
that's
the
point
in
time
when
you
start
to
incur
the
the
cow
overhead.
B
So
in
our
case
we
know
so
switching
back
to
the
context
of
redis.
We
know
that
redis
has
a
large
memory
footprint
and
that
many
of
those
pages
contain
you
know
relatively
stable
keys,
but
a
subset
of
those
pages
are
going
to
be
used
for
really
hot
keys.
That
get
model
all
the
time,
and
some
of
those
pages
will
will
contain
client
buffers
for
for
input
and
output
for
for
clients.
B
It
doesn't
matter
what
the
page
was
used
for,
because
redis
is
using
je
Malik
for
for
its
for,
for
for
its
memory,
management
and
Je
Malik
is
perfectly
happy
to
you
know,
use
any
it's
not
gonna.
It's
not
gonna
make
an
effort
to
separate
hot
pages
from
you
know.
B
Hot
allocations
from
from
cold
allocations-
it-
that's
not
it's
it's,
that's
not
a
design
goal
for
it,
so
you're,
basically
going
to
have
hotkeys
and
client
buffers
will
be
spread
around
across
a
random
subset
of
pages,
and
so,
if
you're,
spending
I'm
making
these
numbers
up.
If
you
spend,
say
two
percent
of
your
memory
on
on
really
hot
on
on,
say,
two
percent
of
of
your
your
memories
is
sorry.
I
shouldn't
make
up
numbers.
B
Let
me
let
me
just
say
that
kind
of
kind
of
conceptually
you
could
have
a
small
percentage
of
keys
and
buffers
that
get
mutated
on
you
know
on
during
during
the
first
few
seconds
and
whatever
Pages
they
happen
to
reside
on,
are
going
to
incur
the
that
copy
on
right
overhead,
but
once
you've
kind
of
touched
and
made
a
first
touch
after
the
fork
of
of
all
of
most
of
the
hot
Pages.
B
B
This
is
pulling
out
only
the
stock
traces
that
that
made
from
the
main
thread
that
were
that
made
that
do
WP,
page
kernel
call
frame,
and
you
can
see
that
they're
extremely
front
loaded
and
that,
once
we've
dealt
with
the
hot
pages,
that
it
becomes
rarer
and
rarer
to
have
to
pay
that
penalty
and
I'm
going
to
scroll
over
to
the
left,
and
you
can
see
a
few
a
few
minutes
later
by
the
way
the
the
headers
here
are
this.
B
This
is
this
is
giving
us
a
relative
offset
since
the
first
capture
and
yeah,
so
you
can
kind
of
think
of
these
as
seconds
so
230
seconds
from
some
point
in
time
is
roughly
when,
when
we
started
to
see
this
activity-
and
it
looks
like
it-
took
us
just
a
couple
minutes-
this
is
this
is
effectively
when
the
backup
ended
and
the
child
process
exited,
and
therefore
we
don't
pay
copy
them
right
overhead
anymore,
because
there
isn't
a
process,
that's
kind
of
present
serving
references
to
those
original
pages
yeah.
B
So
that's
that's
kind
of
what
I
wanted
to
show
that
this
is
the
dense
period
that
lasts
a
few
seconds
when
we're
paying
a
really
heavy
cost
for
copy
on
right
and
and
once
once
all
of
those
highly
volatile
keys
and
buffers
have
touched,
the
pages
that
they
reside
on.
B
For
the
first
time
the
the
penalty
is
much
lighter,
but
the
fact
that
we're
con
that
we're
for
a
few
seconds
consuming
on
average
about
30
percent
of
our
CPU
time
in
a
saturated
state,
is
going
to
have
kind
of
implicitly
the
same.
The
same
effect
where
redis
can't
keep
up
with
its
incoming
request
rate
response
rate,
Falls
below
request,
arrival
rate
clients
perceive
this
as
slowness
and
because
most
of
our
clients
are
Puma.
B
I.
Think
that
that
implicitly
means
that
each
Puma
worker
process
is
going
to
kind
of
rapidly
approach
its
maximum
number
of
threads.
So
if
it
was
previously
running
two
two
Puma
threads
and
it's
Max
is
set
to
four.
It's
going
to
bump
that
up
to
four
and
that's
going
to
manifest
as
the
new
threads
needing
to
open,
meaning,
naturally
meaning
to
open
new
connections
to
redis
and-
and
we
see
that,
as
do
I
still
have
this
open.
Maybe.
B
Yes,
and
that's
going
to
result
in
new
incoming
connections
which
we
can
see
here
during
every
one
of
these
events.
So
each
one
of
these
is
an
rdb
backup
and
we
can
see
a
spike
in
the
connection
count
in
the
in
the
incoming
Connection
rates
and
the
same
spike
in
the
the
open
connection.
Count
and
I'm
mentioning
this
mainly
because
Beyond
Beyond
this
few
seconds
when
we
have
heavy
copy
on
right
overhead.
The
following
few
seconds.
B
A
little
bit
of
overlap
also
contain
a
lot
of
overhead
for
accepting
incoming
connections
from
from
clients.
So
that
kind
of
extends
the
the
CPU
saturation
by
a
couple
more
seconds
than
what
we're
seeing
in
in
this
flame
graph.
But
altogether,
that's
still
only
representing,
maybe,
like
you
know,
five
or
six
seconds
of
CPU
saturation
on
the
main
thread,
I'm
going
to
switch
back
to
the
the
actually
this.
It's
easy.
It's
easy
to
see
here
as
well,
even
though
this
isn't
just
the
main
thread
you
can
see.
B
B
What's
going
on
and
kind
of
identify,
two
components
that
are
driving
this,
this
five
or
six
second
period
of
CPU
saturation,
but
also
kind
of
highlight
that
even
after
we're
no
longer
saturated,
there
might
be
some
kind
of
follow-on
effects
from
from
from
client
from
the
client
perspective,
the
impact
to
app
Dex
kind
of
is
larger
than
this
time
scale
and
I'm
not
entirely
sure.
B
If
that's
kind
of
an
artifact
of
the
way
our
metrics
are,
are
kind
of
lagging
indicators
or,
if
there's
something
else
going
on,
but
regardless
we
know
that
we
know
that
we
get
those
those
Apex
dips
and
kind
of
the
corresponding
throughput
dips.
B
B
B
Oh,
that's
a
great
question:
I
I,
don't
I,
don't
think
so.
I
think
that
probably
the
heavier
the
traffic,
the
the
higher
the
the
portion,
this
particular
profile,
was
captured
at
1742
UTC
two
days
ago,
just
for
reference
in
time.
So
it
was
not
at
Peak,
but
it
was.
You
know
during
during.
A
B
That's
a
great
question:
I
yeah
I
think
that's
true,
like
I
think
that
it
would
like
it
would
affect
latency
but
I.
Don't
it
would
affect
it
on
a
time
scale
of
microseconds,
so
yeah
I,
don't
think
clients
would
notice.
B
This
this
only
matters
because,
because
there's
a
lot
of
them
happening
so
yeah.
A
I
think
the
the
one
thing
that
you
you
mentioned:
five
percent
of
fully
CPU
saturated,
but
the
effect
goes
on
longer.
I
think
that's
probably
because
it's
five
seconds
like
fully
saturated,
but
it's
still
pretty
busy
right
after
until.
B
B
Yes,
yeah
I
think
that's
plausible,
too
yeah
I
I,
just
I
I
try
to
try
to
be
I,
try
to
be
clear
about
things
that
I'm
really
confident
about
versus
speculation,
but
I
mean.