►
From YouTube: 2023-07-20 Scalability Demo
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
B
So
we
have
switched
over
the
read
traffic
and
the
right
traffic
has
been
going
on
for
one
week,
because
we
had
to
be
careful
around
all
the
S1
and
S2s
and
we
also
ran
the
migration
script
twice
like
the
first
time
is
to
migrate
the
bulk
of
it.
The
second
time
was
the
validate
the
migration,
so
the
Trap,
the
right,
the
rate
traffic
was
cut
over
yesterday
and
this
morning,
which
is
about
10
hours
ago.
You.
C
B
B
Is
no
longer
doing
any
work,
at
least
any
meaningful
work,
so
this
is
the
three
days.
Let
me
just
refresh
it.
I'll
just
do
two
days
so
so
this
this
time
was
when
the
the
rate
Traffic
got
cut
over.
So
you
can
see,
there's
a
big
dip
on
radish
cache
and
then
here
was
when
I
cut
over
I
terminated
the
double
right
and
redis
cash
just
went
to
a
very
small
number.
B
B
Yeah
so
won't
be
surprised
to
see
a
small
number,
but
in
general
we
have
five
shots
running,
and
this
bill
is
very
stable.
I
was
looking
like
keep
an
eye
on
all
the
metrics
for
for
redis
cluster
cache
for
red
is
cash
and
also
web
API
and
website
exactly
so.
They
were
fine
yesterday
and
today,
I'm
just
wrapping
up
some
of
the
details.
B
The
one
thing
that's
interesting
is
that
we
have
right
so
so
this
is
Network
red
is
Network
out,
so
you
can
see
that
four
or
five
of
them
are
fairly
similar
and
then
one
is.
One
of
them
is
standing
out,
particularly
more
than
the
rest,
but
this
changes
with
time,
so
I
suspect
this
is
probably
just
hot
hotkey
or
a
key
with
a
larger
payload
and
svss
accessible
I.
Don't
think
it's
particularly
worrying
right
now,
because
we
have
a
block
hidden
group.
We
have
quite
a
bit
of
Headroom
on
whatever
is
right.
B
Yeah,
it's
like
20
on
the
biggest
one
wait!
No!
No!
This
is
this
is
where
this
cash
yeah.
If
we
look
at
the
bottom
there's
this
little
Club,
they
are
around
20
at
Max,
so
we
have
time
to
to
do
like
a
deep
dive
on.
Why
is
that
imbalance,
and
we
probably
want
to
have
a
slightly
longer
data
collector,
so
we
can
observe
if
there's
any
patterns
like
maybe
this
is
maybe
some
of
the
keys
are
for
some
users
in
a
particular
time
zone
and
they
hit
it
a
little
bit
more
like
like.
B
Yeah,
well,
that's
about
it
right
now.
We
are
wait,
we're
just
gonna,
wait
and
left
and
let
it
like
sit
for
by
a
week
and
just
observe
before
we
do
any
sort
of
tear
down
and
and
given
like
the
availability
concerns.
It's
very
likely
that
I
will
wait
till
August,
because
it's
already
almost
end
of
July.
We
just
wait
a
week
or
two
and
then
we'll
do
the
tear
down
later.
B
But
right
now
it's
it's
it's
working
as
as
expected,
and
the
I
think
Igor
has
emerged
it
so
I
I
I've
increased
the
severity
for
this
from
S4,
which
is
because
it
wasn't
really
running
to
the
S2,
which
is
the
standard
for
all
the
values.
So
this.
C
D
C
C
B
Some
some
interesting
things
is
that
well
I
mean
cash
is
still
quite
easy
per
se.
The
next
the
next
one
is
Chat
State,
which
is
going
to
be
it's
fairly.
Tricky
and
I'll
probably
bring
up
another
time.
Some
some.
B
Like
for
chat,
state
has
pops
up
and
pops
up
works
with
Workhorse,
so
we've
got
Workhorse
like
go
Library
dependency.
Things
like
that
to
work
with,
and
what
else
there
are
there's
exclusive
leads,
which
is
like
locking
kind
of
thing.
So,
if
you
do
a
set
NX,
you
can't
do
a
double
right
for
set
index
because
there
will
be
like
the
second
right
will,
step
on
each
other,
and
the
values
will
be
set
differently
differently,
so
that
there
are
these
little
applications
like
intricacies,
that
we
have
to
iron
out,
but
yeah.
B
E
See
what
one
thing
I
do
want
to
mention
on
on
the
redis
shared
state
that
I
found
kind
of
interesting
I
ran
a
TCP
dump
analysis
A
few
days
ago.
Let
me
see
if
I
can
pull
it
up,
so
this
is
capturing
capturing
traffic
on
the
shed
States
secondary,
and
the
reason
for
that
is.
When
we
looked
at
a
CPU
flame
graph,
we
saw
that
a
lot
of
the
CPU
time
is
actually
spent
on.
I
o
on
rights
in
particular,
and
when
we
compared
the
Ingress
on
the
replicas.
E
Basically
that
that's
an
indicator
of
how
much
incoming
replication
traffic
is
happening,
and
you
take
that
times
two
and
then
you
have
a
prop
an
approximate
share
of
how
much
of
the
rights
happening
on
the
primary
are
actually
targeting
replicas,
as
opposed
to
targeting
clients,
and
it
was
like
50.
So
a
large
chunk
of
the
work
that
the
shared
State
primary
is
doing
is
actually
replication.
E
Traffic
and
I
wanted
to
get
a
better
idea
of
what
that
looks
like,
and
so
this
is
capturing
incoming
traffic
on
a
replica
and
then
aggregating
by
the
size
of
that
incoming
traffic,
which
gives
us
a
sense
of
what
are
the
main
contributors
and
what
was
kind
of
surprising
to
me
was.
E
But
what
redis
does
is
it
publishes?
It
always
also
publishes
to
the
replication
stream,
and
so
all
of
the
published
traffic
must
also
go
to
replicas,
and
that
kind
of
means
that
it
can
be
an
optimization
Target.
So
if
we
reduce
publish
payload
sizes,
if
we
shift
publish
traffic
to
a
different
redis,
that
can
also
reduce
load,
which
was
kind
of
a
surprising
result
to
me
and
I
just
wanted
to
kind
of
show
that
eagle.
C
C
Because
it's
effectively
it's
had
caching
right
or
you
know
it's
saying
just
we've
had
this
response
before,
like
you
know,
keep
your
your
existing.
That's
a
good
point.
C
Probably
I
mean
I,
don't
know
how
big
that
I
mean
it's
pretty
big
compared
to
the
next.
That
might
be
a
little
piece
of
work
to
actually.
C
Yeah,
that's
a
big
chunk
of
big
chunk
of
change.
It
might
be
actually
worth
migrating
that
to
the.
B
C
It
might
have
been
that
there
wasn't
a
cash
instance
when
we
built
that,
or
maybe
the
Workhorse
Workhorse
only
spoke
to
the
primary
to
the
to
the
shared
States
redis.
It
might
have
been
that
as
well,
and
it
was
just
like
a
easy
thing
to
do.
B
I'm
not
sure
about
them,
because
I'm
searching
for
what
caused
about
this
or
redis
operations,
it's
mostly
with
key
watching,
which
is
the
runner
cues,
the
pops
up
for
running
queues
and
also
it
it
doesn't
get
on
one
key,
which
is
also
run
again.
C
The
first
iteration
was
that
rails
would
do
the
would
be
the,
but
the
whole
idea
was
that
Workhorse
and
maybe
it
has
never
been
done,
which
is
kind
of
sad,
because
you
could
short
circuit
that
at
the
Workhorse
level,
right
like
if
you
look
I,
think
there
was.
C
There
was
something
in
the
original
design,
doc
about,
like
just
short
circuiting,
that
and
doing
the
the
conditional
response
at
Workhorse
and
not
even
bothering
with
with
rails,
and
it
might
have
been
like
a
we'll
do
that
in
the
next
iteration
and
six
years
later
we
haven't
done
it,
which
is
kind
of
funny
but
yeah.
So
maybe
that
is
the
case.
I.
C
E
C
F
All
right
so
for
a
little
bit
of
context
in
our
psychic
service,
we
have.
We
originally
have
this
per
shot
SLS,
so
we
kind
of
hacked
because
we
want
to
monitor
each
chart.
We
originally
only
has
things
like
short
control:
SLI
chat
database
shuttle
SLI,
so
this
originally
tracks
both
the
queue
and
the
execution
side
of
the
the
performance.
F
F
So
what
we
have
done
for
a
bit
here,
we
actually
now
already
have
a
separated
execution,
HLA
index
and
the
qvm
XLF
text
and
the
way
we
did
this
differently
is
that
previously
for
our
per
shot
slis,
we
have
each
we
money,
we
calculate
from
the
histogram
that
we
scrape
from
the
rails
application
and
then
the
app
next
calculation,
whether
it's
a
zero
or
one
happens
in
the
recording
rules
aggregations,
whereas
our
the
news,
execution
and
the
queuing
ones
will
actually
the
calculation
happens
in
the
rails
app
itself.
F
F
F
Yep,
so
we
have
all
this
different
shots
within
one
component
name
and
then
what
I've
made
the
changes
in
the
dashboard
yourself.
So
instead
of
we
have
this
short
template,
so
you
can
just
select
from
the
drop
down
so
and
then
it
will
respect
the
shot
template
into
this
type
text
panels.
So
now
we'll
see
all
the
shards,
but
you
can
just
see
the
related
ones
that
we
want
to
see
and
then
the
saturation
panels
also
represents
this
shot.
So
previously,
this
saturation
panel
is
a
global
view
of
every
shot.
F
So
now
we
can
see
here
in
the
fine,
Grainer
yeah,
so
I
guess
the
next
steps
is
we
will
I
will
completely
replace
all
this
per
shot
slis.
So
we
will
just
have
like
two
rows
of
SLS
here
and
then
you
just
drop
down
by
the
chart
templates
on
top
yeah.
C
The
market,
if
I,
understand
correctly
the
the
the
applications
emitting
like
a
zero
one,
potentially
also
like
a
half
depending
on
the
the
threshold.
C
F
C
F
For
like
rails
requests,
I
guess
and
psychic
workers,
they
can
have
different
urgency,
so
I
kind
of
make
it
the
one-off
for
sidekick
yeah.
It's
just
a
simple
class
yeah
yeah.
C
It's
just
there's
been
some
discussion
in
the
past,
I
think,
particularly
with
Bob
around
kind
of
abstracting
them
out
where
you
can
basically
give
you.
You
have
like
a
yield
and
you
can
run
something
in
there
and
then
you
say
you
know
what
the
what
the
satisfactory
and
the
and
the
tolerated
thresholds
are.
This
is
unrelated
to
this
work.
It's
kind
of
like
a
it's
like
a
semi-related.
It's
not
you
know.
C
I
think
this
is
really
great
like,
but
and
then
we
can
kind
of
reuse
that
in
places
where
we
want
to
record
up
text,
but
just
doing
that
will
make
a
huge
difference
to
the
to
the
number
of
metrics
and
are
we
getting
rid
of
the
latency
metrics,
all
together
or
they're,
still
there,
but
just
not
being
consumed.
F
Yeah,
it's
interesting
I'm
trying
to
audit
whether
we
can
kind
of
replace
with
the
this
zero
one
app
decks
or
not.
Whether
we
still
have
important
places
where
we
actually
want
to
see
the
duration
or
this
locks
is
fine
for
us.
Yeah.
C
The
directions
themselves,
like
especially
like
I,
know
a
lot
of
software
engineers
in
the
stage
groups
kind
of
rely
on
those
histograms,
but
really
like
the
most
important
thing.
There
is
to
educate
them
in
how
inaccurate
those
histograms
are.
You
know
they
can
be
out
by
many
dozens
of
seconds
right
and
but
people
intertake
them
as,
like.
You
know
absolute
truth.
It's
like!
Oh
look.
We
saved
a
second
and
it's
like
you
really
don't
know
that,
and,
and
so
like.
C
The
thing
is,
if
you,
if
you
encounter
kind
of
pushback
from
Stage,
Group
teams,
who's
saying
oh
well,
how
are
we
gonna
know
like
what
you've
got
to
do
is
kind
of
teach
them
about
the
about
using
the
elasticsearch
and
the
logs,
rather
right,
like
that's
kind
of
the
way
that
we
have
to
kind
of
help.
Those
teams
you
know
for
for
for
actual
latency
metrics
is
much
better
just
to
use
elasticsearch.
F
Yep
I
guess
I
will
it's
part
of
the
Epic.
We
also
maker
write-up
documentation
about
this
Excel,
how
it
works
and
then
yeah
what
to
do.
C
A
Yeah,
it's
going
to
be
really
nice
being
able
to
split
it
clearly
between
the
queuing
and
the
execution.
Just
provide
so
much
more
detail
as
to
what's
actually
going
on
in
there
rather
than
the
aptx
is
fine
like
being
able
to
split
the
two
pieces
out
is
great.
C
Don't
charge
detail
dashboards
as
well
and
have
you
updated
those?
So
if.
C
No,
no.
F
C
The
top
sorry
and
then
just
yeah
to
the
top
of
the
and
then
you'll,
see
there's
a
on
the
right
hand
side.
So
you
can
you
can
go
this
way
as
well.
Yeah,
there's
a
short
detail
side.
It's
a
second
last
one.
D
F
C
How
many
people
use
this?
This
is
one
of
the
places
where
the
histograms
are
used,
but
we
can
get
rid
of
those
I,
don't
even
know
if
anyone
uses
this
dashboard.
To
be
honest,
it's
very
slow,
I.
B
C
C
Yeah
I
mean
some
world
where
we
could
make
that
query
into
elasticsearch
and
plotted
in
here
electrophone.
It
does
support
elasticsearch
as
a
as
a
data
source,
but
it's
probably
easier
just
to
push
people
through
to
elasticsearch
and
give
them
a
graph
in
them.
E
So
I
I
had
another
question
as
well.
You
showed
the
the
psych
or
you
touched
on
the
sidekick
saturation
panel
and.
D
E
That's
now
kind
of
broken
out
by
components,
and
it's
kind
of
a
lot
of
information,
and
it's
so
I
guess
for
context.
When
I'm
dealing
with
an
incident
I,
usually
try
and
kind
of
scan
the
top
row
of
a
service
in
order
to
tell
be
able
to
tell
at
a
glance
whether
there's
anything
obvious
that
looks
out
of
place
and
with
this
amount
of
lines
it's
it's
kind
of
hard
to
gauge.
So
I'm,
just
kind
of
wondering
whether
there's
some
sensible
way
to
to
make
this
make
the
ux
more
friendly.
For
that.
F
C
D
B
F
Yep
and
another
thing
also
I've
discussed
it
with
Bob
last
week,
so
because
now
we
have
two
slice
and
then
we,
if
we
were
to
aggregate
both
execution
and
queuing
into
the
service
aggregation,
we
would
then
have
like
double
RPS
and
double
of
the
error
ratio
is
that
of
just
the
one
that
we
have
per
shot.
F
So
we
currently
don't
have
a
way
to
kind
of
indicate
whether
this
SLI
is
only
for
RPS
or
for
Optics
or
for
the
error
issue,
whereas
for
queuing
we
only,
we
only
have
the
app
decks
based
on
duration,
not
an
error
right,
so
yeah.
This
will
be
the
next
work,
also
to
kind
of
support.
C
Yeah
I'm,
just
sort
of
like
I,
mean
a
lot
of
the
reason
why
we
we
had
that
that
second
dashboard
was
so
that
to
kind
of
avoid
the
compute
you
know
of
of
because
obviously
a
lot
of
those
those
graphs
also
like
really
complicated
the
way
that
they're
con
the
way
that
they
developed
at
the
moment,
so
I'm
just
kind
of
worried
that
if
we
are
adding
in
all
that
extra
complexity
about
like
selecting
on
labels
and
stuff,
whether
it
would
you
know
and
then
obviously
also
if
you
scroll
to
the
top
we've
got
the
the
range.
C
You
know
the
the
normal
that
normal
range
and
then
obviously,
if
you've
got
two,
you
can't
really
do
well.
I,
don't
know
how
you
would
do
that
with
two
values
in
there,
for
example
or
I,
don't
know
if
I'm
kind
of
missing
what
you
were
saying.
F
What
they
were
saying
is
so
we
have
this
in
our
metrics
catalog
yeah,
so
we
would
have
this
average
aggregation
yield
of
flag.
So
what
this
flag
does
is
it
will
aggregate
all
the
app
decks
error
and
RPS.
F
C
All
right
yeah,
so
we
can't
we
can't
do
server
segregation
on
a
per
sub
attribute
level.
We
can't
do.
F
C
App
clicks,
but
not
the
RPS,
that's
interesting
yeah
should
that
should
be
too
hard
to
change.
Yeah
I
mean
a
lot
of
a
lot
of
the
IPS
are
like
kind
of
a
little
bit
out
there
like.
If
you
think
about
the
web.
You
know
when
you
think
of
the
RPS
of
web.
It's
like
one
request,
comes
from
a
user,
but
the
way
that
we
counted
on
there
is
really
like
one
request
into
Workhorse,
one
request
into
rails,
and
so
it's
really
a
relative
measure
rather
than
a
like
yeah.
F
E
Yeah
I
mean
quite
often
if
I
want
to
see
how
much
traffic
a
service
is
actually
getting.
I'll
go
to
the
dashboard,
I'll
skip
the
RPS
line
and
I'll
go
down
to
the
SLI,
and
then
we
have
load
balancer
error
like
load
balancer
request
per
second
Workhorse
request
per
second
rails
request
per
second.
We
usually
have
those
three
right
and
then
that
that
gives
a
more
realistic
I.
C
Mean
the
other
way
we
could
do
it
is.
We
could
say
that
Forest
Service,
instead
of
aggregating
all
the
RPS
to
get
the
service
obvious.
We
say
RPS
from
this
component
and
and
so
RPS
is
kind
of
different
from
errors
and
and
Optics.
So
so
that
you
you
know
normally
it
would
be
the
load
balancer
component.
That
will
give
you
the
RPS
for
the
servers,
because
it
would
be
much
more
realistic
than
what
we
have
at
the
moment.
A
All
right
well
we're
a
Time.
Thank
you,
everyone
for
the
demos,
much
appreciated,
hope
you
have
a
good
rest
of
your
day.