►
From YouTube: Scalability Team Demo - 2022-02-24
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
Thanks
yeah,
I'm
sort
of
stealing
the
show
from
other
people
who
work
on
the
redis
upgrade,
but
what
I
wanted
to
talk
about,
it's
not
the
redis
upgraded
sorry,
we
just
upgraded
redis
cash
to
6.2
and
we
means
alejandro
and
igor
and
not
me,
but
I
was
interested
in
this
because
of
latency
spikes.
We
were
seeing
on
the
git
surface.
I
was
curious
if
curious,
if
those
got
better
and
they
happened-
I
think
which
is
disappointing,
but
it's
also
interesting.
B
So
I
just
wanted
to
quickly
talk
through
the
the
graphs
I
collected
for
that.
B
So
now
I
should
share
my
screen,
so
I
wrote
a
comment
about
this.
Can
you
see
my
browser?
Yeah?
Okay,
I
wrote
a
comment
about
this
on
the
issue
where
we
talked
about
upgrading
and
the
the
problem.
Really.
Is
this
so
it's
really
straightforward
to
see
in
the
the
rails
request
logs,
if
you
filter
for
type
kit
and
you
look
for
requests
of
duration
longer
than
one
second,
then
you
see
these
bursts
and
this
is
a
problem
and
this
shouldn't
be
there
like
this.
B
The
kit
surface
is
where
the
get
http
traffic
goes,
and
all
these
requests
need
to
do
is
to
look
up
a
project
in
the
user
in
the
database
and
say
this
user
exists,
the
project
exists
and
here's
the
gitly
server
and
now
go
away
so
that
unless
you
take
less
than
100
less
than
100
milliseconds,
it's
slow
enough
as
it
is,
but
it
should
take
more
than
a
second
and
the
the
weird
pattern
we
have
is
that
when
it
takes
longer
than
a
second,
it's
usually
because
of
redis
because
of
redis
cache.
B
I
think,
if
I
take
this
filter
out,
then
you
sort
of
see
the
same
thing.
Oh,
they
move
a
bit.
Why
did
they
move?
B
Yeah
so
yeah
now
these
are
actually.
If
I
exclude
these,
then
we
have
way
fewer
so
that
there
are
still
some
models.
B
So
that's
that's
funny,
but
there's
this
correlation
with
redis
cache
and
what
we
realized
was
what
we
were
even
aware
of
for
a
while
is
that
redis
cache
has
eviction
bursts
and
it's
supposed
to
effect
because
it's
configured
with
a
max
memory
limit,
so
whenever
it
hits
the
limit,
it
needs
to
depict
stuff,
and
what
we
were
hoping
was
that
redis
6.2
would
push
the
bursts
down
and
just
effect
at
a
more
constant
rate,
because
it
contains
code
changes
that
should
make
it
do
that
differently,
and
you
can
see
the
changes
if
you
look
at
the
eviction
rate.
B
So,
okay,
let
me
also
refresh
this
because
it's
not
from
an
hour.
So
what
is
interesting
here
in
this?
This
is
the
eviction
rate.
So
this
is
a
counter
in
redis
that
we
can
track
via
prometheus
and
what's
interesting,
is
these
bumps
down
here?
I
think
these
bumps
are
the
effect
of
the
new
eviction
code
in
6.2,
because
if
I
go
back
in
time
24
hours,
then
the
bumps
don't
exist.
You
only
see
the
peaks,
so
something
did
change,
but
it's
not
working
the
way
we
want
it
to
because
we
want
yeah.
B
B
So
it
has
to
be
that
thing
and
it.
D
Still
has
these
bursts,
so
that's
yeah,
but
any
any
memory
pressure
can
induce
can
induce
an
eviction
burst
even
even
the
the
client
connection.
Buffers
count
against
that
budget,
but
that's
not.
B
B
So
this
means
that
it
does
500
microseconds
of
evictions
and
then
something
else
and
then
500
microseconds
of
evictions
and
then
something
else.
So
why
isn't
it
I?
Why
does
it
behaves
like
this
sometimes
and
then
sometimes
it
still
gets
to
do
this
much
work.
B
So
then
it's
easy
to
understand
that
you
will
see
bursts
if
you
go
over
the
limit
by
a
lot,
because
there's
just
it's
only
doing
that.
It's
like
right.
The
main
thread
is
single:
it's
a
single
thread,
it's
in
a
loop
and
it's
only
evicting
until
it
reaches
call,
but
now
every
time
it
depicts
a
key.
It
looks
at
the
current
time
and
if
it's
been
evicting
in
the
eviction
loop
for
500
microseconds,
it
stops.
D
Well,
I
mean
I
don't
know
what
to
say.
I
haven't
looked
at
this
new
code
so,
but
it
sounds
to
me
like
the
next
step
would
be
to
do
some
profiling
during
one
of
these
bursts
during
surrounding
one
of
these
bursts,
particularly
like
for
the
oh
gosh
zoom,
the
zoom
widgets,
in
the
way
waiting
for
it
to
disappear.
Oh
gosh,
zoom
anyway,
the
the
the
10-minute
window
that
includes
there
we
go
like
from
from
say
1527
to
1536
would
be
sufficient.
D
I
think
just
looking
at
the
the
middle
of
the
graph
that
you
got
on
the
screen,
yeah
yeah,
that
would
that
would
teach
us
a
lot
and
it
looks
like
because
of
the
time
scale
that
we're
looking
at
we
could
afford
to
do.
You
know
we
could
afford
to
do
like
you
know
like
500.
You
know
samples
per
like
the
the
sampling
rate.
Could
yeah
low,
not
not
super
aggressive,
yeah.
B
D
So
I
wouldn't
feel
bad
about
running
a
profile
for
multiple
minutes
at
you
know
at
kind
of
a
modest
rate,
and
it
can
tell
us
you
know
the
it
can
tell
us
more
precisely
the
duration
of
the
burst.
So,
if
we
want
to,
we
can
once
we
get
our
first
kind
of
you
know
medium.
You
know
kind
of
lowest
sampling
rate
to
get
a
panorama
of
of
where
the
activity
is
is
centered.
This
is
obviously
going
to
be
cpu-centric,
so
we'll
we'll
see
it
in
the
profile.
E
D
We
can,
we
can
predict
roughly
when
the
bursts
are
going
to
happen
and
then
take
a
higher
sample
higher
sampling
rate
closer
to
when
this
burst
is
going
to
happen.
If,
if
we
deem
that,
we
need
to
which
we
may
not.
B
Yeah,
no,
that
would
be.
That
would
be
interesting
because
it
would
corroborate
what
the
code
looks
looks
to
be
doing,
especially
if
you
use
this
visualization
that
I
don't
know
how
to
make
where
you
get
the
the
dots
over
time.
Yes,.
B
Yeah,
because
that
should
show
the
the
cluster
of
the.
D
B
B
I
almost
wonder
if
there
is
either
it
has
nothing
else
to
do
so.
There
are
no
requests
coming
in
for
a
strange
reason,
because
then
I've
wondered
that
too,
but
it
doesn't
sound
plausible.
Does
it
no,
but
how?
How
else
can
it
chew
through
so
many
keys
in
500.
B
Bursts,
yeah
or-
and
this
is
something
I
don't
understand
so
well
so
if
if
the
500
microsecond
burst
is
up,
the
new
eviction
codes
will
schedule
a
proc
on
the
event
loop
in
redis.
E
B
Don't
know
exactly
what
that
means,
but
it
says
zero
seconds
in
the
future.
Do
this
do
another
chunk,
another
500
microseconds?
So
I
think
the
idea
is
that
that
next
chunk
of
500
microseconds
joins
the
back
of
a
queue
and
other
work
happens
first,
but
maybe
there
is
something
pathological
where
it
it's
at
the
head
of
the
queue
and
even
though
stuff
is
supposed
to
happen
in
between
these
little
500
seconds,
microsecond
blocks
they're
all
back-to-back.
D
Maybe
yeah
well,
the
the
the
profile
should
show
us
should
answer
some
of
these
questions.
I
think
I
think
you're
asking
exactly
the
right
questions
by
the
way.
Those
are
those
are
clear.
Great
next
steps,
thanks
yeah.
B
We
should
figure
out
how
to
do
this
profile
in
a
safe
way
and
if
we
need
to
high
resolution
or
not
so
this
is
happening
right
now
right
this
happens
continuously.
This
is
a
yes.
D
End
time
is
now
yeah
yeah,
that's
great,
so
so,
probably
after
this
call
I'll
it.
F
D
Yeah
yeah
yeah
yeah,
so
I
I
want
to
catch
this
before
the
before
the
the
the
daily
workload
cycle
starts
to
wane.
But
apart
from
that,
I
think
I
think
we
can.
We
can
gather
more
information
today.
Oh,
would
you
share
the
link
to
the
thanos
graph.
You
were
just
yeah.
B
B
I
just
went
through
the
well.
We
had
an
incident
a
while
back,
which
was
sort
of
borderline
incident,
but
our
slis
decided
it
was
an
incident
on
the
git
service
and
then
we
realized
it
was
because
of
these
evictions
causing
latency
on
the.
So
that
was
the
first,
the
log
graph.
B
So
we
knew
about
that
and
that's
how
we
came
up
with
we
rediscovered
that
we
wanted
to
upgrade
to
6.2,
and
I
saw
alejandro
posting
that
he
was
working
on
that
so
the
other
day
I
was
just
curious,
like
hey
refresh,
is
it
on
six
two
actually
had
a
thanos
crap
with
the
versions
open,
so
I
just
was
fiving
that
until
it
reached
six
two
in
production
and
then
I
looked
and
then
and
it
didn't
get
better
so
then
I
just
looked
at
the
log
graph
again.
B
Another
thing
that
I
mean
we
have
one
tunable
thing,
which
is
that
these
500
microsecond
windows
we
could
make
them
shorter
or
longer.
But
I
don't
have
a
good
reason
to
think
that
would
help.
B
But
if
I
know
if
that
may
be
an
outcome
that
we
decide,
we
need
a
longer
or
shorter
window
for
these
things,
but
I
think
we
first
need
to
gather
data
like
map
was
saying.
A
I
had
a
question
around
the
gender
item.
We
happened
to
notice
it
because
you
were
looking
at
the
graphs
for
for
other
reasons,
and
it
seemed
that
the
after
spikes
didn't
get
bad
enough
that
it
actually
alerted.
But
I
was
one
and
I'm
hesitant
to
put
in
like
excessive
process
around
things
like
upgrades,
but
would
it
be
helpful
to
was
there
something
that
we
weren't
monitoring
that
we
should
have
monitored?
Should
we
have
kept
this
open
for
longer
to
see
it
like
you're.
B
Talking
about
a
different
problem
now,
because
something
went
wrong
during
the
upgrade
where
we
started
doing
background
shapes
on
the
radio
screen.
B
Well,
we
do
have
latency
aptx
and
it
wasn't
complaining,
so
maybe
it
wasn't
the
problem.
E
D
B
B
Only
now
it's
not
loading,
maybe
I
need
to
make
it
shorter.
B
Okay
well,
but.
E
Let
me
check
so
it
was
about
the
thanos
store
sli,
and
I
think
that
the,
if
you
go
to
the
production
channel,
the
alerts
are
still
active.
I
think.
B
Well,
if
we
know
what
this
looks
like,
these
are
the
meant
to
be
the
four
golden
signals,
and
this
is
the
aptx.
We
have
oh
no
data.
This
is
measuring
latency
from
the
rails
client
side,
and
there
should
be
a
line
in
there
where
an
slo.
B
So
to
get
back
to
rachel's
question,
apparently
we
were
not
crossing
that
line.
D
Yeah
there's
a
I
just
edit
into
the
zoom
chat,
there's
a
link
to
since
you're
screen
sharing.
Can
you
click
it?
It's
a.
B
Yeah,
so
this
is
the
upper.
This
is
the
first
panel
literally
on
the
page
but
zoomed
in
and
this
what
are
the
lines
again.
B
I
so
I
suppose
so
one
of
these
must
be
the
self
an
hour
and
the
other
six.
B
C
C
Taken
a
long
time
like
over
time,
you
need
to
have
spent
five
percent
of
the
what
you're
allowed
to
spend
and
then
it
needs
to
have
been
especially
bad
for
half
an
hour
for
the
alert
to
trigger.
D
B
A
Is
possible
yeah
because
it
would
have
alerted
at
some
point
and
and
then
we
would
have
dealt
with
it
because
it
had
alerted
it.
Just
we
got
early
warning
because
you
were
interested
in
something
else.
C
But
you
do
bring
up
something
interesting
rachel.
How
do
we
tighten
thresholds
like
our
goal
is
to
always
keep
tightening
them
if
we
can
and
some
like
now,
I'm
looking
at
metrics
for
specific
sli's
that
I'm
going
to
tighten
after
I've
done,
but
then
for
these
ones
like
that
we
haven't
looked
at
in
a
while.
How
do
we
decide
to
tighten
them
or
not?.
B
Oh
this
one's
difficult,
because
we
still
have
those
late.
This
efficient
spikes,
it's
it's
too
bad
that
the
page
doesn't
load
because
I
was
actually
looking
at
it
and
the
this
latency
graph
does
look
better
after
the
upgrade
from
what
I
remember.
I
don't
know
if
other
people
saw
that
so
after.
A
B
G
D
B
D
A
few
people
have
stepped
up
to
to
deal
with
to
deal
with
critical
issues
as
they
emerge.
I
know
michal
has
done
intermittently
some
work
on
this,
for
example,
but
I
don't
think
it
has
a
new
owner,
at
least
to
the
best
of
my
knowledge.
We
don't
have
a
new,
you
know,
owner
for
the
service,
and
I
I
think,
there's
general
recognition
that
there's.
C
D
D
So
so,
through
an
uninteresting
sequence
of
coincidences,
I
ended
up
on
the
call
yesterday,
with
with
some
of
the
ddres,
to
chat
about
a
couple
of
points
of
concern
related
to
us
building
our
own
postgres
packages
currently
in
production,
we're
running
10
petroleum
nodes,
two
of
those
fraternity
nodes
are
running
a
custom,
postgres
build
that
we
built-
and
there
are
some
problems
with
that
build
and
I'm
going
to
gloss
that
over
for
now
and
the
other
eight
petrona
nodes
are
running
the
community
build.
D
These
hosts
are
still
running
ubuntu
1604,
which
is
way
past
end
of
life
and
upstream
package
providers
are
no
longer
a
building
that
including
the
postgres
community,
is
no
longer
building
packages
for
point
releases,
so
we're
stuck
running
postgres
12.7
in
production
currently,
and
we
want
to
get
to
12.9
because
it's
got
some
some
improvements
that
we'd
like
to
have,
but
there
are
no
packages
for
12.9
for
ubuntu
16.04.
That's
the
backdrop.
D
D
Yes,
yes
exactly
so
so
we
don't
run
our
omnibus
builds
in
on
these
nodes
and
apparent,
and
I
not
privy
to
all
the
decisions
that
went
into
this.
But
apparently
we
decided
that
we
were
gonna,
that
we
were
gonna.
Try
to
use
the
the
the
debian
package
recipes
for
building
for
building
the
the
postgresql
12
package
and
and
its
friends.
Here's
the
catch.
D
There
are
a
few
problems
with
with
doing
that,
but
one
the
one
that
concerns
me
most
now
is
most
of
the
code
that
we
run
in
postgres
is
part
of
is
part
of
the
postgres
core,
that
is
to
say,
we
run
about
10
or
so
extensions
to
post
quiz
and
all,
but
one
of
them
in
in
production
and
all,
but
one
of
them
are
part
of
the
postgres
core
source
code
and
so
there's
absolutely
no
risk
of
version
incompatibility.
D
But
there's
one
called
pg
repack
that
whose
job
is
literally
to
rewrite
data
files
in
the
database
in
a
in
a
more
lock
friendly
manner
than
than
the
native
behavior.
That
was
built
against
the
postgres
12.4
headers.
We
are
not
running
postgres12.4.
D
B
D
D
Now,
that's
that's
a
great
question
and
yes,
so
that
that's
a
major
version
boundary
concern.
Even
minor
versions,
though,
can
potentially
change
like
I'm,
I
mean,
for
example,
if
there's
a
struct
who
you
know
added
or
lost
a
field,
and
now
we'll
be
looking
at
the
wrong
offset
into
that
struct,
because
we
used
the
wrong
header
file
when
building
pg
repack.
D
This
is
kind
of
freaking
me
out
because
there
are
so
so
many
bad
things
that
can
happen
when
you
have
effectively.
You
know
incorrect
dereferences
yeah,
when
extensions
have.
D
Extensions
run
in
the
same
context
as
at
least
the
extensions
that
I've
worked
with
just
to
be
clear.
I've
only
worked
with
in-core
extensions,
not
these
third-party
extensions,
which
use
a
different
framework,
but
they
still
end
up
compiling
using
using
headers
from
whatever
postgres
server
was
was,
you
know,
used
as
part
of
the
build
process,
so
I
just
checked
this
out
last
night
to
be
sure
yeah.
So
that's
that's!
That's
the
that's!
That's
the
thing!
That's
kind
of
bothering
me
right
now
to
the
best
of
my
knowledge.
D
I
think
we
do
use
pg.
I
think
I
think
we
do
use
pg
repack
irregularly.
I
don't
think
we've
run
this
very
often,
but
but
I'm
not
I'm
not
trying
to
stay
in
the
loop
on
on
on
our
database
maintenance.
So
anyway,
that
that
was
something
I
came
across
yesterday
and
I
thought
I
thought
I
was.
D
I
was
hopeful
that
this
was
that
someone
had
already
decided
to
address
this,
but
it
turned
out
that
when
I
chatted
in
that
meeting
it
it's
it's
it's
it
wasn't
on
their
radar,
so
so
everest
that
has
concern.
D
I
wasn't
sure
yesterday
morning
how,
whether
I
was
overreacting
to
this
or
not,
but
I'm
pretty
confident
now
that
I'm
not.
I
think
this
is
a
serious
problem.
I
think
this
is
a
serious
risk
and
I
think
we're
super
duper
lucky
that
this
is
that
there
is
only
one
extension
in
production
that
falls
into
this,
this
category
of
crossing
version
boundaries,
so.
D
D
Well,
I
raised
it
first
in
a
slight
conversation
and
then,
in
that
meeting
I
mentioned
yesterday
morning
and
their
the
meeting
went
long
and
I
think
the
folks
on
the
call
were
pretty
receptive
to
to
the
concern
and
when
folks
had
to
start
dropping
off
the
meeting
we
just
we
agreed
that
we
would
switch
to
async
and
they
asked
they
suggested
using
the
existing
change
issue
for
the
async
conversation
to
continue,
which
I
I
followed
followed
that
that
that
request
in
hindsight
I
feel
like
maybe
a
separate
issue
would
have
been
better
and
I've
got
like
about
two
or
three
pages
worth
of
notes
that
I've
added
in
the
long
comment
thread
on
that
issue.
A
I
think
it's
one
of
the
hard
things
about
not
being
responsible
for
the
database
ourselves
is
that
all
all
we
can
really
do
is
make
sure
that
the
concern
is
raised
and
heard
and
then
leave
it
on
their
prioritization
list
to
take
care
of
which
is
frustrating
at
times,
because
we
can't
just
do
what
we
need
to
do
to
to
resolve
it.
But
at
the
same
time
we
need
to
leave
it
with
him.
B
D
Yeah,
I
agree-
maybe
I'll
maybe
I'll-
do
that
and
link
it
to
the
change
issue.
Just
so,
there's
a
a
cleaner
place
to
talk
about
that
particular
point
of
view.
D
Yeah
yeah,
that
sounds
good,
so
I'll.
Do
that
and
then
I'll
step
away
from
it
there's
obviously
a
bunch
of
other
things.
I
need
to
focus
on
as
well,
but
this
this
seemed
like
a
big
deal.
I
shouldn't
sugarcoat
it.
This
is.
This
is
a
huge
problem.
This
can.
This
can
literally
crash
our
servers
or
corrupt
our
data
files
and
the
the
the
nature
the
the
nature
of
the
problem
could
be.
D
D
It
really
is
kind
of
you
know
up
in
the
air
about
whether
how
big
the
impact
can
be,
and
it
would
of
course
be
on
a
it
can
change
on
a
on
each
time
we
do
a
minor
version
bump
the
risk
is
reintroduced,
because
it's
not
about
it's
not
about
the
one
line,
change
that
we've
made
to
the
postgres
source
code.
It's
about
the
the
the
all
of
the
differences
that
have
accumulated
in
the
postgres
source
code
between
12.4
when
the
extension
was
built
versus
whatever
version
we're
running
now.
D
D
F
Yeah,
that's
that's!
That's
all
I
had
on
that
topic.
D
B
E
Yeah
sure
I
think
it
went
relatively
good,
except
for
what
you
found,
which
is
that
we
had
to.
We
had
to
disable
the
background
saves,
but
we
had
a
script
that
igor
developed
and
that
worked
out
pretty
well.
We
just
had
to
do
an
adjustment
for
the
sentinel
nodes
because
ready
sketch
has
external
sentinel
hosts,
so
we
have
to
modify
the
script
to
take
that
into
account.
E
As
a
result,
it's
not
an
optimal
script
now,
because
there's
no
reason
why,
like
we
do
the
way
we
do
it
now
in
the
screen,
is
that
you
reconfigure
the
first
radius
node
and
then
you
reconfigure
the
first
sentinel
node,
but
there's
no
reason
why
that
has
to
be
another
they're,
not
correlated
the
release
node
and
the
synthetic
node,
but.
E
B
E
No,
no
just
a
general
process
yeah,
just
just
thinking
of
what
could
improve,
but
with
regards
to
the
background,
safe
problem.
The
thing
that
was
a
bit
of
a
surprise
is
that
it
seems
that
this
was
a
change
in
ready
6.2,
because
we
used,
we
had
the
same
settings
as
we
had
with
6.0.
E
It's
just
that.
It
seems
that
now,
if
you
don't
have
save
settings
it,
it
puts
the
default
settings
on
your
configuration
so
to
keep
the
old
behavior.
You
have
to
add,
have
a
say
setting
with
an
empty
string.
So
instead
of
having
no
setting,
you
have
to
have
a
setting
with
an
empty
string
and.
E
E
E
They
changed
that
so
that
there's
no
automatic
host
to
ip
translation,
so
we
had
to
change
to
nav
so
yeah.
Those
were
two
changes
where
we
had
to
ping
the
distribution,
the
distribution
team,
so
that
they
are
aware
that
there
might
be
external
customers
that
are
also
affected
by
this.
It's
probably
unlikely.
It's
probably.
B
I
I
just
wonder
on
the
the
reddish
was
sort
of
created
by
one
person
and
I
forgot
his
forget
his
name,
but
I
is
his
his
nickname,
auntie
rez
or
something
and-
and
he
was
the
the
maintainer
of
reddish
for
a
very
long
time,
but
he
stepped
down,
I
think,
a
year
ago,
or
so
now.
I
wonder
if
what
this
has
to
as
related
with
the
maintainer
stepping
down
that
there
have
a
different
attitude
to
changes
or
not.
But
I'm
speculating
now.
E
Yeah
I
mean
if
you
look
at
the
changelog,
for
example,
the
the
change
in
behavior
in
background
safe.
It's
just
it's
not
mentioned
as
a
breaking
changes
there.
You
have
to
go
to
the
book
fixes
and
it
says:
oh
change,
a
behavior
where,
where
no
setting
will
give
you
the
defaults,
so
if
you
are
just
reading
the
breaking
changes
section,
you
wouldn't
have
caught
either
of
these
changes,
which
is
weird.
B
D
B
It's
a
huge
change.
It's
I
mean
redis
is
part
of
this.
I
guess
it's
sort
of
part
of
the
no
sql
hype
where
databases
could
look
fast
because
they
don't
store
your
data.
I
mean
the
the
other
famous
example
is
mongodb
not
flushing
disk
right
to
disk,
but
if
you
boot
redis,
it
doesn't
save
your
data
to
disk,
and
I
I
remember
the
redis
creator
running
having
a
blog
post
where
he
says
by
the
way
my
entire
blog
is
on
redis
and
spectrum
saving
is
off.
B
So
I
almost
lost
my
blog
because
I
restarted
the
redis
process.
It
said
the
reddit
is
running
somewhere,
so
it's
just.
I
don't
know
this
very,
no
sequel
thing
that
it
doesn't
save
by
default.
So
that's
quite
a
big
change
to
call
that
a
bug
fix.
I
mean
it's
probably
for
the
better
for
most
people,
but.
A
Well,
thanks
for
the
good
conversation,
I
hope
you
all
enjoy
the
rest
of
your
days
and
looking
forward
to
seeing
you
all
again
soon.