►
From YouTube: Scalability Team Demo - 2021-03-25
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
Yeah,
thank
you,
so
I
wanted
to
have
a
well
I'm
very
unprepared
again
and
I
found
a
bug
in
recording
rules
while
I
was
preparing,
or
at
least
trying
to
but
anyway.
So
I
want
to
to
show
where
I'm
at
with
recording
error
budgets
for
stage
groups.
Let
me
share
my
screen.
B
This
is
the
issue
that
I
was
working
on.
As
you
see,
we've
recorded
we've
started
by
making
sure
we've
got
all
the
source
metrics
in
place.
The
sort
of
metrics
are
error
rates,
total
operation
rates,
abduct
success
rates
and
total
optics
measurement
rates,
we're
calling
them
scores
and
then
for
now
and
then
a
mapping
of
feature
categories
to
stage
groups.
B
So
we've
got
all
those
in
place
now
and
then
now
we're
going
to
aggregate
the
future
categories.
All
of
these
metrics
that
we
have
for
fidget
categories
up
into.
B
The
meat
of
that
of
the
changes
here
lives
in
the
aggregation
sets
so
yeah.
These
are
some
rates
that
we
forgot,
but
this
is
the
the
main
thing
so
we're
defining
the
stage
groups
aggregation,
set
and
then
later
we're
going
to
take
the
feature
category
aggregation,
set
and
use
it
as
a
source
for
this
one
and
then
to
add
the
mapping.
On
top
of
that,
I
added
this
bit,
and
this
is
the
thing
that
I
would
like
some
input
on
from
all
of
you.
B
If
anybody
has
it
thought,
so,
this
is
the
mapping.
It
contains
feature
category
label
and
stage
group
label
and
the
product
stage
label
that
we
want
to
have
on
the
metrics
that
we'll
be
recording
here
and
the
way
I'm
adding.
That
is
like
joining
it
in
a
string
like
this.
C
Yeah,
that's
where
I'm
at
and
I
I
think
I
really
need
to
go
through
this
actually
and
mess
around
with
it
on
my
computer,
because
it's
yeah
like
I
think
that's
that's
the
best
way
to
kind
of
for
me
to
give
input
into
this.
If
you.
C
B
B
Okay,
because
you
asked
nicely
one
thing
that
I
did
want
to
point
out-
that
it's
a
bit
weird,
that
I
just
noticed
before
the
call
which
made
me
say
that
I'm
not
prepared
like
we
want
to
record
a
success
rate
here
and
for
some
reason
and
we're
summing
the
the
weight.
So
I
need
to
see
what's
going
on
and
if
that's
a
problem
with
other
other
recordings
as
well
yeah,
that's
where
I'm
at
for
now.
B
Stop
sharing.
I
can't
stop
sharing.
B
B
The
next
one
is
jacob.
I
think.
A
Okay,
thanks
bob
now
I'm
going
to
start
share.
I
have
two
items.
One
is
about
an
experiment.
We
did
where
we
bypassed
the
ci
pre-clone
script.
A
Actually
we
don't,
but
this
was
the
change
issue
for
it,
so
the
pac
objects
cache
we're
building
we're
not
calling
it
pac
file
cache
anymore,
because
back
files
also
are.
A
I
don't
even
want
to
go
into
why
it's
confusing,
but
unless
somebody
wants
me
to,
but
I
think
it's
confusing
to
call
it
back
file
cache.
So
I'm
calling
it
back.
Objects
cache
now.
A
Well,
the
thing
that
prompted
this
project
was
project
was
the
realization,
on
my
part,
how
important
the
ci
pre-clone
script
is,
which
is
a
custom
thing
we
have
for
gitlab
or
gitlab,
without
which
the
server
that
the
gitly
server
that
github
or
gitlab
sits
on
melts
down
because
of
all
the
ci
clones
and
we've
been
experimenting
with
the
cache.
The
back
object
back
objects
cache
in
the
past
couple
weeks
and
sort
of
as
the
final
experiment.
A
It
was
a
bit
tricky
to
enter
is
excited.
It
was
a
bit
tricky
to
figure
out
how
to
bypass
this,
because
at
first
I
tried
to
tweak
the
script
so
that
it
would
exit
without
doing
stuff
percentage
of
the
time.
But
I
ran
into
all
sorts
of
bugs
in
the
script
and
I
realized
after
a
while.
I
just
need
to
not
touch
this
script
because
it
works
and
the
moment
you
change
anything
it
might
stop
working
and
then
just
don't
do
that.
A
So
this
is
for
the
window
where
it
happened.
You
can
see
when
we
made
the
switch.
So
here
it's
around
50
megabytes
per
second-
and
here
we
are
in
the
200
to
250
megabytes
per
second
on
the
network
network
egress
rate,
and
the
reason
for
that
is
that
the
pre-clone
script
works
by
making
the
ci
runners
fetch
less
data
and
it
when
I
do
spot
checks,
it
looks
like
they
fetch
between
10
and
100
kilobytes
of
data
each,
and
that
is
great.
A
But
if
you
don't
do
that
and
you
do
a
partial
clone,
then
you
fetch
120
megabytes
of
data.
A
So
we
were
each
of
these
ci
runners
was
fetching,
maybe
a
thousand
times
as
many
as
many
bytes,
and
that
doesn't
really
add
up,
because
250
is
not
a
thousand
times
50
you'd
expect
more,
but
it
does
make
sense
that
the
network,
egress
rate,
grows
up
a
lot,
because
all
these
builds
are
now
much
bigger.
A
A
Well,
if
you
scroll
down,
you
see
something
interesting
here,
so
you
can
see.
This
is
where
the
we
were
doing,
bypassing
the
pre-clone
script
and
there
are
more
open
file.
Descriptors,
there's,
also
more
threads,
and
you
can
see
here
in
this
one
that
there
are
more
upload
back
processes
now.
My
guess,
slash
headway.
The
explanation
for
this
is
that,
because
we're
downloading
more
data,
each
upload
back
process
runs
for
a
longer
amount
of
time.
Just
because
sending
120
megabytes
will
take
longer
than
sending
100
kilobytes,
and
that
means
all
these
processes.
A
That
would
normally
finish
some.
Some
of
them
would
finish
one
after
the
other.
Now
they
will
overlap.
So
that's
why
there
were
more
processes.
But
if
you
look
here,
the
cpu
does
not
look
much
worse,
although
there
is
that
interesting
spike
there
and
I'm
not
sure
what
that's
about
but
yeah
in
the
grand
scheme
of
things
that
we
would
have
had
100,
cpu
and
incidents
and
all
sorts
of
stuff
going
wrong
if
we
would
have
done
this
in
december.
A
So
the
fact
that
this
just
held
up
is
nice
and
and
one
more
thing
before
I
hand
over
the
the
stop
talking.
So
that's
the
network
transmit
rate
that
is
very
clear
that
went
up.
This
metric
shows
the
number
of
bytes
served
by
the
cache,
and
I
need
to
change
that
to
a
10
minute
thing.
Then
it
looks
the
same.
So
that
goes
up
pretty
much
as
much
as
that.
A
There
is
a
little
bit
of
a
difference,
and
then
the
last
graph
I
wanted
to
show
was
the
disk
size
of
the
cache
and
there
you
can
see
that
here
it
dips
to
almost
zero.
More
often
and
while
we
were
doing
the
bigger
clones,
it
never
dips
to
zero.
So
you
can
tell
from
that
that
there
are
more
bytes
in
there,
but
it
I
was.
I
was
bracing
myself
for
a
bigger
impact
on
the
term
number
of
bytes
that
go
into
the
cache
and
that
was
just
sort
of
barely
well.
A
B
Could
you
like
I'm
trying
to
like
think
about
what
the
first
graph
you
showed
that
you
didn't
expect?
What
was
it
the.
E
B
Could
that
be
like
the
the
reason
we
don't
like
that
there
isn't
such
a
difference.
There
is
that,
in
the
grand
scheme
of
things,
lots
of
other
things
not
ci
are
also
generating
bytes
from
this
project
or.
B
A
Yeah,
but
still
there's
there
would
be
yeah,
there
is
other
traffic
clearly
and
whatever
ci
is
doing
in
terms
of
bytes,
apparently
is
not
as
much
as
the
other
traffic,
because.
A
Yeah
yeah
yeah
that
I
mean
that's
the
amazing
thing
about
this
cache
because
it
it
is,
the
cache
is
relatively
complex,
but
that's
because
it's
very
tries
very
hard
to
only
do
it
exactly
once
to
do
the
work
exactly
once
and
not
because
if
you
naively
build
a
cache,
you
can
have
two
things
that
produce
the
same
cache
entry
and
then
one
of
them
gets
thrown
away
and
it's
built
the
way
it
is
so
that
you
never
throw
something
away.
A
You
only
do
the
work
once,
but
still
we're
going
from
putting
things
into
the
cash
that
are
100
kilobytes
to
putting
things
into
the
cache
that
are
120
megabytes.
A
A
Know
I
am,
I
am
happy
I
I
think
I
am
a
little
bit
nervous
about
this.
The
the
egress
rate,
because.
A
Yes
exactly
and
there
is
going
to
be
a
limit
at
some
point
of
how
many
many
bytes
we
can
pump
out
of
that
one
gitly
server.
So
maybe
at
one
point
we'll
need
rescaling
just
for
the
network
pipe,
but
there's
always
going
to
be
a
bottleneck
somewhere,
and
I
I
guess,
let's
go
ahead.
Google
google
tell
us
that
the
network
is.
A
But
there
there
is
a
network
card
on
that
on
that
gitly
server,
magic.
F
A
A
Yeah,
I
have
no
idea
what
will
happen,
but
it's
yeah.
Yes,
bob.
I
am
happy
it's
just
it's
kind
of
like
amazing
that.
C
How
does
this
tie
in
with
prefix
distributing
reads.
A
Yeah,
if
it
was
sitting
behind
prefect,
then
on
each
italy,
replica
yeah,
you
you,
yes
that'd,
be
separate
cash.
A
Yeah
and
if,
if
there's
projects
where
the
number
of
concurrent
clones
is
lower
than
the
number
of
replicas,
then
you're
not
going
to
see
a
benefit
from
the
cache.
But
with
yeah.
C
A
No
and
I
no
exactly
the
hit
rate
goes
down,
but
you
still
get
the
benefit
of
only
one
process
doing
that
job.
The
the
deduplication
of
the
processes
for
each
individual
replica,
yeah.
A
A
I
noticed
when
I
started
writing
the
documentation
for
the
future,
which
is
that-
and
this
is
this
is
probably
not
going
to
be
true
in
the
general
case,
because
we're
doing
the
we've
been
doing
the
experiments
on
prefect,
the
prefect
cluster
and
on
canary
one,
which
both
have
very
few
repos
on
them,
like
most
normal
gitly
servers,
have
lots
of
different
repos
on
them,
but
these
are
sort
of
I
don't
know
test
beds
where
we
don't
have
a
lot
of
different
repos
and
the
reason
I'm
saying
this
is
that
there's
not
a
lot
of
different
repos
there's,
not
a
lot
of
git
data
on
disk
in
the
grand
scheme
of
things,
and
it
all
fits
in
memory
and
we're
not
going
to
stuff
the
page
cache
full
of
git
repository
files,
because
the
interesting
thing
that's
happening.
A
Time
then,
it's
100
kilobytes
per
second
here
peaking
on
discretest
and
in
this
graph
we're
serving
over
200
megabytes
per
second
and
just
to
be
clear.
This
is
a
counter
that
correlates
directly
to
read
system
calls
so
we're
asking
the
linux
kernel.
Please
read
these
bytes
from
a
file,
so
this
is
from
the
point
of
view
of
the
read
system
call.
A
A
Prefect
is
still
pumping
out
a
lot
of
data
here
right,
the
the
I
I'm
looking
at
the
same
servers
and
the
cache
it's
in
in
principle
it's
on
disk,
but
in
practice
it's
in
ram
because
the
linux
page
cache
keeps
it
keeps
it
in
ram
and
if
we
would
have
put
the
cache
in
object,
storage,
we
wouldn't
have
had
this
effect,
because
there
is
no
page
guys
for
object
storage,
but
because
we're
reading
from
disk
it,
the
linux
like
as
long
as
there's,
not
enough
ram
available
linux
puts
the
caching
ram
for
us.
A
Yeah
it
it's
quite,
I
and
I've
been
thinking
about
it
more
because
it's
even
if,
if
we
were
memory
constrained,
there's
the
effect
that
we
have
the
producer
writing
into
the
file
and
all
the
consumers
must
read
from
the
file.
So
all
the
pac
file
back
objects.
Data
must
go
through
the
file
in
a
way,
but
because
of
the
page
cache,
if
you
have
a
reader,
that's
right
at
right
behind
the
writer.
A
B
A
Well,
more
data
will
be
paged
in
from
disk,
but
linux
will
try
to
balance
it
somehow.
The
one
thing
that
can
happen
that
is
less
than
ideal,
is
that
repository
data
gets
paged
out.
That
needs
to
get
paged
back
in,
but.
A
A
C
A
C
A
page
fault
isn't
about
like
the
the
kind
of
excess
memory.
If
you
want
is
all
used
for
buffers
and
that
tends
to
shrink
down-
and
that's
not
page
faults,
is
it.
A
I
think
a
page
fault
is
when
you
want
to
read
something
and
it's
not
in
the
cache,
and
you
have
to
get
it
from
disk,
irrespective
of
what
it's
for
okay,
but
I
I
okay
matt,
will
help.
A
One
thing
I
want
to
say
here
is
that
I'm
not
sure
how
much
control
we
have.
There
are
some
tunables
in
the
linux
kernel
over
how
it
chooses
what
to
do
with
these
buffers,
but.
C
The
main
tunable
I
was
thinking
like
was
actually
amounts
of
memory
that
we
have
you
know,
and
you
know,
would
it
make
sense
in
some
cases
to
have
extra
memory
on
the
machines,
because
you
know
they
they
they
would
maybe.
A
A
A
So,
what's
the
next
step,
the
next
step
is
that
we
make
the
cache
configurable
via
chef,
because
right
now
it
uses
hard-coded
config
values
and
you
can
only
turn
it
on
and
off
with
the
feature
flag.
So
it
should
have
a
config
value
that
says
on
off
per
server,
and
so
I
need
to
do
a
couple
of
merch
requests
for
that
and
I
write
some
documentation,
but
I
think
the
moment
it
is
configure
configurable
via
chef,
I'm
going
to
raise
an
issue
to
have
it
rolled
out.
F
So
yeah,
sorry,
I
might
be
missing
something,
but
is
there
a
reason
why
we
couldn't
use
the
feature
flag
to
just
roll
it
out
to
a
percentage
of
users
now
parallel.
G
A
If
we
wanted
to
roll
it
out
now
with
the
feature
flag,
we
could
I
just
yeah
that
we
don't
have
very
fine
grade
control
with
the
feature
flag.
A
So
one
thing
that
I'm
slightly
concerned
about
is
the
file
hdd
servers
because
they
have
slower
disks
and
we
don't
have
a
way
to
say
this
is
on
everywhere,
except
on
file
http
right
now
we
can
only
say
it
is
on
for
projects
or
a
percentage
of
projects
where
it's
sort
of
the
set
of
projects
gets
picked
randomly
by
the
feature
flag
mechanism
or
it's
on
a
percentage
of
time,
but
we
cannot
turn
it
on
and
off
for
individual
servers,
but
do.
F
F
Right,
so
maybe,
if
you
feel
comfortable
doing
this,
I
would
rather
want
to
see
this
running
in
parallel.
F
If
we
can
do
a
percentage
of
traffic
while
right
like
if
you
run
this
and
we
kind
of
observe
it
while
the
configuration
is
being
written,
we're
going
to
get
best
of
both
worlds
like
you
can
get,
maybe
like
edge
cases
if
we
see
edge
cases,
and
we
can
then
stop
the
configuration
work
because
it's
more
important
to
actually
fix
the
the
actual
operations.
A
Right
yeah,
so
the
idea
of
the
experiments
so
far
has
been
to
look
for
edge
cases,
and
at
this
point
I
think
we're
going
to
be
fine
with
edge
cases
and
the
configuration
work.
The
gitly
merge
request
is
already
approved
by
one
reviewer
and
the
omnibus
merge
request
is
I
need
to.
I
want
to
check
it
manually,
but
what
I'm
trying
to
say
is
that
up
to
the
configuration
work
is
almost
done,
so
I'm
not
sure
how
much
time
we
gain
by
starting
a
percentage.
F
F
The
gem
problem
we
are
on
on
like
we're
digging
ourselves
out
of
it,
but
apparently
there
is
also
like
a
big
open,
ssl
vulnerability
announcement
today.
So
that's
going
to
be
fun,
so,
my
point
being
there
are
delays
with
all
everything
else
that
is
happening
and
given
that
you
actually
have
a
tool
to
to
start
rolling
this
out
and
see
what
kind
of
pressure
it
creates
in
the
infrastructure
right
like
if
we
in
this
situation,
where
we
are
at
now,
can
enable
this
and
it
still
works
and
the
platform
works.
F
E
So
I'm
not.
A
Blocked
yet,
but
the
moment
I
get
blocked,
I
can
raise
an
issue
because
I've
already
started
a
bunch
of
things.
It
doesn't
make
sense
to
drop
them
if
I
can
keep
moving
them
along
right.
E
A
A
This
great,
I
think
that
wait,
I'm
now
that
was
it
or
unless
that
was
my
part
of
the
agenda.
My
part.