►
From YouTube: Scalability Team Demo - 2021-10-28
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Right,
I've
got
the
first
item,
so
I
wanted
to
talk
a
bit
more
about.
I
think
what
I
just
talked
about
last
time,
but
I've
been
away
for
a
week,
so
I've
not
been
working
on
this
for
that
long.
So
I
set
up
when
we
were
doing
the
test
to
see
what
one
cue
per
shard
would
look
like
in
terms
of
whether
that
would
be
a
valuable
project
to
do
craig
created
a
pair
of
instances.
A
I
think
he
actually
created
three
but
to
test
simulate
what
we
do
with
sidekick
on
our
radius
instance
and
then
say:
okay.
Well,
if
we
have
one
cube
per
worker
and
we
have
this
many
jobs
for
this
many
workers
and
they
take
approximately
this
long
per
worker.
Like
what
happens,
we
have
one
cube
per
shared.
What
happens
so?
I'm
just
grabbing
that
and
updating
it
to
see
what
happens
with
different
scheduled
set
permutations.
A
B
A
So
that's
what's
been
taking
me.
The
time
so
far
is
figuring
out
whether
my
results
are
actually
valid
or
not.
I
think
I'm
getting
there,
but
I
still
need
to
tweak
the
actual
total
amount
of
traffic
to
get
the
right
load
in
the
first
place.
I'm
gonna
share.
Well,
I'm
actually
running
one
now,
so
that's
a
good
example.
A
So
if
I
just
spin
this
back
a
bit
further,
so
yeah
one
of
the
issues
is
like
I
run
an
experiment.
I
want
to
leave
a
gap
between
experiments
to
see
you
know
to
make
it
clearer
where
the
demarcation
is
so
we
can
ignore
all
of
this
to
the
left.
A
These
three
here
are
with
the
new
sidekick
six
scheduler
that
we
had
issues
with.
So
this
is
with
no
scheduled
jobs,
and
you
can
see
I've
set
the
base
load-
probably
too
high
here,
because
this
is
one.
This
is
with
half
of
all
jobs
being
scheduled,
and
this
is
with
100
before
drops
being
scheduled,
and
you
can
see
there's
no
difference
between
those
two
again,
probably
because
the
base
the
underlying
load
was
too
high.
A
What
is
interesting,
so
I
made
this
a
stack
chart
to
make
it
clear
about
make
it
unstacked
is
that
the
user
time
was
pretty
much
the
same
in
both
of
those.
It
was
the
system
time
that
went
up
and
if
we
look
at
the
redis
exporter
for
those,
we
can
see
that
it
was
similar
to
what
we
saw
in
production,
where
we
have
a
huge
amount
of
zebra
commands
because
of
the
way
the
sidekick
six
scheduler
works,
and
so
a
lot
of
that
I
haven't
taken
any
photographs.
A
So
I
don't
know
this
for
sure
or
profiles,
but
a
lot
of
that
is
probably
due
just
to
the
additional
network
overhead
of
running.
A
A
This
is,
I
think
this
was
ramping
up
slower,
so
this
was
going,
no
jobs
are
scheduled,
30
jobs
are
scheduled,
50
of
jobs
scheduled,
seventy
percent
of
jobs
are
scheduled,
and
I
was
like
wait.
I
don't
have
enough.
I
don't
have
enough
headroom
here
to
see
the
difference,
so
I'm
gonna
have
to
back
out
and
reconsider,
and
the
distinction
here
is
that
the
it's
user
time
that's
going
up,
not
system
time,
so
the
blue
peeps,
here
and
down
here
we
can
see
the
rate
of
commands.
A
We're
sending
to
redis
is
much
lower.
This
is
valshar.
This
is
the
so
that's
it.
We
should
have
as
many
z
ads
as
zrams.
Basically
because
that
means
scheduling
a
job.
Zebra
means
de-scheduling
it
running.
So
I'm
just
running
one
now
as
well.
A
That's
why
this
is
changing
when
I
hit
execute
instinctively,
because
I
want
to
see
what's
going
on
so
all
I'm
doing
with
that
is
now
prepping
to
see
what
happens
when
we
change
to
only
scheduling
from
certain
processes,
so
not
scheduling
from
as
many
processors
that
we
have
now,
and
I
can
also
I
haven't
done
yet.
Look
at
the
I've
got
the
psychic
server
logs
available
to
me.
A
So
I
can
look
at
those
and
see
what
the
schedule,
what
that
does
to
scheduling
latency
as
well,
but,
like
I
said
most
of
my
time
on
this
this
week,
has
been
spent
finding
that
actually
what
I
thought
was
valid.
It
was
not
valid,
then
thinking
right,
I've
totally
fixed
that
and
then
finding
another
problem.
I
think
now,
I'm
at
a
point
where
I
don't
have
any
problems,
but
I
thought
that
on
tuesday
and
on
wednesday,
so
who
knows.
D
Is
that
including
the
change
that
heinrich,
I
think,
heinrich
just
the
well,
I
think
you
merged
it
or
did
I
merge
it?
I
don't
know,
but
the
idempotent
dropping
the
item,
button
scheduling
or
what
was
it
like?
The
lua
script.
A
D
A
So
that's
valsha
here,
so
that's
these.
A
So
yes,
that
does
include
that,
so
I
I
was
just
showing
the
difference
between
the
two
schedulers
initially
to
show
the
initial
difference,
and
now
I'm
looking
at
using
the
scheduler
we
have
if
we
do
that
in
fewer
places.
What
happens,
but
I
want
to
make
sure
my
results
are
actually
correct.
First,
so.
A
There's
no
questions
for
that.
Then
I
think
it's
bomb.
D
I
wanted
to
show
how,
like
we
want
to
introduce
the
new
customizable
request:
duration,
app
dex
sli
to
error
budgets,
it's
already
in
the
services,
so
the
services
are
like
have
that
sli
now
and
we
will
use
it
for
alerting
and
so
on.
That's
already
running,
but
we
wanted
to
include
those
on
error
budgets
for
stage
groups
and
like
we
wanted
to
give
the
people
time
to
adjust
their
threshold
on
the
endpoints
that
they
have
before
feeding
that
in
the
air
budget,
and
that's
mostly
done.
D
How
that'll
work
is
I've
added
everyone
here
and,
let's
see,
where's
a
stage
group
package
is
a
stage
group
and
I've
added
the
key
ignored
components.
Components
refers
to
the
component
label
that
we
have
on
on
our
metrics
and
when
you
set
that
that's
going
to
result
in
these
kind
of
rules,.
D
D
Yeah
so
here
we
see
the
package
group
that
has
the
rails
request
component
ignored
as
we
see
here,
and
we
do
that
so
separate
according
for
each
group,
and
we
do
that
by
selecting
here
and
then
negating
here.
I
don't
know
if
there's
any
thoughts
about
that,
because
the
annoying
thing
is
that
now
we
have
this
separate
recording
for
each
group.
C
C
D
We
want
people
do
we
want
to
so
these
are
already
included
for
service
monitoring
and
so
on.
That's.
D
Just
stay
the
same,
they
are
not
included
in
error
budgets
because
there
we
wanted
to
have
the
you
know
the
meta
from
yeah.
C
C
C
D
Because
some
teams
well
right
now
we're
waiting
for
teams
to
start
setting
thresholds
I
merged
the
first
merge
request
related
to
that
yesterday.
I
think-
and
we
want
to
give
them
time
before,
feeding
it
into
the
budget,
because
otherwise,
if
we
just
turn
it
on,
then
everything's
going
to
be
red
and
people
right.
C
D
The
reason
I
did
it
with
these
two
was
this
separation,
so
future
category
metrics
stay
the
same
and
stage
group
metrics.
Don't
is.
I
want
to
at
some
point
build
like
like
a
group
overview
dashboard
like
we
have
the
the
service
overview
dashboard
that
shows
the
metrics
on
the
left
and
then
you
can
basically
click
on
and
off
components
using
a
template.
D
I'm
hoping
that's
possible
like
in
my
in
my
mind
it
is
so
then
you
could
basically
see
what
things
would
look
like
if
you
enabled
or
disabled
certain
components
from
your
air
budget,
and
then
you
can
remove
the
like
the
exclusion
in
the
themes
of
the
ammo
when
you
think
that
it's
close
enough,
that's
the
idea.
C
No,
it's
not
directly
related
to
this,
but
something
that
I've
been
trying
to
do
in
the
engineering
allocation
meeting,
and
I
I
want
to
bring
this
up
because
you
might
have
a
better
way
of
doing
it.
Is
I've
been
trying
to
kind
of
like
connect
the
budget
to
like
the
user
experience
so
every
week?
C
What
I
do
is,
I
say
you
know
these
are
the
five
violations
and
you
know
it's
pretty
hard
to
violate
now
with
the
with
the
geptics
as
it
is,
but
we
still
get
a
few
and
what
I'm
trying
to
do
is
kind
of
build
up.
Trust
in
like
these
numbers
are
real,
and
so
I'll
say
like
here
are
the
violations,
and
this
is
the
user
experience
that
people
have.
You
know
this
is.
C
C
C
So
we've
got
you
know
these
this
report,
which
has
got
all
the
error
budgets,
goodness
in
it
and
and
then
you'll-
get
something
like
this
threat
inside
threats
insights
over
here
right,
which
is
a
violation
and
then
kind
of
what
what
I've
been
trying
to
do
is
like
actually
work
back
from
that
to
hey.
This
is
this
is
what
the
users
are
experiencing
like.
C
You
know,
and
so
in
that
case
I
could
see
immediately
that
the
problem
was
pages
and
then
I
it
helped
me
kind
of
go
into
pages.
But
obviously
I've
got
like
a
lot
of
tribal
knowledge
that
lets
me
put
together
this
query
and
get
to
here
and
then
kind
of
go
from
there.
The
teams
don't
have
that
I
was
wondering-
is
there's
nothing
on
the
on
the
stage
group
dashboards
that
has
this
breakdown.
Is
there.
D
C
D
D
A
D
A
D
D
C
D
C
B
As
people
start
using
the
error
budgets,
but
also
as
we
start
using
the
error
budgets
and
the
data
that's
been
collected,
I
think
we're
seeing
that
there's
certain
like
additional
bits
of
tooling.
That
would
be
helpful
to
have
and
there's
a
couple
of
issues
that
need
to
be
raised
about
that
additional,
tooling
and
how
they
could
be
helpful.
C
Yeah
that
that
the
attribution
approach
right
where
it's
like
ratio
on
on
the
number
of
requests
is
also
really
helpful,
because
you
could
get
something
that
if
you
just
the
way
that
I
was
doing
it
there,
you
might
get
a
feature
category.
That's
like
50
or
something
really
poor.
But
it's
got
like
20
requests
a
month
and
the
attribution
approach,
because
it's
taking
the
total
error
budget
and
like
what
percentage
each
subsection.
C
Yeah,
it's
like
a
really
ugly
query:
yeah.
C
C
Every
week
I
find
like
one
or
two
new
infra
dev
issues,
because
I
just
go
I'm
basically
starting
with
the
error
budget
and
then
I'm
going
backwards
from
that
to
like
what
what's
the
cause
of
this
and
there's
like
always
new
things
and
one
of
the
pm
said
to
me,
like
you
just
seem
to
be
finding
all
these
bugs
that
we
didn't
know
about,
and
I
thought
that
that
was
such
a
nice
comment,
because
you
seem
to
be
front
running
our
bugs
and
it's
purely
down
to
those
zero
budgets
right.
D
A
D
F
C
D
C
F
C
Run
this
query:
it'll
tell
you,
but
it'll
be
much
better
when
I.
C
D
I
review
things
like
that
are
related
to
stuff,
like
this,
I
tend
to
like
drop
the
tunnels
link
in
and
like
when
this
is
deployed.
Click
here
to
see
the
effect.
C
G
Okay,
so
it's
my
time
I'll
share
my
stream,
so
rock-
and
I
have
been
working
on
the
interesting
issue
for
the
last
two
weeks,
and
this
is
about
evaluating
the
different
approaches
to
scale
up
already
agencies
and
we
are
trying
to
capture
the
rotation
traffic
and
then
try
to
raise
the
road
test
to
test
different
approaches,
different
system
to
serve
our
production
scale.
And
it
is
a
little
bit
different
from
what
we
have
been
doing
before
and
because
we
are
trying
to
evaluate
the
data
using
reddit
on
top
of
reddit
luster.
G
The
key
distribution
is
really
important.
So
that's
why
we
really
want
to
capture
the
eruption
level
of
loss,
and
then
we
try
to
capture
the
keyness
by
the
data
size
and
even
except
patent
of
each
key,
and
so
that
we
can
generate
the
gross
profit.
And
we
said
it's
not
it's
not
only
about
the
traffic
or
the
request
rate
and
it's
all
about
the
key
distribution
and
the
command
we
use
to
access
to
each
key.
G
So
one
of
what
we
are
trying
to
do
is
to
snip
into
the
radius
interns
and
try
to
capture
the
risk
traffic
via
the
distribution
command,
and
it
will
only
have
a
really
an
excellent
guideline.
How
to
do
this.
So
basically,
after
we
snip
into
the
upgraded
instance,
we
are
trying
to
generate
the
pickup
file
with
this
item
and
then
we
use
a
simple
script
in
the
grab
book
to
analyze
the
traffic,
and
then
we
can
get
a
full
list
of
keys
by
equity
and
comments.
G
We
are
trying
to
run
against
release
and
rob,
and
I
are
working
on
this
on
top
of
that,
and
we
got
some
progress
that
we
refactored
a
script
to
capture
not
only
the
request,
but
the
response
as
well.
The
original
script
doesn't
have
the
response,
so
we
have
to
match
the
respect
and
response
and
to
analyze
the
data
so
from
the
big
file.
G
We
split
on
the
file
into
a
folder
of
smaller
files,
and
it
file
contains
some
kind
of
raw
data
from
radius
and
this
data
is
generated
by
the
radius
protocol
and
for
each
request.
We
have
to
file
one
file
for
the
request
and
another
file
for
response
and
in
each
file
we
have
an
index
file
and
yeah,
that's
it
and
we
have
to
pass
the
request
file
to
get
the
no.
G
It
is
serious,
respawn
file,
so
we
we
try
to
pass
the
request
file
here
and
with
the
raised
roots
corn,
and
then
we
try
to
look
up
back
to
the
response
file
with
using
the
timestamp
here
to
match
the
response.
And
after
that
we
can
generate
something
like
this
one
and
yep.
So
basically,
we
try
to
reassemble
the
all
the
bases
from
this
video
data
back
to
the
data,
and
then
we
try
to
generate
the
key
pattern
files
from
this
one.
So
after
this
one
we
got
a
file.
Look
like
this.
G
So
in
this
file
it
contains
an
array
search.
It
contains
a
hash
of
the
key
button
and
for
each
key
pattern
we
will
have
a
value
size,
which
is
the
data
side.
We
set
it
to
reduced
and
response
side,
which
is
the
size,
the
responsive
we
receive
from
redis
and
some
other
data
on
important
one
is
the
unit
key
frequency,
which
is
the
distribution
of
the
key
upon
the
pattern
and
the
total
uses
of
the
key.
G
So
from
the
data
we
try
to
have
analyze
and
have
a
big
picture
of
what
we
are
trying
to
do
so
I
get
I
did
some
analytics
on
this
and
when
we
capture
about
30
seconds
of
data
on
russian
traffic,
we
get
about
600,
000
requests
equivalent
to
20
requests
per
30,
000
requests
per
second,
and
then
we
get
the
offer
and
picture
of
the
release
and
profit
and
profit
for
read
data.
G
We
are
trying
to
push
the
file
into
a
slotted
system
and
we're
using
cases
doc
io
right
now
to
generate
a
loss
test.
So
basically
it
allows
us
to
write
the
load
test
and
based
on
the
setup
scenario.
So
it
has
some
okay
with
cases.
Okay,
the
scenario
looks
really
simple
like
this,
so
it
allows
us
to
declare
the
scenario
on
in
javascript
and
then
we
will
run
the
system
against
our
reddit
instances.
G
However,
it
doesn't
support
the
radius
outer
box,
so
we
have
to
write
a
thing
adapter
on
top
of
a
gold
line
to
plug
into
this
javascript
scenarios
file,
and
then
we
try
to
run
that
and
the
result
is
interesting.
So
let's
say
the
test.
Pin
script
is
really
simple:
it's
not
that
simple,
but
we
will
pass
the
file
the
key
pattern
file
there.
C
G
Try
to
con
translate
the
key
button
into
different
scenarios.
It's
in
natural
is
a
combination
of
the
command
the
key
button,
and
then
we
try
to
repopulate
the
data
before
we
run
the
last
test
and
after
that
we
will
buy
the
the
patent
to
issue
the
rule
command
against
radius
here
so
on.
We
implement
a
team
adapter
in
golem.
Right
here
is
the
relating
adapter
to
block
our
command
and
issue
a
real
great
command,
and
after
that
we
just
need
to
run
the
test
file
and
it
will
perform
it
automatically
first.
G
So
before
we
do
anything,
we
will
publish
the
data
and
generate
random
keys
based
on
the
key
frequency
and
after
that,
the
cases
with
about
a
hundred
of
clients,
and
then
it
will
issue
the
data
with
different
iteration
speed.
For
example,
we
generate
about
300
of
scenarios
and
each
scenario
is
the
combination
of
command
and
key
button
here,
and
each
scenario
has
different
speed.
G
G
G
Okay,
so
after
running
the
last
test,
we
can
see
that
I'm
issuing
about
50
45
000
requests
per
second
against
my
local
red
instance,
and
then
I
have
some
metric
for
experiment
issue
and
the
the
distribution
that
matches
the
one
we
capture
on
russian
and
for
its
command.
I
can
I
have
a
set
of
metrics
history
metric
here
to
see
whether
our
histograms
is
satisfactory
to
what
we
want
to
when
we
scale
the
reddit
instances
up.
G
So
we
are
moving
into
the
last
step
of
this
issue
and
after
that
we
bring
up
different,
related
instances
and
register
and
run
a
lot
against
that
with
different
settings
so
yeah.
That's
it.
E
That's
that's
really
cool
one
question
I
had
what
kind
of
environment
are
we
running
these
tests
in?
Do
we
provision
like
dedicated
machines
and
have
a
separate
machine
for
the
client
to
assist
the
server.
G
Yes,
we
will
revision
about
some
client
not
and
run
the
law
test.
Against
that.
One
good
benefit
of
this
approach
is
that,
because
on
the
lottery
is
in
the
court,
we
just
need
to
learn
the
scenarios
and
just
clone
again
in
different
notes
and
the
debt
resistance
we
don't
need
to
well
took
many
manual
works.
C
Just
one
environment
I've
used
k6
before
and
I
think
it's
a
super
cool
tool
is
the
javascript
execution
environment
node
or
is
it
just
like
a
vanilla,
javascript.
G
C
E
So
I've
been
working
on
a
little
script
that
basically
runs
through
the
install
of
the
helm
chart
and
then
does
an
incremental
upgrade
and
I've
tried
to
match
the
configuration
to
what
we
have
in
production
as
well.
So
this
is
basically
the
script,
so
it
runs,
helm,
install
and
then
helm
upgrade
and.
E
E
I've
got
stone
for
tailing
logs
in
kubernetes
here
and
we
can
see
the
the
pot
is
starting
to
come
up,
but
it's
waiting
for
its
own
ip
to
appear
in
the
kubernetes
service.
So
there's
some
some
logic
there
for
it
to
kind
of
wait
for
for
that
to
propagate.
So
now
it
came
up.
E
E
E
E
So,
let's
see
where
we're
at
okay,
so
we're
we're
now
up
and
running,
and
so
the
scenario
that
I
want
to
simulate
is
upgrading
the
reddest
version,
so
we're
gonna
simulate
an
upgrade
and
it's
gonna
set
the
image.tag
to
the
like
one
like
plus
one
redis
version,
so
we'll
do
a
helm,
diff
first
and
indeed
we
can
see
it's
kind
of
proposing
to
to
update
the
image
and
the
other
important
change
that
it's
doing
here
is
it's
setting
the
partition
on
the
update
strategy
on
the
stateful
set,
and
this
is
the
kubernetes
mechanism
for
isolating
a
change
that
you're
making
to
a
stateful
set.
E
A
How
would
we
can
we
manually
fail
over
to
a
different
make
a
different
one?
The
primary
with
this.
E
Yes,
there's
there's
some
caveats
with
that
that
I'm
still
trying
to
figure
out,
so
I
can
actually
maybe
try
and
demonstrate
that.
E
E
E
Yeah
and
yeah
looks
like
that
worked
fine.
In
this
case,
I've
had
some
tests
where
this
puts
it
in
a
weird
state
and
it
takes
a
while
to
recover
and
okay.
So
so
I
can
try
and
do
another
failover
and
see
the
opposite.
E
E
Yeah,
we've
also
got
next
failover
delay,
so
it's
kind
of
not
deciding
not
to
do
a
failover
yet
because
it
doesn't
want
to
do
too
many
failovers.
So
node
0
is
still
trying
to
connect
to
itself,
but
I
I
think
we
promoted
a
different
one.
E
E
Related
to
that-
but
you
can
see
here
now
after
you
know
like
40,
I
don't
know
like
half
a
minute
or
so
sentinel
noticed
that
something
is
off
and
it
does.
This
fix
fixed
slave
config
thing
and
sort
of
fixes
itself
and
now
we're
in
a
good
state
again,
so
at
least
it
recovers
eventually,
but
it's
still
not
really
nice
to
have
this
weird
behavior.
So
I
I
want
to
dig
into
this
and.
A
A
The
other
question
I
wanted
to
ask
was
first
of
all,
this
is
really
cool.
Thank
you.
The
other
question
I
wanted
to
ask
was:
what
was
it
oh
resizing?
So,
presumably
this
makes
things
like
the
resizing.
We
had
to
do
a
while
ago,
simpler
as
well,
where
we
needed
to
make
british
persistent
have
more
well
disk,
but
you
know,
I
guess
that's
handled
by
the
other
things
in
kubernetes
and
then
also
memory.
E
E
The
so
the
the
partition
I
can-
maybe
just
so
you
see
that
as
well,
we're
using
stateful
set
and
we're
using
the
partition
mechanism
on
statefulset,
and
so
the
example
that
I
showed
was
using
that
to
update
the
image
on
only
one
of
the
three
parts.
But
we
can
apply
that
to
any
change.
So
that
also
applies
to
update
updating
requests
and
limits.
F
Yeah
on
the
part,
so
this
this
specifically
allows
us
to
have
heterogeneous
sentinel
deployments,
because
that's
what
you
need
to
do
these
sort
of
upgrades
right.
You
need
yeah
the.
F
Needs
to
understand
that
it's
it's
not
just
because
that's
one
of
the
things
I
find
difficult
about
understanding
our
italy,
their
server
terraform
conflict,
because
it
just
says
the
kidney
servers
look
like
this
period,
and
it's
not
very
clear
to
me
how
you
make
changes
to
that's
where
you
account
for
not
everything
can
change
at
once
or
how?
How
does
the
change
work.
C
We'll
see
yeah,
but
what
was
it
gonna
say?
Is
it
possible
to
run
kwang
wins
load
testing
during
your
upgrade
as
another,
as
another
part
is
another
thing
running
in
the
cluster.
E
F
E
Yeah
reads
should
remain
available
due
to
the
way
that
the
the
ruby
client
handles
failovers
so
on
failover
will
either
get
some
stale
reads
from
the
primary
before
it's
stepped
down
once
it
steps
down,
it
closes
all
connections
to
all
clients,
and
so
the
the
clients
will
reconnect.
E
There
is
in
our
current
configuration
there
is
a
new
client,
a
new
primary
already
present
before
the
step
down.
So
it's
like
the
sentinels
agree
on
who
the
new
primary
is
and
then
the
step
down
message
goes
to
the
old
primary
and
so
there's
there's
a
window
of
stale
reads
in
between
and
potentially
lost
rights
as
well,
because
we're
writing
stuff
to
the
old
primary
that
then
gets
thrown
away.
But
the
new
primary
is
known
at
that
time.
E
So
reits
remain
available.
E
Either
that
or
we
could
you
know
we
could
shut
down
like
if
node
two
happens
to
be
the
primary
we
can
shut
down
that
node
and
sentinel
will
elect
a
new,
but.
C
E
F
F
And
I
think
sometimes
in
discussions
about
aj
people
have
expectations
of
everything
has
to
be
available
100
of
the
time
and
then
you
can
work
yourself
into
a
corner,
and
if
you
accept
that
a
controlled
amount
of
lost
rights
or
lost
reads
or
whatever
is
part
of
doing
business,
then
you
can
yeah
design
better
systems.
F
C
F
Just
the
speed
improvement
was
pretty
surprising
because
it
was
end-to-end
six
times
faster
and
it
used
nine
times
less
resources
on
the
on
the
server
and
then,
after
a
while,
we
decided
to
just
take
the
proof
of
concept
and
submit
submitted
as
proper
code
and
that
met
resistance
from
the
gitly
team
because
they
felt
it
was
making
too
big
of
a
change
to
their
architecture,
the,
and
that
resulted
in
a
bit
of
a
stalemate
where
we
didn't
know
how
to
move
that
merge,
request
forwards.
F
But
yesterday
the
gitly
product
manager
mark
wood
got
involved
and
I
impulsively
thought:
let's
just
have
a
call
and
see
see
if,
where
this
goes
and
that
turned
out
to
be
a
very
helpful
conversation.
So
what
I'm
going
to
do
now?
Is
I've
been
approaching
this
as
fixing
one
little
thing?
But
it's
it's
really
a
pattern.
There's
a
lot
of
rpcs
that
follow
this
pattern.
That
is
inefficient,
and
I
guess
that's
also
where
the
the
friction
comes
from,
because
I'm
breaking
that
better.
F
So
what
I'm
going
to
do
now
is
write
a
sort
of
short
document
where
I
explain
what
the
pattern
should
be.
I
think
and
see
if
we
can
get
buy
in
on
that,
or
that
makes
it
easier
to
to
sell
the
idea
and
and
mark
offered
to
help
me
write
that
or
give
feedback
on
on
how
I
write
that
and
like
what
needs
to
be
in
there
or
so
it's
not
like.
I
have
to
write
a
document
that,
and
I
don't
know
what
the
requirements
are
and
that's
yeah.
F
That's
not
the
approach.
I
chose,
obviously
because
I
thought
just
from
a
technical
perspective:
let's
just
go
in
and
do
the
least
amount
of
work
and
fix
the
thing,
but
I'm
happy
to
to
paint
this
picture
and
see
what
see
if
people
like
this.
F
F
F
But
it
seems
to
be
a
lot
of
the
time
and
that
new
rpc
got
enabled
like
one
or
two
days
ago,
and
I
think
it's
been
a
bit
quieter
since
so,
if
we're
lucky
that
one
particular
problem
got
solved
in
a
different
way
and
I
still
think
we
should
do
something
about
final
tags
and
there's
other
rpcs
that
have
the
same
problem.
F
F
F
And
just
as
a
quick
sketch
of
what
the
the
bigger
pictures
that
I'm
trying
to
sell
is
that
in
the
current
model
we
have
rpcs
that
have
a
curtain
certain
certain
design
which
is
inefficient.
We
now
know
from
a
computational
perspective,
it
puts
a
maintenance
burden
on
the
gitly
team
and
when
teams
like
create
source
code
want
to
build,
build
new
git
features,
they
need
to
build
new
rpcs
all
the
time
and
every
time
something
is
not
part
of
the
the
big,
the
big
abstract
interface.
F
The
interface
needs
to
be
expanded,
and
I
think
what
a
better
approach
would
be
would
be
to
say
that
most
skitly
rpcs
are
very
thin.
Wrappers
around
git
commands
and
gitly
streams.
The
output,
as
is
to
the
client,
because
then,
if
the
client
wants
to
parse
another
piece
of
the
data,
then
they
don't
have
to
ask
italy
to
send
another
piece
of
the
data
because
they
get
all
the
data
there
is
and
another
thing
that
is
not
strictly
about
performance,
but
that
is
about
lower
maintenance.
F
F
F
So
let's
update
the
protocol
and
update
the
rpc
and
ship
a
new
gitly
version,
so
the
client
can
sort
on
another
field
when,
if
instead,
they
can
just
pass
the
sort
argument
that
git
already
supports,
then
that
saves
yeah,
that
that
removes
a
necessary
friction
for
the
clients,
meaning
great
source
codes
to
develop
new
features.
E
It
does
sound
a
bit
risky
in
in
the
sense
that
we
have
less
control
or
we
depends
how
you
say
we,
but
with
a
broad
interface,
there
is
less
control
on
the
allowed
set
of
combinations.
D
F
I
I
don't
think
it's
about
it's
necessarily
about
having
less
control.
It's
it's
more
about
whether
you
want
to
define
everything
that
is
possible
in
the
protobuf
definitions,
the
protobuf
definition,
sort
of
act
as
a
type
system,
and
you
can
say
I
want
to
lock
everything
down,
and
this
is
everything
that's
possible.
But
then
what
you
end
up
with
is
a
protobuf
definition
that
looks
like
the
get
manual
page
of
a
command
where
all
the
possible
values
of
all
the
possible
flags
are
protobuf
fields
and
and
constants
and.
F
Yeah,
I
think
it
makes
more
sense
to
say,
you're
allowed
to
pass
flags
and
then
there's
an
allow
list
in
italy
where
we
say
your
if
it
becomes
a
runtime
error
right
is
it
if
you
encode
the
information
in
the
type
system,
then
you
cannot
make
certain
calls,
because
it's
more
like
a
static
thing
right,
but
yeah
you're
doing
this
across
repositories
and
you
need
to
have
merged
requests
in
multiple
repositories.
Multiple
repositories
do
wait
for
a
release
to
be
integrated
back
in
and.
F
F
E
Guess
the
the
analogy
that
I
have
in
mind
is
kind
of
like
graphql,
where
you
really
have
a
lot
of
flexibility
in
the
the
types
of
queries
that
you
send
and
maybe
that's
a
bit
of
an
extreme
example,
but
it
it
generally
makes
the
rpc
performance
much
less
predictable.
E
D
D
That's
what's
what
we're
doing
with
graphql
now,
as
well
as
measuring
performance
differently,
so
it
is
kind
of
predictable,
like
don't
remember,
measure
request
duration
anymore,
because
anything
can
be
in
there.
But
if
you
make
a
distinction
between
what's
happening,
then
you
can
measure
it
again
because.
F
Yeah,
I
think
graphql
is
a
more
extreme
example,
because
you
can
make
very
well
combinations
of.
I
want
to
get
all
the
x
and
all
the
y,
and
I
want
to
have
this
hundred
deep
and
a
thousand
of
that,
and
you
can
make
an
arbitrarily
complex
query
and
if
I
say
I
want
to
run
git
for
each
ref
and
I
maybe
can
add
some
sword
flags
or
I
could
say,
I
only
want
the
refs
that
contain
this
commits.
C
Yeah
another
fairly
successful
api
with
a
service
with
the
api
like
that
is
sql,
so
you
know
you
can
put
anything
in
it's
pretty
open-ended
and
I
you
know
in
a
way
it's
the
same
sort
of
thing
that
you're
talking
about
right.
F
Yes,
and
no,
I
I
think
sql
is
in
a
way
more
like
graphql,
because
you
can
do
anything.
It's
yeah.
F
And,
and
and
as
we
know,
that
is,
that
is
great,
but
it's
also
a
problem.
C
F
Yeah,
because
you
can
have
sql
queries,
that
are
absolutely
horrible,
but
this
would
be
more
like
you
have
an
api
that
allows
you
to
do
a
select
on
a
certain
table.
So
it's
never
going
to
be
worse
than
whatever
you
can
do
with
the
selects
on
on
the
given
table,
and
it's
it's
not
a
perfect
analogy.
F
F
But
but
in
practice
we
have,
I
haven't
counted
yet,
but
we
probably
have
four
or
five,
if
not
more
different
commands
that
are
all
variations
on
git
for
each
ref,
so
different
rpcs
that
call
git
for
each
ref
with
slightly
different
flags,
and
if
you
just
have
one
error
pc
that
gives
you
the
output
of
git
for
each
ref,
then
you
can
defecate
a
bunch
of
old
ones
and
you
don't
need
to
have
to
keep
adding
new
ones,
because
the
rpc
that
got
added
to
address
this
problem
on
file
43
was
yet
another
variation
on
git
for
href.
F
So
if
we
would
have
had
a
generic
git
for
href
rpc,
we
wouldn't
need
a
new
rpc.
At
worst,
we
would
have
to
tweak
the
allow
lists
of
the
generic
git
for
each
ref
rpc
to
say
the
clown.
The
client
is
allowed
to
use
this
flag
and
I
think
the
if
that's
what,
if
that's
all
that
the
rpc
does,
then
the
code
is
shorter.
The
tests
are
shorter,
there's
less
to
test.
All
you
need
to
really
prove
is
that
the
the
flags
can
apply
to
the
command.
F
You
don't
have
to
prove
what
the
flags
do,
because
that
is
kit's
job
and
yeah
people
can
on
the
client
side,
can
iterate
and
build
features
without
with
having
having
to
make
fewer
rpcs
in
between.
I
mean,
there's.
Also
these
ridiculous
things
like
we
have
these
rpcs
to
find
all
new
lfs
pointers.
So
what
we
do
is
we
run
git
ref
list
and
we
look
up
all
the
blobs
that
it
enumerates
of
size
less
than
200s.
F
So
that
could
you
could
just
have
a
thing
that
says:
get
ref
list
return
blobs
up
to
a
size,
but
then
we
have
special
rpcs
that
tweak
the
argument
to
get
ref
lists
by
weighman
that
tweak
the
arguments
and
gitly
tries
to
filter
the
blobs
to
see
if
they
look
like
lfs
pointers.
But
then,
on
the
client
side,
we
parse
the
blobs
again
to
make
sure
they're
really
lfs
pointers
and
if
you
just
have
an
rpc
that
sends
all
the
blobs
that
are
less
than
200
bytes.
F
So
there's
a
there's,
a
bigger
picture
here
and
and
it's
not
just
about
performance
and
resilience,
but
also
about
developer
convenience,
but
but
I
think,
there's
also
more
advantages
for
us,
because
if
you
think
about
something
like
git
upload
back
the
reason
that
doesn't
completely
explode
on
us
all,
the
time
is
that
the
clients
receive
back
pressure
or
the
clients
exert
back
pressure
on
the
server
processes.
F
But
if
you
do
clumsy
parsing
on
the
server
you
do
extra
work,
then
you
can
make
a
cheap
kittley
call
and
then
the
gitly
server
goes
bonkers,
trying
to
parse
all
the
tags
and
fetch
them
from
git
in
an
inefficient
way.
C
F
Well,
I
still
want
to
do
that
too,
but
I
think
that's
I
I
I
want
to
give
them
a
break
and
or
I
don't
think
it
would
go
down
well.
F
I
also,
I
don't
think
it's
the
most
important
problem
to
solve
right
now.
I
think
that
the
the
badness
of
grpc
was
amplified
by
the
volume
of
traffic
of
of
git
fetch,
and
I
think
that
it
really
is
better
now.
So
it's
just
not
the
the
impact
isn't
as
high,
but
I,
if
you
want,
I
can
tell
you
how
I
would
rip
out
jrpc
without
anybody
noticing.