►
From YouTube: Scalability Team Demo 2020-11-04
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
Yeah
I
just
wanted
thanks.
I
just
wanted
to
give
a
quick
update
that
it's
in
review
and
to
inform
everybody
of
what
it
looks
like
right
now,
because
I
don't
know
it's
it's
it's
quasi-demo-ish
and
it's
it's.
I
think
it's
interesting.
I
hope
the
so
yeah
it's
in
review
and
we
reduced
the
scope
a
little
bit,
because
I
had
a
conversation
with
andrew,
where
I
realized
that
we
were
making
the
configuration
part
more
complicated
than
it
had
to
be.
B
So
what
it
looks
like
now
is
the
configuration
is
an
environment
variable,
it's
just
one
environment
variable
and
the
advantage
of
that
is
that
it
makes
the
merge
request
smaller,
and
this
is
already
supported
in
both
omnibus
and
cloud
native
gitlab.
So
we
don't
have
to
do
any
follow-up
there
to
make
it
possible
to
set
the
environment
variable.
This
should
just
be.
B
This
should
be
pure.
The
configuration
management
that's
in
place
should
be
able
to
handle
this.
I
actually
have
no
idea
how
sres
manage
this,
because
we
have
it's
nice.
B
That
john
is
here
because
we
have
deployments
both
on
vms
and
kubernetes,
and
I
don't
know
where
the
config
is
stored
if
it's
the
same
repo,
if
it's
the
same
mechanism
and
what
that
looks
like,
but
I'm
ex
I'm
assuming
that
we
can
set
an
environment
variable
in
both
of
those
without
it
being
a
big
deal
and
the
environment
variable
is
just
for
the
bypass,
which
is
it's
not
the
interesting
part
here.
B
Well,
it's
essential,
but
the
bypass
just
there's
not
much
excitement
in
deploying
that
the
more
interesting
part
is
to
turn
on
the
rate
limiting
and
to
correctly
set
up
the
h3
proxy
to
label
the
things
that
need
to
be
bypassed,
and
one
of
the
things
that
we
now
have
in
the
merge
request
is
that
every
request
that
matches
the
bypass
will
be
marked
in
the
json
log.
So
it
should
be
very
easy
to
see
that
a
we
are
not
labeling.
All
requests
that
with
the
bypass
we
are
labeling.
B
So
what
I'm
trying
to
say
here
is
that
I
think
the
observability
of
this
part
should
be
good
the
way
it
is
now
and
go
ahead.
B
Finished
finish
off
what
you're
saying
yes,
so
I
think
the
observability
of
that
part
is
good
and
the
the
mechanism
of
making
the
conflict
change
is
simple
and
it
has
a
low
cost
for
us
to
build
it
and
ship
it,
and
it
ended
up
a
bit
lower
simpler
than
I
thought
it
would
be
at
the
start.
So
that
is
good
and
yeah.
B
We're
now
having
conversations
on
the
issues
about
how
to
estimate
the
limits-
and
that
is
a
bit
weird
but
or
it's
a
bit
nebulous
to
me,
but
craig
thinks
he
can
do
it
just
from
seeing
the
state
from
wreck
attack
and
all
we
need
is
to
dump
the
state.
B
So
I
want
to
see
if
I
can
write
a
little
script
to
do
that,
and
I
want
to
double
check
that
it
is
reasonable
to
dump
to
state,
because
the
only
way
I
see
to
do
it
is
to
do
a
full
scan
of
the
cache
instance,
and
when
I
wrote
that
I
thought
we
probably
don't
want
this.
I'm
just
going
to
say
like
we
could
do
this,
but
we
probably
don't
want
it,
but
then
craig
said
oh,
I
did
it
in
two
minutes,
and
that
sounds
too
good
to
be
true.
B
Also
just
to
support
craig,
not
not
to
be
not
that
I
don't
trust
him,
but
it's
just
like
it
yeah
it.
I
want
to
double
check
that
and
then
it
did
the
work
sort
of
shifts
to
craig
figuring
out
the
rollout
plan
and
supporting
him
on
that
awesome
yeah.
That
was
actually
what
my
question
was.
Is
that
like
sort
of
scheduled
for
for
happening
it's
part
of
the
epic
yeah?
B
The
idea
of
the
epic
is
that
we
stick
to
checkboxes
in
the
admin
ui
and
the
checkboxes
that
say,
or
three
checkboxes
actually
unauthenticated
rate
limiting
web
authenticated
and
api
authenticated,
and
we
can
only
do
that
if
there
are
same
numbers
below
the
checkboxes,
because
otherwise
this
will
reject
everybody's
traffic,
and
we
can
only
do
that
if
we
can
allow
list
the
people
we
need
to
allow
list.
So
that
goal
of
the
checkboxes
sort
of
has
all
this
work
hanging
under
it.
C
C
I
think,
there's
a
way
in
elasticsearch
to
make
a
query
where
you
don't
ask
for
like
the
rates
at
which,
like
a
certain
request,
was
coming
in
from
a
certain
user.
You
don't
like
aggregate
on
the
full
bucket,
but
you
can.
You
can
almost
like
break
a
bucket
up
into
sub
buckets
and
then
ask
for
the
max
of
those.
So
the
reason
why
it's
useful
is
that
say
you
are
looking
over
the
last
seven
days.
You
know
you
don't
want
to
say
like
give
me
like
minute
by
usage.
C
We
well
because
of
the
context
metadata.
Well,
we
wouldn't
even
use
the
context
metadata.
We
could
use
the
ips,
you
know
because
we
could
use
the
the
lao
lists
that
we've
already
got
that
are
known,
and
so
we
could
filter
out
those
users
and
then
and
then
what's
left,
we
could
use
this
this
technique,
it's
like
if
you
go
I'll,
go
and
give
it
a
try,
but
but
basically,
what
you
can
probably
do
is
say
like
who
hits
who
hit
rails
the
most
in
a
like
one
minute
period,
yeah.
C
Last
seven
days
and
and
that's
kind
of
like
I
I'm
not
sure
if
the
one
minute
periods
are
like
aligned
with
minutes
or
if
they're
like
rolling
one
minute
periods,
but
either
way
you
can
kind
of
get
a
pretty
rough
idea.
B
C
B
To
go
into
racketech
data,
that's
a
good
point
because
there
shouldn't
be
any
record
tech
data
right
now,
although.
C
I
suspect
that
what's
going
to
happen
is,
however,
high.
We
have
those
defaults,
there's
going
to
be
like
a
bunch
of
people
that
get
relimited
just
sadly
because
of
the
state
of
things,
but
that
doesn't
matter
like
I'm
not
saying
that
we
should
work
around
those
people.
B
A
hundred
thousand
per
minute-
I
I'm
sure,
we'd,
be
fine
with
a
hundred
thousand
per
minute
and
then
we
could
dump
the
state
out
of
rack
attack
and
the
way
rek
attack
works
is
that
it
creates
a
redis
key
for
each
each
user
that
it's
counting,
but
also
for
each
time
period.
So
it's
it's
like.
It
looks
at
the
clock
and
it
says
for
each
minute
I'm
going
to
make
a
separate
key.
B
B
No,
no,
no,
no
it's
it's
very.
The
period
is
configurable
and
the
timeout
is
whatever
is
left
of
the
period
plus
one
second,
so
it
is
a
very
finely
matched
timeout,
so
these
things
should
disappear
almost
immediately
when
they're
not
needed
anymore.
C
One
of
the
things
we
should
just
take
note
of
is
like,
if
you
change
workload
in
redis
quite
substantially,
and
you
start
expiring,
a
lot
more
keys,
because
this
is
enough
assistant
greatest
as
well.
Isn't
it
no
it's
kind
of
it's
cash.
C
Sorry,
it's
cash,
but
we
if
we
have
a
much
higher
volume
of
expiries,
it's
there's,
there's
certain
things
in
red
that
slow
down
a
little
bit
with
that,
and
I
can't
remember
what
they
are
of
hand,
but
we've
seen
them
in
the
past
and
we
should
just
keep
an
eye
on
on
those
metrics.
C
So
you
know
because
it
has
processes
that
run
when
you
say
frequency
is
the
name
of
the
config,
something
like
arbitrary
like
that,
and
we
should
just
make
sure
that
that
is
one
of
the
things
we
look
at
as
we
roll
it
out.
B
But
that's
a
good
point:
yeah
I'll
yeah,
considering
I
I
don't
fully
unders.
I
mean
I
sort
of
understand
your
idea
of
the
log
analysis,
but
I'm
not
sure
if
I
could
explain
it
well.
So
if
you
could
write
a
comment
about
that
andrew
that
would
be
helpful
I'll.
C
B
So
you
really
need
to
query
them
as
fast
as
possible
and
hope
you
don't
not
not
too
many
have
disappeared
and
that
works
well
with
the
scan
because
you
get
a
batch
and
then
you
can
immediately
create
that
patch.
So,
but
I
want
to
script
that
up
and
see
what
that
looks
like.
So
then
we
have
two
approaches
for
craig
to
choose
from
either
the
100
000
requests
per
minute
or
the
log
analysis
or
both.
C
B
Obviously,
I
I
think
I've
shared
as
much
as
I
can
think
of
here.
Are
there
any
other
other
questions
on
this
topic.
A
I'd
just
like
to
add:
please:
can
you
put
those
comments
onto
the
issue
that
I've
added
in
the
agenda
because
at
least
then
all
of
this
is
collected
together
rather
than
putting
it
on
the
agenda
itself.
B
B
I'll
I'll
see
if
it
looks
like
we
can
do
that
because
well
I'm
of
the
I'm
of
the
opinion.
We
probably
don't
want
it,
so
I
think
it's
up
to
craig,
because
I'm
trying
to.
A
Yeah,
what
we
do
is
we
open
another
ticket
to
talk
about
the
log
analysis
to
find
that
that
number.
B
C
D
Okay,
well
just
a
couple
couple
things
to
add:
I
think
the
dry
run
is
going
to
be
really
important
and
I'm
hoping
that
we
can
not
turn
this
on
and
find
like
inadvertently
start
limiting
something
like
an
internal
like
internal
api
requests,
or
you
know
an
important
customer.
So
that's
that's
possible.
That'll
be
helpful.
D
Another
thing
is:
is
that
there
are
there
were
a
lot
of
customers
interested
in
the
white
listing
feature,
and
I
know
that
this
isn't
like
what
we're
going
to
be
delivering
to
customers
for
advertising,
but
maybe
there's
the
potential
here
that
we
can.
You
know
maybe
inject
nginx
config,
to
allow
some
self-managed
customers
to
utilize.
This
is
that
worth
pursuing
or
not.
B
Well,
yeah,
of
course,
we
could
maybe
also
do
with
nginx
but
yeah.
We
we
can
try
that
that
would
the
end
result
would
be
more
documentation
on
how
to
do
that,
and
somebody
sitting
down
to
try
out
this
procedure
make
sure
that
those
nginx
conflicts,
things
big
things.
C
B
Not
even
because
you
can
inject
arbitrary.
D
Text
into
the
next
facility,
the
facility
is
there
to
inject
nginx
config,
so
that
could
be
the
first
thing.
C
On
the
back
of
craig
misskill's
point
about
doing
this
in
the
product
I
was
like,
I
was
thinking
about
a
little
bit
afterwards,
like
you
could
quickly
get
yourself
into
a
quagmire
with
that,
because
if
you've
got
like
and
if
you
don't
do
things
in
a
smart
way,
if
you've
got
like
50
rules
and
you're
kind
of
just
iterating
over
each
of
those
rules
like
for
every
request
coming
in
for
your
white
list
in
the
application
like
you'll
you'll,
slow,
the
application
down
pretty
badly
and
obviously
the
way
nginx
and
nha
proxy
do
it.
C
Is
they
build
a
kind
of
tree
where
they'll
sort
of
root
the
requests
down
to
the
right
part
of
the
tree
based
on
the
cedar
blocks?
And
if
we
built
it
in
the
product,
we'd
need
to
do
it
that
way,
because
otherwise
we
just
slow
everything
down
really
badly,
and
so
it's
kind
of
another
thing
to
say
like
maybe
we
shouldn't
do
that.
Maybe
we
should
leave
that
to
experts.
A
Well,
the
way
that
I've
been
trying
to
approach
the
project
in
general
is
that
we
need
to
make
this
work
for
what
the
sres
need
right
now
to
stop.
A
Having
so
many
incidents
about
the
bad,
automated
traffic,
that's
coming
in
and
trying
to
make
the
change
to
the
product
as
minimal
as
possible,
but
I
have
gone
through
and
found
a
whole
bunch
of
issues
that
seem
related
either
to
whitelisting
sorry,
yeah
to
the
whitelist
to
rate
limiting,
there's
quite
a
number
that
have
been
raised
over
the
past
past
couple
of
months,
and
when
we
finish
with
the
work,
we
can
go
back
and
comment
on
all
of
those
issues
and
say
this
is
the
functionality
that
exists.
Now.
A
This
is
what's
there
and
either
that
unlocks
the
ability
for
the
stage
groups
to
pick
that
up
or
continue,
or
they
ask
us
to
help
them
with
putting
it
back
into
the
product
in
a
different
way.
But
I
was
just
trying
to
isolate
this
product
this
project,
down
to
what
we
needed
to
do
and
then
tell
the
rest
of
the
tell
the
rest
of
the
stage
groups,
what
we've
done
as
a
result
and
then
leave
the
choices
to
them
for
how
to
take
this
for
forward
a
bit
bit
more.
B
I
I
wanted
to
yeah
thanks
that
I
I
agree.
I
I
wanted
to
ask
something
else
that
about
what
director
said
or
that
I
wanted
to
talk
about.
The
dry
run
idea
a
little
bit
because
if
you
don't
specif
qualify
it,
it
sounds
like
a
obvious
thing.
B
You
want,
but
it's
one
of
those
things
where,
if
you
think
about
how
it
should
work,
it
becomes
complicated
and
that's
why
I'm
now
pushing
against
not
having
it
because
it
looks
like
the
one
we
want
is
complicated
and
I'm
not
sure
if
we're
going
to
use
it.
B
So
let
me
try
and
quickly
explain
why
I
think
that
the
first
dry
run
idea
I
came
up
with
was
to
say:
we
have
these
rule
definitions
in
the
initializer,
so
they
run
at
startup
and
they
call
methods
on
the
recotec
class
like
on
a
singleton,
so
they're,
basically
stuffing
rules
into
recotec
in
a
global
variable.
And
that's
then
it's
for
the
rest
of
the
life
of
the
process
and
I
thought
well.
B
We
can
make
the
code
that
stuffs
those
rules
in
dynamic
and
check
an
environment
variable
and
if
it's
the
environment
variable
is
set.
We
don't
check
push,
put
a
block
rule
in
or
a
throttle
rule,
but
we
just
put
something
in
that
tracks
it
and
only
logs.
If
something
matches
the
thing.
So
then
you
log
your
your
violations,
but
you
take
no
action,
and
this
is
something
that
requitec
can
natively
do
like.
There
is
a
type
of
rule
called
track
and
it
will
follow
the
same
logic
as
a
throttle.
B
The
problem
is
that
if
you
design
it
like
this
with
there's
one
environment
variable-
and
it
runs
at
startup-
is
that
either
you
put
everything
into
dry
run
or
everything
is
active
and
kicking
in.
So
that
would
be
useful.
The
first
time
we
roll
it
out
when
we're
trying
to
get
all
the
limits
right
for
what
we
have
now,
then
we
could
say
everything
is
just
a
tracking
rule
and
we
make
sure
that
that
looks
good,
and
then
we
restart
all
the
fleets
with
the
environment
variable
away
that
become
real
rules.
B
But
then
what
happens
next
time?
You
want
to
turn
something
on
or
off.
Do
you
put
everything
into
tracking
modes,
or
do
you
put
things
one
by
one
into
tracking
mode
and
then
yeah?
It
gets
a
little
more
complicated
to
have
a
good
interface
to
say
to
turn
individual
ones
into
tracking
mode
or
not,
and
that's
why
I
thought.
Maybe
we
shouldn't
be
doing
this.
D
You
know
log
the
violations
with
whitelisting,
taking
into
account,
because
this
will
allow
us
to,
at
the
very
least,
not
be
surprised
by
something
we
didn't
think
about
like,
for
example,
maybe
an
internal
ip
address
that
we
thought
we
were
setting
the
header
for,
but
we're
no
we're
not,
and
that
should
be
very
clear
if
we're
logging
every
violation,
every
rate
limit
that's
kicking
in,
and
I
think
I
only
really
see
this
as
the
first
time
we
turn
for
the
first
time.
We
turn
it
on.
So
I
think
that
would
be
helpful.
C
B
That
was
my
other
question
is
like
when
you
think
of
we
have
to
adjust
these
limits
anyway,
and
if
we
do
the
thing
where
we
do
a
log
analysis
or
we
set
it
to
an
insane
number
per
minute
and
we
push
it
down,
then
we
might
be
able
to
get
the
same
effect
because
say
we
say
we
allow
a
million
requests
per
minute
and
if
we
can
effectively
dump
the
state
out
of
rec
attack
and
see
which
things
which
counters
are
going
up,
then
we
can
also
see
that
we
get
sort
of
the
same
kind
of
capability
and
the
difference
is
that
you're
not
building
something?
B
That's
you
can
only
use
by
turning
the
whole
system
into
sure
in
driving
mode
yeah.
I
I
I
it's
a
bit
tricky.
I
I
also
I
mean
I
wanna.
I
want
this
to
be
safe.
I
don't
want
this
to
be
something
where
we
knock
ourselves
out,
because
it
has
a
great
potential
for
doing
that,
but
so
okay,
so
I
guess
what
I'm
saying
is
that
we're
still
not
sure
yet
if
the
dry
run
thing
is
what
what
shape
it
should
have
like.
C
B
If
we
can
do
good
enough
analysis
with
million
requests
per
second
a
good
enough
introspection,
then
maybe
we
don't
need
it,
but
we
haven't
quite
established
that
yet.
B
Yeah,
in
a
way
of
course,
turning
these
rules
from
throttles
into
tracking
is
the
best
way
to
know
if
you're
going
to
violate
or
not
because
you're
using
all
the
wreck
attack
is
doing
all
its
own
application
logic
and
it's
making
its
own
decisions,
and
it's
just
at
the
end
of
the
decision.
It
still
lets
the
request,
through
that's
better
than
trying
to
reconstruct
from
the
outside
what
you
think
wreck-it
deck
is
going
to
do
if
the
numbers
are
different
by
inspecting
its
internal
registers.
B
So
another
thing
I
thought
about
is-
and
I
don't
know
if
this
is
a
good
idea,
but
we
could
say
that
each
rule
has
a
name
and
we
could
have
an
environment
variable.
That
is
a
list
of
names,
and
then
we
need
to
use
some
sort
of
schema
to
serialize
the
list
into
a
string,
but
that's
something
we
can
figure
out
and
then,
when
the
app
boots,
it
checks
for
each
throttle.
B
Not
simple
in
terms
of
what
the
last
thing
I
was
describing,
I
think,
would
be
relatively
simple
to
make
in
the
code,
but
it
would
not
be
simple
to
use.
A
It
just
doesn't
feel
like
it's
a
simple
thing
to
to
yeah.
To
use
whether
or
not
it's
easy
to
create
is
a
separate
thing,
and
but
I
think
it's
about
how
we
make
a
decision
as
to
if
we're
going
to
to
build
it.
This
dryer
mechanism
or
not-
and
I
think
I'd
already
asked
on
the
issue
like
how
much
work
this
is
that
we're
talking
about.
C
C
I
see
it
it's
more
shocking,
but
I
won't
mention
their
name,
but
then
I
took
that
out
and
I
just
included
web
and
so
this
this
morning
the
best
results
I
got
was
that
in
in
a
one
minute
period
there
was
one
user
who
made
a
thousand
requests
and,
and
then
I
went
and
looked
it
up,
and
I
looked
up
the
user
in
that
period
and
sure
enough.
The
results
match
like
what
we're
seeing.
So
I
think
we
can
use
that
analysis.
C
The
only
problem
with
it
is
that
you
can't
sort
by
by
the
by
the
worst
offenders,
so
you
have
to
sort
by,
like
average
amount
of
traffic
over
the
period.
So
basically
you
need
lots
of
results
and
then
kind
of
sort
them
on
the
client
side.
I'll.
Send
you
here
I'll
I'll
I'll
give.
B
You
a
so
what
the
upshot
is
that,
based
on
the
log
analysis,
you
think
that
we
can
say
with
confidence
that
certain
numbers
are
not
going
to
cause
a
major
problem
for
us.
C
Well,
yeah:
we
can,
we
can
sort
of
construct,
like
you
know,
if
we,
if
we
know
which
the
ips
are,
that
we're
going
to
white
list
and
we
can
filter
those
out
because
we
got
them
in
the
logs.
How
many
ips
are
in
those
lists
job?
Do
you
know
roughly.
D
Yeah,
I
don't
know
like
right.
I.
B
I
think
there's
half
a
dozen
or
up
to
a
dozen
different
categories
of
things,
and
I
don't
know
how
many
things
are
in
each
category
yeah,
but.
D
I
I
guess,
like
my
my
biggest
concern,
is
like
just
the
initial
configuration
complexity
and
rolling
this
out
across
virtual
machines
and
kubernetes
for
different
ingresses,
like
for
web
api
and
good
https,
and
without
having
a
dry
run
mode.
A
driver
would
make
me
feel
a
lot
better,
because
then
we'll
have
everything
turned
on
the
way
that
we
want
to
turn
it
on
with
the
whitelist
in
place,
and
then
we
can
look
at
logs
to
kind
of
see
like
okay.
D
Are
we
actually
rate
limiting
anything
that
we
didn't
expect
to
rate
limit
and
it
will
validate
the
aj
proxy
configuration
is
correct.
The
environment
variable
is
set
correct
everywhere.
I
think.
B
Trying
to
decide
here,
like
we
have
this
issue
about
having
a
dry
remote
in
the
epic
and
from
my
point
of
view,
and
I
think
also
rachel's
point
of
view.
We
want
to
decide
what
the
scope
is
of
that
issue
and
whether
it's
in
this
app
we're
going
to
do
it
or
not,
and
I
I
think
the
problem
we
want
to
solve
is
that
the
sres
rolling
this
out
need
to
be
able
to
do
this
with
confidence
and
like
we,
we
sufficient
confidence
to
match
the
severity
of
the
change.
A
And
that's
yeah
and
I
just
wanted
to
add
it
it's
about
making
the
rollout
as
safe
and
as
easy
and
straightforward
as
possible,
and
I
think
taking
taking
the
input
from
the
sres
is
really
important
because
they're
the
ones
are
going
to
have
to
so.
E
Thinking
about
this
from
a
sort
of
product
perspective,
like
I
feel
like
you
know,
we
sort
of
talked
about
the
dry
runners,
the
thing
we'll
be
doing
once,
but
it
really
feels
like-
and
this
is
much
more
complicated
to
build.
So
I
don't
think
we
should
do
this
as
part
of
this
epic,
but
it
feels
like
what
you
really
want
is
changes
to
be
able
to
go
into
a
dry
run
mode.
So,
like
I
have
my
you
know
nothing
set
up.
I
make
some
changes.
They
go
into
dry
run
mode.
Then
I
roll
those
out.
E
D
E
B
Because
you
can
individually
flip
environment
variables
on
hosts.
C
B
Driver
mode
is
that
recordtech
increments,
its
counters
and
once
you're
inside
a
throttle
rule,
you
can't
say,
do
increment
the
counter,
but
don't
block
then.
B
To
basically
rewrite
wreck
attack,
I'm
I
I
there's
another
thing
that
occurred
to
me
and
it
has
to
do
a
bit
with
the
product
perspective.
I
think,
until
now
or
earlier
in
this
project,
I've
been
thinking
a
bit
too
much
from
the
point
of
view
of.
I
don't
want
to
want
to
ruin
gitlab
with
this
with
complexity
of
or
weird
stuff,
and
now
I
said,
okay,
we
just
put
the
config
in
an
environment
variable,
and
I
want
to
get
this
out
the
door.
B
B
So
from
that
point
of
view,
I
wonder
if
maybe
we
should
just
have
the
the
global
driver
mode,
because
that
is
a
simple
change
and
it's
a
simple
change
and
it
adds
value
for
this
broad
project
and
yes,
that
thing
will
then
sit
there
for
who
knows
how
long,
in
the
rate
limiting
code-
but
maybe
I
shouldn't
worry
so
much
about
the
long-term
cost
of
having
that
thing
exist
in
the
raid
empty
car.
B
C
B
Is
that
with
what
shoulder
is
saying
about
testing
changes,
you
could
put
one
host
into
dry
run
modes
and
wait?
No,
you
can't
deploy
changes
because
in
the
database
so
that
those
are
global.
E
E
E
E
How
would
that
work,
but
yeah
the
other
thing
about
the
code
living
around
is
like.
If
we
don't
document
it,
we
could
always
remove
it
once
we're
done
with
it
on
gitlab.com.
If
we
think
that's
an
issue,
so
we
could
add
it
undocumented
like
if
we
want
to
keep
it,
we
document
it.
If
we
don't,
we
remove
it.
C
So
can
we
remember
that
there
was?
There
was
just
kind
of
a
little
bit
more
background
when
wikimedia
moved
across
to
gitlab.
C
I
don't
normally
read
hack
and
use
threads
too
much,
but
for
some
reason
I
got
onto
a
whole
thread
on
there,
and
one
of
the
big
themes
on
that
thread
was
how
big
self-managed
instances
of
gitlab
have
all
sorts
of
the
same
problems
that
we
have
when
it
comes
to
abuse
and
spammers
and
common
spam,
and
all
of
these
same
things,
and
so
that
kind
of
made
me
realize
that
maybe
there
would
be
other
people
needing
yeah
things
like
that.
B
Yeah,
the
counter
arguments
to
me
saying
we
are
only
going
to
use
this
once
is,
of
course,
that
we
are
not
the
only
people
who
need
this
and
a
bunch
of
other
people
are
also
going
to
use
this
once,
but
be
very
glad
that
they
have
it
at
one
time.
Just
like
we
are
going
to
be
very
glad
if
we
have
a
dry
run
mode,
the
one
time
we
turn
everything
on.
A
I
think
a
question
for
sorry
before
just
to
interrupt
there,
when
this
was
enabled
the
first
time
when
it
was
quickly
turned
off.
Would
it
have
been
helpful
the
first
time
if
there
had
been
a
dry
run
mode.
D
Yeah,
I
think,
I
think
for
sure
it
would
have
been
helpful.
I
don't
recall
what
we
set
the
limits
to.
I
think
we
set
the
limits
to
be
fairly
generous
and
we
still
had
issues,
but
I
don't
recall
the
details,
but
yeah
that
would,
I
think,
having
having
this
in
the
product
would
also
be
nice,
not
just
as
like
an
environment
variable.
C
Yeah
and
the
the
just
to
kind
of
put
a
contact
there,
the
the
the
people
that
are
doing
the
most
requests,
I'm
seeing
about
22
000
as
a
maximum
in
the
last
six
hours,
twenty
two
thousand
in
one
minute,
so
that
kind
of
gives
you
an
idea
of
of
of
you
know,
I'm
I'm,
assuming
you
put
the
rate
limit
below
that,
so
they
would
have
hit
those
pretty
quickly.
B
Right,
yeah,
okay,
I
this
was
useful
discussion.
I
think
on
my
I'm
now
leading
to
saying:
let's
build
the
simplest
possible
dry
run
mode
because
it's
not
that
expensive
to
build
and
it
is
very
useful
and
it's
it's,
maybe
not
the
ideal
thing
from
a
product
perspective,
but
it
is
still
also
useful
as
part
of
the
product,
and
I
think
I'm
going
to
worry
less
about
how
useful
this
is
for
the
from
a
product
perspective,
or
I
mean
how
this
fits
into
a
vision
of
the
product.
E
I
added
a
thing
to
the
agenda:
it's
actually
completely,
not
a
demo.
It
was
just
while
I've
got
andrew
and
jarv
here,
particularly
I
was
trying
to
get
some
historical
sidekick
stats
and
I'm
gonna
say
historical.
I
mean
last
month
by
shard
and
I
can't
or
I
can,
but
only
for
the
catch-all
and
catch
nfs.
I
just
get
too
many
exceeded
chunks
limits.
It
seems
like
whatever
I'm
trying
to
do
with
this,
and
I
can't
find
a
recording
rule.
C
Yeah,
so
I
mean
that
that
thing
that
you're
looking
at
is
already
pretty
aggregated
right.
So
there's
a
few
things
that
you
can
do.
E
Oh
sorry,
there
is,
there
is
one
thing
I
noticed
about
that,
so
that
recording
rule
that
I'm
looking
at
is
some
without
fqdn
instant
shard,
but
that
won't
include
pods
and
the
other
kubernetes
labels
we
get,
which
is
where
I
think
we're
getting
the
chunks
from.
C
No,
it's
not!
I.
There
are
plenty
of
metrics
that
don't
that
like,
if
you
go
into
observability
channels,
you
see
people
asking
similar
questions
like
pretty
much
at
the
moment.
Sorry,
a
little
quick
rant,
there's
no
thing
that
you
can
get
more
than
two
weeks
worth
of
metrics
for
at
the
moment,
so
I
it's
very
rare
to
be
able
to
find
like
data
for
things,
and
I
think
it's
like
a
major
problem
right
now.
So
ben
has.
C
Well,
if
you
look
on
that
query,
there's
a
big
blur.
You
know
the
prometheus
errors
are
always
wonderful,
but
there's
a
thing
that
says:
limit,
8333
and
ben
has
like
got
an
open,
mr
that
he
opened
in
the
on
monday
to
to
increase
that
number.
So
that's
about
10
megabytes
of
data
that
that
weird
limit
there,
if
you
divide
a
chunk
size
by
like
10
megabytes,
it
comes
up
with
that
eight
three,
three
three
three
number,
and
so
I
think
he's
going
to
extend
it
to
100
megabytes
and
hopefully
that'll
make
a
difference.
C
It's
yeah,
it's
got
some
limits
in
place.
Let
me
just
see
I
can
probably
so,
but
there
are
things
that
you
can
do
to
help.
The
first
is
that
you
use
the
this
is
like
total
technical
debts
and
it
needs
to
be
fixed.
You
use
the
label
called
environment
and
there's
another
label.
That's
the
same.
That's
called
end
and
the
difference
between
the
two
is
that
thanos
will
route
end
to
specific
nodes
where
environment
will
go
to
we'll
fan
out
to
all
the
nodes.
C
So
the
first
thing
you
can
do
is
you
can
change
it
to
end
and
then
the
second
thing
you
can
do
is
in
the
res
block.
You
can
try
and
choose
like
3600
to
get
like
hourly
data
rather
than
to
get
one
sample
per
hour,
but
that's
obviously
pretty
dangerous.
That's.
C
Yeah,
I
I
you
you
probably
right,
but
it
seems
to.
I
think
I
I
don't
know
the
technical
details,
but
I
do
have
better
luck,
but
obviously
the
thing
there
is
that
you're
using
a
one-minute
rate
and
then
you're
getting
one
like
one
minute
sample
every
hour.
So
your
data
is
going
to
be
super
sketchy
right
because
you
know
you're
taking
one
minute
out
of
an
hour
and
assuming
that
that's
what
the
whole
hour
looks
like,
and
I
mean
it's
a
very
rough,
but
I'm
still
getting
those
errors.
E
Yeah
I
tried,
I
tried
just
using
the
like
underlying
metric
with
a
rate,
but
it
didn't
really
help
with.
D
C
C
Sorry
because
I
I'll
find
this
there's
a
there's,
an
issue
about
this,
oh
yeah.
In
fact,
I
think.
C
So
I
was
talking
to
ben
about
this
on
monday
and
he
created
a
because
ben
actually
wanted
to
decrease
the
retention.
So
what
I
often
do
is
I
just
go
to
thanos
a
prometheus
instead
and
I
actually
get
better
results
than
I
do
from
thanos,
but
he
wanted
to
to
reduce
the
retention
on
prometheus
down
to
like
three
days,
which
would
kind
of
and
we'd
lose
that
and
we'd
have
nothing.
So
if
you
look
on
the
thread
it's
at
11
789,
I
thought
you'd
actually
made
the
merge
requests
already
yep.
C
E
C
Yeah,
you
can
see
the
problem
still
right,
yeah.
I
I
think
we
should
just
ping
him
on
here
and
ask
if
if
we
can
get
that
sorted
out
and
that's
good.
E
C
C
On
that
thing,
matthias
was
trying
to
look
at
ruby
memory
stats
and
you
don't
know
what
you
need
recording
rules
until
you're
looking
at
it,
so
you
can't
retrospectively
apply
them
and
it
just
so
happens
there
that
the
cardinality
is
too
high,
and
we
can't
do
this
and
it's
it's
really
kind
of
slowing
us
down,
and
I
don't
really
know
what
the
solution
is.
B
Yeah,
it's
part
of
the
problem
here
that
we
that
thanos
receives
automated
traffic
or
untrusted
traffic
and
needs
to
have
defensive
limits
per
query
and
that
we're
exceeding
that.
Because
then
maybe
we
could
have
two
thanos
instances,
one
where
humans
can
run
crazy
queries
that
use
a
gigabyte
of
ram,
but
they
get
their
query
and
one
that
gets
untrusted
traffic
and
that
has
to
work
within
constraints.
C
And
possibly
longer
than
a
two-minute
timeout
as
well,
and
you
just
kind
of
like
bear
with
it
right.
You
can't
have
asynchronous
queries,
but
you
could.
E
Yeah,
I
would
be,
I
would
be
happy
waiting
like
you
know,
yeah.
I
would
be
happy
with
like
a
batch
query
thing
where
it
comes
back
in
an
hour
and
says,
like
I'm
looking
at
historical
data
anyway,
I
just
want
the
data
yeah.
Sorry,
another
question
about
the
recording
rule
andrew,
so
I
think
for
this
one
it
would
help
like.
I
don't
think,
that's
the
problem,
but
I
think
it
would
help
because
at
the
moment
we
have
so
these
these
are
like
recording
rules
that
we
need
to
tidy
up
anyway.
E
I
think,
but
they
use
some
without
fqdn
instance,
which
for
prometheus
we've
got
other
labels.
Sorry
for
kubernetes
we've
got
other
labels
like
pod,
which
are
basically
the
same
thing.
C
But
yeah
so
so,
what's
happening
with
that,
there's
been
like
a
lot
of
backwards
and
forwards
on
there's
a
there's,
an
issue
and
I've
been
a
bit
snarky
on
it,
but
basically
what
it
is
is
I
like,
I
think
what
we've
agreed
now
is
that
we're
going
to
get
rid
of
fqd
well,
if
qdn
will
just
kind
of
be
relegated
to
like
legacy
and
on
the
on
the
instance
label
ben
is
supposed
to
be
working
on
this
at
the
moment.
C
Add
node
label
for
kubernetes
discovery,
11504
yeah,
so
so
here's
the
original
issue
on
on
that
sorry
I'll
put
it
in
here
and
then
that
kind
of
morphed
into
another
issue
which
has
got
less
stuff
on
it.
But
I
thought
ben
was
actually
working
on
this
at
the
moment,
but
there's
no
one
assigned,
but
basically
what
I
think
we've
agreed
on
is
that
the
instance
label
will
become
it
won't
be
an
ip
anymore,
it'll
be
like
actual
in
in
the
vms
land.
C
It'll
be
like
the
actual
name
of
the
vm
like
fqdns,
it's
not
a
ip,
because
I
I
just
find
like
if
you're
looking
at
ips,
I
just
glaze
over.
Like
you
know,
I
don't
like
using
that
as
a
way
of
just
distinguishing
things
and
then
in
the
in
the
in
the
kubernetes
world,
it'll
be
the
pod
identifier
and
there's
like
a
little
bit
of
risk
when
it
gets
recycled
that
you
know,
you'll
get
two
pods
that
have
got
the
same
name,
but
it's
better
than
you
know.
C
Any
other
solution
and
ben
asked
around
and
other
people
are
doing
that
as
well,
and
then
that
way,
it's
quite
nice
because
everywhere
in
our
graphs,
we
we
use
fqdn
at
the
moment,
we'll
just
replace
that
with
instance,
and
that's
much
better,
because
instance
is
like
a
standard
for
prometheus,
where
fqdn
is
kind
of
like
our
own
label
and
it'll
have
the
ports
on
the
end.
But
I'm
I'm
not
that
bothered
about
the
port.
But
I
I
don't
want
to
ask
you.
You
know
I
don't
want
10
dot.
C
You
know
that
I'm
not
a
fan
of
that.
So
I
I
the
reason.
I'm
getting
a
bit
like
like
banging
on
about
it
is
like
I
see
I've
been
seeing,
merge
requests
of
people
adding
like
fqdn,
comma
pod,
name
comma-
this
you
know
on
like
piecemeal
on
on
graphs
because
they
they're
struggling
with
this,
and
if
we
just
kind
of
did
it,
you
know,
with
strategically
with
the
with
the
instance
label.
That
would
be
much
better
than
fixing
individual
graphs
one
at
a
time.
C
Yeah
yeah
yeah
yeah,
I
I
I
really
personally,
I
really
dislike
without
unless
it's
outside
of
a
width,
you
know
you
know
exactly
what
you're
removing
exactly
because
of
this,
because
you
added
a
new
label
and
then
suddenly
the
cardinality
of
your
recording
rule
explodes
and
you're,
not
controlling
that.
So
I
I
tend
to
think
it's.
It's
like
a
bad
practice.
E
Okay
well
I'll,
ask
ben
why
I'm
still
getting
a
limit
of
eight
three
three
three,
unless
that
I'll
double
check
the
numbers,
because
maybe
it's
like
there's
an
extra
three
in
there
that
there
wasn't
before,
but
otherwise
I'll
ask
then
like
what's
up
with
that
and
we'll
go
from
there.
Sorry,
like
I
said.
C
B
I'm
not
sure
if
I'd
be
on
it
much,
and
I
also
don't
know
where.
B
C
E
C
Like
do
you
want
to
do
well?
Actually,
if
you
do
it
with
jupiter,
notebooks
and
and
pandas,
it's
very
easy
to
do
so.
Okay,.
E
Cool
thanks
for
that
that
lets
me
know
it's
not
me,
but
yeah.
I
see
that
I
feel
the
pain
yeah.
A
I
do
have
one
quick
question,
but
I'm
going
to
stop
the
recording
because
I
wouldn't
want
it
to
be
it's
about
a
customer.
So
I'm
just
going
to
stop
the
recording.