►
From YouTube: Scalability Team Demo 2022-10-13
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
So
yeah,
what
we're
talking
about
in
the
in
the
longer
thread
there
is
moving
some
stuff
around.
So
the
only
thing
that's
still
left
from
the
pipeline
on
Ops
is
loading
data
from
from
Thanos.
A
In
the
discussion
we
ended
up
like
keeping
the
entire
page
generation
there,
because
that
also
pushes
to
push
Gateway
that
triggers
the
the
the
issues
in
the
capacity
planning
projects
on
gitlab.com.
So,
in
the
end,
we'd
have
a
scheduled
pipeline.
A
That
starts
on
gitlab.com
triggers
the
pages
timeline
Pages
generation
on
Ops
that
job
pushes
to
the
push
Gateway,
and
then
it
will
call
back
into
a
new
Pipeline
on
gitlab.com,
with
information
to
publish
the
pages
but
like
on
gitlab.com
and
to
update
images
on
the
capacity
planning
issues
that
have
been
created
from
its
run.
B
Something
yeah,
so
it's
it's
it's
kind
of
yeah.
It's
confusing
how
it
works,
but
like
how
big
of
a
problem
is
it
that
is
confusing
how
it
works.
I
guess
is
our
question.
A
We're
going
to
make
it
more
robust,
because
now
we've
got
a
scheduled
Pipeline
on
Ops
and
a
schedule
Pipeline
on
gitlab.com
that
counts
on
wait.
We've
got.
A
On
Ops
and
then
that
triggers
a
pipeline
on
gitlab.com
that
requires
the
last
Pipeline
on
the
main
branch
to
be
well
doesn't
not
anymore.
I.
Think
I
think
I
fixed
that,
but
it
finds
the
last
artifact.
So
it
doesn't
have
pipeline
information
from
the
pipeline
on
Ops
and
then
it
just
finds
the
latest
artifact
for
the
generate.
A
Pages
stop
gets
that
and
publish
it,
and
at
the
same
time
we
also
have
a
job
on
gitlab.com
in
the
scalability
repository
that
counts
on
the
previous
job,
on
gitlab.com
in
the
timeline
project
to
have
finished
and
have
published
the
images.
That's
right,
isn't
it
Sean
yeah
so
well,
I
mean
that
doesn't.
B
Really
matter
too
much
because
it
just
says,
like
you
know,
if,
if
I
can
find
images,
I
will
put
them
in
the
issues,
so
that
job
is
is
just
going
through
the
issues.
So
they
let
me
take
a
step
back.
So
we
submit
an
MR
to
getlab.com
that
triggers
a
pipe
that
triggers
a
pipeline
on
gitlab.com,
but
then
we'll
also
trigger
a
pipeline
on
Ops,
because.
A
B
Yeah,
so
that's
because
jobs
that
require
Thanos
data
can
only
run
on
Ops,
so
we
run
that
on
Ops.
Then,
when
we
merge
the
Mr,
we
run
a
pipeline
on
getlab.com
that
doesn't
really
do
a
great
deal.
We
run
a
pipeline
on
Ops
that
will
again
use
Thanos.
B
Then
on
Ops
we
have
the
scheduled
pipelines
that
populate
data
and
build
the
pages,
and
when
the
pages
build
one
is
finished
that
triggers
a
pipeline
on
gitlab.com,
but
also
pushes
metrics
to
the
push
Gateway
when
the
metrics
are
pushed
to
the
push
Gateway
that
can
trigger
alerts
through
alert
manager.
Those
alerts
then
go
into
the
gitlab
issue.
Management
feature
sorry
alert
management
feature
that
then
Auto
creates
an
incident
which
goes
on
the
board
in
capacity
planning
projects.
B
So
that's
the
part
where,
like
we're
dog,
fooding,
something
and
like
I,
think
it's
valuable
to
dog
food.
It
you
know
had
some
good
discussions
with
the
the
monitor
team
around
this,
but
I'm
not
sure
that
alerts
is
the
right
way
to
manage,
like
just
Prometheus
alerts
like
that's.
The
part
I'm
concerned
with
mainly
is
the
right
way
to
manage
creating
these
issues,
because
we
did
have
issues
with
the
alerts
going
away
when
we
limited
the
number
of
resources
we
reported
on.
So
if
something
wasn't
in
the
top,
50
was
in
the
top
15.
B
We
also
get
a
new
alert
any
time.
We
change
the
underlying
queries
that
we're
using.
So
recently,
we
changed
from
an
outer
average
to
an
outer
Max
for
Italy.
So
for
Italy
total
this
space
we
were
doing
average
across
not
across
Italy
nodes
but
across
as
it
turned
out
Canary
in
Maine,
but
canaries
get
disk
usage
is
much
much
lower
than
Mains,
so
the
effect
of
that
was
to
say
our
disk
usage
or
gitly
is
like
way
below
the
target.
B
B
The
alert
because
that
doesn't
get
pushed
into
the
gateway,
but
if
we
change
the
the
things
that
we're
aggregating
on
in
the
alert
itself,
that
will
create
a
new
alert
and
what
was
the
other
thing
that
would
create
a
new
alert.
Oh
if
we
changed
no,
we
fixed
before
we
had
one
where
it
would
come
up
each
time
there
was
a
new
page,
but
basically
what
I'm
saying
is
like
we
can't
fix
these
retroactively
every
time
you
find
something
that
will
create
a
new
set
of
alerts.
B
C
C
So
just
keep
Prometheus
alerting
entirely
and
just
have
okay.
B
So
if
we
don't
push
to
Prometheus,
we
can
do
everything
except
populating.
The
cash
on
Ops
is
that
right,
Bob.
If
we
don't
sorry.
C
A
B
A
C
A
B
As
long
as
we
keep
remembering
how
it
works,
the
what
was
the
other
thing
I
was
going
to
say
there,
though
the
yeah,
the
duplicate
issues,
I,
think,
are
an
issue
for
everybody
who
works
on
this
because,
like
it's
really
confusing
and
like
it
does
take
a
reasonable
amount
of
work
and
also
it
kind
of
stamps
the
channel.
B
You
know
you
get
a
new
issue
created.
Oh
this
was
this
was
one
other
thing,
I
meant
to
mention
about
the
alert
manager.
B
Yesterday,
an
issue
got
created
for
well
yesterday,
some
issues
not
created
for
different
components.
So,
if
you
go
to
capacity
planning,
there
was
an
issue
created
for
patronally
registry
peaky,
Petri.
B
I,
don't
know
why
that
triggered
yesterday.
It
has
a
start
time
at
the
14th
of
September,
but
we
only
got
the
alert
through
yesterday.
A
B
A
B
Created
so
this
issue
was
great:
oh
in
fact
this
was
created
13
minutes
ago,
but
the
alert
start
time
was
a
month
ago.
So
what
happened?
B
So,
let's
go
to
the
alert
activity
feeds,
so
we
log
the
alert
from
14
minutes
ago
yeah.
So
apparently
we
got
this
alert
from
Prometheus
14
minutes
ago
and
then,
if
I
do.
C
B
B
C
C
B
Yeah,
okay
threshold,
so
there's
confidence,
type
80
in
confidence,
type
mean
and
threshold
hard
and
threshold
100.
So
if
we
go
back
here,
did
one
of
these
series
show
up
on
14th
of
September,
not
as
far
as
I
can
see,
they
were
already
there
yeah.
So
I've
got
no
idea
why
this
was
created
at
this
point
like
and
I'm
pretty
sure
it's
a
duplicate,
no
I'm
sure
if
we
go
back
to
class
yeah.
So
what
was
it?
Petri
pgbtree
floats
to.
A
Duplicate
there.
B
We
go
so
yeah,
so
it's
a
duplicate.
We've
got
no
idea
why
it
was
just
created
debugging
that,
like
okay,
it's
actually
not
a
gitlab
application
bug.
Is
it
a
problem
with
alert.
B
Having
yeah
there's
too
many
things
in
the
middle
is
basically
what
I'm
saying
is
like
trying
to
figure
out
like
which,
even
which
part
was
the
problem
is
quite
tricky,
so
we
in
the
past,
we
have
been
able
to
debug
these,
but
then
we
just
come
across
more
confusing
ones,
because
we've
solved
the
ones
that
we
know
how
to
debug.
E
Why
I
think
there's
a
lot
of
value
in
making
it
simpler?
I
think
that
we
really
want
to
bring
the
rest
of
the
team
in
to
be
helping
with
us,
but
we're
going
to
land
up
explaining
to
everyone
how
to
debug
these
problems
repeatedly,
but
for
each
person
that
joins
and
then,
if
the,
if
they're,
not
part
of
the
rotation
for
a
quarter,
then
you
have
to
sort
of
re-explain
it
or
write
it
down
again
as
to
how
to
debug
these
things.
E
A
Okay,
so
we
need
to
add
issue
creation,
like
that's.
That's
one
step
that
we've
excluded
for
now
that
we
would
add
to
the
plan
is
create
issues
from.
C
B
A
B
A
B
Well,
yeah
because
I
I
think
I
mentioned
the
comment
yesterday.
The
other
thing
we
could
do
is
we
could
have
the
issue,
because
the
issue
Creator
will
need
to
look
up
existing
issues
right,
so
it
doesn't
create
duplicates.
So
if
it
finds
an
existing
issue,
you
can
just
update
the
forecast
violation.
Date
has
come
forward.
No,
that's
you
can't
use
forward
and
backwards
for
time
Kenny
if
the
forecast
violation
date
is
closer
as
before
we
can
bring
the
due
date
in
or
post
a
comment
or
something
saying
like
hey.
A
And
then
that's
a
part
of
like
the
manual
job
that
we're
doing
now,
like
looking
at
the
graph,
hey,
there's,
no
more
violation
or
the
thing
that
you're
solving
right
now
is
like
this
thing
disappeared
from
the
from
the
page,
meaning
it's
not
in
the
top
15
anymore.
So
just
close
it
because
it'll
come
back
when
it
is
in
the
top
15
again.
So
all
of
that
would
be
resolved
could
be.
A
A
B
C
B
The
other
day
Bob,
it
just
took
me
a
while
to
realize
that
you
convinced
me
so
yeah.
C
B
A
B
E
Feels
like
this
takes
up
people's
time
at
the
moment,
so
I
think
it's
important
to
make
these
changes
and
get
started
with
it,
because
every
week
that
we
delay
is
another
week
of
having
to
go
through
this
process
of
trying
to
figure
out
why
we're
getting
alerts
or
not
getting
alerts
or
so
I'm
not
like.
Is
this
big
enough
to
be
a
project.
C
B
E
I
think,
let's
create
the
let's
create
the
Epic,
raise
the
issues
but
I
think
if,
if
either
of
you
have
space
to
start
that,
let's
start
it
as
soon
as
we
can
I.
Think
it's
just
going
to
save
the
time
and
and
effort
in
the
long
run
and
if
we
delay
and
when
we
wait.
We
just
we're
just
going
to
get
frustrated
with
the
fact
that
we
know
how
to
fix
this,
and
we
aren't
scheduling
it
for
work.
E
B
Okay,
thank
you
so
I
guess:
barbecue!
Okay,.
C
A
Can
also
create
the
Epic
yeah
yeah
I'll.
Do
that
yeah
should
I
create
the
issues
in
the
timeline
project
like
because
the
other
ones
are
there
can
still
be
since
that
other
it's
a
it's
on
the
deal
infra
so.
D
No
I
can
share
about
the
redis
cluster
pocs
initial
exploration.
I,
don't
really
quite
curious
if
that's
more,
like
I
I
took
a
short
Step
at
it
on
tanker
to
do
a
kubernetes
deployment
I
mean
partly
because
it's
you
could
actually
do
a
full
deployment
on
Mini
Cube,
locally
yeah
compared
to
I.
Think
VM
base
is
a
lot
harder
for
for
at
least
the
bank
Engineers,
because
we
don't
have
access
that
much
access
and
I
think
there's
a
lot
more
provisioning
required
to
actually
do
a
local
run,
but
yeah
I
I.
D
D
You
could
run
a
full
like
six
to
seven
note
cluster
in
a
multi-pass
VM
and
then
and
then
and
then
connect
to
it
using
Dash
rails,
console
yeah,
and
then
you
could
just
keep
writing
it
via
the
gland.
Redis
cache
module
and
you
can
just
go.
You
can
actually
go
into
each
each
like
master
and
see
where
the
keys
go
yeah
and
yes,
also
that
that's
like
local
setup
but
I
think
I'll
probably
need
help,
help
or
repair
with
an
SRE
to
be
able
to
further
do
this
further
further
down
the
line.
A
Related
to
prioritization
red
is
Cash,
just
show
shoe
have
has
just
shown
up
on
the
capacity
planning
reports
for
primary
CPU,
so
for
when
December,
somewhere,
80
confidence.
So,
like
that's
like
not.
E
E
Well,
in
terms
of
prioritization,
for
the
readers
class
to
work
like
we
need
to
get
the
red
is
rate,
limiting
and
redis
registry
ones
into
production.
I
think
one
is
on
I,
think
they're,
both
on
staging
now
and
I.
Think
the
Readiness
reviewers
big
has
already
had
a
round
of
feedback
for
the
registry.
One
and
some
of
those
items
might
be
relevant
for
the
rate
limiting
instance
as
well,
and
then
the
plan
was
to
then
crack
on
with
with
Raiders
cluster.
E
The
question
is
just
about
proceeding
on
as
I
wrote
on
the
issue
proceeding
on
VMS,
because
I
think
it
was,
it
was
less
risky
and
there's
more
visibility
into
what's
going
on
there,
while
we're
still
figuring
out
what
rate
is
cluster
is
and
does
before
that
Matt's
got
availability
to
help
with
that
as
well.
E
A
Didn't
didn't
you
try
out
the
the
chart
the
bitnami
chart
for
redis
cluster
now.
D
Yeah
I
tried
the
Vietnam
chart.
It
comes
with.
It
comes
with
a
a
like
post
deployment
hook,
job
that
runs
to
help,
add
notes.
But
it's
it's
not
that
clean.
When
you
do
scaling
down
which,
but
but
then
again
I,
don't
think
we
would
scale
down
much.
We
only
scale
up,
I
I
would
guess
we
mostly
scale
up
and
then
we
keep
it
and
we
like
we
rotate
to
do
sort
of
like
maintenance,
but
it's
getting
down
I'm,
not
I'm,
not
sure
practically
like
do
we
do
that
much.
D
But
if
we
do
then
I
think
vs
might
be
meter,
like
the
small
control
I've
discussed
this
before
on.
Our
coffee
shop,
like
VM,
gave
us
way
more
control
and
more
confidence
in
doing
something
that
we
are
doing
for
the
first
time,
because,
like
here's,
the
team
in
general
don't
have
that
much
experience
with
very
slots
are
compared
to
Sentinel,
so
I
think
VMS
will
probably
roll
out
more
confidently
on
VMS.
D
Yeah
mentioned
something
about
like
the
we
have
to
solve,
prepare
for
CPU
spikes
when
we
do
rebalancing
when
we
add
in
a
new
node
or
even
restarting
or
rebalancing
so
for
that
that
that
might
be
something
that
is
hard
to
project,
at
least
without
proper
load
test
I
contemplated
during
a
load
test
in
VM,
but
I,
don't
think
that's
actually
going
to
be
any
accurate.
D
D
A
Need
to
we
need
to
I,
don't
know,
I,
don't
know
what
what
the
rebalancing
process
looks
like
and
if
it
takes.
If
it's
a
CPU
heavy
thing-
and
we
try
to
do
that
on
an
already
busy
cluster,
then
yeah,
something
to
that.
We
should
figure
out
and
let's
see,
I
thought
it
was
easier
to
do
on
VMS.
E
E
E
What
can
you
see
when
you
have
one
like
there
were
so
many
questions
that
we
had
at
the
start
of
this
all
and
I
think
that
getting
these
two
instances
in
answers
some
of
those
questions,
and
it
might
answer
enough
of
the
questions
that
we
that
we
do
do
red
as
cluster
and
kubernetes
straight
away,
but
foreign
I
think
the
the
consensus
when
we
last
talked
about
it
was
at
least
starting
with
VMS
having
the
visibility
would
be
a
massive
help
for
all
of
the
pieces
that
we
don't
that
we
don't
know.
A
One
extra
thing
there
is
that
probably
we'll
need
to
do
some
kind
of
yeah
another
migration,
but
turn
the
existing
instances.
We
have
so
persistent
in
the
cache
into
instances
in
redis
cluster
capital
c.
So
that's
going
to
be
easier
to
do
with.
If,
if
it's
not
a
mixed
deployment
as
well.
E
D
A
Sean
built
that
because
we
wanted
to
do
red
this
cluster
at
some
point,
but
then
we
said
it's
not
that
urgent,
but
let's
just
make
sure
that
nobody
else
adds
new
things
that
aren't
compatible.
Okay,.
C
I
mean,
if
that
were
the
case,
then
we
will.
That
will
give
us
even
more
confidence
but
I'm
guessing.
The
answer
is
probably
now.
D
C
D
Well,
yeah
I
I
sort
of
got
into
Omnibus
to
like
hack
it
like
it's
all
about
hack
but
like
add
in
I'm
under
the
conflict,
so
that
it's
the
application
can
use
various
cluster,
so
I
think
for
the
customers
to
to
you
to
do
that.
They
have
to
modify
Omnibus
yeah.
C
Yeah,
so
that
makes
it
even
less
likely
so
Jessica,
so
the
application
itself
is
capable
of
working
with
various
cluster,
but
omnibus,
isn't
like
refused
in
a
better
installation.
You
can
configure
it,
but
not
true
of
news
is
that
the
current
status.
D
D
Yeah,
so
that
that's
all
for
for
me
for
me
in
terms
of
the
whether
it's
cluster
I
think
to
like
spend
more
like
create
more
issues
as
to
like,
like
criteria
to
accept
like
whether
we
go
communities
or
VM
I,
think
reporting
to
prioritize.
It
then
wait
on
the
two
existing
kubernetes,
the
redness
and
diverse
yeah.
A
Sylvester
for
all
of
your
tests,
have
you
looked
at
gitlab
sandbox,
where
you
can
have
your
your
own
gcp
project
to
do
stuff
on
VMS
do
some
stuff
stuff
on
the
gke
cluster
and
so
on.
I've
played
with
some
of
the
tanker
stuff
there,
and
that
was
easier
than
using
mini
Cube
locally.
D
Oh
okay,
that's
okay!
I
might
try
that
out
to
like
have
something
there.
So
I
can
run
practical
like,
but
a
bit
more
realistic
test
like
we
will
check
out
like
how
bad
will
be
restarting
kick
from
like
a
three,
a
three
shot
cluster
to
a
four
shot
cluster
things
like
that.
At
least
some
estimates
to
that
we
don't
get
caught
off
guard.
C
E
Cool
well,
thank
you
so
much
for
taking
us
through
those
things,
hope
you
enjoy
the
rest
of
your
day.