►
From YouTube: 2020-11-12 GitLab.com k8s migration EMEA
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
A
Techno
party,
it's
yeah.
D
D
Going
through
the
highlights
first
issue,
we
already
have
it
on
the
agenda,
so
I'm
going
to
skip
that
one
unified
structure,
logging.
This
is
oh
okay,.
D
No
problem,
I
think
we
decided
scarbeck
that
this
the
unified
logging
is
not
a
blocker
anymore
for
get
ssh.
Do
you
agree
with
that?.
D
And
possibly
not
a
blocker
for
the
front
end
either.
I
think
the
the
main
thing
we
would
like
to
do
here
is
to
split
out
some
of
the
different
log
files
into
different
indexes,
for
example,
right
now
for
virtual
machines,
we
have
puma
split
out
from
rails
logs
right
now,
they're
all
mixed
together,
but
I'm
not
sure
if
this
is
really
a
blocker.
I
think
I'm
gonna
take
it
or
marin.
If
you
want
to
since
you're
already
doing
it,
we
can.
I
think
we
can
take
it
off
the
highlights
yeah.
E
We
can
take
it
off
the
highlights,
but
it
might
be
actually
good
to
just
leave
it
be
because
I
see
jason
actually
answering
that
things
are
scheduled
for
thirteen
seven.
D
That
sounds
good
next
is
deciding
on
environment
tier
type
stage,
shard
service,
etc
deployment.
All
these
labels
for
kubernetes
infrastructure
andrew
has
been.
You
know,
working
on
this
actively
and.
D
D
D
B
A
What
we
said
on
the
on
the
call
yesterday
was
that
on
on
pushes
which
are
really
quite
slow
on
gitlab.com,
you
know
we're
talking
about
200
milliseconds
here
and
is
that
worth
us
losing
the
data
that
we
get.
D
A
D
Yeah
yeah,
I
don't
I
don't
know
like,
for
example,
I
I
assume
we're
seeing
the
same
issue
with
git
https
that
we're
seeing
here
with
git
ssh.
A
Yeah,
quite
possibly,
but
sort
of
my
point
is
that
it's
it's
like
get
operations.
They
yeah
people
have
more
tolerance
for
them.
If
you
want
yeah.
E
We
talked
about
maybe
lowering
it
to
a
value
that
is
a
percentage
of
traffic
that
is
not
going
to
make
a
huge
impact
on
everyone
else,
but
will
still
provide
us
enough
data.
If
that
is
33
percent
that
we
have
right
now,
we
can
leave
it.
If
that
is
10,
we
can
go
10,
but
it's
up
to
you.
D
D
D
Yeah
yeah
yeah,
I
guess
it
doesn't
matter.
I
guess
you're
right
in
the
grand
scheme
scheme
of
things,
it's
not
a
big
impact
to
people.
It's
just
really
bothers
me
that
we
don't
understand
why.
A
A
That
well
yeah.
I
guess
so.
I
I
actually
just
took
your
point
as
mostly
being
about
and
not
writing
out
more
but
and
and
actually
when
I
read
the
rest
of
it,
it
is
true
but
skybeck
it
looked
like.
Maybe
the
problem
is
because
we
are
not
rooting
to
the
internal
api
and
we're
getting
queuing
on
the,
because
we're
rooting
to
the
git
service
and
there's
you
know
stan
pointed
out
above
the
problem
is
queuing
on
the
on
the
request
right.
So
that's
where
we're
spending
all
that
time.
A
It's
not
like
the
execution
of
the
request
is
taking
less
time.
It's
the
it's
the
queuing
before
it
executes,
and
so
that
means
that
we've
got
too
few
sidekick
workers
or
not
psychic
sorry
puma
workers,
wherever
those
requests
are
getting
rooted
to,
and
we
spoke
about
two
things
and
the
first
one
was
that
the
puma
workers
we
obviously
whatever
pool
that
is
needs
to
be
bigger.
But
the
other
thing
is
that
it
might
just
be
better
to
route
that
to
the
internal
api.
E
Okay,
so
I
wrote
it
wrongly
then
andrew
so
might
work
correcting
it
in
the
in
the
issue,
because
I
said.
A
C
A
B
Back,
I
don't
know.
A
C
B
A
It's
it's
the
back
end
call
from
there
jason.
So
it's
it's
from
gitlab
shell
to
to
to
authorize.
You
know
the
internal
yeah.
D
D
B
Which
is
customizable
so
maybe
since
we're
on
pause
with
the
migration
jar
of
you-
and
I
had
discussed
spinning
up
the
canary
variety
of
this,
because
I
failed
to
do
that.
So
what
we
could
do
is
configure
the
canary
endpoint
as
necessary
with
the
customization
to
the
internal
api
endpoint
and
see
if
we
see
a
performance
differentiation
between
those.
B
A
Another
thing
to
note
is
that
if,
if
that
queuing's
happening
for
the
internal
authorized
requests,
then
it's
probably
also
happening
for
the
the
other
ones,
and
it
just
sounds
like
the
the
pool
of
human
workers
isn't
big
enough
yeah.
Is
that
the
same
size.
D
A
What
you're
running
is
just
workhorse
and
you
know
you're
no
longer
actually
running
puma
for
the
get
http
authentications
right,
you're
just
running,
so
you
almost
don't
need
to
deploy
puma.
In
that
case,.
D
A
D
Yeah,
okay,
hot
off
the
press.
I
just
submitted
this
issue
now.
This
is
something
I
noticed
I
think
yesterday
you
know
the
day
before
yesterday,
which
is
that
we
don't
currently
support
zero
downtime
deployments
for
the
nginx
ingress
controller.
What
this
means
is
that,
if
there's
any
reason
why
we
have
to
cycle
those
pods,
it's
quite
disruptive.
D
The
connections
are
terminated
immediately
and,
interestingly,
I
just
did
a
quick
check
with
s
trace
when
you
delete
an
nginx
controller
pod,
it
gets
a
sig
kill,
which
is
a
bit
surprising
because
I
thought
the
default
was
a
term
not
sig
kill.
But
in
any
case
there
are
a
bunch
of
blog
posts
about
this
behavior
and
basically,
you
need
to
add
a
pre-stop
script
to
the
nginx
controller,
to
let
it
do
a
graceful
shutdown
and
we'll
have
to
also
add
the
like.
D
You
know
the
termination,
when
the
the
termination
window
for
kubernetes
kind
of
say,
like
wait
longer
for
the
connections
or
wait
wait
longer
before
killing
the
pod
wait
longer
for
the
process
to
to
quit.
So
this
is
pretty
high
priority.
I
think
for
us,
I'm
gonna
say
it's
a
blocker.
D
For
now,
because
we
definitely
don't
want
to
move
the
front
end
before
doing
this,
it's
not
so
great
for
git
https,
though
it's
very
it's
pretty
rare
that
we're
cycling,
nginx
controller
pods,
it
does
happen
occasionally
like
we
did
an
nginx
controller.
It
was
we
took
a
chart
upgrade
today
and
that
cycled
the
controller
pods,
which
you
know
caused
a
little
bit
of
errors,
forget
https.
D
D
C
D
We
also
took
the
we
also
took
the
chart
upgrade
today,
which
added
the
pages
config,
so
that
was
nice,
so
we
have
the
pages
bucket
now
officially
configured,
even
though
we
don't
have
pages
turned
on,
so
I
think
we're
we're
going
to
be
ready
for
this
when
it
when
it
lands.
D
Okay
on
to
the
demo,
I
just
want
to-
I
invited
jason
just
to
answer
any
questions
that
we
have
about
the
mr,
the
split
traffic
jason.
Could
you
just
give
us
kind
of
a
brief
kind
of
introduction
and
kind
of
overview
of
where
we,
where
we
went
with
this.
F
Sure
so,
there's
two
factors
that
come
into
play,
but
the
first
one
is
having
a
method
to
actually
split
traffic
at
the
ingress.
To
do
that,
we
have
to
be
able
to
deploy
effectively
multiple
deployments
that
provide
us
with
named
fleets
and
the
way
that
we're
doing
that
is
out
of
the
box.
You
have
no
declaration
of
deployments
and
it
will
basically
create
one
default
out
of
the
normal
chart
properties.
F
F
F
If
I
look
at
the
kubernetes
setup
that
we
have,
I've
got
a
small
number
of
pods
for
api,
a
small
number
of
pods
for
internal
api
and
two
for
default,
and
we
look
at
the
ingress.
I
have
one
ingress
for
slash
api
and
one
ingress
for
slash,
but
I
do
not
have
one
that
directs
any
traffic
to
internal
api.
D
Looks
great
really
excited
about
this.
I
have
a
few
questions
one
well.
Actually,
I
think
the
scarbeck
you
started,
you
had
some
questions
on
the
agenda,
so
go
ahead.
First,.
F
Effectively,
if
you
aren't
the
way
this
is
set
up,
if
you
do
not
supply
a
actual
deployments
map,
it
will
just
deploy
what
you
already
had.
The
only
difference
is
that
you'll
now
have
a
deployment
in
pods
with
name
dash
default,
so
it
will
it's
an
automatic
change
for
anybody
who
already
has
one
in
play.
It's
just
the
full
pod
names
change
and
let
me
shoot.
E
F
B
Now
my
second
question
was
being
able
to
name
deployments
just
to
provide
some
extra
customization
in
case.
We
need
some
sort
of
contextual
information
quickly,
but
it
looks
like
you're
adding
the
necessary
data.
We
need
at
least
to
the
pod
names.
I
would
imagine
the
same.
It's
going
to
see
we'll
see
the
same
on
the
deployment
ends
as
well.
Probably.
F
B
F
B
F
One
thing
that
I
don't
have
yet
implemented,
but
I'm
going
to
sneak
into
this,
mr,
is
for
a
long
time.
We've
basically
been
stuck
on
specifically
target
average
value
for
the
metrics.
F
F
That
entire
yaml
array,
section
of
the
metrics
definition,
would
be
replaceable.
C
A
But
if
we
I
know
like
this
is
like
jumping
way
ahead,
so
forgive
me,
but
if
we
wanted
to
use
prometheus
metrics
in
there,
there's
still
extra
steps
right
where
you
you,
there's
there's
other
things
that
you
need
before
you
can
actually
be
driving
that
off
like
prometheus
metrics
right.
F
Right
so
basing
it
off
of
prometheus
metrics
such
as
q
depth,
that's
something
that
actually
requires
further
setup
within
the
cluster.
It's
not
something!
That's
specific
to
this
chart!
It's
just!
You
need
to
have
this
additional
functionality
available
to
your
kubernetes
environment
and
that's
the
use
of
custom
metrics
to
use
those.
B
A
Yeah
and
and
it's
going
to
be
a
long
time
until
we're
doing
that,
but
I'm
I
think
it'll
be.
C
F
C
F
F
D
Yeah,
I'm
I
have
a
few,
so
one
is
node.
Selector
extra
and
pod
labels
and
nginx
annotations,
node
selector
was
that
on
the
original
list
or
not,
I
think
it
it
might
not
have
been,
but
I
don't
may
not
have.
D
D
D
You're
awesome.
Thank
you.
That's
great!
This!
The
next
two
kind
of
tie
together,
you
know
right
now
we
have
aj
proxy,
which
is
doing
this
routine,
we're
turning
this
around
a
bit
right,
because
now
we're
going
to
have
a
single
ingress
which
is
going
to
route
traffic
instead
of
having
multiple
ingresses
right.
Now
we
route
by
not
only
path
but
by
header
and
by
cookie,
is
this
now
a
cookie
doesn't
really
matter.
D
This
is
for
canary,
which
we
run
in
a
separate
namespace
anyway,
but
but
what
about
header-
and
this
is
important
for
websocket
traffic.
F
E
F
D
Okay,
I
guess
the
one
issue
in
andrew
justice
is
rel
relevant
for
scalability
work.
Right
now
we
do
rate
limiting
in
aj
proxy
for
only
api
traffic.
If
we
start
routing
at
the
nginx
ingress
we're
going
to
lose
that
rate
limiting
capability-
that's
probably
okay,
right,
because
we're
on
the
cusp
of
using
application
rate
limiting
right.
A
A
D
No,
that's
not
the
problem.
We're
going
to
have
h.a
proxy.
The
problem
is,
is
that
this
is
replacing
the
api
in
web
split.
It's
now
happening
down
at
engine
x.
So
what
it
means
is
that
traffic
will
first
come
into
proxy
and
then
all
traffic
will
just
go.
There'll
just
be
one
back
end,
which
will
be
nginx
ingress
and
then
from
there.
Then
it
will
split
between
web
and
api.
D
We
can
still
add
the
header
at
aha
proxy
it'll,
just
be
set
for
all
types
of
requests,
web
api,
etc.
A
So
I
think
it's
okay
craig's,
if
craig
miskel's
doing
the
the
the
work
at
the
moment
on
hi
proxy.
As
I
expect
he
is
then
he's
adding
those
white
lists
to
the
web
and
to
api,
and
to
I
mean
I
don't
know
about
git,
but
certainly
web
and
api,
and
and
at
the
moment
we
don't
have
anything
on
the
web,
which
is
which
is
pretty
weird.
So
he's
adding
he's
adding
that.
D
Yeah,
I
guess.
A
Yeah,
what
we
don't
want,
we
don't
want
rate
limiting
per
se
jason.
We
want
an
ac.
We
want
like
a
list
of
of
ips,
that
you
know
basically
a
bunch
of
cidrs.
That
say
you
know
this.
A
This
network,
this
network,
when
you
see
one
of
these
networks,
add
this
header,
so
it's
not
actually
doing
rate
limiting
it's
actually
just
including
a
header
which
is
a
like
a
bypass
that
will
get
passed
on
to
the
application,
and
it
just
says
to
the
application:
don't
do
any
rate
limiting
this
is
you
know
special
customer
x,
or
this
is
the
gitlab
runner
and
we
don't
allow.
A
D
For
now
we're
okay,
because
we're
not
getting
rid
of
aj
proxy,
but
we're
removing
the
logic,
a
lot
of
logic
out
of
h.a
proxy,
which
I
think
is
good
for
us
long
term.
Assuming
we
keep
with
the
nginx
controller,
which
maybe
we
will
so
I
don't
think
that's
a
problem.
But
we
will
need
to
make
sure
that
we
can
route
based
on
header.
B
D
Request
paths:
I
was
just
taking
a
look
at
our
aj
proxy
config.
I
think
what
we
have
like
there's
nothing.
D
C
Obviously,
haven't
prepared
a
thing,
but
what
I
was
doing
last
week
was.
A
We've
got
the
saturation
framework
that
we
used
to
measure
like
all
different
types
of
utilization
and
saturation,
and
we
added
two
new
things
into
that,
and
those
are
the
container
memory
which
is
basically
how
close
a
container
is
to
to
reaching
its
limit
for
for
memory
and
then
the
same
thing
for
cpu
and
what's
really
nice
about
these
metrics
is
that
they
have
our
existing
service
and
shard
and
stage
labels
on
them.
A
So
you
know
if
we
deploy
something
into
staging
and
it
goes
awry,
then
you
know
we'll
we'll
be
able
to
see
it
and
and
isolate
it
very
quickly.
Obviously,
for
the
people
on
this
call,
I
think
it'll
be
more
obvious
that
you
know
a
pod.
You
know
at
the
moment
we're
kind
of
going
on
pod
names.
A
You
know
this
is
a
canary
get
shell
git
lab
shell
container,
and
so
I've
been
trying
to
get
it
so
that
you
know
we've
we've
already
got
these
existing
labels
type
shard
and
stage
which
we
use
for
lots
of
things,
and
I've
been
trying
to
massage
the
metrics
that
we
get
from
cubelets
and
from
c
advisor
and
from
cubestate
metrics
into
the
the
taxonomy
that
we
use
for
our
labels.
A
You
know
the
way
we
divide
things
up,
so
we've
got
to
get
service,
we've
got
a
web
service,
we've
got
an
api
service
and
we
want
those
metrics
to
to
to
be
the
same
way.
So
if
I
go
just
take
a
look
at
this
and.
C
A
Yeah,
okay,
that's
that's
the
yeah,
I'm
actually
trying
to
find
yeah.
Well,
we
can
look
at
it
here
as
well,
because
it's
repeated
in
sure
yeah,
perfect
yeah,
exactly
thank
you.
So
there
you
can
kind
of
see
it's
really
really
busy
at
the
moment.
Probably
too
busy.
One
of
the
things
I
was
thinking
of
doing
is
actually
not
having
the
this
level
of
detail,
because
it's
too
much,
but
what
we
could
possibly
do
is
just
put
like
quantiles
in
here.
So
we
say
like
this
is
the
19.
A
You
know
1999
quantile.
This
is
the
50
year
the
median
and
we
can
have
like
a
sort
of
cloud
of
of
where
the
spread
of
these
values
is.
I
don't
know
how
people
feel
about
that
or
if
they
actually
want
to
see
these
individual
parts,
but
it
is
also
worth
pointing
out
like
that
that
bug
in
sidekick,
in
the
background
migrations,
there
was
something
spinning
that
had
been
spinning
for
three
weeks
and
you
know
as
soon
as
we
saw
this
on
the
on
the
sidekick
dashboard.
A
You
know
I
was
drawn
to
it
and
then
we
discovered
the
bug
and
and
we
could,
we
could
dig
it
up
where
at
the
moment
you
know
no
one's
really.
As
far
as
I
know,
going
into
the
kubernetes
metrics
and
and
looking
where
this
kind
of
drew
that
out
a
little
bit.
I
think
so.
That's
kind
of
the
first
thing
and
then
the
second
thing
that
I'm
doing
is
obviously
at
the
moment.
We've
got
these
dashboards
that
we
generate
and
we've
got
lots
of
useful
stuff
on
here
and
for
vms.
A
You
know
you
can
just
open
this
up
and
you
can
go
to
node
metrics
and
you
can
immediately
see
like
a
whole
bunch
of
information
about
about
the
machines
that
are
running
this
piece
of
the
fleet
right
and
what
I
really
want
is.
I
want
exactly
the
same
things
for
kubernetes
without
people
having
to
kind
of
navigate
through
namespaces
and
nodes,
and
you
know
all
the
other
stuff,
and
so
I
just
wanted
to
be
like
the
same
kind
of
straightforward
sort
of
thing.
A
We've
got
the
metrics
catalog
and
I've
just
got
a
small
amount
of
configuration
in
there,
which
is
saying
that
for
the
git
service
we're
running
two
deployments,
one
called
get
one
called
shell
and
it's
got
one
container
called
gitlab
shell,
and
then
we
have
a
second
service
called
web
service,
and
it's
got
these
two
containers
that
are
that
are
running
inside
that
and
with
that,
what
you
can
do
is
then.
A
It'll
generate
some
dashboards,
and
these
are
obviously
like
very
much
work
in
progress,
and
I
have
some
questions
that
are
advanced
of
you
guys,
but
so
this
gives
you,
you
know
broken
down
by
our
straights.
You
know
the
the
taxonomy
that
we
use,
so
you
can
go
here
and
you
can
change
us
to
canary
and
it
will
give
you
the
canary
nodes
and
there's
no
sort
of
navigating
around
with
namespaces
or
anything
like
that.
So
that's
that's
quite
different
with
canary.
A
C
Oops
you.
A
Know
we
just
have
the
same
dashboards
and
this
this
was
kind
of
interesting.
I
don't
know
if
mailroom's
got
like
a
leak
in
it
or
something
like
that,
but
it
it
seems
to
have
these
big
drop-offs
every
now
and
again
sorry,
I
didn't
show
I
was
showing
the
container
overview,
but
we
also
have
this
deployment
overview,
which
is
which
is
just
a
kind
of
aggregated
view
of
the
deployment.
A
So
you
can
see
here
it's
broken
down
by
the
three
clusters,
and
I
don't
know
if
that's
something
that
we
want
or
if
we
want
to
just
aggregate
all
together
or
you
know,
let
me
know
whatever's
whatever's
best
for
people,
but
you
you
can
kind
of
see
the
registry.
There
was
some
kind
of
you
know
again.
A
You
know
so
we
can.
We
can
build
these
dashboards
with
a
very
small
amount
of
configuration
and
we
kind
of
get
all
the
other
stuff,
so
yeah
registry,
during
that
deploy,
there
was
like
a
massive
drop-off,
so
I'm
guessing
that
the
registry
also
has
leaks
of
some
sort
that
that's
that
we
need
to
take
a
look
at
perhaps
but
anyway.
A
Obviously
this
is
very
basic
at
the
moment,
like
I
really
like
to
have
things
on
here
like
evictions
and
you
know
a
whole
bunch
of
different
failures,
but
all
tied
into
the
way
that
we
think
of
the
application
you
know
with
the
stages
and
the
shards
and
the
and
the
services
that
we've
got
so
I've
kind
of,
I
think,
I've
kind
of
cracked
the
the
way
to
do
this,
and
you
know
we've
just
got
the
deployments
going
as
well.
A
So
that's
great
so
the
question
I
the
first
question
I
have
is
whether
this
needs
to
be
its
own
dashboard.
Obviously
there'll
be
more
things
on
here,
but
we
can
either
have
this
as
a
zone
dashboard
or
we
can
have
it
built
into
you
know
the
main.
A
A
The
only
thing
that's
sort
of
pushing
me
towards
a
separate
dashboard
is
these
dashboards
are
getting
a
bit
big
and
a
bit
slow.
So
I
was
thinking
we
could
do
some
nice
linking
where
you
know
right
up
at
the
top.
It's
like
go
to
the
kubernetes.
You
know
and
have
like
some
good
navigation
between
the
different
dashboards,
but
I'm
keen
to
get
other
people's
opinions
on
that.
E
Can
I
ask
just
one
or
two
questions
andrew
and
completely
ignore
if
you
want,
but
like
I'm
looking
like
you
scroll
down
to
this
dashboard?
Actually
I
don't
know
about
others,
but,
like
I
spend
most
of
the
time,
looking
at
that
and
not
going
to
detail
there
for
every
one
of
these.
A
E
A
So
yeah
yeah,
exactly
because
it's
the
the
service
level
stuff
is
what
you
get
on
this
page,
and
you
know
that's
like.
Is
this
affecting
users
and
yeah
you're?
Quite
right,
like
that's?
Why
we
don't
have
like
cpu
and
memory
and
stuff
on
these,
because
it's
it's
about
like
user
experience
rather
than
you
know
the
the
machine,
so
that
kind
of
sounds
like
you
want
to
go
with
a
separate
page
as
well.
I.
E
Mean
I'm
just
seeing
how
it
looks
to
me,
I'm
not
saying.
A
No,
no,
I
think,
that's
a
reasonable,
I
think
that's
reasonable.
The
other
thing
that
I
was
wondering
about
was
how
would
people
feel
about
having
the
deployments
have
the
same
name
for
canary
and
main
stage,
because
it
would
make
my
life
so
much
easier,
obviously
they're
in
different
name
spaces.
A
But
obviously
you
know
we've
got
stuff
like
this
over
here.
Where
is
it
this
over
here
this?
This
is
actually
wrong
for
canary.
So
I
could
I
could
figure
that
out,
for
you
know
in
in
canary.
I
think
it's
gitlab
dash
cny
their
shell
and
you
know,
obviously
the
other
one
is
not
gitlab
dash
main.
So
it's
not
like
gitlab
dash
stage
dash
something
they're
different
things
and
it
makes
it
kind
of
difficult
and
also
if
they
have
the
same
name,
those
two
deployments
it's
easier
to
do.
A
You
know
comparisons
between
them.
You
know
with
with
computers,
and
so
you
can
say
just
like
look
at
all
the
things
between
the
gitlab
cny
namespace
and
the
gitlab
namespace
and
kind
of
do
like
for
like
comparisons.
If
you
want
and
use
that
for
some
sort
of
health
checks,
so
I
don't
know
if
people
feel
super
strongly
about
that,
but
it
would
make
my
life
a
whole
lot
easier
if
they
had
the
same
name.
B
A
Okay,
so
are
those
I
need
to
take
a
look
at
what
those
are,
but
have
they
got
hard-coded
into
them
like
gitlab
cny
like
are
we
doing
alerting
at.
A
Yes,
okay,
I'll
review
that
and
if
I
can
nudge
it,
then
would
you
be
okay,
with
with
us
kind
of
moving
over
to
matching
names,
matching
deployment
names.
D
D
D
C
F
What
we
could
do
is
go
through
and
ensure
we
have
fully
functioning
name
overrides
in
place
so
that
you
could
be
a
little
more
accustomed
with
them,
but
as
of
right
now,
it's.
F
F
There
would
be
a
longer
discussion
and
that
would
definitely
be
a
breaking
change,
which
would
be
a
major
version
of
the
chart.
Yeah.
A
A
Yeah
like
what
I
would
like
to
ultimately
from.
Obviously
I've
got
my
head
in
the
prometheus
metrics
for
this.
What
I'd
really
ultimately
like
to
end
up
with
is
that
we're
not
using
like
regular
expression,
matches
to
kind
of
select
metrics
anywhere
like
it's
like
you
know,
and
everything
it's
like
the
deployment
is
this
and
the
you
know
we're
not
doing
like
question
mark
cny
and
you
know
all
sorts
of
hacky
things
like
that,
because
I
you
know
it
just
kind
of
helps:
build
more
structure
around
the
metrics.
A
We
could
so
so
I
mean
that's
a
that's
a
very
good.
That's
a
we
could
label.
So
I
actually
brought
us
this
up
with
jav
yeah.
We
we
could
totally
do
that,
so
I
asked
for
it.
I
asked
for
the
our
type
label,
which
is
effectively
a
badly
named
service
label
on
the
on
the
deployments,
and
I
you
know
that
would
be
super
helpful
already.
A
We
could
just
use
our
own
label
and
then
match
on
that,
but
we'd
need
to
add
it
to
the
deployments.
That
might
be
a
much
better
option.
F
A
A
So
we
we
could
come
just
use
our
own
label
for
that.
Maybe
that's
a
better
way
to
do
it,
and
then
we
just
leave
the
deployments
as
they
are.
F
A
A
B
F
A
Yeah
I
mean
I
okay,
so
here
we've
got
see
what
I
notice
is.
We
don't
have
like.
I
would
quite
like
type
on
here.
Specifically,
you
know
outside
label,
so.
C
A
So
you
know
we,
it
would
be
super
helpful
for
me
at
least
if
we
could
have
a
so
this
there's
what
you
were
talking
about.
I
mean
even
that
is
kind
of
tricky
because
it
doesn't
quite
match.
You
know:
we've
got
a
specific
label
called
stage
and
it's
main
and
the
main
main
and
cny,
and
then
we
need
to
map
that
with
you
know
not
very
intelligent
tools
like
grafana
and
graphonet
into
main
stages.
A
Presumably,
label
release
gitlab
and
then
cny
is
gitlab.cny
so
that
you
know
it's
very
difficult
to
kind
of
map.
If,
if
we
don't
use
exactly
the
same
labels
and
so
yeah,
I
think
you
know
if
we
could
get
the
type
label
on
here,
like
a
label
underscore
type
in
the
same
way
that
we've
got
these
ones.
That
would
be
super
helpful
as
a
starting
point.
C
F
We
do
have
a
global
deployment
annotations
like
one
double
check.
We
have
labels.
C
C
C
F
F
C
E
F
Right,
so
if
this
is
indeed
a
production,
blocker
throw
a
production
blocker
on
it
and
that
will
automatically
give
it
sub
two
or
seven
one
based
on
whether
it's
needed
like
now
or
next
week.
But
those
are
the
labels
that
you're
going
to
want
to
be
able
to
get
this
set
all
the
way
to
the
top.
A
Yeah,
like
my
take,
is
that
when
we've
had
like
issues
with
anything
that's
running
in
kubernetes,
people
are
like
it's
kubernetes
and
then
they
don't
have
like
a
clear
thing.
That's
like
look.
Everything
in
kubernetes
is
fine.
This
is
a
problem
with
the
application
and-
and
I
think
you
know
before
we
carry
on
rolling
it
up
too
much.
We
need
to
have
like
all
of
those
things
in
place,
so.
E
Well,
yeah:
let's,
let's
slap
that
label,
then
we
would
want
to
do
things
faster
if
possible.
D
I
added
a
couple
comments
here.
Andrew
one
is
that
I
think
I
think
this
kind
of
belongs,
maybe
in
what
you
have
as
a
service
overview
dashboard,
which
is
the
number
of
replicas
and
the
number
of
nodes,
because
we've
had
incidents
where,
like
okay
saturation
is
increasing,
but
it
was
due
to
a
scale
down
event,
which
is
something
new
in
kubernetes
that
we
don't
have
with
vms.
D
A
D
Yeah
I
mean
there
isn't:
the
hpa
is
pretty
dumb
right
now,
it's
just
looking
at
average
cpu
utilization
across
all
the
pods
and
it's
hardcoded
to
well.
It
depends
on
the
service,
but
yeah
we
would
have
to,
but
I
think
for
now
I
mean
just
seeing
the
number
of
replicas
and
number
of
nodes.
Okay
would
be
helpful
number
of
nodes,
I
guess,
would
be
at
the
service
and
you.
D
I
don't
know
like
the
number
of
replicas
I
almost
kind
of
want
to
see
that
on
the
on
like
the
service
dashboard,
that's
that's
fine,
but
the
number
of
nodes
I
mean
that
maybe
is
maybe
on
a
different
dashboard
than
I
don't
know
when
you
were
talking
about
aggregating
the
metrics,
because
it's
so
no
noisy,
I
think
like
having
them
broken
down
by
zone,
is
going
to
be
extremely
helpful.
D
A
So
zone
so
I
I've
been
using
cl
that
first
version
that
I
did
was
cluster,
so
cluster
kind
of.
Is
it
the
same?
Who
is
it
because
it's
it
is
yeah,
there's
a
one-to-one
mapping
between
the
two
yeah
cool
okay,
so
for
each
of
those
for
each
of
those
clusters.
Sorry
for
each
of
those
zones,
we
just
want
to
show
the
median
aggregate,
the
average.
D
A
D
So
yeah
I
think
that
would
be
a
good
start
and,
and
so
you're
not
thinking
there'll,
be
a
selector
for
zone
at
the
top.
But
rather
just
the
panels
will
have.
A
A
You
know
the
ones
that
ship
with
kubernetes
those
grafana
dashboards
and
people
can
always
go
to
that
if
they
want
like
seriously
deep
dive,
you
know
whatever
those
dashboards
have
on
them,
but
like
so
it's
never
going
to
kind
of,
because
it's
the
one
thing
about
the
way
that
I'm
doing
it
is
that.
Obviously
I
have
to
build
these
recording
rules
for
every
metric
that
I
use
to
get
the
type
labels
on
and
it's
fairly
expensive.
So
we
don't
want
to
have
like
everything
in
there.
A
Right
and
so
do
we
want
to
have
yeah.
I
was
going
to
say:
do
we
want
to
have
like
a
like
a
dash,
a
general
like
node
pool
health
dashboard
and
then
kind
of
from
the
service
overview?
You
can
say
like
show
me
like
the
node
pool
for
this
service.
Sorry,
it's
not
many
too
many.
It's
it's
one
service
is
in
a
node
pool,
but
it's
only
ever
in
one
node
pool
right.
D
A
It's
not
it's!
That's
not
strictly
true
in
every
case
at
the
moment,
is
it
yeah?
It's
not
strictly
true
in
everything
yeah,
so
yeah
yeah.
So
that's
that's
fine!
As
long
as
we
can
have
you
know,
so
it
won't
be
like
directly
attached.
It'll
more
be
like
you
know,
this
is
the
node
pool
that
this
services
is
is
running
in
and
what
about?
We
will
always
all
the
components
for
a
service
like
what
we
call
gits
at
the
moment,
which
is
get
lab
shell
and
web
service.