►
From YouTube: 2020-07-31 GitLab.com K8s migration EMEA
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
B
Yeah
sure
thing
so
no
update
on
enabling
live
traces.
B
Just
checking
to
see
yeah
nothing,
nothing
new
here,
support
for
the
dependency
proxy.
I
think
this
is
done.
I
think
this
was.
B
You
have
an
mr
for
it:
okay,
nfs
dependency
on
pages.
Jakub
is
working
on
this.
Hopefully
this
will
happen
soon
and
then
I
don't
think
there's
been
any
significant
update
for
mapping
services
to
cloud
native,
but
we're
not
completely
blocked
we're
just
gonna
be
blocked
very
soon
as
soon
as
we
finish,
the
get
https
migration
and
state-of-the-art
logging,
which
is
a
funny
title,
but
this
is
just
what
it
happened
to
be
when
it
was
submitted,
but
yeah.
This
is,
it
sounds
like
we.
B
We
may
use
the
sidecar
approach,
and
I
owe
I
think
I
owe
mare
an
issue
in
the
gitlab
tracker
for
trying
to
see
if
we
can
fix
this
at
the
application
by
annotating
logs
with
the
source.
I
think
this
is
going
to
be
problematic,
though
just
because
we
have,
we
also
have
the
unstructured
logs,
like
the
production.log
application.log,
all
the
stuff.
I
I'm
not
sure.
If
there's
much
we
can
do
with
the
app,
but
you
can
see
any
comments
on
those
blockers.
B
B
My
plan
for
next
week
is
to
have
canary
running
by
the
end
of
next
week
to
have
canary
running
in
kubernetes,
as
well
as
some
of
the
production
traffic,
and
what
I'll
do
is
I'll
use
the
kubernetes
cluster
as
like
a
single
server
in
the
list
of
servers
so
that
we
take
like
maybe
you
know,
a
single
digit
percentage
of
traffic
to
the
cluster,
so
we'll
start
with
canary
and
then
move
to
that,
we'll
probably
let
it
sit
like
that
for
a
while
and
then
maybe
the
following
week,
we'll
do
the
full
migration
finish
staging
so
far
I
mean
there
weren't,
really
any
issues.
B
B
I
am
a
little
bit
concerned
about
log
volume
because
you
know
a
little
bit
different
than
what
we
have
now
is
that
we
get
all
of
the
logs
going
to
the
rails
index
and
all
of
the
logs.
B
Well
just
the
workhorse
logs
for
the
workhorse
index,
but
for
the
rails
index
we're
getting
like
the
production.log
application.log.
All
these
are
are
going
to
it
yeah.
I
guess
there
isn't
a
whole.
There
isn't
a
whole
lot
to
see.
B
Anything
maybe
this
well,
so
maybe
this
sidecar
should
be
a
blocker
for
us
going
to
production.
If
we're
worried
about
it.
I
I
think,
like
we're,
gonna
increase
the
volume
by
quite
a
bit
and
also
have
a
lot
of
junk
in
the
rails
index
that
you
don't
want
to
look
at
yeah
and
there's
not
in
the
senate.
D
E
B
E
Yeah,
I
think
we
talked
about
it
a
bit
yesterday.
I
just
don't
remember
what
I
said
because
it
was
yesterday,
so
I
mean
I
think
it
should
be
a
blocker
and
actually
I
think
I
said
that
in
the
end
after
you
explained
how
it
is
not
right,
I
don't
know
how.
C
C
B
Yes,
yes
yeah,
I
guess
okay,
so
we're
going
to
make
that
a
blocker,
which
means
that
we
may
be
delayed
on
getting
production
migrated
because
of
it.
But
I'll
update
that
issue
to
let
distribution
team
know
that
it's
a
blocker
and
we'll
probably
have
to
tackle
it
next
week.
E
B
Well,
we're
not
running
blind,
but
we
have
like
we're
kind
of
over
dosing
on
stimulus.
You
know
like
it's
like
too
much
because
we're
getting
all
these
unstructured
logs
as
well.
It's
just
like
it's
really
good
yeah.
B
D
I
mean
if
we
could
just
turn
off
the
like
if
it
was
urgent,
that
we
got
this
into
production
and
we
wanted
to
avoid
the
blocker.
Could
we
not
just
turn
off
the
production
rails
log
and
then
everything
else
would
kind
of
be
okay,
but
then
it's
only
that
one
and
no
one
uses
that
so
like
we
don't
want
it.
B
E
C
B
B
Okay,
let's
yeah,
let's
table
it
and
we
can
discuss,
but
obviously
it's
a
high
priority
thing
if
it's
going
to
block
us.
B
This
is
the
dashboard,
the
pod
info
dashboard,
which
is
just
a
copy
of
like
the
existing
dashboard
we
have
for
sidekick
and
mailroom
2ml.
I
guess
the
thing
to
note
here
is
for
staging
I'm
using
the
default
requests
and
limits.
So
we
have
limit
of
one
and
a
half
cpu
and.
B
B
So
you
can
see
that,
like
you
know,
we're
spiking
up
over
a
core
on
some
of
these,
so
I
mean
I'm
not
sure,
like
maybe
we're
gonna
have
to
adjust
the
limits
a
bit
from
memory.
It
looks
like
that,
like
we're,
also
kind
of
spiking
spiking
up
on
memory
and
like
going
up
to
three
gigabytes,
let
me
just
set
the
two,
so
I
think
we
need
to
do
some
investigation
here
to
see
whether,
like
maybe
our
limits
are
too
low.
B
So
next
is
there's
a
there's,
a
new
issue
for
kind
of
summarizing
where
we
are
in
the
migration
after
a
year,
and
one
thing
I
put
together
today
is
this
dashboard
called
the
kubernetes
migration
like
overview
efficiency
dashboard.
B
What
I
did
is
I
I
totaled
up
the
number
of
cores
and
the
amount
of
memory
that
we
were
running
on
vms
prior
to
the
migration.
This
has
remained
fairly
stable.
I
don't
think
we've
added
I'll
have
to
double
check,
but
since
we
started
the
migration,
I
don't
think
we've
added
many
nodes
in
the
web
or
api
or
git
fleet.
B
B
So
we
had
like
1500
cores
when
we
were
running
on
vms
and
you
know
about
5000
gigabytes
of
memory.
So
I
have
two
panels
here.
This
is
actually
a
little
bit
negative.
I'm
not
sure
why
this
single
stat,
I
have
to
figure
out
or
no
it's
actually
not
negative,
never
mind.
It's
very.
B
So
maybe
that's
why
it's
showing
zero,
but
it's
kind
of
funny,
if
we
add
up
the
number.
So
I
have
the
total
number
of
cores
for
virtual
machines
right
now
and
the
total
number
of
cores
in
the
kubernetes
cluster
right
now
and
if
you
add
these
up
and
as
well
as
the
number
of
memory
we're
sitting
in
approximately
about
the
same
amount,
we're
running
under
vms,
which
kind
of
makes
sense
like
we're
about.
Even
so
what
I'm
hoping
is
that
right
now
we're
a
bit
over
provisioned
in
the
kubernetes
cluster.
B
So
what
we
can
do
is
hopefully
like
drive
this
memory
savings
and
cpu
savings
up
a
little
bit
so
that
the
total
amount
of
cores
and
the
total
amount
of
memory
is
like
less
than
we
were
running
prior
to
the
migration.
B
I'm
not
done
with
this
dashboard.
Yet
I
plan
to
add
some
more
things.
What
I
want
to
do
is
break
it
out
by
service
and
show
like
the
total
number
of
cores
and
memory
and
utilization
per
service,
as
well
as
maybe
showing
like
both
node,
auto
scaling
and
hpa
scaling
to
see
how
it's
working
in
kubernetes
is
there
anything
anyone
would
like
to
see
here
or
first
of
all,
I
guess.
E
It
could
be,
it
could
be
just
my
my
day-to-day
and
the
warmth
here,
but
I'm
really
having
a
bit
of
a
trouble
understanding
the
the
numbers
I'm
seeing
sure
are.
Are
you
are
you?
Is
this
the
current
state
as
in
there
is
no
historical
comparison
right
like
if
we
are
going
to
say
registry
from
august,
2019
and
register
from
august
2020
we
can
like
this
is
just
this
is
what's
happening
right
now.
Is
that
correct.
B
B
This
is
these
are
the
number
of
cores
we
are
running
without
kubernetes,
and
this
is
not
like
the
amount
of
memory
we
were
running
without
kubernetes
this.
What
it's
taking
is
the
total
number,
of
course,
between
vms
and
kubernetes,
and
dividing
it
by
the
number
of
cores.
We
were
running
prior
to
the
migration,
so
as
we
as
we
like
shift
cores
over
from
the
vms
to
kubernetes,
this
number
might
remain
the
same,
but
hopefully
like
we're,
utilizing
cores
more
efficiently
in
kubernetes,
and
that
number
will
like
the
savings,
will
increase.
E
Yeah,
I
guess
what
I'm
missing
here
is
well.
First
of
all,
the
services
you
said,
you're
working
on
that
that's
going
to
be
a
good
thing,
because
it's
going
to
allow
you
to
right
like
we're
mangling
too
many
things
at
the
same
time.
So
it's
it's
hard
for
me
to
understand,
for
example,
specifically
for
registry,
what's
happening,
what
would
be
actually
also
interesting
to
know
is
if
there
is
a
way
for
us
to
say
like
what
kind
of
what
kind
of
number
of
requests
were
we
serving
at
a
certain
point.
E
So,
for
example,
if
we
were
serving
a
thousand
requests
in
august
and
we
are
serving
15
000
requests
now
we're
doing
better
or
worse.
C
B
Memory,
you
want
to
know
how
many
requests
we
were
serving
and
because
obviously
like
like
it
could
be
that
we're
saving
more
cores
and
memory
because,
like
the
number
of
requests,
have
increased.
D
It's
very
difficult
over
time
to
I
I
understand,
and
I
think
it's
the
problem
is
like
the
workload
changes.
So
so
if
the
workload
was
like
fixed,
then
it
would
be
very
easy
to
do
that.
Calculation,
but
obviously
you
know
endpoints
get
more
efficient
and
endpoints
get
less
efficient
and
and
so
you're
trying
to
juggle
like
lots
of
different
variables.
E
D
Like
if
you
could
get
it
over
a
longer
period
like
if
you
took
a
month
long
period-
and
you
said
right
in
this
like
not
a
short
period,
not
like
an
hour
or
minute,
but
he
said
in
this
month,
you
know
the
growth
was
three
percent
on
this
month
and
you
know
we
used
20
percent
of
his
calls
or
whatever.
Then
then
that
would
be
valid.
But
I
think
if
you
were
looking
over
a
short
period,
it
would
be.
D
B
B
Yeah
cool
but
yeah
I
mean
I
was
kind
of
surprised
to
see
that
we're
about
even
right
now,
but
I
guess
not
too
surprised
like.
B
Yeah,
so
so,
first
of
all,
this
includes
everything
except
pages:
petroni,
redis,
kiddily
and
prefect,
so
all
the
stuff
that
we're
going
to
migrate
over.
When
I
say
even
what
I'm
saying
is
that
like
before,
like
we
were
on
vms,
we
had
1514
cores.
If
you
sum
up
the
total
number
of
cores
between
virtual
machines
and
kubernetes
in
the
kubernetes
cluster,
now
we're
about
at
1500.
B
D
E
B
Well,
keep
in
mind
that
git,
but
gitweb
and
api
are
like
the
services
that
we
are
scaling
the
most,
and
these
are
snapshots
of
today,
not
not
like
a
year
ago.
Registry
is
a
snapshot
of
a
year
ago
or
so,
but
registry
like
we
typically
don't,
have
scaling
issues
for
registry,
at
least
on
bms.
We
didn't
like
we
were
running.
B
E
B
Yeah
so
I
think
it's
difficult,
but
I
like
just
being
able
to
see
like
the
number
keeping
a
tally
of
the
number,
of
course,
and
the
amount
of
memory
we
have
in
the
cluster
and
making
sure
that's
under
control
like
because.
E
And
improve
it
the
way
you
want
it,
but
just
do
a
right,
like
you
know,
in
a
google
sheet,
this
is
the
amount.
Of
course
we
had
before
the
migration.
This
is
the
amount
of
requests
we
had.
This
is
the
amount
of
we
have
right
now.
This
is
the
amount
we
have
of
requests
right
now
like
it
can
be
as
simple
as
that,
and
then
we
can
just
plot
that
very
low
level.
A
Cool
will
this?
Will
this
end
up
being
a
way
of
us
actually
seeing
some
of
the
kind
of
progress
of
the
migration
as
well?
So
I'm
guessing
we'll
see
like
number
of
calls
coming
down
and
on
the
vms
yeah.
I
guess
it's
not
like
it's
like
a
hard
progress
tracker,
but
like
it's
a
little
bit
of
a
visual
right
of
progress.
B
E
D
Okay,
cool-
I
don't
know
how
deep
to
go
into
this,
but
I
can
give
you
a
quick
demo
of
what
I
did.
I
think
it
was
last
week
or
the
week
before
we
were
talking
about
how
to
figure
out
which
sidekick
jobs
are
talking
to
nfs
and
the
two
approaches.
Maybe
I
should
close
the
door
give
me
one.
Second.
E
D
Kids,
decide
to
start
playing
outside
my
room
as
I
start
talking.
So
basically,
the
two
approaches
are
one.
We
like
send
a
message
to
sentry
like
every
time
an
access
happens,
or
we
do
something
like
that,
which
is
the
approach
that
we
used
on
gidley
when
we
were
doing
a
similar
sort
of
migration
to
move
away
from
nfs
near
the
end
of
the
gitly
project,
and
that
was
pretty
successful.
D
But
this
is
kind
of
like
it
was
a
bit
more
of
a
risk.
But
the
idea
was
we
could
use
kernel
instrumentation
to
trap
the
nfs
calls
and
then
figure
out
what
was
going
on
at
that
time
to
figure
out
what's
making
nfs
calls,
and
so
it's
a
little
bit
more
risky.
So
I
thought
I'd
give
it
a
try
and
just
spike
it,
and
if
that
didn't
work,
then
we
could
go
to
the
other
route,
which
is
sort
of
less
risky.
D
D
Also,
if,
like
the
the
you
know,
just
if
you
have
any
questions,
let
me
know
so.
Basically
the
way
it
works
is,
I
used
a
library
called
go.
D
Bpf,
which
is
so
bpf
is
a
is
a
kernel,
is
a
is
a
set
of
tools
where
you
can
write
like
c
code
and
like
good,
old-fashioned
c,
and
then
that
c
gets
compiled
and
gets
injected
into
the
kernel,
and
the
reason
why
like
this
is
phenomenal
and
incredible
and
amazing,
is
that
in
the
past,
if
you
wanted
to
run
things
in
the
kernel,
you
know,
you'd
have
to
be
like
a
hardcore
kernel,
developer
and
spend
years
writing
device
drivers
and,
like
even
longer
before
you
put
anything
into
a
production
environment,
and
the
thing
with
bpf
is
that
it's
got
a
thing
called
the
verifier,
which
takes
a
look
at
your
code
and
guarantees
that
it's
safe
right.
D
So
this
it
is
impossible
to
write
bpf
code
and
inject
it
into
the
kernel
that
is
not
safe,
and
so
it's
got
a
whole
bunch
of
limitations
like
you
can't
have
a
for
loop,
which
is
pretty
standard,
but
the
reason
you
can't
have
a
for
loop
is
because
then
it
doesn't
know
that
the
program
will
ever
end,
so
you
could
kind
of
just
go
around
and
run
forever
and
because
you're
kind
of
a
for
loop,
like
comparing
strings,
becomes
quite
difficult
because
you
know,
if
you
wanted
to
compare
two
strings
you
have
while
loops
and
you
kind
of
iterate
through
them,
and
so
there's
a
lot
of
stuff.
D
That's
kind
of
weird,
but
it's
actually
also
surprisingly
powerful
and
people
are
doing
incredible
stuff
with
it.
But
this
is
a
very,
very
simple
program
and
all
it
does
is
it's
got
these
entry
points
and
when
something
happens,
a
little
bit
of
co
co,
a
little
bit
of
code
gets
called,
and
in
this
case,
what's
happening
is
that
whenever
an
nfs
operation,
a
write
operation,
a
read
operation
or
a
file
open
operation,
I
can't
remember
if
I
did
get
actually
yeah
and
get
attributes.
D
If
any
of
those
get
calls
get
called
in
the
kernel,
this
code
gets
executed
and
all
it
does
is
it
gets
the
process
id
of
the
process,
that's
running
that
made
the
call
originally
and
then
it
sticks
it
into
a
table
and
then
pushes
that
table
up
to
userland
and
then
in
userland.
There's
a
go
program
that
I
wrote
that
basically
reads.
All
of
those
events
are
all
the
all
of
all
the
processes
that
are
doing
nfs
access
and
then
does
something
with
it.
D
So
in
this
case,
what
I
did
was
I
the
pro
when
the
program
runs,
it
monitors
the
sidekick
log.
You
know
the
good
old-fashioned,
json
formatted,
sidekick
log,
that
we've
got
and
it
keeps
track
of
all
of
these
start
job
events.
D
So
when
a
job
starts,
it
gets
emitted
with
the
process
id
of
the
job
and
obviously
the
name
of
it
or
the
class
of
the
job
that
was
running,
and
so
what
I
do
is
I
tell
that
that
thing
and
if
it
says
you
know
a
a
web,
sorry,
let's
say
a
web
hook
worker
started
and
it
was
on
pid
53.
D
So
that's
running
along
on
the
one
side
and
on
the
other
side
I
installed
the
bpf
program
into
the
kernel,
and
it's
telling
me
process
id
5
did
nfs
process,
id10
did
nfs
and
then,
if
it
matches
up,
if
it
says
process
id
say,
53
did
an
nfs
write.
Then
on
the
process,
53
is
a
web
bookworker
and
so
using
those
two
things
I
can
say:
well,
it's
probable
that
a
webhook
worker
was
was
doing
nfs
work.
D
The
only
sort
of
requirement
is
that
we
don't
have
more
than
one
a
concurrency
of
one
in
sidekick
and
the
reason
why
we
can't
do
that
is
because,
if
you
have
10
different
things
running
inside
a
single
sidekick
process,
we
don't
know
which
one
of
those
10
things
was
the
thing
that
did
the
nfs,
because
we
don't.
We
can't
go
to
the
thread
level.
D
We
can
only
go
to
the
to
the
process
level,
although
now
that
I'm
thinking
about
it,
I
think
we
could
do
that
if
we
just
add
a
small
thing
to
our
logs,
but
let's
see
how
well
it
goes
on.
So
basically
what
it
does
is
it's,
it's
tailing,
the
log.
It's
picking
up
all
the
the
start,
events
and
then
it's
tailing,
all
the
ineffect,
well,
monitoring
all
nfs
activity
and
then
what
it
does
with.
That
is
every
time
a
job
starts.
D
It
increments,
one
prometheus
counter
and
every
time
a
an
nfs
access
happens
for
that
job.
It
increments
another
one
and
so
theoretically
what
you
could
do
is
you
could
run
it,
for
you
know,
12
hours
and
at
the
end
of
it
you
could
say
like
what
are
the
percentages
and
anything
that's
above
a
few
percent.
I
suspect
that
there
will
be
like
a
few
things
where
the
timing
between
what
we
get
from
bpf
and
what
we
get
from
the
logs
isn't
synced.
D
That
was
probably
something
else,
and
so
it's
kind
of
a
sketchy
kind
of
idea,
but
it
didn't
take
very
long
to
write
this
and
it
just
basically
tails
the
log
parses
it
out
to
jason,
looks
to
see
if
it's
a
start
job
and
then
associates
that
job
with
the
process.
That's
running
it
and
then
paul's
nfs,
sorry
reads
the
the
ppf
data
from
the
from
the
kernel
here
and
it
actually
works
pretty.
Well,
so
am
I
showing
my
whole
screen.
I
think
I
am
yeah.
D
So
skobik
has
set
up
a
machine,
that's
running
sidekick
with
concurrency
one
but
scovic.
It
doesn't
look
like
it's
actually
running
any
jobs.
D
B
D
So
so,
basically,
what
I
was
doing
so
most
of
the
time
for
for
actually
putting
this
together
was
getting
a
linux
environment
that
matches
our
linux
environment
with
the
same
version
of
bcc.
So
we're
actually
running
a
very
old
version
of
bcc,
so
that
was
the
most
difficult
part
was
getting
that
all
working.
So
basically,
I
set
up
a
vagrant
box,
good
old
old-school
vagrant
and
got
it
to
be
the
same
version
and
then
I
could
compile
it
on
there
and
then
copy
it
around
and
then
and
then
run
it.
D
So
basically,
if
I
let
me
just
remind
myself
which
port
so
when
this
starts
up,
it
will
listen
on
10
282
and
it
will
have
metrics
there.
D
So
if
I
go
like
that
should
fail-
and
the
reason
it
fails
is
because
it's
tried
to
inject
ppf
code
into
the
kernel
and
the
kernel's
like
not
happy
about
it,
because
it's
not
root
so
either
you
can
give
the
program
privileges
or
you
can
run
it
as
root,
but
it's
pretty
safe
and
then,
if
we
start
letting
that
run
basically
now
what
was
the
name
of
the
was
it.
D
D
D
If
I
do
curl
sorry
about
the,
I
don't
know
if
you
can
hear
the
kids
having
a
nice
fight
next
door.
So
basically,
what
we've
got
here
is
these
are.
These
are
just
sidekick,
prometheus
metrics,
let's
open
this
up
a
bit
and
you
can
see
that
sidekick
nfs
monitor
jobs
started.
It's
got
the
name
of
the
class
and
the
number
of
of
the
number
of
jobs
of
that
class
that
have
started
so
you
can
see
it's
picked
up.
Six
of
those.
D
You
know,
however,
many
of
those
and
then
down
here.
It's
picked
up.
The
number
of
nfs
monitor
accesses
that
has
picked
up
now.
The
problem
is
in
this
casual
fleet.
It's
actually
going
to
be
slightly
wrong
because
it
doesn't
know
how
to
attribute
so.
Basically,
at
the
moment,
it's
made
an
assumption
that
there's
a
concurrency
of
one
and
there's
not
so
you
know
the
concurrency
is
eight
or
whatever.
So
it's
it's.
It's
pretty
much
wrong
and
you
can't
rely
on
this,
but
once
we
have
the
catch
nfc
working,
we
can.
D
We
can
test
this
properly
and
it
should
work
but
and
the
numbers
should
be
correct,
but
you
can
see
it's
working,
it's
picking
up
and
it's
associating
jobs
with
the
you
know:
nfs
accesses
with
the
different
sidekick
jobs,
so
it
seems
to
work
and
I'll
just
kill
that
off
and
it
uses
about
10
of
one
core
in
terms
of
cpu.
So
it's
not
free,
but
on
that
machine,
that's
got
the
you
know.
Concurrency
one
would
probably
have
a
bit
of
extra.
D
D
Server
I
mean
it
like.
The
other
thing
is
if
we
know
that
there's
jobs
that
absolutely
don't
have
nfs
success,
it'll
kind
of
validate
the
script
because
you
know,
hopefully
what
we'll
see
is
that
there's
no
nfs
access
is
being
attributed
to
those
jobs
and
if
we
do
get
like
high
nfs
access,
then
maybe
this
isn't
a
great
approach,
but
at
least
it'll
help
with
that.
C
B
D
C
D
Yeah
I
mean,
however,
you
want
to
like,
obviously
the
because
we
have
lower
con
like
the
the
way
that
this
works
is
is
statistics,
and
so
you
know
like
what
you
want
is
like
a
high
probability
that
when
these
jobs
run
they
are
doing
nfs
and
in
order
to
do
that,
we
need
to
to
kind
of
focus
and
run
as
many
of
those
jobs
as
possible.
D
So
obviously,
if
we
are
spreading
out
and
running
a
lot
of
jobs
that
don't
that
we
know
don't
have
any
face
on
that
fleet,
we're
kind
of
taking
away
the
sample
size
from
the
ones
that
we
do
want
to
run.
That
with
so,
I
think
initially,
it's
like
good
to
test
it
with
that,
but
then
eventually
just
kind
of
leave
it
on
the
unknowns
yeah.
You
know,
obviously
the
known
yeses
and
the
known
nose
or
sorry,
the
known
nfs
accesses
and
the
known
non-nfs
accesses
we
can
kind
of
ignore,
and
then
it's
just
whatever's
left.
D
C
A
D
Yeah
could
do
that.
The
only
other
thing
is
that
it's
running
as
root,
so
I
don't
know
if
we
want
to
have
it
binding
for
too
long
and
the
other
thing
we
could
do
is
we
could
just
give
it
the
permission
that
it
needs,
which
is,
I
don't
know
some
vpf
thingamabob.
D
You
know
we
could
we
could
give
it
that
one
permission
and
like
I
presume
that's
what
we
did,
because
we've
got
an
epbf
exporter
running
on
all
the
nodes
already,
and
presumably
that's
not
running
as
root.
So
you
know
we've
we're
already
doing
that
on
our
fleet
and
that
might
be
a
safe
way
of
doing
that.
C
D
Like
if
we,
if
we
find
like,
like
I
think
I
mentioned
before,
like
I
think
we
I
would
trust
the
results
after
they've
been
like
a
thousand
invocations
of
a
job
right
and
if
we're
struggling
to
get
that
many,
then
maybe
we
need
more
nodes
but,
like
you
just
saw,
I
ran
that
thing
now
and
there
was
like
a
lot
of
jobs
right
yeah
I
mean
it
was
a
lot
higher
than
I
actually
expected.
So
you
know
hopefully
we'll
get
something
like
that
in
on
that
new
note,.
D
C
C
D
C
D
C
Actually
start
the
migration
to
kubernetes,
we
could
start
looking
at
century
for
those
errors
and
then
we
could
also
look
at
our
existing
dashboards
to
determine.
Well,
you
built
it
into
the
script,
but
we
still
have
our
existing
dashboards
as
well
to
look
at
determining
how
often
those
jobs
were
called
upon.
So
we
have
a
few
things.
We
could
look
at.
That's
what
I'm
trying
to
say.
D
D
A
B
A
Cool
on
your
final
one
andrew,
I
mean
it's
a
great
question
about
it.
It's
I
don't
know
if
it's
directly
related
to
the
kubernetes
migration,
but
it's
it
does
raise
a
good
question
of
whether
we
should
have
a
we
should
be
discussing
those
sorts
of
things,
the
deployment
stuff
as
well,
but,
like
I'm
guessing
like
it's
up
to
you
job
scott
back,
you
keen
to
see
another.
D
D
In
there
thinking,
alessia
was
going
to
be
here
so
that
we
could
we
could
kind
of
run
through
it,
but
he's
kind
of
the
main
target.
But
if
you're
interested,
we
had
a
really
good.
We
had
a
really
good
call
this
morning
to
kind
of
figure
out
what
we're
going
to
do
there
and
basically,
what
it
is,
is
we're
going
to
have
a
separate
set
of
thresholds
for
monitoring
for
release
purposes
and
they're
going
to
be
much
stricter
than
so.
D
Basically
what
happened
was
we
decided
to
use
the
monitoring
thresholds
that
we
use
already
and
then
this
morning
there
was
a
bit
of
a
hiccup,
but
it
wasn't
enough
to
basically
wake
someone
up,
and
so
the
thinking
was
we've
kind
of
optimized
the
the
alerting
to
be
about,
like
not
waking
people
up,
but
actually
the
thresholds
that
we
want
for
starting
a
deploy.
D
It'd
actually
be
ideal
if
they
were
decoupled
from
that,
so
we
could
kind
of
move
them
up
or
down,
and
you
know
if
we
make
them
tighter.
We
don't
have
to
worry
about
whether
that's
going
to
be
people
getting
pager
duties
on
the
weekend.
So
we've
basically
split
it
into
two
different
thresholds:
same
metrics,
but
with
two
thresholds.