►
From YouTube: 2021-06-16 GitLab.com k8s migration EMEA
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
B
Luckily,
so
far
we
haven't
created
any
degradation,
but
I'm
actually
kind
of
at
a
loss
as
to
how
to
proceed
forward,
because
I
feel
like
I've
made
improvements,
but
it's
harder
to
actually
visually
see
those
so
I'll,
first
go
through
what
I
changed
so
far
and
then
I'll
show
some
charts
as
well.
So
so,
firstly,
this
was
cluster
b
that
we're
modifying
and
it
was
already
slightly
configured
differently
than
the
rest
of
our
clusters.
So
number
one
item
is
the
amount
of
our
hpa
tuning
was
at
a
value
of
2600
versus
the
we.
B
I'm
not
showing
anything
yet
sorry,
I
just
wanted
to
give
a
quick
highlight
and
then
we
also
have
a
worker
count
difference.
In
this
case
we
have
it
configured
to
six
versus
the
standard
4
on
the
rest
of
our
clusters,
so
the
first
change
I
did
was
to
modify.
B
We
changed
the
amount
of
cpu
requests,
there's
hpa
average
cpu
requests,
so
we
we
tuned
it
closer
to
the
value
of
the
amount
of
workers
that
were
running
so
instead
of
2600
or
2300.
I
believe
what
was
it
before?
It
was
2600.
We
bumped
it
up
to
3
000.
with
the
intention
that
we
would
end
up
removing
some
pause
from
rotation
which
did
occur.
We
also
saw
ruby
thread
contention
bump
up
a
little
bit,
which
was
expected,
but
the
abstracts
didn't
change.
B
We
do
have
these
awkward
periodic
drops
that
happen
quite
frequently,
but
we
can
see
that
cluster
b
is
the
red
line.
I
should
have
put
the
the
legend
here.
Cluster
b
is
the
red
line.
We
can
see
it's
actually
higher
than
the
rest
of
the
clusters,
which
is
a
good
thing,
so
we're
actually
doing
pretty
good
memory.
Saturation
was
still
high
that
I
wanted
to
address
and
that's
exactly
what
I
did
next
was
move
or
existing
limit
from
seven
gigs
and
bump
it
up
to
ten
gigabytes,
which
should
put
us
around
seventy
percent
usage.
B
In
this
case,
we
went
from
having
around
around
eighty
five
percent
and
we
dropped
all
the
way
down
to
fifty
five
percent
ramp
usage.
So
I
thought
that
was
great,
so
we're
making
progress,
but
we
haven't
made
any
angle,
progress
towards
lowering
the
amount
of
pods
and
nodes
that
are
being
utilized
just
yet
pods
a
little
bit,
but
not
enough
technically.
B
So
the
next
thing
I
did
was
modifier
hp
again
in
this
case
I'm
trying
to
squeeze
a
few
more
or
more
workload
onto
the
existing
number
of
pods.
So
we
changed
the
target
average
utilization
from
3000
and
bumped
up
to
4000..
So
this
is
currently
equal
to
the
amount
of
cpu
that
we
request,
which
is
still
lower
than
the
amount
of
workers
that
we
run
and
with
that
I
noticed
that
the
cpu
saturation
for
the
nodes
that
were
running
more
pods
than
say
one
pod.
B
So
a
lot
of
these
nodes
where
we're
running
when
they
run
pod,
eventually
they'll
get
cleaned
up,
but
it
appears
the
cluster
auto
scaler
is
kind
of
sluggish
at
that
point.
So
I'm
kind
of
disappointed
by
that,
but
the
one
thing
I
did
want
to
highlight
was
that
the
ruby
thread
conduction
jumped
right
back
up
to
just
shy
of
80
percent,
so
I
don't
really
want
to
modify
that
any
further
butter
pod
counts
are
a
lot
lower.
B
So
here
excuse
me:
pod,
counts,
yeah,
we're
running
like
37
pods
per
down
to
like
running
30
as
of
right
now,
but
like
our
nodes
they
jumped
when
we
had
to
deploy,
and
they
just
been
slowly
trickling
down
like
it's.
The
time
between
these
two
events
is
about
15
minutes,
so
I'm
I
don't
think
we'll
actually
see
the
usefulness
of
this
experiment
because
it
takes
so
long
for
the
nodes
to
spool
down
in
an
effort
to
see.
B
If
I
can't
jump
start
that
process,
I
was
watching
our
memory
saturation
and
it
had
jumped
up
to
80
again.
I
thought
let's
try
to
get
that
down
again
and
because
that's
going
to
cause
rotation
of
all
the
pods
due
to
that
configuration
change,
mabel,
that'll,
re-compact
things
that
way.
We
potentially
have
no
pods
running
on
some
of
our
nodes
and
then
maybe
the
cluster
autoscale
will
get
rid
of
stuff
pretty
quickly.
So
that
was
the
last
thing
I
did
that
was
about
20
minutes
ago,
and
that's
where
I'm
at
now.
B
B
C
I
don't
like
this
really
yeah
having
having
the
hba
target
and
the
requests
at
the
same
value
means
that
we
constantly
we
will
have
pots
going
over
and
below
and
especially
going
over,
isn't
really
nice
on
one
hand,
because
I
plan
to
have
saturation
metrics
in
this
and
every
time
we
are
oversaturated.
It's
not
nice.
Looking
in
the
graphs
right
and
the
avalanche
is,
is
an
average.
C
It's
not
a
maximum
right,
so
I
think
they
should
at
least
have
it
slightly
the
rigor
slightly
higher
than
the
average
so
that
we
fit
most
of
it
below
it.
Maybe
I
think
that
would
be
better.
I
mean
if
we
sometimes
have
pots
going
over.
We
can't
do
much
about
it.
I
think
because
there's
always
a
lot
of
variance
in
cpu
usage
yeah,
but
but
I
think
we
should
not
have
it
equal
and
and
to
the
other
point
for
forcing
notes
going
down.
C
I
think
where
we
can
get
forward
here
is
now
only
to
see
if
we
can
fit
pots
as
good
as
we
can
on
our
nodes
right
like
leaving
just
a
little
bit
of
of
allocatable
resources
over
if
we
fit.
Let's
say
two
or
three
pots
on
on
each
node:
I
think
more
efficiency
can
only
get
can
be
gained
if
you
maybe
have
bigger
pots
which
exactly
are
matching
on
the
nodes.
Maybe.
B
B
C
But
with
the
4
000
requested
cpu,
I
think
we
can
fit
three
pots
on
each
note
right.
B
C
B
That
that's
a
perfectly
legitimate
item
that
I
was
considering
because
right
now
we're
not
seeing
any
more
than
two
pods
anyways
at
4
800
requests,
which
puts
us
right
around
80
of
the
ahpa
target
average
value
that
would
give
us
2.6
pods
per
node,
which
you
know
we're
not
going
to
ever
see.
So
I
think
that
would
be
a
fine
change
that
I
might
do
next.
C
Yeah
but
but
if
you
say
we
don't
see
more
than
two
ports
on
our
nodes
anyway,
maybe
it's
the
kubernetes
is,
if
you
scale
up
right,
then
we
fill
nodes
with
bots
and
then
later
traffic
drops
down
and
then
we
remove
pots
from
nodes.
But
we
don't
remove
nodes
very
fast
right.
It
takes
a
long
time,
so
we
will
run
for
a
long
time
with
not
really
filled
up
nodes
and
if.
C
B
B
D
I
think
that
they
are
also
considering
the
fact
that
spinning
up
a
note
is
an
expensive
operation,
because
you
cannot
until
it's
completely
provisioned
you,
you
cannot
utilize
it.
So
maybe
it's
a
kind
of
the
bouncing
of
of
spikes,
because
you
may
have
a
spike
of
traffic.
You
increase
the
number
of
machine,
then
it
goes
down
just
maybe
just
go
down
because
you
are
not
allowed
to
serve
stuff
right
so
and
then
it
goes
up
again
and
you
just
destroy
the
machine
and
you
need
to
recreate
it
from
scratch.
B
Yeah,
so
the
15.0
that
I
see
is
just
from
our
metrics
I'd
have
to
go,
which
I
haven't
done
yet
look
at
our
logs
to
see,
if
maybe
they're
evaluating
more
often-
and
I
just
haven't,
seen
the
results
of
those
changes
so
I'll
I'll
be
looking
into
that
during
the
rest
of
this
experiment
as
well,
so
so
yeah,
that's
where
I
stand
currently
for
this
interesting
experiment.
I'm
just
glad
I
got
the
change
request
approved
because
it's
not
one
that's
kind
of
standard
policy.
So
I'm
happy
about
that.
D
Can
I
do
a
very
naive
observation
based
on
my
understanding
what
is
happening?
So
it
sounds
to
me
that
basically,
we
have
those
boxes
and
we
want
to
fit
football
balls
inside
it
and
basically
there's
a
lot
of
empty
and
used
space,
and
the
problem
is
that
we
have
just
one
size
of
both
instead
of
having,
which
is
more,
the
the.
D
So
if
you
look
at
the
all,
the
commercial
around
docker
and
kubernetes
is
very
designed
for
having
different
size
and
type
of
workloads
so
that
you
can
just
put
everything
together
and
they
kind
of
naturally
squeeze
together.
So
they
are
kind
of
filling
the
empty
spots
right.
So
you
have
a
better
optimization
of
traffic,
and
but
we
have
just
one
size
because
kind
of
all
the
it's
just
one
single
application,
and
so
all
the
parts
are
more
or
less
same
amount
of
resources,
same
size,
same
cpu,
request
and
so
kind.
B
The
everything
is
filled
with
air,
so
they're
compressible,
but
in
terms
of
be
able
to
squeeze
another
football
in
you
can't
because
you
just
can't
shove
another
item
into
that
box.
You
know
it's
full.
The
limits
is
where
we
would
end
up
seeing
the
compression
of
those
footballs.
B
B
B
I
just
need
to
go
back
to
the
history,
to
figure
out
why
we
switched
to
c2s,
because
I
know
that
was
a
project
in
of
itself.
E
E
A
A
There
was
something
we
previously
migrated
that
was
also
really
under-provisioned,
and
we
had
trouble
tuning
that
down.
I'm
wondering
what
we
ended
up
with
on
that
one
like
how
close
we
managed
to
hit
it
in
the
end,
because
it
wouldn't
have
been
like
for
like
right,
because
we
we
knew
we
were
underprovisioned.
A
B
Henry's
tuning
the
show
currently
so
I'm
not
actually
sure,
but
also
keep
in
mind
that
when
we
first
migrated
all
these
services
over,
we
didn't
really
spend
a
lot
of
time
afterwards,
like
this
is
the
first
migration,
where
we're
spending
a
lot
of
time
afterwards
tuning
these
values.
After
the
fact
we
haven't
done
this
for
any
other
workload,
that's,
which
is
what
henry
is
currently
doing,.
C
I
can
talk
about
what
I
found
at
least
for
tuning,
so
not
really
demoing
much
on
that,
but
the
basic
things
that
I
was
looking
to
observability
and
what
magics
are
missing
in
kuwnitas,
and
I
noticed
that
for
a
lot
of
these
saturation
metrics
that
we
got
before
in
the
vm
fleet.
We
now
need
to
do
it
differently,
because
we
can
either
measure
saturation
against
set
limits
in
kubernetes,
like
cpu
limits
and
memory
limits,
which
is
what
we
do
until
now.
C
But
for
certain
deployments
like
api,
for
instance,
we
don't
have
set
cpu
limits,
so
we
can't
measure
anything
and
the
other
thing
is
measuring
saturation
against
the
request
that
we
use
to
reserve
space
on
the
nodes,
which
is
also
determining
how
how
communities
is
scheduling,
ports
and
then
trying
to
find
place
for
them
and
by
trying
to
find
the
saturation
metric
for
base
on
requests
for
cpu.
I
found
that
for
a
lot
of
our
deployments.
C
We
are
totally
over
saturated
and
that's
especially
true
for
us
sidekick
github,
shell,
also
for
registry
a
little
bit,
and
so
we
need
to
adjust
there
and
forget
that
it's
a
lot
of
work
because
for
each
single
shot
we
need
to
adjust
the
settings
there
maybe
see
how
it
reacts
and
how
it
we'll
play
with
that.
I
mean
it's
all
working
fine
right
now
I
mean
we
didn't
run
into
any
big
problems.
C
So
by
adjusting
those,
we
will
see
maybe
more
usage
of
nodes
for
for
those
deployments,
but
we
should
be
safe
or
not
running
into
a
situation
where
we
are
out
of
space
sources,
but
because
we
plan
from
the
end
and
set
the
wrong
requests
and
kubernetes.
C
That's
what
I'm
working
on
currently
like
I'm,
not
working
on
that
I'm
making
issues
for
that.
I
just
also
made
2mrs
for
registry
in
gitlab.
Shell
already,
but
for
sidekick
that
would
be-
I
don't
know,
10
mrs
to
work
on.
C
So
that's
just
an
issue
that
I
created
with
some
suggestions
on
how
to
work
on
that
they're
all
put
into
this
epic
I
created
for
this,
which
is
linked
here
yeah,
and
this
is
currently
blocking
my
saturation
metrics
work
a
little
bit,
because
if
you
don't
fix
this,
then
the
saturation
metric
as
soon
as
we
enable
it
will
cause
alerts,
and
then
we
would
need
to
silence
all
of
those
and
fix
it
later
so
yeah.
We
need
to
see
how
we
work
with
this.
C
If
you
find
fine
time
fixing
this,
that
would
be
great
as
there's
still
enough
work
to
work
on
other
observability
issues
for
kubernetes.
So
I
can
work
on
that
and
this
can
wait
a
little
bit.
A
Does
this
overlap
with
the
taints
work,
the
other
epic?
We
have.
C
I
don't
think
so,
besides
that
having
better
observability
on
the
vr
saturated
certainly
would
help,
I
guess,
but
I
it
shouldn't,
have
a
lot
of
tainting
it's
just
to
determine
on
on
which
nodes
we
have
certain
pots
running
right,
which
we
don't
do
for
some
of
the
kinds
of
pots
that
we
have
like
most
of
the
logging
and
monitoring
things.
I
think.
A
B
The
overlap
comes
in,
you
know
how
we
schedule
our
workloads,
but
for
the
most
part,
we've
been
trying
to
segregate
our
workloads
on
their
own
node
pools
and
we've
got
some
workloads
that
shouldn't
be
running
where
they
do
primarily.
The
monitoring
stack
is
running
on
certain
node
pools
that
we
don't
want
them
to.
I
think
that's,
I'm
pretty
sure.
That's
the
overall
goal
of
that
epic,
but
I'd
had
to
re-review
it.
C
I
mean
one
thing
that
would
be
helpful,
for
that
is
having
a
saturation
metric
for
how
many
many
requests
are
we
already
using
on
on
a
node
right?
I'm
not
sure
if
you
have
those
anywhere.
I
wanted
to
look
into
that
next,
because
if
you
see
that
which
node
is
having
allocated
how
many
requests
already,
then
it's
much
easier
to
see
if
we
could
fit
in
more
parts
or
where
we
are
inefficiently
using
nodes,
and
I
think
that
would
be
a
helpful
metric
also
for
looking
into
how
retained
things
but
they're
not
closely
related.
C
A
And
are
you
right
now,
are
you?
Are
you
making
these
changes
to
registry
and
get
like
sheldon's
sidekick?
Is
that
or
are
you
talking.
C
About
the
saturation
I
created
those,
mrs
because
they
were
easy
and
fast,
but
I
didn't
execute
them
because
skymac
was
working
on
kubernetes
right
now,
but
they
can
be
executed
any
time
when
we
feel
fit
for
that.
C
No,
because
for
github
channel
registry
that
can
be
done,
it's
not
too
much
work,
but
for
sidekiq
that's
a
lot
of
work
because
for
each
single
shot
you
need
to
find
the
right
values
and
most
of
them
are
tuned
in
the
wrong
way
for
memory
and
cpu,
and
this
is
just
work
to
be
done.
But
but
this
I
don't
know
that
would
take
a
while
to
get
this
all
finished
is.
A
B
If
we
could
configure
the
alert
disablement
inside
of
our
like
configuration,
I
think
that'll
be
a
wise
choice.
Just
until
we
get
the
work
completed,
because
I
don't
it's
not
worth
our
time
to
create
an
alert
if
it's
not
going
to
be
useful
and
it's
just
more
work
for
the
sra
team
to
create
the
the
silence
and
re-establish
that
silence
until
we
complete
that
word.
C
A
C
Look
at
them
and
if
you
can
just
for
the
three
deployments
registry
github
inside
click,
disable
the
alerting,
then
we
still
can
see
how
we
do
an
api
and
the
other
services
right.
So
that
was
the
idea.
A
A
Okay,
yeah,
let's
make
sure
something's
moving
forwards
either
like
we
push
on
and
get
the
saturation
metrics
so
that
they
can
be
used
or
if
we
aren't
in
a
place
where
we
can
do
this.
Let's
be
clear
and
like
park
this
stuff
and
move
on
to
some
of
the
other
observability
work,
so
graham,
is
making
good
progress
on
the
web
migration
so
like.
Let's
focus
on
doing
what
we
need
to
improve
observability
for
web
migration,
so
that
that
goes
a
bit
easier.
A
Cool,
okay,
sorry,
scott.
I
also
skipped
over
your
point
in
that
point.
So
let's
talk
about
your
theory.
Aptx
drops.
B
B
I
was
pairing
with
henry
earlier
this
morning
and
we
watched
the
production,
deploy
go
out
and
I've
got
a
theory
where
we
are
using
our
default.
Rolling
strategy
for
deployment
and
the
default
strategy
provides
you
with
a
max
surge
of
25.
What
that
means
is
that
if
you
have
100
pods
it'll
add
at
most
25
pods
during
the
replacement
period,
we
also
have
a
max
unavailable
set
to
25.
B
B
That
way
the
same,
we
could
get
it
as
close
as
possible
to
the
same
amount
of
pods
taking
traffic
before
and
after
deploy,
but
this
record
last
time
I
checked
this
requires
a
helm
chart
update
to
add
that
capability.
B
This
is
just
a
theory
at
the
moment.
Obviously,
I
think
the
only
way
to
really
prove
this
is
to
really
dig
into
our
logs,
which
will
be
a
little
time
consuming,
because
we
have
to
cross
check
the
logs
between
the
events
of
kubernetes
and
the
events
of
when
a
pod
stops
and
starts
taking
traffic
throughout
a
deployment
cycle,
which
is
not
an
easy
task
to
do.
I
plan
on
creating
an
issue
for
this.
I
just
haven't
yet
because
I've
been
working
on
this
change
request.
A
B
Oh,
this
is
one
of
those
things
where
I
think
anyone
could
benefit.
So
if
we
put
in
the
hilton
chart,
leave
it
there,
but
this
should
not
be
a
difficult
add
to
our
home
chart.
We
would
follow
the
same
commonality.
B
A
Cool
yeah,
it
sounds
like
a
really
good
one
to
test
out.
That
would
be.
That
sounds
like
it'd,
be
a
really
good
thing
to
try
and
sell.
I.
B
A
B
C
That's
good.
I
mean
that
already
becomes
an
issue.
If
we
tune
api
first
in
canary
right,
then
usually
it
shows
very
big
abdex
drops
where
in
g
product
we
don't
see
it.
So
we
can't
use
the
same
values
for
tuning
or
we
need
to
choose
different
values
for
for
the
the
maximum
search
in
canary.
It's.
A
B
The
only
last
thing
I
wanted
to
make
comment
on
is-
and
this
will
probably
impact
henry
the
most,
but
I
did
want
to
comment
that
we
did
yesterday
introduce
a
new
shard.
It's
simply
called
imports,
the
only
worker
that's
on
it
is
the
repository
import
worker.
B
The
reason
for
this
new
shard
is:
it's
been
identified
that
we
sometimes
will
kill
a
pod
while
an
import
is
happening
and
for
imports
that
take
a
lengthy
period
of
time
that
import
might
get
killed,
it'll
get
picked
up
by
a
new
scikit
worker
and
then
some
other
scaling
event
occurs
and
kills
that
pod.
So
then
that
job
ends
up
having
to
be
requeued
again
and
again
and
again.
B
B
B
C
But
during
deployments
it's
with
the
based
outputs
right,
oh.
B
C
A
Nice
thanks
for
dealing
with
that
one
skype,
do
you
know,
like
sort
of
how
long
do
you
think
it'll
be
before
we
know
if
we've
got
enough
resources
for
that
or
whether
we'll
have
to.
B
We're
going
to
try
to
complete
that
evaluation
today,
it's
not
entirely
easy
just
due
to
the
nature
of
the
way
work
works
but
and
release
management
is
kind
of
a
little
busy
these
days.
But
aside
from
that,
I'm
hoping
to
try
to
complete
that
today.
A
B
No
like
this
one's
going
pretty
smoothly,
it's
just
a
matter
of
getting
the
work
done.
A
Cool
okay
sounds
good
and
henry
I
should
ask
you
saying:
do
you
need
any
help
with
any
of
your
stuff
you're
working
on.
C
A
Cool
sounds
good
great,
was
there
anything
else,
and
I
want
to
discuss
or
demo.
A
Nope,
okay,
awesome
just
to
say
kind
of
update,
so
graham,
is
working
on
putting
together
a
setup
for
nginx
that
will
allow
api
and
web
to
both
run.
He's
also
going
to
check
that
the
the
setup
he
comes
up
with
will
support
pages
as
well
and,
like
all
of
the
things
will
just
plug
into
that.
A
So
he's
going
to
work
on
that
next
few
days
and
then
hopefully
right
after
14.0
he'll
have
something
he
could
put
on
to
pre
and
we
can
actually
then
see
what
it
looks
like
and
he'll
do
like
a
side
by
side
like
comparison
with
existing
setup.
So
hopefully
the
web
stuff
is,
will
start
to
start
to
become
visible
in
the
next
week
or
so.
A
Super
all
right,
thank
you
very
much.
Everyone
really
great
to
see
everything.
I
hope
you
all
have
a
good
rest
of
your
day.
Take
care.