►
From YouTube: 2021-09-01 GitLab.com k8s migration EMEA
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
So
I
guess
I'll
go
ahead
and
get
started
with
the
agenda.
I
wanted
to
showcase.
A
Let
me
back
up
before
I
start
showcasing
anything
kubernetes
in
a
recent
version
decided
to
get
rid
of
the
docker
runtime
they've
recently
switched
to
running
container
d.
A
A
If
that
shell
was
not
enough,
such
as
the
requirement
to
install
a
tool
that
required
root
access
to
certain
portions
there's
another
method
using
docker,
where
you
could
exec
into
a
new
container,
attach
it
to
the
space
and
the
network
space
and
such
to
the
container
that
you
want
to
interrogate
and
you'd
have
all
the
necessary
access
available
to
you
container
d
does
not
have
this
same
capability
but
run
c.
Does
you
just
have
to
know
how
to
leverage
it?
So
I
thought
I
would
take
our
demo
and
show
us
how
we
would
do
that.
A
But
let's
say,
for
example,
we
want
to
troubleshoot
a
container,
so
the
easiest
method.
For
me,
I'm
going
to
leverage
our
canary
stage
in
production
because
not
all
node
pools
are
running
container
d
versus
the
docker
runtime,
and
this
is
just
due
to
the
upgrade
progression
and
things
happening
through
incidents
and
my
mitigations
that
we've
done
over
the
course
of
time.
Production
has
suffered
more
incidents.
Therefore,
production
has
more
continuity,
running
node
pools,
so
it's
easier
for
us
to
use
canary
as
a
as
a
testbed.
A
A
A
A
Okay,
so
let's
pick
a
pod,
this
is
staging
so
like
I'm
not
worried
about
breaking
things.
I
should
have
started
with
this
anyways
instead
of
production,
but
so
be
it.
So
I'm
just
going
to
pick
a
websockets
pie
just
because
so
k
get
pods
and
what
we
want
out
of
this
is
where
it's
running,
so
we
can
get
that
information
by
just
passing
in
the
wide
flag
which
gives
us
which
node
the
container
is
running
on.
So
I
know
that
we
are
now
running
on
this
node
here.
A
A
The
other
piece
of
information
that
I
need
to
know,
because
this
is
container
d,
it's
a
little
more
complicated
because,
like
normally,
what
you
would
do
is
to
do
a
dr
ps
and
then
poof.
You
have
all
your
information,
but
because
docker
is
not
the
runtime,
you
don't
have
that
information
cry.
Control
is
the
container
d
equivalent
to
providing
us
this
information,
and,
if
I
remember
correctly,
the
command
is
not
ps
list
containers,
so
it
is
ps.
A
A
I
only
see
one
of
those,
so
I
picked
a
poor
example
where
this
particular
node
is
only
running
one
pod
of
something.
If
I
were
to
pick
a
node
say
in
production,
where
there's
four
pods
running
on
it.
You're
gonna
see
four
gitlab
workhorses,
but
you
don't
know
which
pod
or
container
that
resolves
to.
So
I
can't
tell
you
from
via
the
cry
control
command
that
this
pod
that
I
want
to
interrogate
is
this
container
that
is
running
here.
A
That's
kind
of
the
downside,
so
in
order
to
get
that
information,
what
you
would
do,
let's
see
jq
status
container
status.
I
stat
hi,
fine!
Oh,
why
didn't
anyone
tell
me?
I
did
that
wrong?
A
A
A
So
now
that
we
have
all
that
necessary
information,
we
now
have
the
ability
to
do
fun
things
well.
Firstly,
I
guess
I
should
go
back
a
little
bit
like
so
previously
we
would
do
like
a
docker
exec
and
would
just
pipe
in
the
container
id
cry.
Control
does
allow
us
to
do
that.
So
if
I
do
an
exec,
I
think
I
still
need
the
tty
and
let's
see
if
this
works
then
bash.
A
A
You
get
to
a
show,
you
get
the
id,
you
get
that
you're,
the
get
user,
but
you're
still
limited
like
you,
can't
switch
to
root
because
it
requires
authentication
which
we
don't
have.
So
your
tooling
capabilities
are
still
limited.
You
can
still
do
a
curl
on
localhost
and
do
fun
things
if
I'm
assuming
you
know
curls
available
so
stuff
like
that
is
available
to
you.
But
what,
if
you
want
to
do
something
more
invasive
like
run
a
p
trace
or
an
s
trace,
or
you
want
to
do
a
tcp
capture
stuff
like
that?
A
You
cannot
do
that
inside
of
the
container.
As
is
you
have
to
become
root
in
some
way,
shape
or
form
you
have
to
do
other
things
in
order
to
install
your
tooling
that
you
may
require
so
having
this
really
lengthy
container
id
is
very
important
because
we
need
to
interface
with
run
c
and
cry.
Control
doesn't
provide
us
the
native
capabilities.
If
I
do
exec
and
help
you'll
notice,
I
can't
specify
the
user,
and
I
can't
specify
various
kernel
capabilities
that
I
want
to
be
able
to
execute.
Previously.
A
A
We
could
specify
the
user
that
we
want
to
log
into
etc.
So
that's
what
we
need,
but
there's
a
catch
with
this.
If
I
do
run
c-
and
I
forget
the
exact
commands,
because
they're
all
slightly
different
ps
inside
of
run
c
gives
you
the
list
of
processes
running
the
setting
here.
So
it's
more
like
the
process
list
command
list
is
the
listing
of
the
containers,
that's
running
by
the
run
c
container
runtime.
So
if
I
do
a
list,
you
don't
get
any
information
which
is
highly
unfortunate.
A
A
A
A
A
We
don't
want
it
to
be
removed
from
the
cluster,
but
we
want
to
separate
it
out,
so
we
can
continue
interrogating
it
that
way,
we're
not
blocking
auto
deploys,
but
we
could
still
interrogate
this
container
or
we
could
segregate
this
container,
so
it's
longer
receiving
customer
traffic.
We
could
still
look
at
it
in
some
way,
shape
or
form.
I
want
to
figure
that
out
next,
if
that's
possible,.
A
D
D
C
A
A
A
Discussion
items
I
wanted
to
talk
about
the
two
ongoing
migrations
that
we're
accomplishing,
so
we
all
know
that
the
web
migration
is
now
completed
at
this
point.
What's
left
is
cleanup
and
documentation
updates
grain
did
get
started
on
this.
We've
removed
the
virtual
machines
in
our
pre
and
staging
environments,
and
we
removed
the
deployer
mechanisms
and
patch
mechanisms
so
we're
no
longer
deploying
to
the
nodes
that
still
exist
in
production.
A
And
then,
after
that,
it's
just,
I
think
it's
we
still
have
tuning
left
to
accomplish
cluster
b
is
still
in
our
testing
phase,
so
it's
slightly
different
than
the
other
two
clusters.
Grain
is
currently
managing
that
and
I
haven't
looked
into
the
state
of
that
issue.
Yet
so
I
think,
what's
left
is
to
either
push
that
change
out
to
the
rest
of
our
clusters
or
continue
tuning.
A
So
I
need
to
follow
up
on
that
one
and
henry.
Regarding
your
question
about
appdex
drops
during
deployments,
I
need
to
create
an
issue
for.
D
We
this
into
this
all
the
time
also,
since
we
moved
api
because
we
see
the
same
thing
there
right,
but
somehow
we
never
really
came
to
a
good
conclusion
like
we
couldn't
really
prove
that
it's
related
to
plots
being
slow
when
they
are
new
or
new
nodes
right.
We
couldn't
really
prove
that
and
you
had
the
suspicion
that
maybe
it's
just
being
the
readiness
checks,
health
probes
have
probes
being
accounted
wrongly
to
the
up
decks
when
we
terminate
pots.
D
So
that's
really
strange
because
I'm
not
sure
if
we
really
have
an
issue,
because
error
right
is
not
going
up,
just
the
up
decks,
dropping
down
yeah
and
I'm
not
sure
if
we
really
have
an
issue
there
or
not.
The
thing
is
just
every
time
we
deploy
and
we
see
the
updex
drops.
I
think
oh
something's
going
wrong
yeah,
okay,
but
I
also
have
no
good
idea
of
what
it
could
be.
It's
just
very.
A
D
A
A
A
Okay,
so
pages
migration.
This
is
something
that
just
recently
got
kicked
off
I'll
share
my
screen
really
quickly,
because
I
do
have
some
things.
I
could
at
least
showcase
a
little
bit.
I've
pre-populated
our
epic
with
a
variety
of
issues,
I'm
currently
working
on
trying
to
get
pages
working
on
kubernetes.
A
I
did
notice
that
there's
a
lot
of
missing
stuff
in
our
helm
chart
preventing
me
from
getting
that
started.
So
I
expressed
this
in
my
last
moment
meeting,
but
just
to
reiterate
this.
My
goal
is
to
start
with
prepod
as
a
method
of
learning.
How
pages
needs
to
operate
inside
of
kubernetes?
A
It's
a
new
service
to
me
entirely.
So
I
need
to
learn
about
it
in
the
first
place
and
then
get
it
working
in
pre-broad
and
then
leverage
staging
as
a
way
to
test
how
we're
going
to
perform
the
migration,
because
I
know
we're
going
to
have
to
make
some
fun
changes
to
our
aj
proxy
cookbook
in
order
to
enable
a
smooth
migration
to
avoid
downtime
and
such
so
staging
its
primary
focus
is
just
going
to
be,
let's
figure
out
how
we
do
this
in
a
non-outage
driven
methodology.
A
At
some
point,
we
need
to
sit
here,
and
this
issue
right
here
is
to
determine
how
the
configuration
needs
to
look
like
for
resources
as
well
as
tuning
it,
so
that
it
runs
well
and
our
hpa
responds
accordingly,
because
we
do
have
the
occasional
customer
will
perform
a
release
and
then
pages
suffers
because
everyone
clicks
on
their
blog
post
linking
ends
up
landing
on
pages
and
kind
of
drives.
Ups
up
the
wall.
B
B
B
B
Maybe
we
need
it
in
the
beginning
if
we're
going
to
do
mixed
deployment
with
vms.
B
I'm
thinking
it
can
be
just
one
entry
for
the
kubernetes
cluster
and
then
let
kubernetes
deal
with
low.
I
mean,
then
you
have
false
load
balancing
there,
because
then
from
hi
proxy
perspective,
yeah
with
just
one
end
point,
you
know
what
is
the
name
in
the
back
end?
I
bet
one
backhand
in
proxy
and
then
actually
there
are
more
it's
more
capable
than
the
others,
but
yeah.
We
should
definitely
figure
out
if
we
can
remove
it
from
from.
A
I
know
already
that
we
are
losing
logs
from
the
pages
service
when
people
attempt
to
make
a
connection
and
then
there's
something
wrong
with
ssl,
whether
it
be
our
fault,
the
client
fault.
What
have
you
like?
It
doesn't
matter
to
us
as
operators
of
the
service,
but
to
the
people
that
are
using
the
service.
It
will
matter
too.
So
I
know
pages
doesn't
log
every
single
connection
that
comes
into
it,
so
he
proxy
for
that
point
might
be
beneficial
and
then
acls.
A
So
I
already
know
that
there's
reasons
to
keep
it
in
place,
but
I
do
think
it's
worth
exploring
because
from
a
technical
standpoint,
if
it's
just
going
to
do
the
tls
termination
in
the
future,
there
would
be
caching
in
front
of
it
right
now.
There's
not
aj
proxy
is
just
kind
of
a
dumb
proxy
at
this
point
not
really
doing
much,
so
I
have
to
we'll
have
to
look
at
any
existing
rule
sets
and
such
to
figure
out
what
we
want
to
do
in
the
future.
But
I
do
think
that
is
a
post-migration
task.
A
A
Let's
see
as
far
as
migration
blockers,
I've
identified
at
least
three
items
that
are
we're
just
missing
inside
of
our
helm.
Chart
that
enables
certain
options
this
one,
it
looks
like
gitlab,
blogger
is
not
properly
starting
potentially
and
it's
preventing
us
from
capturing
the
first,
a
few
seconds
in
which
gitlab
pages
starts
up,
which
is
kind
of
crucial,
because
when
gitlab
pages
starts
up,
we
get
some
clear
information
as
to
whether
or
not
the
service
is
healthy
or
not.
So
knowing
that
information
when
the
pod
starts
is
quite
crucial.
A
So
in
this
proposal
I
ended
up
taking
all
three
of
these
issues,
because
you
know
distribution
is
busy
with
things.
My
proposal
is
just
to
remove
gitlab
logger,
because
page
is
already
outputs
in
a
structured
log
format,
we're
completely
missing
the
network
policy.
So
we
lack
a
bit
of
security
that
we
need
to
maintain.
A
So
I'm
trying
to
get
that
pushed
into
place
and
then
there's
various
options
that
are
missing
inside
of
pages
so
items
I
could
think
quickly
are
stuff
related
to
the
let's
see
I
had
two
merge
requests
for
this,
so
one
was
is
adding
h,
a
proxy
b
to
support.
We
currently
leverage
this
in
production.
This
currently
exists
in
pages,
but
does
not
exist
inside
of
our
home
chart.
So
it's
simply
something
we
cannot
yet
activate.
A
So
I
need
this
in
place
before
I
can
push
this
into
pre-prod
and
then
all
this
has
been
merged.
Okay,
so
we've
got
at
least
one
merged
item
for
this,
so
that's
kind
of
cool.
A
So
in
this
last
one
I
marked
it
as
a
blocker
just
so
that
I
could
discuss
it
very
quickly.
A
So
in
a
different
thread,
I'm
like
hey
is
it
actually
work
because
it
is
configured
correctly
and
I
know,
inside
of
our
service,
it's
configured
to
talk
to
sentry.
A
So
jamie
was
nice
enough
to
spin
up
an
issue
to
figure
out
whether
we
simply
aren't
throwing
any
errors,
which
I
know
is
kind
of
false,
because
I've
seen
areas
fly
through
our
logs
and
I've
seen
errors
fly
through
a
metrics.
A
C
A
C
Wanted
to
hijack
this
and
ask
some
questions.
So
jarek
was
asking
the
other
day
what
happened
to
a
background
migration
psychic
job
he
had,
and
we
know
that
long-running
sidekick
jobs
aren't
a
great
fit
for
kubernetes
because
well,
because
your
container
can
go
away
anytime,
basically
and
yeah.
So
we
know
that's
a
risk,
but
here
I
was
just
trying
to
figure
out
why
why
the
container
went
away.
So
we
can
see
that
the
we
get
the
context
deadline
exceeded
for
the
health,
the
readiness
or
the.
C
I
think
it's
the
readiness
one
and
in
some
cases
sidekick
seems
to
be
shut
down,
gracefully
and
other
times
it
just
gets
killed.
Jav
pointed
out
that,
actually,
let
me
just
share
my
screen.
C
C
D
C
C
C
How
how
do
I
go
further
here,
like
you
know,
I
can
see
so
there's
one
here
that
was
from
yesterday
that
just
got
killed
this
one
from
the
other
day
was.
C
Oh,
I
think
this
time
range
will
be
wrong,
but
I
think
this
one
got
terminated
more
gracefully
so
like
it
got
to
do
its
own
shutdown
steps
but
like
how
do
I?
How
do
I
do
anything
useful
here?
Basically,
sorry,
let
me
just
oh
wait.
This
is
the
wrong
one.
C
B
B
That
has
a
timeout
for
receiving
an
answer,
and
this
timeout
is
not
met.
So
this
means
that
sidekick
is
not
giving
back.
The
redness
probe
timely.
C
B
Yeah,
what
I'm
thinking
here
is
just
random
ideas
that
something
got
in
between
maybe
memory
limits
or
something
like.
So
it
received
a
signal,
and
maybe
the
handle
for
the
signal
how
it
is
implemented
in
prometheus
is
that
it
stop
accepting
new
connection.
So
while
it
was
terminating
the
job,
even
though
it
doesn't
have
enough
time
for
terminating
it,
it
was
reported
as
not
ready
because,
obviously
a
terminating
only
new
incoming
connection.
That's
why
you
see
this.
C
It
could
be
it's
in
the
same
process
is,
it
is
the?
Is
the
metrics
web
server
in
the
same
process
as
the
psychic
threads?
I
can't
remember.
C
Yeah,
so
if
it
is,
I
mean
it's
ruby
right,
so
if
it
is
in
the
same
process
and
the
site,
one
of
the
sidekick
threads
is
doing
something
cpu
heavy,
then
the
readiness
probe
will
fail
because
it
can't
like
the
request,
can't
be
handled
like
it.
Won't
it
won't
obtain
the
global
vm
lock,
but
if
that
was
the
case,
I'd
expect
this
to
happen.
A
lot
on
the
cpu
bound
pods.
I
don't
actually
know
if
it
doesn't
happen
on
the
cpu.
Now
it
wants
to
be
fair.
B
A
I
guess
thank
you
yeah,
so
I've
got
two
thoughts.
Here
is
one
the
fact
that
we
don't
see
the
liveliness
probe.
Failing
is
a
little
bizarre,
so
either
the
liveness
probe
is
not
configured
or
it
might
be
configured
a
little
differently.
A
D
C
C
Okay,
but
it
doesn't
matter
one
step:
it's
not.
A
Traffic,
so
it's
not
going
to
be
inside
of
a
service
per
se,
so
you
know
sidekick
itself
is
what
pools
work.
Nothing
is
distributing
work
to
sidekick
in
this
particular
case,
so
I
think
there's
two
avenues.
We
need
to
explore
one:
let's
look
at
the
liveness
probe
and
see
how
it
works,
because
I've
have
forgotten
at
this
point
how
it's
configured
and,
let's
determine
if
the
liveness
probe
is
causing
these
failures
and
determining,
if
that
is
leading
to
the
pod
being
restarted.
A
If
I
recall,
jarp
had
started
slack
thread
and
discover
that
these
pods
were
not
showing
a
restart
count
being
incremented.
Oh.
A
I
think
I
might
be
making
this
up,
because
I
dream
a
lot,
the
other
avenue
that
might
be
worth
exploring,
because
I
did
see
the
message
that
the
process
rapper
was
killed.
So
that
would
lead
to
me
to
believe
that
it's
not
kubernetes
that's
interfering
here,
but
instead
the
node
killer
might
be
being
invoked.
A
D
C
D
C
Yeah
he
said
it's
expected.
The
pod
will
be
terminated
after
reaching
the
limit
yeah
and
he
said
so
that
we
should
be
able
to
get
whether
it's
hitting
the
limit
from
the
lobs.
But
I
don't
know
if
he
means
the
node
logs
there
or
the
pod
logs.
C
Use
database
throttles
is
the
one
we're
interested
in,
oh
sorry,
which
node
or
which
chart.
D
Yeah,
I
look
for
for
the
notes
that
you're
running
on,
like
the
catch-all.
A
I
wonder
if
it's
default
sean
you
still
had,
you
might
still
have
the
tab
open.
That
tells
us
which
node
pool.
C
A
D
D
C
D
You
mean
the
the
dashboard.
C
C
Did
you
say
yes.
A
C
A
Events
index
that
should
have
this
information.
C
D
A
C
A
C
C
A
A
A
Okay
thanks
and
if
you
need
additional
help,
let's
try
to
prioritize
this
that
we're
not
blocking
yorick's
work
here.
C
C
Yeah,
that
might
be
the
issue
here.
A
gig
seems
quite.
A
C
No,
it's
the
same
as
what
we're
using
for
memory
bound.
I
was
just
trying
to
think
in
terms
of
vms
like
how
much
memory
did
they
have.
I
guess
they
were
doing
it
more.
They
were
doing
more
things
right,
like
it's,
not
just
the
memory
for
the
psychic
process,
so
it's
difficult
to
compare
yeah
but
yeah,
okay,
yeah.
A
All
right,
so
it
sounds
like
we've
got
enough
information
to
move
forward.
Is
there
anything
anything
else
we
want
to
talk
about
that
part.
A
Cool
so
our
last
item,
our
goals,
I
think
our
goal
should
be
to
knock
out
the
remainder
of
this
web
migration,
which
we're
currently
doing
and
I'm
going
to
create
a
new
issue
to
slow
it
down.
So
that's
perfect,
so
cool
all
right.
That's
all
I
have
to
discuss
anyone
have
any
further
questions.
Otherwise
we
could
well
I'm
going
to
go
to
lunch
cool.
Well.
Thank
you.
All.
Have
a
lovely
day
see
you
later.