►
From YouTube: 2020-12-03 GitLab.com k8s migration EMEA
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
I
will
upload
the
video
from
this
morning
as
soon
as
I
can
scavenge
the
there's
nothing
too
current
we've
spent
quite
a
bit
of
time
reviewing
andrew's
dashboards.
He
gives
a
demo
of
the
dashboards
he's
been
doing
so
it's
an
interesting
video
to
watch
to
see
those
and
then
what
else
we
had
a
bit
of
a
chat
about
the
502
errors,
and
I
expect
the
latest
for
that.
A
One
will
end
up
on
the
issue
and
we
had
a
brief
chat
about
helm,
which
we
can
wrap
up
in
this
call.
So
there's
nothing
missed,
but
it
is
a
worthwhile
video.
A
And
also
graham,
is
finishing
today
is
officially
his
last
day
before
he's
back
kind
of
in
just
after
christmas
he's
doing
the
week
between
christmas
and
new
year
he's
out
for
the
three
weeks.
So
if
there's
anything,
he
said
he'd
check
in
on
the
helm
issue
before
he
finishes,
but
if
there's
anything
else,
we
need
from
him
then
get
over
to
him
sooner
rather.
A
C
B
A
C
A
Interesting
one
day,
the
the
incident,
because
the
first
kind
of
one
of
the
first
indicators
was
the
deployment
abdex.
A
Not
alert
but
like
indicator
warning.
A
Yeah,
but
there
was
nothing
when
we
first
looked
into
it.
I
was
like
no,
no,
they
look.
It
looks
mostly
fine,
there's
just
a
little
bit
of
latency,
so
it's
kind
of.
C
C
A
They
see
like
they
come
periodically
through
the
day,
but
this
was
the
first
one
where
I
was
like
yeah
interesting.
A
Yeah,
oh
okay!
So
let's
get
started,
let's
go
through
the
blockers.
Firstly,
so
we
have
first
up
is
implement,
generate
a
pattern
to
allow
traffic
splitting.
B
B
I
have
a
long
long
thread
about
the
testing
work
that
I've
completed,
but,
for
the
most
part
it's
working.
Okay,
I
think
there's
just
a
few
environmental
variables
that
can
be
plugged
in
and
then
there's
a
making
sure
that
the
images
change
when
auto
deploy
works
appropriately.
I
think
it's
the
last
few
items
that
I
have
on
my
list
of
things
that
need
to
be
checked
up
on.
A
A
B
D
Maybe
we
should
start
writing
up
what
the
migration
plan
is
going
to
look
like.
I
think
we
we
discussed
that
we
wanted
to
do
websockets
action,
cable
first,
but
I
think
we'll
have
to
probably
migrate
both
right
because
we're
going
to
have
a
path
for
action,
cable
and
then
everything
else.
B
D
So
perhaps
we
just
wrap
this
up
with,
like
we're
now
going
to
be
creating
a
new
service
for
websockets
and
action.
Cable,
we'll
create
a
readiness
review
for
that.
That
includes
the
traffic
split.
B
A
D
I
think
I
think
we'll
do
it
together.
I
know
that
marin
really
wants
them
to
drive
it
so
I'll
talk
to
them
first
and
get
their
input,
but
I
did
talk
to
marin
earlier
this
week
and
he
said
he's
fine.
If
we
have
one
readiness
review
for
both
the
infrastructure
work
to
do
the
split
and
induction
cable.
A
Cool,
okay
sounds
good
cool,
so
then
next
one
we've
got
the
structured
logging
geoff.
Do
you
want
to
give
us
a
quick
update.
D
Yeah,
evidently,
it's
done
and
merged
and
then
broke
production,
so
we
reverted
it
bulletproof
canary
there's
a
lot
of
special
consideration.
We're
going
to
need
to
make
here
because
we're
now
wrapping
all
of
the
log
messages
in
json.
So
this
caused
all
sorts
of
problems
with
elasticsearch
who's,
like
you
know,
suddenly
sees
this
wrapped
these
wrapped
messages
and
wasn't
happy
about
it
because
of
the
field
mappings.
D
I
also
had
a
comment
that
I
noticed
that
we're
pulling
this
thing
from
from
github
the
community
like
this
is
a
community
contribution.
This
is
item
number,
seven
or
b7.
D
D
Okay,
like
are,
we
sure,
we're
okay
with
having
a
github.
I
guess
like
this
thing,
what
what
happens,
for
example,
if
this
thing
doesn't
isn't
able
to
build
or
like
github
has
an
outage,
will
our
cng
pipeline
fail,
probably
right.
C
D
D
It's
so
specific
to
us
yeah,
I
think
yeah
I'll
bring
it
up
with
distribution
to
see
what
they
say
like.
Maybe
they
already
planned
to
have
this
guy
maintain
it
for
us
which
is
free
work.
It's
just
I'm
worried
that
we're
going
to
have
to
I'm
thinking
like
for
one
like
when
we
wrap
these
messages.
We
may
want
to
differentiate
between
structured
logs
and
unstructured
logs,
like
maybe
even
having
two
keys
in
the
json
wrap
to
say,
like
this
is
json.
This
is
not
json
or
unstructured.
D
D
D
D
C
C
Can
you
it
would
just
be
flattened,
I
mean
if
they've
done
it
then
we'll
just
have
to
deal
with
it
at
fluence,
right,
yeah,.
B
D
D
D
It
still
goes,
it
still
goes,
you
know
through
fluent,
and
we
can
do
whatever
massaging
we
want
in
fluent
it's
just
expensive,
and
I
would
prefer
it
if
we
didn't
have
to
it'd
be
better.
If
actually,
I
think,
you're
right
just
having
like
a
bear
like
if
it's
unstructured
like.
Maybe
we
should
just
have
an
unstructured
key
or.
C
B
C
D
This
part,
no,
this
is
the
actual
log
file
yeah.
So
it
looks.
It
looks
like
that
yeah,
okay,
but
this
extra
like
like,
for
example,
this
top
level
log
key.
That
seems
like
unnecessary.
I
think
I
completely
agree.
C
What
about
I
don't
see
a
like
a
pod
id
or
container
identifier
in
there.
D
Let's
talk
to
distribution
to
see
what
they
because,
I
think
like
I
think
I
think.
Ideally
what
I'd
like
is
well.
I
think
I
think
this
component
field
is
is
useful,
so
maybe
we
keep
that.
Why
do
we
need
an
extra
date
like,
maybe
that's
unnecessary?
I
don't
know
or
maybe
maybe
it
would
be
handy
to
have,
but
what?
What
is
this
date
anyway?
Like
is
this
when
the
rapper
consumed
the
log
message
and
wrote
it
out?
I
don't
know
if
that's.
D
D
And
they're,
very
they're,
very
close,
but
obviously
like
a
little
bit
delayed
like
the
date,
is
a
little
bit
after
the
time
yeah.
So
I
don't
think
that's
useful
so
component
at
keep
sub
component.
I
don't
know
level,
probably
not
file,
maybe
not,
and
then.
D
B
D
D
Well,
I.
C
C
C
D
C
D
We
do
we
do
it
in
fluent
and
have
to
look
that's.
D
D
Okay,
I'm
not
sure
exactly
where
this
is
done,
but,
let's,
let's,
let's
like
oh
I'll,
follow
up
with
distribution.
I
think
we
want
to
make
sure
that
we
don't
have
to
unwrap.
C
C
The
two,
the
two
things
that
are
deal
breakers
to
me.
I
think
that
it
should
just
be
jason
and
not
jason,
not
a
jason,
encoded
string,
because
I
don't
know,
there's
there's
someone
was
on
the
on
twitter
was
laughing
the
other
day
like
what
percentage
of
compute
capacity
in
the
world
was
spent,
encoding
and
decoding
json
and
like
they
were
taking.
B
C
And
this
is
like
a
classic
case
of
that,
like
I'm
sure
we
don't
need
to
do
that,
but
the
other
one
which
are
just
reduced
bugs
is
for
for
raw
unstructured
logs.
We
have
msg
as
the
message
and
for
structured
logs.
You
know,
you
know
we
don't
we
don't
want
to
check
whether
message
is
an
object
or
a
string,
because
that'll
just
lead
to
more
bugs
right.
Like
people
like,
oh,
I
didn't
realize.
Sometimes
it
could
be
a
string.
C
You
know
I
looked
at
a
thousand
of
them
in
all
objects
and
and
so
like
having.
That
is
like
a
that
to
me
seems
wrong,
like
it
would
be
much
better
if
msg
was
you
know,
log.msg
was
an
unstructured
log
and
like
log
dot,
well
log
dot.
All
the
things
is,
you
know
structured.
Should
I
write
this
down
because
I
know
I'm
not
being
very
clear.
D
This
log
wrapping
happens
right
now.
This
is
not
something
that
was
added,
so
I
think
we'll
have
to
look
at
what
exactly
the
logger
is
doing,
but
I
think
it's
actually
just
taking
the
raw
message
and
stuffing
it
into
this
message
field,
and
sometimes
this
message
field
is
a
raw
string
and
other
times
it's
a
json
object,
so
we
basically,
so
I
think
you
can.
You
can
ignore
the
first
level
of
escape
json.
That's
not
the
problem.
It's
the
second
level
here
under
message
that
you
would
have
to
unwrap
again
in
fluentd.
C
C
D
A
Sure,
let's
skip
down
to
d
and
no
downtime
deploys
must
be
uh-huh.
D
Yes,
for
the
nginx
ingress
controller,
I
think
I'm
just
waiting
on
a
review
for
this
scarbeck.
I
saw
your
comments
and
did
you
see
my
response.
D
Okay,
I
do
think
we
want
this
change.
Your
comment
was
like:
why
does
the
grace
period
matter?
Because
the
pod
gets
terminated
immediately
right,
but
the
change
that
we
merged?
I
think
a
couple
days
ago,
which
is
this
pre-stop
hook
that
runs
in
a
loop
and
it
basically
will
ensure
that
nginx
doesn't
terminate
until
all
active
connections
are
finished.
Okay,
that's!
Why
that's
why
we
need
to
extend
it,
because
I
think
this
may
even
take
longer
than
60
seconds
if
we
have
like
someone
cloning,
a
very
large
repository.
B
B
A
Cool
sounds
good
and
then
just
to
loop
back
andrew
on
the
dashboards
and
things
is
there
anything
you
need
like.
We've
got
the
recording
from
earlier.
Where
we
got
the
walkthrough.
Is
there
any
action
you
need
for
us
to
do
for
you?
I.
C
I
haven't
been
paying
close
attention,
but
the
labels
on
the
deployments
is
the
next
thing
that
I
need,
but
I
I'm
aware
that
there's
been
a
lot
of
discussion
on
that
with
jason
and
I
haven't
been
paying
super
close
attention,
but
once
we
can
get
like
the
same
sort
of
labels
onto
the
hpa,
sorry
onto
the
deployment
sorry
onto
the
hpa
is
what
I
meant
to
say.
Then
we
can
start
adding
a
whole
lot
more
information
with
that.
So
that's
kind
of
what
I'm
blocked
on
there.
D
C
Hpa
had
like,
when
I
looked
at
the
charts,
it
does
have
like
a
customs
labels
thing,
but
I
I
don't
know
how
easy
it
is
to
put
things
on
there
and.
C
D
I
was
hoping
to
get
to
these
already,
but
yesterday
was
like
complete
waste
because
of
all
the
investigation
we
did
into
this
performance.
My
performances,
these
errors
on
the
gps.
A
Is
pages
I'll
see
if
I
can
find
out
it's
nothing
on
that
epic,
so
I'll
go
off
and
get
us
an
update
for
that
one
but
progressing
along.
Hopefully,.
A
A
So
one
thing
I
would
like
to
cover
make
the
agenda
helm3,
so
it
would
be
great
to
be
able
to
upgrade
before
we
do.
The
next
service.
A
B
So
the
last
time
we
tried
to
perform
an
upgrade,
we
ran
into
a
blocker
where
we
ran
into
immutable
fields
where
kubernetes
would
not
let
us
perform
the
upgrade,
because
there
are
certain
things
that
you
cannot
make
changes
to
in
deployments
and
services.
For
that
matter,
it
doesn't
look
like
we
attempted
to
address
this.
B
It
looks
like
we
just
documented
the
situation
and
we
inform
our
users
that
it's
easier
for
you
to
just
delete
the
object
and
perform
the
upgrade
and
recreate
the
object
that
in
itself
will
create
a
disruption
in
traffic.
So
what
are
the
options
we
could
do
if
this
continues
to
be
an
issue?
I
don't
know
if
there's
been
changes
to
the
way,
how
operates
in
newer
versions
to
make
this
any
easier?
B
But
if
this
does
continue
to
be
in
a
situation,
testing
will
tell
us-
and
we
could
you
know,
as
gray
mentioned,
we
could,
you
know
completely
write
up
a
cluster
and
replace
it.
B
Either
of
those
options
will
require
us
to
do
some
tooling
upgrades
that
way.
We
attempt
to
prevent
ourselves
from
blocking
auto,
deploys
and
patches
and
such
so
we'll
need
to
make
sure
our
tooling
is
able
to
handle
both
helm,
3
and
helm
2
at
the
same
time,
and
we'll
also
need
to
thoroughly
test
the
upgrade
path
to
determine
the
best
method
to
go
about
upgrading
things.
B
A
So
how
do
we
want
to
go
about
like?
What
do
we
need
to
do
to
get
to
a
plan
where
we
know
kind
of
the
steps
we'd
want
to
take
and
like
how
we
want
to
handle
those
things,
and
then
we
can
work
out
scheduling.
B
B
It's
you
know.
It's
been
since
april,
ish
that
I
did
this.
So
it's
been
a
long
time.
I
don't
remember
everything.
So
if
we
thoroughly
vet
the
testing
we'll
have
a
better
idea
of
what
we
need
to
accomplish
as
far
as
what
changes
need
to
be
made
to
our
tooling
and
how
we
could,
whether
or
not
we
want
to
destroy
clusters
or
just
take
traffic
off
the
clusters.
B
A
Quickly:
cool,
okay,
let's:
let's,
let's
find
a
time
to
do
that,
so
that
we
can
actually
see
where
we
are
with
this
one.
Okay,.
A
D
I
think
we
should
probably
catch
up
now
that
scarbeck
is
here
for
like
what
are
the
next
steps
for
get
it
get
https
and
the
cluster
problems
we're
having.
D
D
D
D
So
this
is
so.
This
is
funny
because
this
was
my
recollection
as
well,
but
then
I
went
to
our
docs
and
yeah.
It
totally
is
the
opposite
like
it's,
it
does
not
verify
the
database
or
or
other
service.
So
this
looks
like
I
was
like
okay,
so
health
must
be
the
most
lightweight
thing
that
we
have
right,
but
but
I
guess
it
doesn't
perform
under
a
load
which
makes
me
very
suspicious.
C
D
C
Oh,
I
mean.
C
I
want
no,
I
suppose
you
have
to
go
to
puma
for
the
health
check.
You
can't
just
terminate
it
at
workhorse,
because
if
puma
is
really
sick,
then
we
want
to
take
cluster
out,
so
we
always
want
to.
Although
actually
are
we
absolutely
sure
that
the
health
checks
are
proxy
to
the
back
end
in
in
workforce,
because.
C
I
don't
know
about
how,
because
I
know
that
they've
got
their
own
roots
in
in
workforce,
which.
D
C
D
I
don't
know
whether
health
goes
to
rails,
but,
given
that
we
saw
puma
crushing
when
we
switched
to
this
health
endpoint,
I
assume
it
does
okay.
Well,
I'm
I'm
a
bit
lost
because
I
have.
I
have
the
exact
same
memory
that
you
have,
which
is
that,
like
we
got
off
of
this
health
check
because
it
was
too
heavy
and
it.
D
D
Well,
I
think
we
have
to
start
like
I.
I
did
some
tests
under
with
like
siege
tests
on
staging
against
these,
and
it
does
seem
like
liveness,
is
the
lightest
of
the
three.
C
So
what
about
like
one
of
the
things
that
you
could
do
is
you
could
use
those
chaos
endpoints
and
you
could
like
send
a
request
you,
you
could
put
a
little
script
together
that
sends
a
request
to
the
chaos
like
sleep
endpoint,
so
like
sleep
for
30
seconds
or,
however
long
then
send
a
sick
kill
to
to
work.
What's
this
against
to
workhorse,
and
then
the
next
thing
that
comes
along
should
be
a
502
right.
C
Be
and
it
and
it
shouldn't,
and
that
should
terminate
that
the
one
after
this
again
should
terminate
at
workhorse
and
not
go
all
the
way
to
the
back
end
right,
because
because
after
the
seconds
then
it
then
it
kind
of
draws
the
drawbridge
in,
and
so
it's
like.
No
nothing
else
is
coming
through.
But
if
we
could
kind.
D
B
D
But
the
requests
that
go
through
workhorse
we'll
be
we'll
return,
a
503
for
the
readiness,
but
everything
else
will
be
successful.
D
The
problem,
the
problem
is
that
we're
using
the
readiness
as
the
health
check
endpoint
for
h.a
proxy,
which
means
that
the
whole
cluster
is
being
dropped
out
of
the
the
back
end
as
soon
as
as
soon
as
like.
We
terminate
a
pod
because,
like
some
of
these
requests,
are
actually
going
to
terminated,
pods
and
we're
seeing
like
503s.
D
D
To
re,
I
don't
want
to
repeat
you
know,
I
don't
want
to
create
another
incident
like
we
need
to
somehow
put
this
thing
under
a
load
and
test
it.
C
So
what
you're
kind
of
saying
or
one
maybe
the
most
resilient
way
is
if
the
health
check
for
aj
proxy
to
the
cluster
is
totally.
D
Yeah,
but
the
thing
is,
is
that,
right
now
we
have
our
training
wheels,
which
are
these,
which
is
the
git
fleet,
which
is
set
up
as
backup
servers
yeah.
I
I
need
something
that
tells
me
that
the
cluster
is
unhealthy,
so
that
we
before
like
at
least
until
we
take
those
good
servers
out
of
the
aha
proxy
back
end,
because
I
want
to
be
able
to
use
them
as
a
fallback.
So
I
need
something
that
tells.
C
D
C
C
D
Yeah
but
okay,
so
we
know
that
when
you
send
the
signal
to
terminate
a
pod,
there
will
be
a
small
window
of
time
when
requests
will
still
go
to
that
pod.
I
mean
it's
just
eventually
consistent
right
yeah,
so
the
readiness
check
protects
us
against
that,
because
the
readiness
check
returns
a
503
but
we're
still
able
to
process
requests,
and
we
have
this
configurable
blackout
window,
which
is
set
to
two
minutes.
D
So
I
think
it's
okay,
that
we're
getting
some
requests
are
leaking
and
going
into
these
terminating
pods,
because
they're
mostly
successful
the
problem
is,
is
that
the
health
check
is
returned
to
503
during
this
time.
So
we
need
to
come
up
with
another
check.
Yeah,
that's
always
going
to
return.
C
So,
like
I'm,
just
like
here's
a
failure
scenario
which
so
say
you
have
one
pod
which
is
broken
for
some
reason,
and
it
is,
you
know,
pumas
in
a
broken
state
and
the
request
is
getting
workhorse
is
accepting
but
puma's
not
accepting
it's
dead.
The
sockets
died
and
you
know
after
30
seconds,
it's
timing
out
with
a
503
or,
however,
you
know
after
a
higher
amount
of
time,
and
a
series
of
requests
come
in
from
h.a
proxy
through
nginx
and
they
hit
this
and
they
all
come
back
with
a
503.
C
At
that
point,
h
a
because
of
one
pod
out
of
out
of
n
h
a
proxy
then
removes
that
entire
cluster
from
the
you
know,
one
of
the
three
clusters.
D
No,
I
don't
think
you're
misunderstanding
I
think,
like
our
assumption
was
that
this
window
was
very
short
like
by
the
time
like
and.
D
Yeah,
so
if
you
have
a
single
pod,
I
guess
our
assumption
was
that
h.a
proxy
has
to
fail
like
three
checks.
So
given
that
you're
gonna
hit
random
pods
for
those
three
checks
that
wouldn't
but
you're
right,
I
think
maybe
that's
not
a
chance
we
should
be
taking,
but.
C
D
Yeah,
I
I
see
your
point,
I
think
you're
right.
So,
let's,
let's
leave,
let's
see
what
our
options
are
there
and
and
if
we
don't
have
an
option
then
maybe
we
should
just
remove
the
health
check
all.
B
C
D
C
Yeah,
but
also
you
know
that
situation
could
occur
during
a
role.
You
know
like
a
deployment,
and
there
might
be,
you
know,
might
be
more
than
one
machine
shutting
down
and
then
you
might
accidentally
just
because
there's
a
high
prevalence
of
503s
coming
back.
You
might,
you
know
you
might
get
a
situation
where
you
get
three
in
a
row.
D
Yeah,
but
in
the
sunny
day
scenario
like
I
was
hoping
that
we'll
never
see
any
errors,
because
we
have
this
blackout
window
period
right,
which
should
be
long
enough,
so
that
kubernetes
doesn't
route
any
traffic
after
two
minutes
to
a
terminating
pod.
But
apparently
it
does.
And-
and
this
was
the
problem
we
found
with
the
nginx
controller
yesterday-
where
it
appears
that
nginx
keeps
these
connections
open
to
pods
that
are
in
the
terminating
state
and
the
only
way
around
it
is
to
reload
engine
x
to.
D
It
it
has
two
options:
we
use
the
option
where
it
just
sends
it
to
the
service
endpoint,
which
is
one
ip
address
in
one
port
and
the
service.
Endpoint
does
the
routing
to
the
pods
like
that
it
has.
C
D
D
To
the
service
ip,
the
problem
that
we
think
we
discovered
yesterday
was
that,
because
of
keep
alive
that,
even
though
that
a
node
was
removed
from
the
service,
these
existing
connections,
you
know,
are
still
active
and
we
would
still
get
requests
going
to
these
terminating
pods.
So
we
adjusted
the
keep
alive
and
from
what
I've
seen
so
far.
It
doesn't
help
as
much
as
we
thought
it
would
like.
We
have
to
play
with
that
more,
but.
D
I
don't
know
man
it
feels
like.
Maybe
we
need
to
get
rid
of
engine
x.
B
Jarv,
I
don't
know
what
you
have
tested
today,
but
I
think
maybe
we
should
concentrate
more
on
this
keep
live
stuff.
I
also
found
another
option
that
we
could
set
in
our
ingress
controller
to.
B
Okay:
okay,
during
a
short
conversation
with
matt
smiley
yesterday
in
our
dna
meeting,
I
was
talking
about
how
we
just
have
the
amount
of
keep
alive
connections
or
requests
that
go
through.
D
Yeah,
I
think,
like
it's
really
easy
to
see
this
on
staging
because
we
have
enough
traffic.
I
think,
because
of
the
traffic
generator,
that
we
can
kind
of
see
like
what.
What
we
see
right
now
are
not
a
lot
of
502s,
but
we
do
see
a
lot
of
200s
after
the
termination
signal
is
sent
and
the
readiness
check
is
failing-
and
I
think
yeah.
C
So
my
guess
about
it
being
an
nginx
plus
feature
seems
to
be
correct.
Are
you
serious?
I
I
I
might
be.
I
mean
it's.
The
first
thing
I
found,
but
it's
definitely
a
plus
feature
what
I
found
the
active
health
checks.
They
call
it
we'll
see.
Maybe
it's
a
different
thing.
B
So,
let's
keep
working
on
the
keep
alive
stuff,
let's
tune
that
more.
Maybe
I
could
do
some
math
to
figure
out
better
appropriate
values
for
keep
live
settings
and
maybe
also
try
out
that
other
configuration
where
it'll
retry
a
502.
D
B
D
What
do
you
think
about
being
as
like
severe
as
possible
with
the
keeper
like
on
staging,
let's
just
like
remove
keep
alive
all
together
and
see
if
that
actually
makes
this
window
either
like
go
away
entirely
or
is
really
short?
And
then
we
can
work
up
instead
of
working
down.
B
I
like
that
idea,
yeah,
let's
try.
D
B
D
I
don't
know,
but
it's
so
low
traffic
right
now,
it's
not
our
biggest
problem
and
maybe
like
when
we
have
websocket
traffic
going
into
this
other
like
have
it
segmented
off
we
can.
We
can
worry
about
it.
Yeah.
C
D
C
No
social
distance
holiday.