►
From YouTube: 2021-05-06 GitLab.com k8s migration EMEA
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
C
Hello,
so
let
us
start
is
there
anything
that
we
would
like
to
demo
this
week.
E
We've
played
with
a
couple
incantations
which
allowed
us
to
change
the
behavior
of
how
service
discovery
communicates
with
the
end
point
that's
necessary,
so
we
went
from
using
the
default
method
of
just
specifying
hey,
go,
reach
this
service
and
then
kubernetes
routes
you
appropriately
and
you
get
set
to
write
a
pod.
We
then
change
that
to
a
headless
service,
where
the
attempt
was
allowing
kubernetes
to
decide
which
pods
you
go
to
pending
the
state
of.
E
E
First
time
we
rolled
this
out,
we
ended
up
barfing
a
lot
of
errors
in
kubernetes
because
we
are
using
an
older
version
of
kubernetes
and
there's
just
an
incompatibility
with
this
version,
which
is
fine,
but
we
fixed
that,
but
later
yesterday,
graham
discovered
that
this
was
not
actually
doing
what
we
intended.
So
these
error
messages
we
see
here
the
same
stuff.
We've
been
seeing
that
we've
been
trying
to
eliminate,
but
in
this
case
it's
just
pointing
to
the
ip
address
of
that
node
and
going
to
that
port.
E
E
So
now
we
are
talking
to
port
8600
on
that
host
versus
going
out
to
a
kubernetes
service
to
make
this
traffic
exchange
happen,
as
you
can
see
from
the
logs
the
last
time
this
happened
was
at
3,
10
am
utc,
so
this
change
has
been
ruled
out
for
quite
a
while,
and
we
haven't
seen
this
is
staging.
We
haven't
seen
error
since
so
this
is
fantastic.
E
E
E
If
you
hear
construction
noise
behind
my
apologies,
but
you
can
see
that
there
is
a
drop-off
when
graham
rolled
this
change
out
and
right
now
would
be
a
time
an
opportune
time
where
we
should
be
seeing
a
lot
of
these
failures
come
through.
So
I
think
the
work
that
graham
has
resolved
here
is
promising.
So
in
related
to
our
discussion
items,
I
think
what
I'm
going
to
work
on
today
is
create
the
necessary
change
request
to
roll
this
out
into
production.
B
Yeah,
I
think
we
really
found
a
solution
to
a
very
long-standing
issue
with
that
I
mean
that
was
really
great
and
hard
to
research.
I
think
we
learned
a
lot
by
this
right.
I
mean
we
learned
about
this
back
in
our
old
quinitus
version,
with
specifying
this
port
for
two
different
protocols
which
isn't
working
and
then
learned
a
lot
about
this
way
that
we
can
use
help
to
patch
our
kubernetes
installations.
B
If
we
don't
have
support
for
that
in
the
charts,
which
is
not
very
nice,
but
still
a
workaround
to
get
us
forward
in
a
lot
of
cases
right
and
also
learning
a
lot
about
how
notepads
host
ports
and
all
this
is
working.
So
I
think
that's
cool.
E
Yeah,
I
think,
there's
a
documentation
situation
here,
because
I
could
have
sworn
that
this
latest
change
was
going
to
be
the
solution,
but
then
it
didn't
change
the
behavior
at
all.
So
I
was
concerned
about
that
and
then
discovering
that
we
use
the
wrong
port
number
confuses
me,
but
it
makes
sense
as
you
read
it,
but
it's
just
mind-boggling
what
we
had
to
go
through
to
get
to
this
point.
So
yesterday
I
opened
a
bunch
of
issues.
C
So
it
would
still
be
interesting,
I
think,
to
know
like
well,
we
won't
know
great
but
like
it
might
be
interesting
to
know
how
problematic
it
is
once
we
do
get
this
fixed
I'll,
open,
a
retro
issue
because
yeah,
I
think
we
learned
a
lot
but
also.
I
think
that
there
was
some
things
that
we
maybe
want
to
do
differently
next
time
or
learn
from
or
avoid
or
things
like
that.
C
But
it's
I
mean
overall,
it's
an
interesting
one,
because
I
think
that
this
solution,
like
hopefully
this
is
the
fix-
and
I
think
one
of
the
reasons
graham
was
pushing
on
this.
C
One
was
partly
because
he
hoped
it
was
a
fix,
but
also
because
it
was
just
a
simpler
architecture,
and
so
it
was
a
kind
of
a
it
was
a
kind
of
win
and
hopefully
a
win-win,
but
that's
a
good
one
for
thinking
about
in
the
future,
as
we
like,
how
do
we
get
logging
and
how
do
we
simplify
things
when
we
see
odd
stuff.
C
I'm
wondering
today
henry
how
much
time
have
you
got?
Maybe
you
can
actually
help
could
be
great
to
get
this
stuff
into
production
today,
if
we
can,
because
graham
will
be
coming
online
tonight,
well
daytime
for
him,
but
it's
his
friday,
so
we're
rapidly
getting
to
the
end
of
the
week
so
be
great.
If
we
could
get
this
stuff
through
to
production
today,
and
then
that
gives
graham
kind
of
all
of
his
data
keep
an
eye
on
it
as
well.
B
Yeah
I
just
reviewed
that
amount
approved
this
for
this
meeting,
so
I
can
just
go
and
also
work
on
the
cr
scavec
to
get
this
in
production
I
mean
it
looks
like
we
can
just
do
it
right.
I
mean
this
is
a
one
line
change
and
it
should
be
deployed
without
interference
to
anything
right,
so
that
should
be
fine
to
do.
F
F
So
maybe
it
will
take
less
than
five
hours,
because
the
one
the
one
that
failed,
because
it
took
five
hours,
were
two
indexes
on
the
same
table
and
this
one
is
only
one.
So
we
don't
know.
Maybe
I
need
three
hours
yeah.
I
can
yeah
I'm
completely
blind
on
this.
We
have
no
idea
what
is
happening
and
wow.
F
I
have
a
question
on
on
those
things.
I
didn't
follow
the
problem
closely,
but
we
were
trying
to
troubleshoot
this
in
last
week,
so
I
don't
understand
why
we
were
able
to
to
to
get
the
console
the
dns
information
when
we
were
testing
it
on
the
shell
in
the
rails,
console
and
so
what?
Where
was
the
problem?
Because
we
were
able
to
reach
the
console
and
get
the
information
back.
E
Recreating
this
issue
is
horribly
difficult.
Grain
ended
up
spinning
up
a
a
simple
test
using
a
while
loop
and
just
sat
there
doing
the
same
request
over
and
over
again
hoping
to
find
the
issue,
and
eventually
he
did
so.
It
was
a
very
sporadic
problem,
which
is
what
made
this
troubleshooting
so
horribly
difficult.
E
C
We
test
with
the
pods
coming
in
and
out,
because
that
was
the
the
thing
kind
of
graham
was
was
talking
to
me
about
yesterday
was
so.
The
problem
is
that
the
pods
start
talking
to
console
and
energy
to
scaling
console,
gets
scaled
away
into
a
different
pod,
basically
or
somewhere
else,
so
it
loses
touch
with
it.
So
it's
just
the
disconnect
by
keeping
them
together.
They
scale
together.
B
E
We
trouble
the
effort
that
we
were
troubleshooting
in
last
week's
demo
was
simply
trying
to
recreate
the
problem.
Just
in
general,
we
weren't
trying
to
test
targeting
a
specific
pod
in
a
specific
scale
event,
because
testing
that
would
involve
having
to
put
everything
inside
of
another
while
loop
and
just
hoping
that
it
would
happen
because
during
a
scale
event
you
don't
know
which
pods
are
going
to
get
removed
and
you
don't
know
which
nodes
are
going
to
get
removed.
So
we
may
be
on
the
wrong
pod.
E
F
E
Yeah
and
grain
mentioned
in
this
issue
somewhere
that
sidekick
sees
us
a
lot
more
often
than
our
services
and
either
the
api
of
the
git
services
and
that's
partially
due
to
load,
but
we
also
have
more
sidekick
pods
running
than
we
do
get
services.
So
there's
also
just
the
volume
of
requests
that
are
occurring.
That
also
impacts
this
as
well.
C
Nice
well,
hopefully,
I'm
I'm
really
hoping
this
to
fix.
It's
a
good
sign
that
staging
looks
stable,
because
this
probably
is
visible
during
deployments
based
on
it's
to
do
with
scaling.
So
that's,
hopefully
a
good
sign.
C
How
would
you
feel
scarbeck
about
putting
a
small
amount
of
traffic
into
canary.
C
E
C
E
Yeah
so
I'll
be
creating
the
necessary
change.
Requests
to
do
that
work
and
going
through
my
merch
request
just
to
make
sure
it's
up
to
stuff.
C
Do
you
want
to
push
on
with
that
and
assume
like,
so
I'm
I'm
kind
of
at
the
stage
with
this
service
discovery
that
this
fix
looks
pretty
good
on
staging
I'm
really
hoping
this
is
the
fix
for
production,
but
if
it's
not,
I
think
we're
going
to
have
to
push
force
that
logging
and
then
really
work
out
like
how
big
an
issue
this
is
and
whether
it's
actually
like
really
has
to
block
canary
or
not
because
it
sounded
when
I
spoke
to.
C
E
E
A
There
are
two
things
I
want
to
highlight:
the
first
one
would
be.
It
would
be
great
if
we
can
actually
put
some
traffic
now
on
on
on
the
api
on
canary,
because
that
will
give
us
the
before
it
would
be
great
to
see
the
before
on
on
canary,
because
once
henry
is
able
to
then
roll
out
the
fix,
we'll
know
the
after
whether
this
was
actually
the
thing
that
was
fixing
it
right,
because
if
we
are
going
in
right
now
on
canary
and
rolling
out,
the
change
graham
did.
A
A
Well,
what's
the
amount
of
traffic
okay,
can
we
confirm
that,
as
in
I
don't
want
to
cause
an
outage,
but
I
want
to
put
some
traffic
on
this,
so
we
can
see
how
much
worse
it
makes
it,
because
if
this
has
been
a
blocker,
then
I
get
to
ask
a
question
of
oh
sorry.
If
that's
this
has
been
a
problem,
then
I
get
to
ask
a
question
of.
Why
was
this
a
blocker
for
api
immigration
right
and
if
the
answer
is
well
because
the
volume
might
actually
push
it
over
the
line?
E
That
sounds
good
I'll,
create
a
change
request.
Well
I'll
finish
up
my
change
request
and
I'll
add
some
extra
details
to
capture
the
before
and
after
of
these
charts
and
such
awesome.
A
And
then
the
second
thing
is
even
if
we
enable
api
today
and
tomorrow
and
everything
is
fine,
can
we
make
sure
that
on
friday
afternoon,
we
pull
this
back,
because
I
do
not
want
a
single
thing,
possibly
derailing
pg
upgrade
on
saturday,
like
not
a
single
thing
from
our
side
at
least
right
cool
thanks.
C
A
C
Cool
and
did
the
readiness
review
got
approved
was
that
last
week,
scott.
C
C
One
other
thing
that
we
can
wrap
up
in
next
week
is
jarv
mentioned
as
well.
We
should
just
test
the
hot
patcher,
it's
still
working
for
the
api
that
he
is
going
to
do
the
fire
drill
next
wednesday
for
hot
patch
anyway,
so
we
can
just
wrap
it
up
into
that.
I
think
so.
We
can
keep
an
eye
on
that.
C
C
Nice
andrew,
is
there
anything
you
want
to
dab.
I
will
discuss.
G
G
So
I
started
taking
a
look
at
the
ingress
stuff
yesterday
and
then,
just
since
yesterday
to
now
it's
been
ci
problems,
so
I
haven't
looked
into
that
as
much
as
I'd
hoped,
but
I
posted
a
problem
yesterday
before
before
all
the
troubles
began
and
I
haven't
even
really
reviewed
the
answer
yet,
but
I
just
wanted
to
understand
whether
there's
some
difference
between
the
way
we
monitor
ingress
in
staging
and
in
production
or
if
it's
just
the
lack
of
traffic,
that's
causing
that
difference.
G
I
don't
know,
apologies
go
back
for
not
actually
like
reviewing
your
your
response,
but
it's
still
like
on
the
back
burner
from
incidents.
E
G
G
Okay,
yeah,
so
basically
what
we
were
seeing
there
is
that
I
think
last
week
I
said
that
nginx
has
got
really
horrible
metrics,
but
I
don't
know
why
but
nginx
ingress
controller
for
kubernetes
on
staging
has
got
really
nice
metrics
and
I
don't
know
whether
they
you
know
so
when
we
have
nginx
running
in
a
vm,
there's
there's
an
endpoint
called
status
and
it's
got
very,
very
terse,
like
pretty
useless
metrics
really,
and
I
assume
that
that's
all
we
would
get
from
the
nginx
ingress
and
in
production.
G
It's
kind
of
like
that
on
gprod,
but
in
staging
there's
like
we've
got
durations,
we've
got
counts,
we've
got
status,
codes
the
whole
lot,
and
if
we
can
get
that
going
in
production,
then
it'll
we'll
have
some
really
nice,
like
at
the
edge
monitoring
there,
which
which
is
exciting
but
we'll
need
some
more
labels,
so
I
will
be
sending
out
requests
for
more
labels.
Now,
that's
all.
I
really
have
to
update
this
week.
Unfortunately,
some.
G
E
C
C
E
Need
to
figure
out
how
to
add
those
additional
labels,
because
we
forked
the
engine
x
controller.
We.
G
So
what
we
did,
I
I
showed
you,
the
the
saturation
for
the
node
pools
right.
So
I
think
we
did
that
last
week
or
may
maybe
it
was
after
last
week,
but
with
that
we
actually
worked
around.
So
what
I
realized
was
the
labels
that
we
would
need
on
the
node
pool
there.
They're
very
easy
for
us
to
add,
but
we'd
need
to
rebuild
all
the
pools,
and
so
I
just
felt
like.
G
Let's
rather
not
do
that,
and
so
we
we
had
a
little
workaround,
where
we
maintain
our
own
recording
rules
for
the
labels
for
those
node
pools,
and
when
we
rebuild
those
node
pools,
we
can
we
can
put
them
on
like
as
and
when
we
rebuild
them
and
get
rid
of
it
and
it
just
unblocked
us
without
having
to
you
know,
re,
rebuild
and
taint
and
rebuild
all
the
node
pools
that
we
have
already,
because
it
just
felt
like
that
was
too
much
work.
G
So
we
could
do
something
like
that
as
well.
If,
if
we
were
blocked
like
now
that
I'm
working
on
this
stuff
on
a
daily
basis,
if
we
found
that
it
was
going
to
be
like
a
two-month
turnaround
to
get
something
into
cloud
native,
git
lab
or
whatever,
then
I
would
probably
just
do
the
same
thing
there
and
then
leave
it
as
technical.
That
for
later.
E
G
Cool
but
yeah
the
ingress
is,
is
cool.
I
don't
know
whether
it's
really
worth
it,
but
I
noticed
that
a
whole
bunch
of
the
thanos
stuff
has
got
persistent
volume
claims
and
it
would
be
quite
nice.
G
We
do
have,
we
do
have
some,
oh
I
I
know
something
that
I
can
show
as
well
sorry
this
this
this
isn't
very
exciting,
but
it
is
oh
I'm
having
a
terrible
time
sharing
on
my
computer,
but
there's
a
cube
that
we
have
a
cube
service.
Maybe
I'll
just
send
a
link.
We
have
a
cube
service
dashboard
in.
G
And
so
that
this
dashboard,
I'm
not
going
to
share
my
screen
because
it
just
doesn't
work
anymore,
but
in
this
dashboard
you'll
see
if
you
open
up
the
saturation
section
there's
this
q
persistent
volume
claim
disk
space
and
same
with
inodes
and
those
are
pretty
much
the
only
things
in
there.
As
far
as
I
can
tell
are
thanos.
So
we
do
have
the
alerting
and
it
should
probably
be
fine
like
if
that
alert
goes
off.
G
It'll
say
it's
the
cube
service,
we'll
go
anyone
see
it's
actually
thanos,
but
if
we
had
lots
of
different
persistent
volume
claims,
then
I
would
say
it
would
be
better
for
to
say
you
know
the
get
lee
service
or
the
redis
service,
rdb
or
whatever
has
got
a
problem.
But
if
it's
just
the
one,
then
you
know
it's
probably
not
worth
the
effort.
A
I
would
not
assume
that
we'll
have
one,
maybe
one
have
it.
G
A
But
yeah
but.
G
G
A
G
If
google
have
an
outage
on
on
their
gke
clusters
in
some
way
it
might
prove
to
be
quite
complic
quite
difficult
for
us
to
guess
that
that's
happening
at
first,
and
so
this
will
be
quite
nice,
especially
once
we
start
putting
slos
on
it,
because
we'll
actually
get
an
alert
that
you
know
the
cube
service.
Api
server
is
is
acting
strangely,
so
that's
quite
a
nice
thing
as
well.
So
that's
that's
on
there
as
well.
G
G
E
G
E
G
That
is
that's
only
for
the
logs,
so
if
you
scroll
down
to
well
to
the
api
service
and
then
you
click
on
kubernetes
cluster
warning
logs
and
it's
actually
quite
a
useful
thing
for
people
to
so
I
was
what
I
was
trying
to
do.
G
There
was
look
for
everything:
that's
not
like
debugging
information
and
there
I
so
that's
where
I'm
searching
for
critical
error
warning
and
that's
how
I
spotted
that
problem
with
the
node
node
port
and
it's
gone
away,
which
is
quite
nice,
because
when
I
looked
at
it,
there
was
like
lots
of
stuff
in
there
and
now
there's
a
small
amount
of
stuff.
Although
you
know,
there's
still
some
strange
things:
invalid
metrics
from
gitlab
web
service,
websockets
like
what's
that.
E
G
For
the
hpa,
oh,
it's
just
yeah,
okay,
it's
just
it's!
It's!
It's
quite
nice
having
it's
quite
a
useful
link.
I
think
that
that's
that
thing
I
think
you
know,
especially
if
we
have
an
incident
then
you
can
drive.
Then
you
can
go
through
that
and
maybe
see
something
useful
there.
So
there's
that.
A
G
A
Quick
question:
why
is
that
link
that
you
shared
cube
overview?
Why
is
it
in
cube
folder
and
not
in
kubernetes
folder.
G
The
we
can
so
we
can
rename
it
the
serve
like
everything.
We've
had
everything
already
up
to
now,
that's
sub
the
folder
is
type
equals
cube
and
that
has
been
set
for
several
months.
We
can
change
it
to
kubernetes,
but
then
it's
got
to
match
the
type
label
and
the
saturation
metrics,
and
everything
like
that,
so
it's
just
that
cube
is,
is
what's
being
used
on
the
saturation
metrics
and
in
several
places.
A
Just
just
to
to
ask
to
ask
a
question
because,
like
if
I
do
this
like
I,
I
have
whatever
right
like
all
of
this
is
great
like
we
have
per
service
vision
but
then,
which
one
do.
G
A
G
Can
we
not
just
put
all
the
so
we'll
break
all
of
the
metrics
that
we've
got
and
any
of
the
history
that
we've
got
in
there
if
we
change
it
to
kubernetes?
So
I
would
what
about
if
we
do
it
the
other
way
around,
because
then
we've
got
to
the
other
reason.
Is
that
kubernetes
being
as
long
as
it
is,
and
grafana's
got
a
40
character
limits
on
on
dashboard
names,
because,
and
so
you
kind
of
there's
not
much
left
after
that.
A
G
It
yeah
cool
scobic,
I
would
say
for
now:
let's
definitely
leave
those
because
in
an
in
a
situation
you
might
still
want
those
dashboards
like.
I
don't
find
them
particularly
helpful
because
they,
you
know
they're
so
difficult
to
kind
of
get
to
the
right
workloads,
but
there
could
be
a
situation
where
it's
important.
G
E
Yeah,
I
think,
along
with
whatever
that
rename
is
going
to
happen,
can
we
add
the
label
to
the
chart
or
to
the
dashboard,
just
the
word
kubernetes
that
way.
If
someone
types
the
word,
kubernetes
yeah
quickly
find
it,
because
once
they
start
typing,
the
entire
word
cube
will
disappear
from
there.
G
E
I
have
another
question:
I
have
a
desire
to
increase
the
observability
of
nginx,
especially
with
the
controllers
in
terms
of
logging,
historically,
we've
never
in
our
nginx
logs
into
elasticsearch,
because
they're
so
voluminous
same
reason
for
proxy.
E
Looking
at
just
the
api
service
rails
log,
we
generate
327
million
events
in
one
day.
What's
what's
the
harm
in
having
327
additional
million
events?
Additionally,
for
the
nginx
service
in
elasticsearch,
I
don't
know
how
to
evaluate
whether
that's
going
to
blow
up.
B
G
B
About
cause-
and
I
think
we
would
need
to
discuss
with
finance
and
andrew
and
so
on,.
G
I
also
think
that
we
did
just
for
a
little
bit
of
extra
history.
We
did
actually
have
those
nginx
logs
at
one
stage
in
in
elk,
and
we
found
with
the
cost
and
with
the
fact
that
nobody
ever
used
them
for
anything,
because
they've
only
got
a
limited
number
of
fields
that
you
can't
really
configure
that
well,
at
least
in
omnibus
it's
kind
of
difficult
to
configure
them.
So
there
was
never
much
in
there
that
that
we
use
so
we
stopped
using
it.
What
do
you
want
to
get
out
of
it
like?
G
G
For
me,
specifically
as
opposed
to
the
access
logs,
because
there's
two
separate
streams
in
nginx
and
the
error
logs,
the
only
thing
that's
a
bit
rubbish
about
their
logs
is
the
last
time
I
looked
at
them.
They
don't
have
like
a
url
or
anything
they
just
kind
of
like
stream
of
consciousness,
kind
of
like
didn't
buffer
this
request,
and
then
you've
got
to
figure
out
which
request
it's
talking
about.
E
Okay,
maybe
I'll
we've
got
a
few
issues
ready
to
logging
so
aside
from
trying
to
grab
specific
error,
just
the
error
stream,
what?
If
we
because
currently
we
don't
do
this?
What
if
we
send
the
data
to
bigquery?
G
G
E
G
As
if
what
about,
if
at
the
as
a
first
level,
we
just
put
the
errors,
we
because
we
should
be
doing
something
with
the
error
logs
right,
I
mean
like
it's,
it's
actually
a
bit
bad
that
we
are
just
sending
them
to
devnet
at
the
moment,
I
agree
so
what
about?
If
we
focus
on
on
the
error
logs
and
not
the
access
logs,
nginx,
underscore
error,
log
and
then
and
then
put
those
in
elk,
I
don't
think
they're
structured
in
any
way.
But,
like
you
know,
that's
fine.
E
It's
a
good
start:
okay,
I'll,
find
the
issue
and
I'll
modify
the
acceptance
criteria
to
account
for
just
that,
because
that
would
be
pretty
good.
I
think.
G
C
C
E
E
I
guess
because
that's
the
my
wife
has
been
dealing
with
that
primarily
and
it's
a
little
interesting
slack.
So
I
don't
know
the
names
of
these
cats.
Yet
I
think
robin
is
still
trying
to
figure
that
out,
but
this
is
some
one
of
the
siamese
looking
ones.
You
could
tell
it's
a
little
eye,
crusty
thing
going
on,
but
it's
cute.
It's
adorable
fits
in
your
hand.