►
From YouTube: 2021-04-15 GitLab.com k8s migration EMEA
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
B
C
B
I'm
going
to
assume
there's
not
a
demo
scale
back
to
talk
us
through
how's
the
api
service.
D
It's
great
it's
deployed
in
the
canary.
It
is
not
currently
taking
any
traffic.
We
have
a
configuration
item
that
is
missing.
This
was
a
object
that
was
pulsing
a
lot
of
errors
in
staging,
so
we
had
to
prevent
staging
from
taking
traffic
until
we
had
some
fixes
in
place.
Unfortunately,
the
fix
that
I
created
was
kind
of
misguided
and
it
took
into
account
the
use
of
geo,
so
this
created
a
problem
production
when
we
tried
to
roll
it
out.
D
It
created
a
situation
where
last
week
this
was
first
making
a
lot
of
requests
for
the
api
service
and
they
were
getting
stuck
by
rack
attack.
So
due
to
that,
we
rolled
the
change
backwards.
D
D
Oh
you're,
you're
saturating
our
cloud
nat
pool
for
just
the
registry
nodes,
because
we're
making
way
too
many
connections
back
to
our
api
api
fleet-
and
this
is
due
because
the
registry
is
making
connections
to
the
api
for
notifications
for
the
geo
service
that
we
don't
use
in
gitlab.com.
So
I
worked
with
jason
a
little
bit
at
the
very
end
of
yesterday
on
a
fix.
That's
the
merge
request
or
that's
the
issue.
There's
a
wrench
request
associated
with
it.
D
It's
currently
ready
for
review,
but
once
that
gets
into
place
that
would
relieve
the
workaround
that
we
are
using
to
get
this
configuration
in
place
and
then
we
could
start
taking
traffic
in
canary.
D
D
So
this
isn't
something
that
customers
would
run
into.
This
is
something
just
the
way
that
we
decided
to
implement
this
particular
feature
of
getting
metric
data
in
the
snowplow,
so
we're
almost
there.
So
in
the
meantime,
I
finished
up
the
readiness
review
yesterday
and
I
sent
it
off
to
three
groups
of
people,
a
couple
sres
staying
from
development,
and
then
a
couple
of
members
of
security
to
have
owner
have
an
overview
of
the
writing
distributor
to
see
what
holes
we
have
in
our
documentation
and
such
so
I'll
be
following
up
on
that
information.
D
As
the
reviews
come
out
from
people,
I've
already
got
some
information
from
stan,
as
well
as
the
security
team.
So
far,
so
that's
helpful.
B
And
we
can
definitely
did
you
add
a
priority
label
onto
this
review.
Have
it
great,
how
can.
D
D
B
B
D
It's
immediate
but
I'll,
be
using
when
this
gets
merged
and
our
helm
chart
gets
upgraded
and
pushed
out
through
canary.
The
only
thing
I'm
going
to
be
checking
for
is
the
the
configuration
item
existing
inside
of
the
gitlab
yama
file.
That's
how
I
know
that
we're
going
to
go
and
as
soon
as
that's
done
I'll
start
working
on
the
necessary
items
to
start
sending
traffic
which
in
this
case
is
just
modifying
the
weight
values
of
the
canary
environment
in
nha
proxy.
D
B
Super
great
stuff,
nice
and
then
in
terms
of
on
canary,
so
I
checked
to
graham
yesterday
and
he
is
still
working
through
the
service
discovery-
sometimes
fails
inside
kubernetes
blocker
at
the
top.
I
should
move
it
down.
B
So
he's
hoping
to
have
some
experiments
that
we
can
run
to
try
and
progress
this
one
further
like
before
we
become
blocked,
is
basically
the
kind
of
goal
so
as
much.
We
can
try
and
get
this
ahead
of
us
wanting
to
move
ahead
of
canary,
but
at
the
moment
we
are
still
a
bit
in
the
dark
about.
What's
going
on
with
this
one.
D
So
about
this
I
would
like
to
try
to
figure
out
if
there's
a
way
to
determine
how
many
requests
we
are
making
versus
how
many
requests
that
are
being
shut
down
for
whatever
reason.
But
I
don't
know
how
to
create
the
necessary
charts
to
do
that.
So
I
was
wondering
andrew
if
you
might
be
able
to
help
me
out
with
this,
because
we
know
how
service
discovery
works.
We
ping
console
every
60
seconds.
D
Every
pod
is
going
to
do
that
every
60
seconds.
I
did
leave
some
open
questions
and
then
they
should
relate
to
this,
but
I
would
like
to
be
able
to
figure
out
how
many
requests
we
are
making,
because
we
don't
log
the
successes
we
only
log
the
times
that
we
fail.
D
So
if
we
can
figure
out
how
often
we
are
successfully
making
those
requests,
we
could
derive
the
ratio
of
how
many
times
we
fail
to
make
the
service
discovery
request,
and
then
we
can
determine
how
often
this
is
an
actual
problem
for
us,
but
I
also
have
some
open
questions
related
to
how
service
discovery
works,
because
I
don't
know
much
about
it.
I
don't
know.
D
E
So
my
I'd
need
to
confirm
this,
but
I'm
pretty
sure
that
if
it
doesn't,
in
fact,
I
know
that
if
it
can't
discover
it
goes
to
the
primary,
because
there's
that
other
bug
about
sidekick
starting
off
with
with
traffic
only
going
to
the
primary
and
then
and
then
it
fails
well
sort
of
fails
over
to
using
the
replicas.
So
that's
definitely
how
it
works,
but
the
the
way
that
that
service
discovery
works
is
not
by
pinging.
It's
by
it's
yeah.
E
D
A
E
Yeah
I
mean
like
I
haven't
done
a
lot
of
this,
but
my
initial
reaction
is
that
that
would
be
better
because
just
because
that
node,
that
agent
can
get
to
that
secondary
right
doesn't
mean
that
you
can
so
you're
going
to
somewhere
else.
On
the
on
the
on
the
cluster
and
saying
you
know,
what
can
you
see
is
available
right
now
and
then
you're
using
that,
where
you
should
rather
be
doing
that
locally?
I
would
imagine,
and
it's
less
surprising
as
well.
A
D
E
D
E
A
D
D
Well,
that
was
the
problem.
We've
been
trying
to
figure
out
whether
it's
due
to
a
pod
being
rotated,
or
maybe
a
node
being
taken
out
of
rotation
at
just
the
right
moment
when
a
request
is
coming
in,
but
so
far
we've
been
unable
to
determine
that
one
of
the
things
that
graham
did
find
is
that
the
calico
network
service
might
be
misbehaving
on
some
of
our
nodes
and
that
would
impact
the
ability
to
send
traffic
to
the
appropriate
pod.
E
D
B
Okay,
so
number
four
andrew
lucky
you
and
thank
you
for
helping
us
out
with
the
observability
stuff.
I
thought
it
might
be
useful
to
start
off
and
just
maybe
get
some
thoughts
around
like
where
we
are
with
this
and
what
we'd
like
to
like,
ideally
be
seeing
so
that,
like
we
had
some
sort
of
initial
work
on
the
epic,
those
pieces
were
very
much
tied.
I
think
to
where
we
were
in
the
migration
rather
than
being
kind
of
the
overall
kubernetes
observability.
B
So
completely
welcome
to
scrap
this
and
like
put
in
like
a
completely
fresh
load
of
stuff.
But
whilst
we
have
everyone
on
the
call
like
what
what
do
we
want
to
do
with
observability.
E
So
I
haven't
had
a
chance
to
look
at
the
issue
yet
but
or
the
epic,
but
in
my
mind
the
things
that
I
was
thinking
about
are
really
important
is
is
having
like
first
class
monitoring
for
node
pools
as
a
as
a
thing
that
are
kind
of
their
own
thing,
and
so
they're
going
to
be
a
little
bit
different,
because
everything
else
that
we
have
in
our
world
is
kind
of
in
the
service
hierarchy.
E
So
you
know
we
have
services
and
stages
and
shards
and
and
everything
kind
of
rolls
up
to
a
service,
and
that
was
true
in
the
vm
world
as
well.
You
know
we
had.
Each
service
was
on
a
different
type
of
node,
and
now
the
node
pools
are
kind
of
mixed
a
little
bit.
So
we
get
you
know.
Some
services
have
two
node
pools,
some
node
pools
are
shared
and
we
have
the
bug
that
we
have
some
random
jobs
that
are
just
kind
of
going
anywhere
in
the
kubernetes
cluster,
and
so
there's
like
it.
E
It's
a
many
to
many
between
services
and
node
pools,
and
so
I
think
we
should
just
monitor
those
as
their
own
thing.
Like
with
their
own
set
of
dashboards,
like
you
know,
how
is
the
health
of
of
my
node
pools
so
there's
a
kind
of
like
a
high
level
overview,
and
then
you
can
go
down
into
like
into
a
single
node
pool
and
see
the
health
of
that.
So
that's
the
node
pools
and
then
backing
on
the
work
that
we've
already
done
for
monitoring
like.
E
I
think
we
should
just
make
sure
that
it's
working
properly
everywhere.
Like
the
other
day,
we
got
plant
email
fixed.
We
should
make
sure
that
that's
there,
we
need
to
get
the
autoscalers
into
our
monitoring
so
that
we
can
see
autoscaler
saturation
and
then
sorry
back
on
node
pools.
We
also
have
node
pool
saturation
and
then
the
oh
all
of
the
graphs
that
we've
got
on
those
kubernetes
detail
dashboards
at
the
moment.
E
There's
a
lot
of
places
where
we're
just
seeing
like
error,
error,
error,
crashly
back
off
like
a
lot
of
pods,
fail
for
strange
reasons
and
we're
not
reporting
on
that
at
all
at
the
moment.
So
we
need.
We
need
like
much
better
alerting
around
that.
E
My
I
don't
know
what
you
think
about
the
scobic,
but
I
was
thinking
that
if
we
almost
treat
it
like
an
slo
like
the
number
of
pods
that
fail
divided
by
the
total
number
of
pods
that
were
created
or
containers
or
whatever,
because
you
don't
want
to
kind
of
alert
on
a
fixed
threshold
or
on
a
you
know,
like
oh,
a
single
part
failed.
E
D
Report
that
a
single
pod
is
failed
because
there
could
be
any
number
of
reasons.
A
pod
fails
like
it
could
be
running
out
of
memory
or
we
could
have
killed
it
because
we
configured
our
project,
exporter
or
importer
to
now.
It
is
export
to
make
sure
that's
not
using
up
too
much
disk,
and
you
know
if
it
picks
up
a
job.
It
tries
to
write
30
gigs
of
data,
we're
going
to
kill
that
pod,
so
it'll
be
unreasonable
for
us
to
fire
an
alert
or
page
on
something
like
that.
E
And
we
also
get
like
at
the
node
pool
level
like
I've,
seen
quite
a
few
ooms
and
kind
of
weird
stuff
that
we're
also
not
tracking
at
the
moment,
and
we
should
probably
on
those
node
pools,
make
sure
that
we,
you
know
first,
firstly,
monitoring
load
and
that
issue
that
I
linked
there
was
that
cpu
scheduling,
one
which
is
like
outrageously
high,
and
you
still
don't
understand
why,
but
that
stuff
as
well.
But
what
about
crashlyte
back.
D
E
E
D
D
B
What
do
you
need
for
this
andrew,
like
what's
your
kind
of
plan,
around
kind
of
taking
this
epic
and
getting
it
to?
I
guess,
like
amazing
and
monitoring.
E
So
so
I
I'll
appreciate
reviewers
and
feedback,
and
I
will
I
will
start
on
like
a
grooming
that
that
epic
and
going
through
it
and
and
then
kind
of
mostly
like
reviewers,
but
then
also
yeah,
like
not
just
reviews,
because
that
just
means
you
know
but
like
feedback
on
on
on
how
we're
doing
it
and
yeah
we
can.
We
can
get
that
going,
get
the
alerts
going
properly
for
that
as
well,
and
then
also,
I
think,
it'd
probably
be
quicker.
E
D
D
E
Yeah,
okay,
cool,
so
yeah,
then
then,
mostly,
I
guess
it's
just
figuring
out
like
if
there's
more
labels
that
we
need
on
anything
and
getting
all
that
done
and
then
yeah
trying
to
figure
out
a
plan
for
like
I,
I
guess
oh
yeah,
like
a
plan
for,
are
we
just
going
to
set
up
this
kubernetes
service
like
that
we've
kind
of
spoken
about
a
few
times?
Is
that
going
to
be
when
we
attribute
like
a
node
pool
being
saturated,
because
we
try
to
do
everything
to
the
level
of
a
service?
A
E
So
when
an
alert
comes
in
you'll
see
nearly
all
of
our
alerts,
say
it's
the
service
and
then
you
know,
technically
you
could
say
well
it's
the
data
stores
teams
alert,
but
because
the
node
pool
doesn't
have
a
service,
we
either
need
to
kind
of
create
a
new
service
called
like
kubernetes,
and
then
I
guess
the
ownership
of
that
would
be
the
delivery
team.
But
you
know
or
or
we
do,
some
sort
of
inference
where
we
like
well.
A
D
A
This
per
service,
where
we
are
in
danger
of
reaching
some
limit,
we
could
still
maybe
generate
something
so
that
we
can
include
it
in
the
service
dashboard.
E
Yeah,
so
so
for
the
like
for
the
hpas,
for
example,
those
will
definitely
be
attributed
to
a
service.
So
if
an
hpa
is
maxed
out
that
won't
be
pinging,
the
kubernetes
servers
that
will
be
pinging
like
git
and
saying
you
know,
we've
got
no
more
get
part
or
sidekick
or
whatever,
and
so
on
hpa
on
a
node
pool,
it
will
probably
say
q.
So
if
we,
you
know,
if
we
kind
of
maxed
out
the
node
pool,
then
the
alert
will
go
to
kubernetes.
E
I
think,
from
my
point
of
view,
I
would
imagine
over
time
that
we're
going
to
get
less
and
less
note
pulls
you
know.
Instead
of
having
you
know,
the
node
pools
will
just
be
kind
of
different
types
of
machines,
and
so,
as
we
go
towards
that,
you
know
having
these
very
rigid,
node
pool
per
service
is
going
to
become
more
difficult.
D
We're
probably
going
to
try
to
figure
out
how
to
do
a
cost
analysis
and
resource
analysis
to
figure
out
how
we
could
lower
that
bill
at
some
point
in
time
in
the
future,
so
that
that
makes
sense
to
me.
E
There's
a
related
kind
of
thing
that
I've
noticed
and
like
the
reason
we
had
to
go
away
from
the
six-hour
burn
rate
on
on
thanos
was
because
not
because
of
the
number
of
pods,
but
because
the
number
of
pods
that
started
and
stopped
through
the
day-
and
we
don't
know-
I
don't
have
any
data
on
this
yet
but
like
between,
like
9am
in
europe
and
say,
5
p.m
in
europe.
Right,
I
don't
imagine
that
you're
gonna
need
to
like
recycle
your
entire
fleet
down,
because
the
traffic's
just
kind
of
getting
higher
and
higher.
E
D
At
the
moment,
we
definitely
turn
very
high
and
I
think,
there's
two
ways
we
could
go
about
adjusting.
This
is
one
modifying
the
way
that
we
scale
just
on
a
different
metric
that
way
we
kind
of
slow
the
rate
at
which
we
change
the
problem.
That
introduces
is,
if,
for
whatever
reason,
we
suffer
say
an
outage
or
maybe
there's
just
for
whatever
reason.
Lower
traffic
will
be
slower
to
scale
upward
during
high
demand.
D
So
I
think
it
would
be
great
if
we
had
the
ability
to
scale
on
a
custom
metric
that
way
we
kind
of
leverage
the
ability
to
scale
up
and
down,
based
on
some
metric
that
we
define
that
way,
there's
always
pod
availability
and
that
we're
not
resource
starved
and
vice
versa,
we're
not
over
over
utilizing
anything
that
would
reduce
our
churn
rate.
Because
then
now
we
have
a
smooth
curve
of
pods,
starting
and
stopping
throughout
the
day
versus
this
jaggedness
of
starting
and
stopping
pods
constantly
every
three
minutes
so.
A
E
E
E
D
A
D
A
D
Amy
back
to
your
original
question
about
this
particular
epic,
something
that
I
find
missing.
Maybe
it's
somewhere
else,
but
I
know
you
had
a
conversation
with
graeme
about
our
observability
into
helm
and
deployments.
D
B
B
Upgrades,
so
let
me
just
find
this
so
yeah
he's
thinking
about
this,
and
at
the
moment
we
I
think
it's
quite
likely
that
once
we've
done
the
api
service
migration
in
the
kind
of
weeks
of
tidy
up
after
that
he'll
take
a
look
and
see
if
this
is
a
big
job.
So
okay
for
e
is
the
is
the
issue
that'll
all
go
on
too
so
yeah
he
is
thinking
about
it.
B
It's
sort
of
in
the
sort
of
starting
point
was
the
helm
three
logs,
the
fact
we
don't
have
the
helm
three,
the
logs.
Now
we
have
helm
three,
but
actually,
I
think,
he's
thinking
about
this,
whether
this
is
slightly
bigger
and
whether
actually
there's
the
order
of
or
the
way
we
do
things
in
the
pipelines
actually
related
to
this
issue.
72
721
is
actually
the
right
approach,
so
I
don't
know
what
how
big
I
think
that
is.
B
I
don't
quite
know
what
the
the
changes
he
wants
to
make
in
there,
but
yeah.
Certainly
it
was
a.
We
need
to
get
that
observability
back,
that
we
lost
with
helm3.
B
Perfect
cool
so
andrew,
like
feel
free
to
like
just
give
us
a
shout
like,
either
in
slack
or
on
the
issues
or
in
in
these
demos
like
for,
if
you
need
input
or
need
help
with
any
of
this
stuff.
B
So
the
final
thing
I
was
just
curious
about
this.
One
can
be
a
quick
one,
because
I
know
we
have
it
written
down
as
well,
but
I'm
just
really
curious
about
the
pre-deployment
issue
we
had
following
the
registry
changes
just.
B
I
thought
it
might
be
just
of
interest
and
useful
for
other
people
to
hear
kind
of
like
what
maybe
not
specifically
why
this
one
was
a
problem
but
sort
of
more
generally,
like
what
sort
of
problems
did
we
see
here
and
do
we
need
to
do
anything
special
to
make
sure
we
avoid
these
sorts
of
deployment
problems.
D
But
I
didn't
update
our
checking
mechanism
to
say
it's.
Okay,
that
there's
a
change
to
the
api
deployment.
D
A
Yeah
I
mean
the
ugly
thing
here
is
that
for
our
db,
migrations
and
queen
letters
and
for
registry,
we
spin
up
a
job
instead
of
spinning
up
an
init
container
and
the
job
will
always
be
a
new
object.
So
it's
always
seen
in
the
div
which
each
deploy.
We
will
see
a
new
diff
just
for
creating
a
new
job
which
is
expected,
but
it's
not
nice,
because
you
know,
I
said
to
just
see
it
diff
with
no
change.
A
If
you
just
you
know,
run
the
usual
db
migrations,
so
if
it
would
be
implemented
as
an
init
container
application
instead
of
a
job,
I
guess
we
would
not
see
any
diff,
and
then
you
wouldn't
have
this
issue,
but
I'm
not
sure
of
the
complexities
for
other
use
cases,
then
for
self-service
customers
and
things.
D
The
problem
with
doing
it
in
a
container
is
that
it's
going
to
get
run
for
every
single
pod
so
for
running
80
pods
in
production,
for
example,
because
I
have
no
clue
what
we
do
run
today,
you're
going
to
run
that
migration
script
80
times,
which
is
not
reasonable.
So
it
makes
sense
that
it's
configured
as
a
job
and
that's
exactly
how
our
migrations
would
work
if
we
were
using
them
inside
of
our
home
chart
for
our
rails
application.
But
we
have
that
disabled.
A
B
I
think
it
does.
Yes.
Is
there
anything
we
need
to
do
in
the
future
like?
Are
there
any
kind
of?
I
guess
like
things
we
need
to
watch
for
anything
we
can
make
easier
in
the
future
so
that
we
can
avoid
these
things.
B
A
I
think
we
need
to
remember
when
we
add
new
services
to
communities
that
we
need
to
also
adjust
this
filter,
that
we
have
so
each
time
when
we
add
something
new
like
if
we
add
canary
to
api
canary
or
apa
gpro
later,
we
need
to
adjust
this
filter
to
also
include
this
stuff
to
be
filtered
see.
We
shouldn't
forget
about
this,
so
this
is
still
manual
yeah.
D
B
Do
we
already
have
like
a
checklist
or
something
that
reminds
us
of
like
when
we
add
a
new
service
to
kubernetes?
We
need
to
do
these
various
things.
B
A
Yeah
I
mean
it's
some
kind
of
tribal
knowledge.
I
think
I
mean
I.
I
know
that
I
think
scarborough
explained
this
to
me
at
the
beginning,
and
I
forgot
about
it
again.
Then
I
wondered
when
the
second
obvious
is
coming
from.
Why
is
it
complaining
and
then
java
reminded
me
again?
Oh
we
have
this
filter
and
I
right
there
was
this
one
thing
I
mean
we
don't
do
this
very
often
right
that
we
add
new
services
so
for
several
months
nothing
will
happen
and
then
you
do
this
and
then
it
pops
up
again.
A
So
it's
it's,
it's
not
hard
to
fix,
but
it's
still
something
that
you
need
to
think
about
here.
B
Yeah,
okay:
well,
maybe
we
should.
We
should
I'll
open
an
issue.
I
think
we
should
probably
start
thinking
about
like
what
would
what
are
the
things
that,
when
we
add
a
new
service
to
kubernetes,
we
want
to
make
sure
we've
always
updated
or
checked
so
that
we
don't
have
to
try
and
remember
all
of
these
individual
pieces.
B
Awesome
thanks
for
going
through.
That,
though,
is
there
anything
else
that
anyone
would
like
to
talk
about
today.
B
No
okay,
let's
go
back,
you
said
this
be
super
quick
and
it
wasn't
and
it
was
excellent.
It
was
a
great
discussion.
So
thanks
everyone
for
coming
along
today
and
good
luck.
Let's
go
back
with
the
next
step
of
canary.