►
From YouTube: 2020-10-08 GitLab.com k8s migration EMEA
Description
Demoing the new multi-cluster configuration
A
See
if
I
can
find
the
right
tab
to.
A
The
enabling
build
logs
is
still
progressing
nicely,
so
it's
been
enabled
on.
We've
got
the
change
issue
up
for
get
dot
com
as
well,
so
still,
hopefully
on
track
for
13.5.
A
A
One
mapping
pods
to
carbonate
for
cloud
native.
You
I've
seen,
there's
some
still
some
comments
going
on
on
this
one.
I
don't
think
I
think
it's
going
to
be
for
13.5,
but
I
don't
think
it's
specifically
prioritized
for
that
jeff.
You
had
any
involvement
on
this
one.
C
A
C
I
mean
right
now.
We
definitely
have
enough
work
to
do,
but
pretty
soon
we're
going
to
run
out
of
work,
which
is
funny
to
say,
but
as
soon
as
because
scarbeck
is
working
on
the
get
ssh
stuff.
In
parallel
to
me,
working
on
the
multi-cluster
stuff
and
both
of
these
wrap
up,
then
we
are
going
to
plow
forward
and
want
to
migrate.
You
know
probably
api
next
or
web,
even
and
without
the
service
mapping.
We
can't
do
that.
C
A
Okay,
good
to
know:
we've
got
a
new
blocker.
B
I
find
that
statement
a
bit
hard
to
believe
looking
at
the
epic
and
the
number
of
issues
in
the
epic
and
all
the
epics
actually
related
to
the
work
you're
doing.
Do
you
mean
that
you're
gonna
run
out
of
the
next
item
to
migrate.
B
Right,
which
is
the
I
think,
like
the
next
item
that
we
are
looking
at,
but
like
I'm
looking
at
the
kubernetes
epics,
there
is
a
lot
of
work
there.
C
B
A
Awesome
and
we've
got
new
blocker
prometheus
metrics,
so
this
one
is
still
being
discussed.
There's
no
plan
for
this
yet.
D
D
I
think
we
should
have
something
in
place
whether
we
decide
to
try
to
implement
a
stopgap
solution
which
would
be
adding
mtail
as
a
sidecar
container
or
if
we
attempt
to
wait
for
the
maintainers
to
get
lab
shell
to
build
us
something
we
probably
should
decide
that
soon
to
determine
whether
or
not
this
is
a
blocker
or
if
this
is
something
we
want
to
tackle
ourselves.
C
Yeah,
do
you
know
yet
skarbeck
how
much
our
current
general
service
metrics
depend
on
these
mtail
metrics?
I
haven't
looked
myself.
D
B
B
C
D
D
C
B
B
B
A
Cool
and
then
the
cross
az
stuff
we'll
chat
about,
I
guess
in
in
a
second,
unless
you
want
to
talk
about
any
bits
before
the
demo.
A
Awesome
and
then
pages
I
think,
continues
on.
B
Yeah
there
is
a
currently
a
rollout
happening
on
in
production.
I
think
we
rolled
out.
20
percent
of
projects
are
now
being
served
from
the
zip
implementation,
so
there
is
quite
a
lot
of
progress
and
the
teams
are
very
much
on
top
of
it.
A
Great
that's
good.
I
was
wondering
with
that
on
the
prometheus
metrics
on
gitlab
shell.
Do
we
know
if
that,
if
there
are
any
other
issues
like
that
that
we've
opened
in
the
past
that
we
will
get
caught
out
by
when
we
start
migrating
other
things.
D
The
only
other
one
that
we
opened
up
so
far
was
logging
and
that's
been
merged
into
our
helm.
Chart
it's
just
a
matter
of
reconfiguring
that
outside
of
that,
I
need
to
do
an
investigation
into
our
log
details
because
we're
going
to
start
mixing
structured
and
non-structured
logs
instead
of
elasticsearch.
So
I
need
to
do
an
analysis
on
that
and
then,
lastly,
would
be
the
performance
for
gitlab
show,
which
I've
created
an
issue
for
and
subsequent
or
merge,
request
for
and
sub
subsequently
got
merged.
D
D
A
Okay
sounds
good,
so
job
over
to
you.
C
C
So
I
don't
think
we're
going
to
do
any
kind
of
like
live
demo,
because
everything
is
set
up
and
we're
deploying
to
staging
now
what
I
can
show
you
is
kind
of
what
it
looks
like
first
I'll
show
you
the
auto,
deploy
pipeline,
we're
auto
deploying
to
the
zonal
clusters
now
in
staging
soon
we'll
be
doing
this
in
production,
maybe
even
today,
because
the
configuration
is
almost
ready
for
production.
So
this
is
what
it
looks
like.
We
now
have
four
deployments.
C
C
C
We
have
zonal
clusters
as
well
for
pre-prod,
I'm
thinking
after
we
get
this
all
configured
through
production,
we'll
probably
remove
the
pre-prod
zonal
clusters.
It
seems
like
a
waste
of
money
if
we
like,
we
can
always
spin
them
up
later
if
we
want
to,
but
I
think
staging
gives
us
what
we
need
so
we'll
probably
get
rid
of
these
pre-prod
ones,
but
you
can
see
like
we
have
all
the
production
you
know
dry
runs
and
then
we
have
the
apply
still
for
configuration.
We
manually
apply
production.
C
So
now
you
have
four
four
jobs
that
you
need
to
apply,
which
is
kind
of
like
not
as
nice,
but
I
don't
know
like
I
think,
starbuck.
We
should
probably
converge
towards
having
these
production
jobs
run
automatically.
If
we
can
set
like
some
gating
jobs
to
just
make
sure
everything
is
working.
Okay,
I
think
we'll
we'll
get
there
logs.
C
Well,
mary,
do
you
just
want
to
say
you're
gonna.
C
Sure
so,
logs
for
now
we
don't
really
have
a
label
in
the
log
that
tells
you
what
zone
or
what
region
you're
in.
We
do
have
the
cluster
id,
which
you
can
see
here
as
soon
as
this
loads.
So
you
have
these
built-in
for
these.
Basically,
this
is
extracted
from
the
log
that
was
parsed
and
it
shows
you
the
kubernetes
host,
which
does
give
you
like
the
actual
cluster
which,
from
there
you
can
infer
what
the
zone
is,
but
that's
like
so
you
can
see
here.
C
C
I'll
start,
I
think
that's
pretty
much
it
so
I'll
just
go
through
the
questions.
Marin.
Do
you
want
to
start.
B
So
what
happens
when
one
of
those
jobs
fails.
B
B
Right
right,
so
what
happens
when
four
of
them
deploy
and
one
doesn't
fails
for
whatever
reason,
what
kind
of
tools
do
we
will
we
have
at
our
disposal
to
take
action?
How
do
we
take
out
the
cluster
on
it.
B
C
Sure
yeah
yeah
retry
is
a
pro
tip
for
the
people
watching
the
recording
yeah.
I
think
it's
similar
to
what
we
have
now,
which
is
most
of
the
reasons
for
failures,
have
been
outside,
of,
like
intermittent
failures
that
go
away
with
a
retry
have
been
the
image
tag
not
being
available
and
for
those
we
typically
have
to
dig
in
to
understand
why,
and
it
could
be
like
an
upstream
cng
failure
that
we
didn't
catch
and
then
we
often
have
to
like
re
redeploy.
I
don't
think
to
date.
C
There
hasn't
been
any
failure
that,
like
we've,
had
to
manually,
go
in
and
fix
and
then
continue,
but
I
guess
that's
possible.
I
guess
I
guess
your
point
here
is
that
our
failure,
like
failures,
are
now
like
the
chance
of
failure
is
like
multiplied
by
you
know
four
now
and
it's
probably
gonna
happen
or
we're.
Gonna
have
some
weird
thing
in
one
zone
and
not
in
another
zone.
I
guess
the
good
part
is
is
that
the
configuration
for
the
zones
are
absolutely
identical.
Like
there's.
C
No,
in
fact
it's
it's
not
even
possible
right
now
to
like
make
per
zone
config
differences
because
we
just
don't
need
to
so
I
don't
anticipate
any
config
drift
between
the
zones,
but
it's
always
the
possibility
that
there'll
be
problems.
Maybe
there
is
a
use
case
for
continuing
when
there
is
a
zone.
Failure
right
like
I.
E
Don't
know
if
one
zone
is
offline,
you
just
lose
one
one-fourth
of
the
capacity,
but
you
should
be
able
to
go
ahead,
otherwise
makes
no
sense.
You
add
fourth
zone
because
you
want
to
divide
by
four
the
chance
of
having
an
outage,
but
if
just
one
of
them
is
enough
for
blocking
everything,
you're
just
having
four
point
of
failure,
and
so
it's
you're
you're
getting
the
opposite
out
what
you
want
to
have.
B
Right
but
those
are
extremes
right,
like
you,
have
success
in
failure
success.
All
of
them
are
working
all
of
the
that
is
fine
failure
of
one
okay,
one
is
a
down.
You
have
others,
I'm
more
concerned
about
the
in-betweens.
Those
are
the
more
dangerous
where
something
is
being
reported
is
healthy,
but
it's
actually
not
working.
Spiking
errors
left
and
right.
So
maybe
the
pipeline
passed
everything
passed,
but
we
have
a
failure
in
one
region.
B
C
I
think,
like
I
think
what
we
need
to
have
here.
Our
production
checks
need
to
be
targeted
towards
the
zone.
Somehow
we
don't
really
have
that
capability
right
now.
These
aren't
separate
stages,
they're
zones
and
the
only
way
differentiate
metrics
is
by
the
cluster
name.
So
maybe
we
could
have
a
check
per
zone.
C
That
would
be.
That
would
be
nice,
but
this
isn't
really.
I
mean
I
don't
consider
this
being
done
for
availability
as
much
as
cost
right
now,
at
least,
and
I
think,
like
there's
an
argument
to
be
made.
If
there
is
a
zonal
outage,
wouldn't
we
probably
not
want
to
do
deploys
anyway?
I
could.
I
could
see
cases
where
we
probably
where
we
might
need
to,
but
you
know
in
that
situation,
I
think
we
would
probably
wait
until
the
zonal
outage
was
cleared
before
we
continue.
B
B
B
Your
remark
about
waiting
out
something
well
again,
this
is
a
very
complex
system
and
it
doesn't
have
to
be
a
zonal
outage
to
have
issues
it
can
be.
B
Kubernetes
upgrades
gone
wrong
for
some
reason
it
could
be
application
deployment
failing
mid
deploy,
and
I
know
that
helm
will
roll
back
release
but
say
it
rolls
back
half
way
which
I've
seen
myself
right
like
it
leaves
this
undefined
state.
B
C
What
we
could
do
or
what
we
could
consider
doing
is
you
know
the
first
thing
you
would
do
is
you
would
drain
and
put
the
zonal
endpointed
maintenance
on
aj
proxy,
and
then
what
we
could
do
is
as
a
pre-step
to
the
deploy
you
could
see
if
the
zone
is
in
maintenance
and
if
it
is
similar
to
what
we
do
for
canary,
we
could
skip
it
I'll
open
up
an
issue
for
that.
C
I
think
it's
that's
a
pretty
easy
thing
for
us
to
do
and
in
that
way
it
is
very
similar
to
canary.
C
D
E
Yeah,
so
my
question
here
is
so
what
we
are
doing
when
we
started
the
kubernetes
thing.
It
was
just
one
or
two
satellite
services.
Now
we
are
migrating
more
and
more
stuff,
so
from
so.
What
happens
here
is
that
basically
we
upgrade
helm,
we
change.
No,
we
don't
change
configuration.
Actually,
we
just
change
the
image
version
and
let
him
roll
everything's
out.
So
what?
E
What
will
this
look
like
when
everything
would
be
migrated
to
kubernetes,
because
we
are
doing
a
lot
of
work
in
terms
of
understanding
if
something
is
wrong
me
to
deploy
and
trying
to
roll
back?
And
how
does
it
ties
into
this?
Because
we
are
really
thinking
about
the
old
deployer
when
we
have
git
first
and
then
the
all
the
other
things
and
we
do
roll
out
in
batches,
which
is.
B
C
Yeah,
okay,
outside
of
writing
an
operator
what's
happening
here
with
at
least
the
current
configuration
is
that
this
bottom
job
contains
everything
for
sidekick
registry
mail,
room
and
then
other
like
stuff.
That,
like
is
unrelated
to
gitlab,
and
then
for
these
three
clusters.
We
only
have
git
https,
eventually,
we'll
have
good
https
get
ssh
web
and
api,
because
those
are
the
high
bandwidth
services
we
might
also
have
registry.
We
have
to
decide
to
do
that.
Still
we
could
reorder
these
so
that
you
know
like.
C
Maybe
we
would
upgrade
like
one
of
these
zones.
First,
like
just
do
git
api
web
registry
first
do
some
checks
and
then
we
would
upgrade
the
rest.
Is
that
what
you're
thinking
about,
or
or
not
I
mean,
but
it
is
sort
of
like.
E
So
this
is
kind
of
what
I
was
thinking
about,
but
is
there's
also
other
things.
For
instance,
we
deploy
gitely
first,
so
when
we
will
have
everything
on
kubernetes
will
how
will
we
do
this?
So
will
the
chart
take
care
of
this,
and
also
in
the
current
work
with
ansible?
We
have
the
granularity
of
deploying
each
service,
ten
percent
of
it,
and
then
we
move
on
and
then
we
move
on
and
we
are
already
working
on
it
in
making
a
check
in
between
right,
so
10
check
status.
E
E
At
least
I'm
not
aware
how
to
do
this,
because
without
deployment
you
can
put
out
the
rollout
strategy,
so
you
can
say
yeah
just
drop
down
25
percent
and
do
the
magic,
but
I
mean
we
can
do
something
in
parallel,
but
it
can
it's
kind
of
tricky
right,
because
you
have
to
understand
where
you
are
in
the
process.
Maybe
you
can
just
inspect
kubernetes
and
eventually
roll
it
back
so
that
that's
my
question.
C
Well,
I
think
it's
going
to
be-
maybe
not
as
nice
as
what
we're
thinking
of
with
ansible,
with
like
10
batches
but
we'll
be
able
to
at
least
do
checks
after
these
zonal
deploys,
which
we'll
call
like
waves
or
whatever,
like
we'll,
have
wave
one
wave,
two
wave.
Three
and
you'll
have
three
checkpoints,
maybe
we'll
add
another
regional.
C
What
I
imagine
is
that
we'll
have
like
another
regional
cluster
in
a
different
region
at
some
point,
and
it
could
be
that
we
do
this
like
one
region
at
a
time.
Okay,
so.
C
But
until
then,
I
think
I
think
that's
our
best
option
and
the
way
I
think
it'll
work
with
deployer
is
that
we'll
have
a
release
tools
pipeline
that
will
trigger
an
ansible
pipeline
for
italy.
I
have
a
sneaky
suspicion
that
we're
probably
gonna
always
have
omnibus
for
the
purpose
of
validation,
like
maybe
we'll
have
like
a
very
tiny
omnibus
fleet
for
gitlab.com.
Just
because
we
can
dog
food,
maybe
not,
and
then
following
that,
then
we'll
trigger
kubernetes
deployment
from
the
release
tools
by
plane.
E
Database
migration
box
and
things
like
that,
but
so
now
I'm
going
to
change
question,
but
it's
still
based
on
this.
So
if
so,
how
can
a
customer
do
a
zero
downtime
deployment
with
the
helm
chart
if
we
are,
if
we
ourself,
would
not
do
it
with
the
m
chart
so
because
I
am
inspecting
because
we
support
zero
downtime
deployment
with
omnibus
and
I
kind
of
expected
if
someone
wants
to
run
things
on
kubernetes
will
have
the
same.
For
now.
This.
B
Is
not
our
concern,
as
in
at
this
moment
in
time,
I'm
more
concerned
about
ensuring
that
we
are
in
a
state
where
we
have
control
where
we
have
gitlab.com
figured
out
the
zero
downtrend
deployment
is
something
distribution
will
have
to
do,
and
they'll
have
to
come
to
us
for
expertise
and
help.
I
I'm
concerned
the
reason
why
I'm
saying
this
is.
I
am
concerned
with
the
enormous
level
of
complexity
of
what
we
have
here
right
now
to
introduce
yet
another
variable.
E
B
E
B
And
you
see
how
much
work
we
did
as
an
application
to
actually
get
to
this
stage
right.
If
we
waited
for
for
an
operator,
we
would
automate
something
that
we
don't
know
what
we're
automating.
B
Although
this
conversation
between
you
and
jarv
actually
makes
me
think
that
we
have
to
write
this
down,
we
have
to
write
these
these
thoughts
down
these
options
down,
and
I
think
after
jar,
after
you
finish,
jared's
car
back,
I
guess,
after
you
finish
this
zonal
and
regional
zona
cluster
rollout
and
you
wrap
up
the
git
traffic
migration
prior
to
doing
web
and
api
fleet.
B
We
have
to
sit
down
and
write
out
a
a
blueprint
of
what
we
actually
currently
have
an
architecture
overview
that
is
going
to
be
a
mess
and
and
then
options
of
how
we
can
address
some
of
these
items
that
you
raised.
B
So
I
like
this
fact
that
we
now
with
zonal
clusters,
don't
have
to
think
too
much
about
helm
level
right
like
if
it's
only
helm
level
that
would
be
mass.
We
have
options
of
just
introducing
new
clusters
right,
like
add
three
more
clusters
deploy
to
them,
get
them
out.
That
sounds
a
bit
old-fashioned,
but
it
is
what
it
is
so,
but
I
think
we
need
to
have
this
written
down.
I
I'm
struggling
myself
to
follow
complexity.
I
don't
really
know
how
all
of
you
are
following
it.
C
B
Yeah,
my
next
item
is
configuration
window
that
you
showed
something
like
kind
of
stuck
there
with
me
like.
There
is
no
play
all
button
for
some
reason.
Is
it
because
it's
a
mix
between
manual-
and
I
I.
B
B
E
C
Yeah,
I
think
we'll
just
get
rid
of
the
manual
play.
I
I
think
we'll
just
adopt
the
same
production
check
that
we
have
for
the
other
pipeline
and
we'll
just
use
it
here.
C
We
already
have.
We
already
have
qa
on
this
pipeline
as
well.
So
wait
did
we
have
qa
on
this
pipeline
skyrack.
I
thought
we
did.
C
Oh,
no,
we
have
it
yeah.
We
have
it
for
gitlab.com
configuration
changes
but
not
give
up
home
files.
I
don't
know
whether
qa
really
makes
sense.
For
this
I
mean
it's.
There.
C
Except
doing
something
that
interferes
with
you
know
other
name
spaces.
I
guess
I.
C
Yeah,
I
still
kind
of
like
doing
the
production
check,
because
what
you
don't
want
to
do
is
make
a
configuration
change
when
there's
an
ongoing
incident,
even
if
the
incident
has
nothing
to
do
with
what
you're
changing.
B
Yes,
I
was
looking
at
the
logs,
so
we
now
have
this
kubernetes
host
and
people
need
to
figure
out
which
region
things
are
coming
from.
I
know
we
discussed
all
this,
but.
B
Any
plans
on
writing
some
smaller
go
through
training
sessions
with
with
sres
with
well,
actually
everyone.
It's
not
only
a
series,
because
the
more
we
move,
the
more
people
will
have
to
depend
on
this,
and
now
that
we
have
four
different
clusters,
it'll
become
really
important
that
we
can
explain.
Where
do
you
go
to
look
things,
especially
as
a
result.
C
Yeah,
I
think
the
biggest
the
really
nicest
thing
about
this
is
that
we
now
have
this
container
image.
So
for
every
log
line
you
get
to
see
what
a
version
was
running
when
the
log
was
emitted.
That's
nice
there's
nothing
unique
here
for
zonal
clusters.
I
mean
these
fields
have
remained
the
same,
but
I
think,
as
we
move
more
services
into
kubernetes,
I
don't
know
we
probably
could
use
some
logging,
better
logging
documentation,
tutorials.
B
B
If
they
can't
figure
this
thing
out
themselves,
then
we
are
well
in
a
big
problem,
but
it's
not
only
about
them.
It's
about
everyone
else.
If
it
means
that
we
need
to
pause
the
migration
for
a
month
to
actually
get
some
hands-on
knowledge
from
everyone
else.
We
have
to
do
that
like
this
is
again
becoming
really
really
complex
and
we
haven't
not
on
wood,
haven't,
had
any
real
big
problems,
but
four
clusters
now
chances
are
much
higher.
C
C
E
C
C
C
Cool
amy.
A
C
C
C
Cool
okay,
that's
it
for
my
part.
Skybeck
did
you
have
something
you'd
like
to
demo.
D
D
B
Know
the
plan
for
not
working
and
we
can
look
together
and
see
if
we
can
debug
it.
D
D
D
D
D
D
D
So
what's
happened
is
that
we've
pulled
this
image
from
a
different
release
that
has
that
pool
configuration
that
provides
it
the
credentials
to
the
dev
registry.
For
some
reason
this
is
missing
on
the
zonal
cluster,
which
means
this
is
actually
missing.
Inside
of
our
gitlab
configuration
inside
of
a
helm
inside
of
our
home
configuration
somewhere.
I
thought
I'd
fix
this,
but
maybe
something
got
reverted
because
there
was
a
massive
rebase.
I
did
and
I
screwed
things
up
a
little
bit
but
anyways.
D
C
D
D
Yeah,
so
this
is
going
to
be
the
next
thing
after
this
meeting
but
anyways.
So
the
item
I
wanted
to
talk
about
was
a
plan
for
catch
nfs.
I
had
a
discussion
with
amy
yesterday
about
this
and
I
think,
moving
forward.
We
have,
I
think,
16
ish
queues
that
are
blocked,
some
of
them
that
we
know
we're
waiting
on
build
traces.
D
So
when
we
know
that
we're
waiting
on
pages
and
there's
three
of
them-
that
we
don't
really
know
why
they're
blocked
on
us-
and
it
looks
like
the
dev
teams
that
are
work
are
pulling.
These
issues
are
trying
to
figure
that
out.
So
I
think
my
plan
of
action
here
is
take
those
to
be
determined
items
and
push
them
into
catch
nfs
to
help
the
investigation
as
necessary.
D
Now
morning,
if
anyone
had
any
opinions
or
any
thoughts
about
proceeding
forward
with
this
task
just
to
help
engineers
with
the
work.
B
D
A
Cool
but
yeah
since
we're
not
going
to
be
using
like
since
we've
now
captured,
nfs
is
empty
right
because
we've
migrated
correct.
D
Yeah,
so
it's
definitely
got
visibility
yeah.
I
think
they
just
need
a
little
bit
of
help
identifying
what
our
end
goal
here
is
because
I
I
may
not
have
made
it
clear
that
yeah.
This
is
shared
storage,
that
we're
eyeing
that
we're
concentrating
our
efforts
on
and
it
sounds
like
there's
a
little
bit
of
confusion
as
to
whether
or
not
it's
dependent
on
shared
storage
or
whether
it's
dependent
on
the
fact
that
it
just
needs
to
write
data
to
disk
temporarily.