►
From YouTube: 2020-07-10 GitLab.com k8s migration EMEA
Description
Delivery team weekly demo of the GitLab.com Kubernetes migration
A
B
A
Job
she
won't
give
us
a
demo.
C
C
So
the
problem
you
have
right
now
in
staging
the
production
is
that
you've
lost
the
certificate
authority
key
and
the
certificate
authority
key
was
generated
years
ago
by
a
former
employee
so
and
that
former
employee
does
not
know
where
this.
So,
what
we
did
is
that
we
turned
off
TLS
verification
and
staging
their
production,
and
that's
the
current
state
of
things.
C
So
what
I
think
we
probably
want
to
do
is
regenerate
these
Keys
re-enable
TLS,
and
this
will
allow
us
to
get
the
good
console
running
in
the
kubernetes
cluster
I
think
we
could
maybe
even
do
it
without
doing
those
steps
first,
but
it's
probably
something
we
should
fix
anyway.
I
put
some
instructions
here,
because
I
just
ran
through
this
for
pre
pod
on
how
to
generate
the
keys
generate
the
server
keys.
C
The
client
keys,
I,
put
notes
here
about,
like
you,
can
choose
for
how
long
expiration
is
by
default
one
year,
which
is
not
very
long.
So
maybe
we
should
increase
that.
We
probably
also
want
to
come
up
with
the
domain
and
also
council
has
this
concept
of
data
center,
which
currently
is
set
to
the
US
east.
One
for
pre
prod
and
staging,
and
that's
something
really
different
for
production,
and
it
seems
like
it's
just
a
random
string.
I,
don't
know
if
we
want
to
change
it.
C
C
C
So
we're
using
the
hashey
Corp
console
helm,
chart
it's
actually
pretty
simple
like
because
we're
just
using
what
they
provide
out
of
the
box.
Once
you
have
the
correctly
configure,
you
can
see
that
these
are
the
pods
for
console.
You
might
be
thinking
wow.
We
have
a
lot
of
replicas
here
and
why
is
that?
Well,
the
reason
why
is
because
it's
being
set
I
do
the
wide
option
here
you
can
see
that
it
just
so
happens.
We
have
a
lot
of
nodes
because
we
have
a
lot
of
node
pools.
E
C
We
don't
care
so
much
about
pre
prod,
because
really
console
is,
if
you
don't
rely
on
console
for
anything
recod
at
the
moment,
because
we
don't
have
the
truck
in
pre
prod.
What
we'll
see
I
can
pull
up
the
UI
for
console
and
actually
I
think
it'd
be
really
nice
to
get
the
console
you
I,
like
accessible
from
the
outside.
F
C
F
C
I
don't
know,
yeah
I
was
thinking
we
would
just
put
it
behind
IEP,
but
that's
also
that's
a
good
point.
Danger
I,
don't
think
we
want
everyone.
T
Lokhande
mean
bad
access
to
counsel.
So
we
have
to
figure
that
out.
You
could
have
like
a
group,
professor
ease
for
it
or
we
don't
know
yeah
but
anyway.
So
here's
what
you
looked
after
this
is
what
you
see.
We
can
connect
a
console,
we're
we're
actually
a
like
a
minor
version
behind
console.
C
So
we
may
want
to
consider
up
at
some
point,
but
I
don't
know
if
we
want
to
do
that
now.
These
are
the
services
that
are
running
I.
Don't
know
why
we
have
your
services
defined
honestly,
like
I
guess
we
have
maybe
a
service
that
is
looking
at
the
like
this
check
for
unicorn.
These
are
VMs
by
the
way.
This
isn't
this
isn't
for
kubernetes,
where
you
can
see.
Kubernetes
is
if
I
go
to
nodes.
Now
you
see
like
all
of
these
gke
nodes
here
that
are
reporting
from
in
the
console
server.
C
C
C
C
F
C
I'll
show
this
again
so
yeah.
So
here
are
the
here,
the
replicas
and
you
can
see
yeah
we
have
for
every
node
pool.
We
have
you
know
the
reason
why
there's
so
many
is
because
we
have
all
of
these
milk
poles,
which
consists
of
a
lot
of
nodes.
I
would
say
we're
a
bit
over
provision
because
of
that
yeah.
F
C
C
F
C
Know
like
maybe
maybe
maybe
I
I
can't
really
sit
like
I,
would
think,
like
maybe
like
internal
API
in
public
API.
Maybe
you'd
want
those
to
be
isolated
into
separate
note
pools,
but
maybe
not
I,
don't
know.
I.
Think
one
thing
I'd
like
to
discuss
is
moving
registry
into
its
own
node
pool,
so
it
isn't
on
the
default
node
pool,
because
it's
a
memory,
hog
and
I
think
it
would
be
nice
to
isolate.
C
B
F
F
E
B
You
know
the
first
time
we
actually
have
an
incident
where
we
don't
automatically
point
towards
the
kubernetes
cluster
as
the
point
of
failure,
then
we
can
start
talking
about
us
feeling
comfortable
about
running
this.
We
know
a
bit
better,
but
it
was
very
way
too
easy
for
people
to
just
point
to
the
cluster.
Well,
things
were
actually
clearly
broken
somewhere
else.
That
means
we
have.
No,
we
don't.
We
are
not
in
a
comfort
zone
right.
We
are
not
there
to
be
able
just
to
claim.
You
know
and
that's.
F
C
B
D
I
have
a
couple
of
question
about
comes
from
the
grand
vision
of
using
it
and
forget
about
calm,
because
so
we
are
as
understood,
we
are
using
this
only
in
production
orally
as
a
service
discovery
for
Patroni.
So
this
is
the
only
reason
why
we
have
Council
installed,
which
kind
of
cover
less
than
10%
of
council
features.
It's.
D
D
D
Maybe
the
RISM
and
going
in
production
deployment
console
when
it's
replicated
and
is
distributed
system
is
a
good
place
for
storing
this
kind
of
information
which
is
transient,
but
it
kind
of
defines,
what's
the
status
audience
external
to
the
am
to
get
up
the
product,
because
it's
something
that
we
need
to
operate
the
infrastructure.
So
this
basically
was
my
question.
If
we
are
already
using
this
for
something
or
it's
just.
F
We
don't
use
it
for
that,
but,
like
the
one
thing
about
that
is
obviously,
we've
also
got
HCD
and
in
a
certain
part
of
the
application,
and
could
be
kind
of
like
use
the
cuban
aires
abstraction
for
this,
and
I
think
the
examples
you
have
kind
of
show
that
that's
not
a
great
example.
Well,
maybe
it
doesn't
work
because,
obviously
deployments
don't
wouldn't
work
like
that.
F
D
We
have
council,
which
is
outside
of
cuban
eras,
so
it's
a
tool
that
we
may
use,
regardless
of
we
are
in
VMs
or
in
in
kubernetes
word,
because
one
of
the
best
practice
that
I
know
about
the
ABCD
is
that
if
you
need
it
for
your
application
purpose,
it's
better
not
use
the
one
we
provided
with
kubernetes,
but
you
should
have
your
own
cluster,
because
it's
something
is
the
attd
in
cuban
areas
is
for
kubernetes
right.
So
you
should
not
mess
with
that
one.
So
this
was
just
thinking
out
loud
because
we
have
council.
C
We're
just
keeping
my
number
we're
not
doing
a
console
agent
per
pod,
we're
doing
the
console
agent
per
node
so
like
for
things
like
storing
key
value.
I
mean
it
would
be
at
the
whole
like
environment
level.
It
wouldn't
be,
but
yeah
I
mean
we
could
run
conflation
as
a
sonically.
I
could
run
it
as
a
sidecar
or
something
but.
F
C
C
A
E
A
F
Can
I
just
go
back
one
second
to
this
year's
point,
so
one
of
the
other
really
interesting
things
right
as
if
we
did
use
console
for
like
global
states,
we
could
also
use
the
console
exporter
to
take
a
subset
of
the
keys
and
put
them
into
Prometheus,
and
that
would
be
very
useful
for
things
like
not
alerting
on
low
operation
rates.
There
is
when
we
drain
canary,
because
that
state
could
be
kept
in
canary.
You
know,
drain
I
saw
it
kept
in
console
and
then
and
then
we
could
automatically
include
that
in
our
loading.
F
F
A
F
F
A
C
Yeah,
so
that's
that's
easy.
It
doesn't
seem
like
it's
a
blocker
anymore.
It
shouldn't
be
a
blocker
now
I've,
already
unmounted
the
cache
mount
across
the
entire
production
fleet.
We
were
looking
at
90th
percentiles
of
downloads.
It
didn't
seem
to
be
affected
at
all,
but
I
can
see
I
mean,
of
course,
it's
a
very
spiky
metric,
but
it
looks
good,
no
additional
load
on
the
building
servers,
so
I
think
we
could
just
chalk
this
up
as
a
wing
and
that
you
don't
have
to
worry
about
it.
Forget
live
comm
there.
C
That
issue
isn't
closed
it
because
you
know,
like
there's
some
discussion
about
whether
what
should
we
do
for
self
managed
cloud
native
and
I
was
I
was
proposing
that
we
by
default
during
this
feature
Flygon,
so
that
nobody
uses
the
disk
cache
yacht-club,
wasn't
too
happy
about
that
and
responded
that,
like
we
probably
don't
want
to
mess
up
people
which
I
agree,
but
this
would
only
different
installations.
You
could
just
turn
it
off
for
cloud
native
so
that
no
one
uses
the
disk
cache
for
cognitive
I.
C
E
C
C
We're
using
common
fire,
the
clouds
are
completely
outside
of
everything
right,
so
yeah,
so
I,
don't
know,
I
think
like
I
would
prefer.
If
you
know,
I
would
prefer
to
have
a
solution
where
the
default
just
works
for
cloud
data,
then
currently,
in
my
opinion,
it
doesn't
work
for
cloud
native
or
it's
not
great,
because
you
have
these
files
on
disk
that
are
gonna
build
up
over
time.
This.
C
C
B
B
The
investigative
work
is
being
wrapped
up,
and
there
are
couple
of
proposals
out
there
how
to
tackle
the
the
problem
and
I
can't
say
any
details
right
now
about
what's
gonna
happen
next,
but
I
can
tell
you
that
next
week
there
is
gonna,
be
some
movement
on
who
and
how
this
is
gonna
be
driven
further,
and
that
will
give
us
a
more
clear
horizon
on
when
are
we
gonna
be
unblocked
on
the
web
and
API?
This
is
not
to
say
that
pages
is
gonna
instantly
become
cloud
native.
C
Four
four
next
steps:
I
think
after
we
sounds
like
we're
going
to
finish
their
urgency,
PU
bound
or
I.
Guess
we
need
to
decide
what
comes
next
like
do
we
try
to
take
more
cues
off
of
the
catch-all
and
move
them
or
do
we
move
on
like
as
soon
as
the
console
work
is
done?
We
can
start
all
the
other
blockers
are
finished,
so
we
could
start
working
on,
get
HTTPS
and
get
SSH
or
the
WebSockets
work
and
think
those
are
our
options.
B
E
E
C
Yeah
I
was
thinking.
Maybe
we
use
the
tag
because
I
think
like
having
a
bunch
of
names,
it's
gonna
be
a
bit
overwhelming
to
manage.
So
maybe
we
just
use
the
tag
syntax.
We
have
in
the
queue
selector
and
just
start
tagging
things
husband,
but
I
think
what
we
need
to
do
is
we
probably
need
to
create
a
VM.
We
need
to
move
them
to
VMs
first,
so
that
we
can
monitor
and
fs
right.
We
need
to
have
a
staging
area
like
the
staging
area,
where
we
move
them
from
the
catch-all
to
the
staging
area.
C
B
But
yours
that
one
Dan
perfect
yeah
Geoff
is
it
possible
then,
because
what
I'm?
What
I'm
hearing
right
now
is
that
if
we
are
going
down
that
route,
where
we
are
going
to
be
isolating
in
the
ends
and
doing
all
of
that,
basically
shifting
like
sifting
through
months
type
of
work,
sorry
skarbek
I
know
it
doesn't
sound
excited.
Is
it?
C
C
C
B
C
B
C
C
Here
also,
it
would
be
really
nice
to
like.
This
is
the
first
time
we're
gonna
have
an
ingress
for
a
kubernetes
service
other
than
registry.
It
would
be
nice
to
be
able
to
have
canary,
but
it's
like
you
can't
really
do
canary
for
a
WebSocket.
We
can't
really
do
I
mean
we
can
do
canary
for
HTTP
GET,
but.
B
C
F
C
Yeah,
we
just
don't
have
that
capability.
Now,
that's
all
I'm
saying
maybe
we
can
so.
B
C
D
C
D
My
point
is
that
when
canary
was
one
week
ahead
of
production,
this
made
sense,
but
now
we're
talking
about
hours,
so
shifting
percentage
of
production
traffic
to
cannery
is
suddenly.
We
should
do
regardless
of
the
cookie,
because
it
helps
us
understand
the
status
of
something
that
we
want
to
deploy
in
one
hour
and.
F
I
think
we
should
do
that,
but
you
have
to
have
like
the
lesson
we've
been
learning
over.
The
last
few
weeks
is:
if
we
do
that
with
the
web,
then
the
WebSockets
have
to
be
sticky
with
the
same
version
and
when
you
talk
to
graph
QL,
it
also
has
to
go
back
to
canary
right,
and
so
we
don't
have
the
the
tooling
to
do
that
at
the
moment,
and
so
it's
basically
scattergun
and
that's
leading
to
lots
of
problems.
D
Because
we
are
using
path
and
not
seeking
to
user
sessions,
or
things
like
that,
because
we,
if
we
were
saying
something
like
10
percent
of
users,
so
sticking
on
sessions
goes
to
to
cannery,
regardless
of
the
default
they
are
asking.
This
will
allow
us
to
make
sure
that
if
you
get
cannery
front-end,
then
you
also
draft
ul
furies,
for
instance,
would
be
routed
to
cannery,
because
it's
your
connection,
it's
your
session
right,
yeah,.
F
It's
complicated
if
it's
as
long
as
as
long
as
everything's
going
to
the
same
stage.
That's
what
I
want
right
and-
and
you
know
obviously
like
stickiness
to
a
stage-
is
fine
like
if
you
put
it,
if
you
put
a
cookie
in
but
then
obviously,
if
you
drank
canary,
then
everyone
goes
to
the
main
stage.
But
anything
for
me
is
that
we
kind
of
met
like
something's
happened.
We
kind
of
half
going
to
one
place
and
half
to
another
yeah.
C
Alessio
I
think
that
the
point
of
canary
in
this
context
there
was
infrastructure-
and
in
this
case
like
we
could
we
want
to
like
deploy
the
new
infrastructure
in
the
Canaries
data.
It
would
sit
there
for
like
a
week
or
two
right,
so
I
think
I
think
that's
the
main
benefit
that
we
will
see
about
having
something
you
can
put
them
to
marry.
First
is
just
to
ensure
that
it's
working
with
production
traffic
for
a
long
period
of
time
before
we
promote
it
to
production.