►
From YouTube: 2020-07-30 GitLab.com k8s migration APAC
Description
Demo of the webservice pod on pre/staging. Discussion of Vault and 1.16 Kubernetes upgrade.
C
C
B
So
you
have
a
good
job
online
at
the
moment
I
don't
think
but
she'll
just
kick
off
and
run
through
the
blockers,
so
we
are
still
waiting.
First
time
is
nearly
ready
on
the
live,
tracers
issue.
B
That's
to
one
and
support
for
dependency.
Proxy
is
still
around
that
one.
B
This
is
one,
though,
at
the
moment,
where
we're
working
around
on
the
catch-all
shards,
so
we're
going
to
separate
out
the
cues
on
catch-all
and
do
as
much
of
that
as
we
can.
So
that's
all
in
progress
at
the
moment,
so
this
will
unblock
the
next
stage
of
that
and
then
jakob's
working
on
the
removing
nfs
well
removing
enough
the
nfs
dependency
on
pages
for
us
to
progress.
There.
B
Actually,
nuts
moving
along
nicely
any
update
on
that
from
your
side,
marin.
B
I
think
I
was
progressing
as
as
expected,
so
no
no
issues
there
and
then
we've
got
new
hijab,
so
we've
got
a
new
pretend.
Is
it
definitely
a
blocker
at
the
moment
the
state-of-the-art
logging
job
or
it's
about
to
be
right.
E
Yeah
so
for
logging,
we're
not
we're,
not
blocked
right.
Now,
I'm
a
little
bit
worried
when
we
start
taking
production
traffic
for
git
that
we're
just
going
to
be
flooded
with
crappy
logs
and
we
don't
have
good
filtering
mechanism,
so
I
would
say
possibly
a
blocker
I
mean,
and
it
sounds
like
that.
Well,
I
don't
know:
jason
is
speaking
for
the
distribution
team,
but
he
seems
willing
to
incorporate
this.
E
You
know
contribution
that
has
a
sidecar
that
wraps
the
logs
in
json
and
then
indicates
like
which
log
file
each
logline
comes
from.
That
will
give
us
the
flexibility
to
do
the
filtering.
We
need
to
do
so.
I
would
say
it's
still
a
blocker
for
now,
but
it's
not
like
it's
definitely
not
preventing
us
from
getting
started
with
the
get
https
and
websockets
stuff
in
production,
but
it
may
prevent
us
from
finishing
it.
B
Cool
okay,
so
job
jonah
give
us
a
demo.
E
Stuff
working
in
staging,
but
it's
working
now
I
guess
what
we
can
well,
let's
let
me
just
share
my
screen
and
we'll
just
go
ahead
and
see
if
logging
is
working,
I
I
don't
know
I
just
merged
the
logging
changes,
so
I'm
not
even
sure
if
it's
working,
but
you
can.
E
So,
just
a
little
bit
of
background,
we
enabled
git
https
in
the
well
actually
both
websockets
and
did
https
in
the
kubernetes
cluster
for
pre-product
staging.
So
when
you
do
a
git
clone
or
any
git
operations
on
staging
using
https,
you
should
be.
Those
workloads
are
being
serviced
by
the
web
service.
Part.
A
A
E
Yeah
that
I
mean
we,
we
have
this
new
feature
called
action,
cable
which
we
discussed,
which
hasn't
been
turned
on
yet
which
uses
web
sockets.
We
have
an
existing
feature,
which
is
the
terminal,
the
the
interactive
terminal
which
uses
websockets
and
maybe
you've
seen
that
before
you
can.
You
can
pull
that
up.
If
you,
you
can
actually
pull
that
up
on
our
production
cluster.
If
you
go
to
the
cagedworkloadscape.com
project
on
ops.
A
C
E
It
works,
it
works
like
you
can,
you
can
actually
use
it.
If
you
you
can
pull
up.
I
was
showing
amy
this
in
our
one-on-one.
You
can
pull
up
a
pod.
You
know
when
you
click
that
terminal
icon
on
ops.
You
can
pull
up
a
pod
and
you
can
have
an
interactive
terminal.
D
No,
I
know
that
part
works,
but
I'm
yes,
I
don't
know
how
it
works
in
staging
production.
Right,
like
ops,
is
very
much
simpler.
E
Sure
yeah,
I
don't
think
anyone
has
well.
Maybe
maybe
someone
does
yeah,
I
I
don't
know.
No
one
is
actually
so
it's
very
low
traffic.
Let's
put
it
that
way.
It's
like
very,
very
low.
So
I'm
looking
at
the
non-protagonistic
search
cluster
and
so
far
I
don't
see
any
kubernetes.
E
Let
me
let
me
go
ahead
and
refresh
the
index.
Mappings.
E
C
E
Change,
I
feel
like
this
interface
like
changed
a
bit
since
the
last
time
I
used
it.
I
need
to
go
to.
C
E
E
I
imagine
what
we'll
do
is
we'll
start
off
with
canary
and
then
we'll
probably
just
add
the
gke
load
balancer
as
a
single
back
end
in
h.a
proxy,
so
that
it
gets
a
small
percentage
of
the
production
traffic
and
then
we'll
kind
of
observe
it
from
observe
it
from
there
and
then
we'll
shift
over
all
traffic.
Eventually,
we
can
take
a
look
at
like.
E
Oops,
so
you
see
that
this
is
a
little
bit
different
than
it's
actually,
because
there's
multiple
containers
that
are
running,
we
have
both
the
web
service
and
the
gitlab
workforce.
E
So
if
we
select
gitlab
workhorse,
you
can
see
that
wow
we
get
a
lot
of
logs.
Well,
I
mean
not
that
much
because
it's
staging,
but
this
is
the
workhorse
logs
and
we're
seeing
https
get
requests
here,
along
with
the
readiness
probes.
E
E
You
can
see
that
you
can
see
that
we're
doing
the
we're
tailing
the
logs
in
the
pod,
that's
going
to
standard
out,
so
we're
seeing
log
lines
from
multiple
log
files.
This
is
one
of
the
things
that
I
hope
this
issue
will
resolve.
Is
that
we'll
be
able
to
determine
you
know
which
log
line
comes
from
which
log
file
we're
most
interested
introduced,
interested
in
the
production,
json
log,
but
there's
still
like
unstructured
logs
in
here
we
have
the
offload
there's
just
like
a
lot
of
stuff
to
sort
out.
E
Yeah
I
mean,
I
think
I
think
this
is
the
reason
why
we
have
this
blocker.
I
think
what
we'll
do
is
once
we
so
what's
going
to
happen,
is
that
each
of
these
log
lines
will
be
wrapped
in
json.
So
we'll
be
able
to
say,
like
this
log
line
came
from
the
auth.log
and
then
we
can
either
drop
it
or
send
it
to
a
you
know
like
right,
not
drop
it,
but
not
send
it
to
elasticsearch
or
send
it
to
a
special
index.
E
C
E
Application
yeah
yeah,
so
I
think
there
is
an
issue
I'll
take
an
action
to
define
it
to
see
whether,
like,
I
think,
yeah,
I
think
I
think
possibly
doing
something
like
we
could.
We
could
add
to
structure
logging
like
a
field
that
indicates
where
the
log
comes
from.
That
would
be
the
best
option:
pops
possibly.
C
D
Find
it
and
and
ping
me
on
that.
E
E
Sure
so
I'm
not
sure
why
I'm
not
sure
why
logs
aren't
being
but
let's,
let's
just
take
a
look
at
elasticsearch.
So
if
we
do
k.
C
C
E
E
E
Where
is
the
here?
It
is
so
it's
looking
for
this
path
for
workhorse
logs,
so
what
I
should
be
able
to
do
is
just
see
if
that
exists.
It
does
so.
This
is
the
pod
box
that
it's
looking
at,
so
you
can
see
that
the
logs
are
here
and
then
this
should
be
forwarded
over
to
elasticsearch.
I'm
not
sure
why
I
can't
see
them
yet
in
elasticsearch
I'll.
Take
a
look
at
that
this
morning,
but
I
would
say,
like
vlogging
should
be
working.
E
A
E
Yeah,
no,
I'm
sure
it's
fine.
I
mean
this
is
brand
new,
so
I'm
sure
there's
something
something
else,
but
I'll
take
a
look
cool.
That's
it
for
me.
Graham,
do
you
have
anything
or
let
me
look
at
the
agenda.
A
You
want
to
yeah
so
give
a
five
minute.
Vault
update
the
short
answer.
Is
we
have
a
production,
a
quote?
Unquote
a
production
and
a
non-production
vault
instance
both
up
and
running
they're
vpc,
peered
they're
in
their
own
special
gitlab
approach.
Sorry,
google
projects
in
their
own
isolated
zones
they're
both
peered
with
their
respective
networks,
so
non-productions
paired
with
things
like
pre
and
staging
production,
is
only
paired
with
production.
A
I've
done
all
the
ci
jobs,
all
the
the
boiler
plate
and
all
the
bolting
together
so
that
merge
requests
to
start
basically
using
vault
could
be
added,
so
my
kind
of
in
terms
of
the
project
management
side.
The
things
I
have
left
to
do
before
my
definition
of
done
for
this
stage
of
vault
is:
I
need
to
hook
it
up
to
prometheus
and
I'm
about
90
of
the
way
there.
A
I
just
need
to
have
a
quick
chat
with
some
some
people
from
the
monitoring
team
just
to
wrap
my
head
around
how
some
of
the
monitoring
stuff
works,
so
I'll
get
prometheus
running
there
I'll
run
up
some
some.
You
know
monitoring
rules,
I
guess
making
sure
vlog
goes
down.
We
know
about
it
and
then
I
will
do
run
books
so
how
you
know
if
you
get
an
alert
about
volts
or
whatever,
how?
A
What
do
you
do
and
then
more
or
less
I'm
considering
the
first
part
of
vault
done,
and
I
will
do
a
readiness
review,
which
is
purely
just
to
sign
off
of
the
architecture
of
vault
and
the
setup
I
have,
and
you
know
how
it
works
and
the
run
books
and
the
monitoring
get
consensus
from
the
team
that
you
know
we're
happy
with
that.
There's
nothing
outstanding,
there's
nothing
else.
A
A
It's
it's
dead,
simple
to
use,
but
in
fact,
with
all
of
our
services,
ci
jobs,
helm,
file,
chef
is
still
an
unknown
quantity
at
this
stage
and
that
kind
of
work
will
be
parceled
off
into
different
epics
like
how
do
we
use
this
with
chef
or
how
do
we
use
this
with
kubernetes,
so
yeah,
so
more
or
less
within
the
next
couple
of
like
once
I
get
the
readiness
review
done,
I'm
pretty
much
happy
for
people
to
start
using
them.
A
The
you
know,
there's
a
git
labs,
configs
bucket,
where
we
stick
some
random
configuration
data
for
like
cloudflare
exporter
and
some
other
kubernetes
services.
We.
C
A
They're
not
used
by
chef
they're,
just
all
by
themselves
and
there's
about
10
files.
We
have
in
there
that
have
like
a
few
lines
of
code,
each
and
they're
only
used
at
ci
job
time
to
create
those
secrets.
I
think
that
was
a
prime
candidate
for
just
switching
over
to
because
from
the
helm
file
example,
you
simply
just
replace
instead
of
like
that
whole
gkms,
and
about
those
really
long
awful
lines
that
we
have
at
the
moment
to
grab
buckets
and
decrypt
it.
You
replace.
B
A
D
Just
thinking
out
loud,
graham
before
disconnect,
so
if
we
go
down
the
route,
so
just
humor
me
here
so
if
we
think
about
maybe
putting
our
staging
cluster
in
there
just
to
well
test
it
in
this
hybrid,
weird
environment,
would
it
work
because
of
the
chef
dependency.
A
D
Okay,
oh
that
sucks,
because
I
was
I
was
hoping
we
could
maybe
before
we
start
putting
some
of
this
really
public
facing
traffic
right
like
the
majority
of
traffic
we
have,
if
we
could
do
something
with
gold
as
well,
because
I'm
afraid
this
is
gonna,
be
like
a
big
change
as
well.
A
Yeah,
I
think
the
key
is
is
as
soon
as
I
kind
of
as
I
said
done
that
first
part
and
that
that's
ready
to
go
is
we
tackle
the
chef
problem?
But
if
I've
looked
at
the
work
you
did
job
with
gkms
and
the
current
setup.
I
think
we
just
create
another.
We
extend
that
ruby
code
because
you,
you
wrote
it
in
a
pretty
good
fashion,
where
it's
like
extensible
right,
so
it
can't
be
just
extend
another
class
there,
which
basically
just
calls
vault
get
key
or
whatever
it
is
instead
of
gkms,
but.
C
E
C
E
It's
just
another
back
end
for
the
gitlab
secrets,
cookbook
right
now
it
has
chef
vault
and
gk
ems.
So
it's
like
an
another
shim
that
you
could
just
add.
I
I
would
say
I
I
don't
know
if
it's
worth
spending
too
much
time
on
this,
because
when
we
move
all
of
the
front
end
over
the
only
thing,
that's
left
using
chef
secrets
will
be
postgres
and
redis
and
italy
and
yeah,
maybe
yeah.
E
Maybe
it's
worth
transitioning
that
to
vault
as
well,
but
yeah
I
mean,
maybe
maybe
it
would
be
worth
just
time
boxing
to
see
how
tricky
this
will
be.
I
I
guess
like
I.
I
also
don't
want
to
give
each
right
now
we
have
the
problem
where
every
single
node
in
an
environment
has
the
keys
to
the
kingdom
for
that
environment
and
I'd
like
to
go
about
it.
The
right
way.
E
This
time,
where
we
limit
access
to
secrets,
it's
a
bit
tricky
with
omnibus,
because
omnibus
is
like
grouped
all
together
and
there
aren't
a
lot
of
secrets
that
can
be
separated,
but
like,
for
example,
I
think
giddily
is
a
good
example.
Getaly
doesn't
need
access
to
the
postgres
database,
but
it
still
has
access
to
the
postgres
secrets,
and
this
is
something
that
I'd
like
to
try
to
segment.
If
we
can.
A
B
A
Mention
is
there:
was
a
ticket
floating
around
and
I'll
see
if
I
can
find
it
again
about
making
the
omnibus
so
the
gitlab
rb
file
have
relative
volt
integration,
so
in
the
gitlab
rb
you
would
write
like
colon
slash
and
then
you,
you
know
if
you
run
a
vault
agent
on
the
machine.
The
volt
agent
talks,
the
google
meta,
our
data
server
to
get
machine,
roll
and
gets
policy
and
everything,
and
therefore
we
don't
solve
it
in
chef.
A
D
D
I
I'll
tell
you
right
now:
it
ain't
gonna
be
as
simple
as
extending
the
cookbook,
so
it's
gonna
take
way
more
time.
A
D
E
Cool,
what's
the
what's
the
latest
for
the
tls
for
console?
Is
that
blocked.
E
A
For
us
and
europe,
because
that's
the
lowest
point
of
the
week
and
I'm
extremely
nervous-
I'm
really
scared
about
causing
an
outage.
So
I
want
to
try-
and
at
least
do
it
at
a
time
that
it's
not
going
to
impact
people,
I'm
still
going
to
do
a
little
bit
more
testing
and
stuff
over
the
next
few
days.
A
C
E
Okay,
so
I
guess
the
the
point
where
we'll
probably
where
we
might
run
into
problems
is
the
point
where
we
turn
on
tls
verification
right.
I
think
up
until
that
point,
it
doesn't
matter
the
certificate's
complete
completely
wrong
and
I
think
console
like
without.
I
think
I
think,
without
tls
verification
set
to
true.
I
think
things
will
work.
So
the
tricky
part,
then,
is
like
turning
that
flag
on
on
the
master
console
yeah.
E
C
E
E
E
It's
just
that
it
was
kind
of
a
you
know
we
used
console
for
so
it
can
dynamically
change.
When
we
have
failovers,
I
guess
that's
an
option
if
you're
feeling
super
nervous-
or
at
least
like,
maybe
maybe
we
should
keep
that
in
our
pocket
as
a
way
to
quickly
recover.
If.
E
A
Maybe
I'll
I'll
look
into
that
and
I
think
definitely
as
a
rollback,
I
might
put
it
as
just
the
first
rollback
step
to
stop
the
bleeding.
D
E
A
As
long
as
I
turn
the
nodes
over
very
slowly
one
at
a
time
and
I'm
really
making
sure
every
single
node
is
connecting
again
that's
alive,
I
think
we're
probably
okay.
The
good
news
is
it
should
either
work
or
not.
It's
not
like
it'll
work
for
30
seconds
and
then
stop
should.
E
And
and
how
will
it
work
so
when
you
turn
on
tls
verification
on
the
server
you
do
it
on
the
not
the
master
console,
but
the
the
other
console
servers
first
or
like
because
as
soon
as
you
turn
on
tls
verification,
that's
going
to
restart
console
and
then
clients
will
try
to
re-initiate
a
connection
and
it
won't
work
right.
A
E
A
Which
will
essentially
initiate
a
failover,
so
I
will
be
able
to
at
least
determine
if
all
the
other
nodes
can
connect.
I
think
that's
a
good
sign
like
before
I
do
the
leader
as
long.
If
all
the
other
non-leaders
still
work,
then
I
think
that's
okay
and
then
I
can
just
target
and
like
roll
it
out
to
different
nodes.
C
A
C
A
E
Yeah,
what
I,
what
I'm
unsure
about
is
whether
turning
the
verify
on
and
doing
a
reload
does
that
cause
a
connection
to
drop
and
reconnect
that
I
don't
think
it
it
might
not.
So
if
it
doesn't,
then
those
old
connections
will
be
fine
and
then,
when
they
restart
when
the
connection
you
know
they'll
use
tls
but
yeah.
I
think
I
think
also
be
careful,
because
I
think
our
chef
cookbook
might
do
a
hard
restart
on
all
config
changes.
I'm
not
I'm
not
sure
if
it
does
a
reload
it
just.
E
It
does
a
reload
okay,
so
that's
good,
so
maybe
maybe
this
is
fine,
then
you
know
it
just
does
I
guess.
If
the
reload
doesn't
force
connections
to
re
reconnect,
then
it
should
be
a
simple
really
of
just
turning
on
tls
verification
everywhere
and
like
and
just
forcing
a
reconnect
to
yeah
or
just
waiting
yeah.
But
we
should,
I
guess
before
we
before
we
close
out
the
change
request.
We
should
probably
do
a
hard
restart
of
console
just
to
make
sure
that
yeah
yeah
cool.
C
E
If
you
need
another
another,
look
at
the
change
request,
just
ping
me.
A
Yeah,
I
appreciate
it.
Yeah
I'll
definitely
go
back
and
have
a
closer
look
and
see
what
options
we
have
for
yeah,
just
better
things
to
do
in
case
of
failure
and
then
yeah.
I
might
update
it
to
make
sure
we're
talking
covering
what
we
talked
about.
C
C
A
Is
one
all
of
our
monitoring
broke?
Basically,
all
the
dashboards
stopped
working
and
everything
because
right
down
the
bottom
of
the
release,
notes
not
even
under
the
please
look
at
this
section.
Just
one
of
the
lines
down
the
bottom
was
they
dropped
the
label
prometheus
labels
that
we
were
using
so
container
name
becomes
container
pod
name
becomes
pod,
so
everything
we
had
just
kind
of
and
the
reason
and
the
thing
is
this
has
been
running
in
staging
for
weeks
and
the
dashboard
were
broken
for
weeks
and
staging
the.
E
Night,
no
one
noticed
that
these
are.
These
are
like
the
the
mixing
dashboards
right,
like
the
the
these
are
the
mixin
dashboards,
which
are
always
in
like
a
half
state
of
brokenness,
regardless,
like
I
think
we're
only
looking
at
them.
I
like
to
just
deprecate
those
dashboards
all
together
and
just
write
our
own
like
it's.
It's
not
very.
A
C
E
A
A
I've
already
fixed
them
put
them
in
already,
but
yeah.
I'm
just
surprised.
I
didn't
notice
it
sooner.
The
only
other
so
is
there
was
a
pod
in
the
cube
system,
name
space.
So
it's
basically
a
gke
pod
that
they
provide
us
that
just
sends
some
events
to
stackdriver.
That
was
crash,
looping
and
yeah.
A
The
only
thing
I
could
find
on
it
is
people
complaining
to
google
about
it
and
google's
saying
that
it's
fixed
in
a
newer
version
of
six
1.16
of
kubernetes
that
came
out
three
days
ago,
so
that
pod
is
still
crash
looping.
Once
again
it
was
in
staging
and
crash
slipping
for
weeks.
It
doesn't
serve
any
real
purpose
for
us
because
we
use
a
different
login
structure,
but
I
am
going
to
look
at
like
upgrading
and
fixing
that
problem
some
point
in
the
future
and
it's.
A
E
Yeah
we
discussed
it
early
on
and
we
weren't
really
sure
so
we
kind
of
went
with
the
safer
approach,
but
maybe
you're
right.
I
think
auto
upgrading
the
nodes
and
when
you
upgrade
a
node
does
that
that
will
actually
like
bring
down
the
node
right
and
bring
it
back.
A
Up
again,
it
does
the
whole
spins
up
a
new
one.
E
So
so
for
sidekick,
it's
definitely
safe.
It's
like
I,
I
yeah,
I
think
it's.
I
think
it
should
be
safe
for
for
everything.
I'm
I'm
always
a
little
bit
more
worried
about
git
ssh
and
get
https,
because
you
have
these
very
long
lived
connections.
You
know,
like
you
know,
like
sometimes
like
these
clones,
take
a
lot
longer
than
your
typical
web
request.
A
E
B
Is
there
anything
we
need
to
do
around
that
stuff,
then,
graham,
like
should
we
just
check
things
still
working
or
is
there
any
like
work?
We
need
to
schedule.
B
A
Yeah,
so
I'm
pretty
confident
besides
the
niggly
bits
I
found
everything
else
should
be
fine,
like
we've
got
pretty
good
monitoring
coverage,
we're
getting.
No,
you
know
user
alerts
and
no
like
customer
issues.
So
I'm
pretty
happy
with
that.
Okay,
I've
already
fixed
the
monitoring,
so
the
monitoring
is
fixed.
I
said
there's
this
other
pod
problem,
but
that's
not
an
issue.
That's
major
in
any
way,
and
in
fact
it
didn't
look
like
google
drag
themselves,
drag
their
feed
on
it.
So
I'm
not
really
too
worried
about
it.
So
I'm.
C
E
Are
the
are
the
the
kubernetes
mixing
dashboards,
which
is
like
the
grab
bag
of
all
the
miscellaneous
stuff?
Those
are
all
broken,
then,
as
well
or
or
most
of
them
are.
I
haven't.
I
use.
I
use
a
couple
of
them
sometimes
like
like
when
I'm
looking
at
nodes.
For
example,
let's
take
a
look.