►
From YouTube: 2021-08-19 GitLab.com k8s migration APAC
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
C
How
are
you
doing
I'm
good?
How
are
you
I'm
like
I'm
stuck
deep,
deep,
deep
in
in
kubernetes
land
for
once
so
probably
got
lots
of
questions
for
graham
but
I'll
I'll?
Let
you
guys
have
your
demo
first.
A
Yeah,
this
is
going
to
be
an
interesting
demo,
I'm
hoping
we'll
well,
I'm
hoping
we'll
get
our
view
of
webfleet
on
canary.
We
will
but
yeah
graham's
also
having
a
think
about
kind
of
future
stuff.
So
yeah
it
should
be,
should
be
interesting.
C
Oh,
I
I
I
haven't
I
will
I
will
I
didn't
realize
I
will
look
at
them
thanks.
C
A
Else
I
know
you're
super
busy,
so
I
just
don't
want
I.
I
think
I
think
it
was
kind
of
a
scent
to
you
and
we
sort
of
sit
back.
C
C
Yeah
yeah
I'll,
okay,
I'll
I'll.
I
might
do
that
yeah
just
because
it's
yeah.
D
D
Cool,
so
everyone
can
see
dashboards,
yep
I'll
quickly,
change
this
to
okay,
so
last
six
hours.
So
let's
look
at
canary
in
production,
see
how
canary's
doing
is
that
loaded
already.
I
should
move
this
window
yep,
so
that's
loaded,
so
we
can
see.
Overall,
I
mean
looking
at
the
last
six
hours.
Canary's
been
running
for
about
18
hours.
Now
I'd
say:
oh,
maybe
a
bit
less.
We
can
see
that
there's
no
huge
abnormalities.
I
won't
blow
this
out
too
much
just
because
the
longer
it
all
the
dashboards
take
so
long
to
load.
D
I've
updated
the
ticket
to
point
all
this
out
as
well,
but
we
don't
really.
We
have
not
seen
kind
of
any
real
change
in
terms
of
say,
5xx
responses
from
aj
proxy.
Let's
have
a
quick
look
at
web
itself.
D
So
it
was
about
it's
about
13,
it's
about
there
that
the
changeover
actually
happened.
I
believe
double
checking
the
date.
No,
it
was
about.
D
C
Okay,
so
that
sort
of
sustained
period
of
of
drop
down
that
sort
of
flat
issue,
there's
two
spikes
and
then
the
drop
down
was
after
the.
D
D
Yeah,
it's
not
unreasonable
for
us
to
get
dips
that
low
and
we've
seen
dips
that
low
before
I.
I
definitely
am
interested
in
giving
a
few
more
days.
Let
me
just
kind
of
say
it
by
that,
but
I'm
not
I'm
not
seeing
this
as
a
huge
difference.
D
If
that
makes
sense
like,
I
think
there
might
be
slight
difference,
I'm
not
entirely
sure,
but
I'm
pretty
confident
it's
it's
similar,
but
we'll
need
to
run
it
for
a
few
more
days
and
probably
look
probably
look
overall
with
all
that
time
in
the
weekend,
that's
probably
not
making
things
a
bit
clearer
either
during
apac
hours.
It
definitely
looks
fine
as
well.
D
So
you
can
see
here
there's
kind
of
an
interesting
story
with
saturation
for
the
memory
component,
especially
we
had
a
false
start
where
we
set
the
memory
limit
on
the
containers
too
small
six
gig,
instead
of
eight
gig,
turns
out.
I
think
we
we're
modeling
it
off
api
originally
and
it
turns
out,
I
think
it's
slightly
different
than
to
api,
as
in
the
memory
usage,
I
think,
is
slightly
different,
so
we
were
actually
hitting
out
of
memory
errors
here
and
so
pods
were
out
of
memory
and
getting
killed.
D
D
D
So
I'm
not
entirely
sure.
Obviously
I
think
that's
might
be
an
application
issue,
but
it's
interesting
nonetheless,.
C
I'm
I'm
certain.
It's
almost
almost
certainly
the
application
right,
the
the
question.
The
the
problem
is
tracking
it
down
because
we're
probably
not
logging
it
right
because
it
it's
it's
crashing,
so
you've
actually
got
to
look
outside
and
try
find
that
trend
which
is
pretty
difficult,
but.
D
Yeah
and-
and
the
thing
is
too,
you
know
we
cycle
pods
during
like
now
with
me-
doing-
deploys
definitely
like
recycling
pods,
pretty
consistently
throughout
a
24-hour
period
as
well,
which
I
mean,
that's
just
it's
not
a
big
factor,
but
that
also
means
that
you
know
it's.
We
we
rarely
get
pods
that
live
for
a
long
amount
of
time.
D
D
C
C
D
Believe
it
or
not,
this
was
so.
I
can
actually
look
how
many
of
these
were
actual
ooms
only
say
out
of
these
three.
I
think
only
one
of
them
were
actually,
but
that
being
said,
I
still
think
that
that's
still
quite
not
great
right,
like
we
probably
don't
want
that,
even
if
it
but
yeah
let's,
but
can
we.
C
D
C
Yeah,
if
you
want
that
resolution,
you
really
ought
to
go
into
prometheus,
because
I
think
we've
got
a
minimal
thing
of
50
off
of
one
minute
but
sure
okay,.
C
D
Yeah
yeah,
no,
I
might
have
a
look
at
that.
It's
good
to
know.
I
might
look
at
that
because
it's
definitely
yeah.
It's
definitely
interesting.
I'm
not
sure.
I
personally
am
of
the
opinion.
It's
probably
not
enough
to
hold
this
from
going
to
production
unless
anyone
has
any.
C
D
Thoughts,
but
it's
certainly
interesting.
C
Graham,
the
other
thing
that
I
would
look
at
if
I
was
you
is,
you
should
go,
look
in
the
rails
logs,
go
and
find
that
pod
and
the
rough
time
period
and
then
basically
order
requests
by
there's
a
memory.
C
Stack
we
do
so
that
might
be
a
really
good
clue.
They're
members
and
it's
some
of
them
are
shocking,
like
it
doesn't
surprise
me
at
all
that,
if
we,
if
we're
getting
stricter
on
memory,
they're
mello
by
yeah,
one
of
those
ones
will
will
will
give
you
and.
C
And
then
and
then
sort
of
narrow
it
down
to
where
you're,
seeing
the
ooms
and
then
well
actually
sorry.
I
just
said
earlier:
we
don't
we're,
probably
not
going
to
log
it,
but
you
know
you
might
get
some
idea
of
of
by
looking.
There
kind
of
near
misses
so
we'll
obviously
load
the
near
misses,
but
that
might
be
something
to
look
at.
D
Is
this
we
do
see
spikes
in
disk
space
usage
as
well,
but
I'm
wondering
if
that's
related
to
the
open
issue
we
have
where
we
leak
files,
don't
like
a
large
file
uploads
or
something.
I
think
we
occasionally
leak
them.
But
once
again,
it's
not
enough
to
really
it's
not
enough
to
cause
anything
to
run
out
of
disk
space,
but.
D
If
we
look
at,
I
mean
if
we
look
at
the
the
aptx
g
pod.
Oh,
this
is
whoops.
D
Cool
so
once
again
still
over
here,
this
is
over
a
24
hour.
If
I
change
it
to
two
days,
definitely
was
a
change
around
here,
but
I'm
hoping
that
that's
just
within
the
kind
of
normal
parameters
for
a
week.
I
guess
we'll
see
over
the
next
few
days
the
other
metrics
I
looked
at
seemed
once
again,
relatively
okay,
that's
a
quick
look
at
pamela
and
workhorse.
Some
of
these
metrics
now
obviously
are
not
broken,
but
their
usefulness
is
diminished.
D
Things
like
per
fully
qualified
domain
name,
because,
obviously
now
everything
just
becomes
under
one
value
like
you
get
lose
all
of
that
granularity,
which
makes
sense,
because
you
know
we
only
have
one
back
end
and
in
fact
these
will
change
over
to
a
different
set.
I
think
we
have
specifically
for
kubernetes
services,
but
the
reason
I
haven't
yet
is
because
if
someone
has
a
problem
in
production
which
is
running
in
vms,
I
need
to
we.
They
need
to
be
able
to
use
these
now.
C
There's
a
whole
bunch
of
stuff
that
if,
in
the
in
the
dashboards
it
says
provisioning
and
then
it's
got
kubernetes
true
and
vm's
true
or
something
along
those
lines,
and
when
when,
when
the
thing's
finished,
when
you
say
vm's
false
false,
there's
a
whole
bunch
of
panels
that
will
disappear
automatically.
Okay,
not
that
per
ftdn1
in
particular.
That
won't
because
that's
one
of
the
significant
label
things
so
that
you'll
find
elsewhere,
but
definitely
there's
a
whole
bunch
of
node
level
stats
and
also,
if
you
I
don't
see
the
kubernetes
the
kubernetes
panel.
A
C
The
moment
yeah
yeah,
so
we,
I
beg
your
pardon,
we
do
have
a
chair.
Okay
and
effectively.
Those
node
metrics
will
will
disappear.
The
one
the
row
above
gotcha.
E
Green
yesterday
I
discovered
the
aja
proxy
dashboard.
Oh.
E
E
But
if
you,
if
you
specifically
select
the
right,
back-ends
and
so
on,
I
will
place
the
link
and
chat.
Oh
yes,
then
it's
much
faster
and
more
selective.
C
D
I
will
actually
do
a
bit
more
analysis
on
the
logs.
I
did
an
analysis,
a
quick
analysis
on
just
like
500
errors,
but
I
might
dig
through
like
that.
You've
you've
definitely
inspired
me
with
that
memory
field.
I
might
try
and
look
it
through
all
the
fields
we
have
and
see.
If
I
can
find
any
differences
to
some
of
those
other
things
so.
C
C
D
So
if
we
look
at
the
last
two
days,
we
seem
to
have
a
weird
multi-hour
gap
here
or
multi
one
hour
gap
year
of
metrics,
which
is
a
little
bit
interesting,
but
yeah.
So,
besides
that
the
change
came
in
around
19
like
somewhere
around
here
29
even
further
back-
oh
sorry,
it
should
be
somewhere
around
here
shouldn't
it.
D
Around
there,
so
this
is
probably
not
enough
resolution,
but
if
we
I'll
try
pulling
this
back
to
seven
days.
E
I
think
yesterday,
when
I
looked
at
the
seven
days
resolution,
it
looked
like
that
we
have
a
little
bit
higher
latency,
especially
on
the
p95
sure,
but
but
overall
and
average
it
doesn't
seem
to
be
a
big
difference.
E
Because
the
other
way,
because
we
are
not
using
web
sockets
anymore
right-
that's
this
was
the
expected
latency
slow
down,
maybe-
and
also
maybe
we
just
have
differences
in
how
how
pots
are
reacting
to
load.
D
Sure
so
yeah,
I
guess
that
is
a
good
point.
I
guess
we
we
used
workhorse
and
puma,
used
to
talk
via
a
unix
socket
that
was
and
and
nginx
they
would
all
talk
via
a
unix
socket,
I'm
pretty
sure
to
each
other
on
the
same
node,
which
was
obviously
quite
quick,
but
now
they
could
potentially
be
bouncing
all
the
way
over
the
network.
We
don't
have
nginx.
Fortunately,
but
at
least
I
know
pimmer
and
rails
will
be
talking
over
the
loopback
interface,
which
could
conceivably
be
slightly
slower.
C
But
those
those
latencies
you're
not
going
to
spot
on
a
prometheus
sure
because
you're
talking
about
like
many
many
seconds,
you
know
we're
talking
on
that
we're
talking.
You
know
five
milliseconds
max
right,
so
yeah
with
the
resolution
you've
got
you
you're
not
going
to
spot
that.
E
I
think
you
could
maybe
check
the
readiness
checks
here
and,
and
I
think
in
front
and
let's
check
http
or
check
as
you
guys,
I'm
wondering
if
that
one
maybe
would
be
the
very
fast
request
where
we
would
see.
Maybe
a
difference.
E
Check
like
in
the
top
values,
there's
the
front
and
run
and
instead
of
all,
you
could
select,
I
think
check
http
I'll
check
https,
I'm
not
sure,
but
maybe
that
would
give
us
the
very
fast
clickfunnels.
E
Yeah,
I'm
not
entirely
sure
what
people
checked
there,
but
at
least
they
look
like
they
should
be
the
fastest
responding
right.
So
if
you
see
that
latency
differences,
then
we
should
see
them
where
they
are
right,
but
I'm
not
sure
if
that
is
really
true.
D
I
guess
really,
the
question
is
as
well
is,
is
you
know
it's,
it's
definitely
possible.
We've
we've
lost
a
tiny
bit
of,
or
we've
introduced
a
little
bit
more
latency
like
what
is
the
threshold.
We
have
to
kind
of
consider
it
a
problem.
I
guess,
is
the
next
question
like
obviously
I
don't
like
to
see.
It
would
be
great
if
it
was
the
same,
but
it
might
be
slightly
higher.
A
If
we're
still
within
slos,
that's
fine
right,
we
can.
We
can
tune
this
stuff
later
right.
I
think,
as
long
as
we're
planning
a
gradual
rollout
through
production,
I
think
that's
totally
fine.
E
C
Graham,
the
the
one
thing
that
I
do
think
is
important
is
that
you,
you
also
compare
queuing
time.
So
that's
something
else
you
can
get
in
the
logs,
because
there's
no
reason
at
all
that
queuing
time
should
be
should
be
bigger.
If
we've
got
that,
then
we've
got
our
kubernetes
config
wrong.
C
So
I
I
would,
I
would
look
at
queuing
time,
but
I'd
also
look
at
it
at
execution
time
so
separate
those
two
components
out,
because
the
the
reasons
for
both
of
them
are
different
and
matt
smiley
pointed
out
the
other
day
that,
on
some
of
the
the
concurrency
settings
on
sidekick
were
very
high
and
that's
going
to
lead
to
things
slowing
down
as
well.
C
I
don't
know
if
that's
the
same
with
with
this
puma
work,
but
it
was
like
15
or
something
like
that,
which
is
which
is
too
high,
because
it's
all
effectively
contending
on
a
single
core.
C
D
Yeah
no
good
to
know
that's
definitely.
I.
A
Think
for
these
things,
graham
like
if
you
just
like,
do
a
side
by
side
like
you
know,
I
think,
let's
try
and
get
some
traffic
shifted
to
production
next
week
after
the
release,
but
yeah
as
long
as
we've
got
a
kind
of
like
we
know
this
thing
slowed
down
a
bit
or
we're
seeing
a
bit
more
of
this
or
a
bit
less
of
that,
then
you
know
we
can
kind
of
monitor
that
through
the
rollout.
D
Sure
so
what
I'm
kind
of
hearing
is,
we
should
definitely
do
a
bit
more
investigation
which
more
for
that
I'll
do
that
first
thing
tomorrow
morning,
but
we're
not
seeing
any
obvious
signals,
at
least
at
the
moment
that
we
won't
stop
at
least
trying
to
go
to
production
next
week.
Sometime.
A
D
So,
okay,
so
I
think
we're
okay,
basically
we're
at
the
point
now,
where
I'm
happy
with
the
state
of
the
infrastructure.
One
we've
had
cameron
review
it,
and
I
think
there
was
someone
else
review.
It
stan
reviewed
the
one
from
development
security.
One's
now
been
done
as
well,
so
we
should
be.
D
D
D
I
will
there's
a
couple
of
suggestions
here,
I'll
double
check,
but
they're,
probably
just
more
like
documentation
notes
than
they
are
actual
issues,
but
I
think
that's
probably
done-
or
at
least
I
can
make
sure
that's
wrapped
up
tomorrow.
I
don't.
I
don't.
I
feel,
like
we've
got
a
critical
mass
around
that
now.
So
I
think
all
of
the
readiness
reviews
should
be
done
by
the
end
of
the
week
kind
of
thing
awesome.
A
Good
stuff,
good
stuff
and
then
what
else
have
we
got
that
we
need
to
do
like?
When
are
we
thinking
we'll
be
ready
to
start
moving
over
to
production
like
assuming
all
the
investigation
things
go
smoothly.
D
Yep,
so
assuming
that
we
don't
find,
there's
a
reason
that
we
shouldn't
go
to
production
tomorrow
by
the
close
of
business,
I
will
probably
have
the
change
request
for
production
done
and
then
the
next
question
will
really
be
how
slow
or
fast
we
want
to
go
to
production.
So
I've
I've
got
about
90
of
the
change
requests
done
it's
just
a
matter
of.
I
need
to
tweak
some
numbers
in
terms
of
how
quickly
like
do
we
want
to
do
10
every
hour.
Do
we
want
to
do
10
every
day?
D
A
Oh
yeah
we'll
work
with
henry
and
to
go
back,
I
guess
to
work
out
like
those
those
times
like.
Probably
something
in
between
knows
it's,
probably
whatever
you
want
to
hit
yeah
to
be
helpful.
D
I
mean
api,
I
think
we
we
kind
of,
I
think
it
was
around
24
hours
like
by
like,
by
the
time
of
the
24-hour
period
passed
we've
gone
from
zero
to
100.
I
think
so.
Maybe
we
try
and
work
off
that.
Obviously
we
need
to
be
aware
of.
If
we
do
that,
then
it
means
deployers
and
stuff
are
going
on
during
that
time.
D
So
I
do
need
to
do
a
slight
tweak
to
the
change
request
to
make
sure
we're
not
blocking
deploys
yeah
and
I
might
even
split
I
I
had
a
chunk
of
the
change
request.
I'm
gonna
might
split
up
into
a
different
change
request
that
I
can
action
tomorrow
and
that's
just
adding
the
back
ends
into
proxy
with
no
weight.
So
it's
like
it's
it's
getting
everything
ready!
D
It's
it's
spinning
up
the
kubernetes,
it's
like
all
of
the
prep
it's
but
and
everything's
there,
but
just
no
wait
and
then
the
actual
change
request
that
we
do
say
over
24
hours
will
be
the
change
just
changing
weights,
just
like
slowly
waiting
waiting,
you
know
changing
the
weights,
changing
the
weights
and
I
think
that's
kind
of
I
think
that
would
be
the
best
solution
so
fingers
crossed.
I
can
try
and
action
that
tomorrow,
depending
on
just
splitting
that
out.
D
A
B
D
So
if
all
things
go
well,
possibly
we
could
do
it
next
monday,
if
not
and
and
I'm
going
to
probably
be
very
realistic
here.
D
Maybe
I
get
some
feedback
on
the
cr
and
I
need
to
do
it
again,
maybe
my
tuesday,
unless
someone
else
in
a
different
time
zone
wants
to
do
it,
although
once
again,
I
think
we're
probably
going
to
be
doing
it
over
a
longer
a
24-hour
cycle.
Anyway,.
D
E
I
think
the
plan
was
to
get
this
out
next
week
before
the
friday
right.
Okay,
because
friday
is
27th,
I
think
normally,
we
would
release
security
releases
on
28th,
and
so
we
wanted
to
try
to
get
this
in
before
that
to
see.
If
you
can
make
this
happen,
yeah,
if
you
can
get
the
change
request
just
to
this
changing
weights,
that
would
be
great
because
then
it
would
be
very
comfortable
to
fiddle
around
with
it
or
to
roll.
E
D
Yeah,
that's
what
that
was
the
feedback
I
got
from
john
as
well
and
yeah.
It
makes
total
sense
and
it
wasn't
until
I
sat
down
and
kind
of
rethought
it
all
out
that
I
was
like
okay
yeah.
We
can
do
this,
we
we're
lucky.
We
did
the
work
on
the
cookbook.
We
wouldn't
have
been
able
to
do
this
without
that.
D
So
we
can
just
leverage
that
now
and,
as
I
said,
I'll
try
and
just
cut
and
paste
some
I'll
try
and
split
it
out
and
actually
get
hot
that
that
prep
work
done
tomorrow
I
can,
I
should
be
able
to
grab
whoever's
on
call
in
apac
and
just
you
know,
it's
a
quiet
time
of
day
and
just
kind
of
get
that
done,
and
then
it
literally
will
just
be
changing
weights.
That's
all
we
need
to
do
everything
else.
D
And
I
guess
it
is
worth
noting
with
our
canary
setup
as
well.
Technically
speaking,
I
don't
know
what
it
is
exactly,
but
I'm
trying
to
think
it's
weight.
Three
versus.
Let's
say
it's
one
percent,
it's
probably
a
fraction
of
one
percent
of
gitlab.com's
real
traffic
is
going
to
canary
and
he's
just
going
to
kubernetes
as
well.
D
So
it's
not
like
you
know
it's
not
just
it's
not
just
I
mean
we're
all
familiar
with
this
obviously
with
canary,
but
we
have,
you
know,
got
real
user
traffic
on
there,
which
is
which
is
which
is
good,
because
which
makes
me
more
confident
already
as
well.
A
Great
yeah,
that's
good,
excellent,
great
stuff
cool.
Well,
please,
like
you
know,
let's
keep
things
moving
like
scovic
henry's
rounds
morning,
scuba
be
around
later,
so,
if
there's
things
we
could
be
continuing
through
today
to
get
this
stuff
ready
for
next
week.
That
would
be
good.
A
D
D
I've
created
an
issue
in
the
gitlab
chart
to
talk
about
using
getting
the
upstream
git
like
chart
to
use
v2
beta
2
or
basically,
the
new
version
of
the
horizontal
pod,
auto
scaling
api
on
the
objects
that
we
ship
with
the
chart.
The
short
answer
of
that
is
it'll,
give
us
greater
flexibility
and
control
over
the
scaling.
In
particular,
we
can
say
stuff,
we
can
do
things
like
say
scale
up
very
quickly
but
scale
down
slowly
to
stop.
You
know
that
kind
of
thrashing
and
going
up
and
down
in
pods.
D
We
have
happening
all
the
time
we
can
be
like
you
can
scale
up.
You
know
over
five,
like
you
can
look
at
metrics
over
five
minutes
scale
up
quickly,
but
you
can't
scale
down
until
you
know
you've
seen
an
hour
of
lower
traffic
or
things
like
that.
So
we'll
see
what
the
if
distribution
can
pick
that
up
or
we
can
pick
that
up
for
them
so
just
want
to
people
be
aware
of
that.
So
one
of
the
topics
we
want
to
try
and
cover
after
web
migration.
D
Alongside
the
pages
migration,
I
guess,
is
just
addressing
some
of
the
technical
debt
and
bits
and
pieces.
We
have
I've
kind
of
chosen
to
focus
at
this
point
on
the
gitlab
com,
the
biggest
reason
for
doing
that
is
with
the
migration
of
web.
Obviously,
we
have
all
of
reliability
now.
Looking
at
that
repo
a
lot
more
closely
trying
to
utilize
it
a
lot
more
outside
people
outside
delivery,
basically
contributing
to
it.
D
We
also
have
you
know
a
lot
more
developers
within
gitlab
who
are
you
know,
trying
to
roll
stuff
out
and
contributing
to
it
as
well?
It
is
it's
just
grown
organically
over
time
to
have
you
know
some
ugliness
and
some
bits
and
pieces
which
we
we
known
for
a
while
that
you
know
need
work,
so
I
don't
need
to
talk
too
much
more
about
that.
So,
in
terms
of
actual
steps,
I've
started
to
nail
down.
What
I
would
like
to
do
is
like
the
first
real,
concrete
steps
to
kind
of
fix
some
of
these
things.
D
First
thing
I
talked
about
there
and
I've
got
issues
as
you
can
see
for
all
of
those,
so
we've
got
getting
rid
of
helm
git,
which
basically
is
blocking
us.
It's
a
helm,
plugin
that's
loosely
maintained.
It
doesn't
do
much
for
us.
Well,
it
does
a
little
bit
for
us,
but
what
it
does
is
very
simple,
but
the
problem
is
it's
blocking
us
from
upgrading
helm
and
it's
blocking
us
from
upgrading
the
plugin.
D
So
I've
outlined
that
issue
there,
and
people
can
have
a
look
and
discuss
how
we
basically
get
rid
of
it
to
replace
it
with
something
it
sounds
like
andrew.
You
had
success
with
get
subtrees
for
get
for
like
get
like
the
thing
is
we
have
another
get
repo
at
the
chart?
We
just
want
it
kind
of
locally
in
that
repo.
Somehow.
C
C
It's
nothing
like
git
modules,
it's
much
better
sub
modules,
which
was
an
awful
thing,
and
what
I
really
like
is
that,
like
all
your
changes
on
one
merge
request
right
so
you're
not
trying
to
like
vendor
in
something
and
working
in
two
repos,
you
push,
you
do
your
changes
locally
and
then,
when
you're
ready
you
you
push
your
merge
request
back
and
it's
just
got
the
changes.
C
The
the
commit
sort
of
list
looks
a
bit
weird
on
the
other
project,
because
it's
got
all
the
commits,
even
ones
that
don't
have
like
anything
to
do
with
that,
but
you
can
squash
them
down,
but
it's
it's
it's.
It's
really
like.
I
think
it's
gonna
be
a
big
productivity
boost
because
you're
not
trying
to
you,
know
work
in
two
places.
Push
that
change
test
it
here,
you're
just
doing
it
locally.
D
C
D
There
was
for
a
while
there
I
think
java
can
speak
more
to
this,
but
we
were
running
off
a
branch
for
a
while
when
we
had
some
intermediate
work,
which
we
try
not
to
as
much
as
possible,
but
a
solution
where
we
could
conceivably
work
off.
A
branch
if
we
needed
to
especially
in
an
emergency
is
useful.
C
Yeah
and
what
we're
doing
with
this
is
like
we've:
we've
got
all
these
different,
merge
requests
that
we're
pushing
back
and
then,
but
if
you,
because
it's
you
know
it's
all
gets,
we
can
look
at
our
total.
You
know
we
can
look
at
what
the
total
difference
is
and
then
you
know
work
on
them
individually,
but
the
other
thing
is
that
we're
not
trying
to
wait
for
every
change
to
get
into
get
before
we
can.
You
know,
move
ahead
with
our
stuff.
Otherwise
it's
just
going
to
be.
D
Yeah,
fair
enough,
so
yeah
there's
a
topic
about
that
and
how
we
can
provide
some
safety
for
that
as
well,
because
if
we
start
locally
vendoring,
we
need
to
make
sure
people
don't
change
the
local
copy,
although
it
sounds
like
git
subtree
might
solve
that
so
anyway
yeah.
So
so,
once
we
action
that
small
discrete
item,
we
can
then
do
the
upgrade
of
all
the
software
components
we
haven't
upgraded
in
months,
if
not
years,
the
next
step
from
them
will
be.
D
I
want
to
remove
as
much
of
the
chef
sinking
as
we
can.
I
think
it's
just
more
confusion
now,
because
you
can't
see
the
values
you
just
see
that
they're
being
pulled
from
chef
and
it's
really
difficult
to
debug
and
honestly,
we
just
don't
have
stuff
running
in
chef
as
much
as
we
used
to
and
then
the
final
piece
in
this
first
wave
will
be
getting
us
visibility
back
into
our
home
deployments.
D
At
the
moment,
all
we
get
is
helm,
saying
that
the
upgrade
has
started
and
then
30
minutes
later
saying
it
completes
with
no
visibility
into
what's
going
on.
So
there's
an
issue
there
to
track
that
and
how
we
can
approach
that
to
try
and
get
some
visibility
back.
D
I
do
have
obviously
larger
issues.
I'd
like
to
do
in
terms
of
attacking
tackling
that
repo
as
well,
but
I
feel
like
these.
These
ones
are
really
good
to
smart
start
with
to
just
get
the
ball
rolling
and
then
I'll
continue
to
probably
try
and
break
off
some
of
the
technical
debt
work
into
these
discrete
issues
and
get
them
onto
the
the
you
know
the
the
board.
So
we
can
action
them
or
others
can
action
them.
A
Oh
yeah,
thanks
for
that,
graham
so
yeah
just
to
catch
the
rest
of
you
up
so
grab.
I
chat
a
little
bit
about
this
yesterday,
so
we've
got
a
load
of
tech
there.
We
obviously
don't
want
to
have
to
stop
a
whole
project
and
spend
six
months
paying
it
all
down.
So
we're
going
to
start
having
like
small
pieces
like
this,
that
we
can
just
trickle
in
alongside
our
other
work,
so
for
now
game.
I've
not
got
any
of
these
on
the
board.
A
Just
I
think
some
of
them
we
want
to
see
if
we
can
actually
break
down
a
bit
smaller
and
fit
in
around
the
web
stuff,
but
like
for
all
of
this
stuff,
please
keep
opening
issues
for,
for
anyone.
Who's
got
suggestions
for
how
we
can
actually
like
make
this
stuff
like
how
we
can
break
off
small
pieces
of
tech
debt
and
actually
have
value
without
needing
to
just
like
re
overhaul.
The
entire
thing
basically.
A
Awesome,
do
you
want
to
go
through
number
three
graeme.
D
Yeah,
I'm
a
little
bit
wary
just
because
I'm
keeping
an
eye
on
time-
and
it
is
a
big
topic.
I
will
try
and
briefly
summarize
the
important
points
so
with
our
current
four
cluster
setup.
Zonal
clusters
do
not
have
a
high
availability,
kubernetes
master
and
we
have
upgrade
windows
for
our
kubernetes
clusters.
D
Basically,
during
apac
work
hours,
it's
been
bit
me
twice
now
and
it
probably
will
bite
me
more
that
when
the
masters
go
down
for
upgrade,
which
can
be
like
half
an
hour
at
a
time,
if
I'm
running
a
deploy
during
that
time,
they
will
fail
because
they
can't
talk
to
the
master.
They
can't
change
the
number
of
pot
yeah.
Basically,
they
just
can't
deploy
it's
not
a
big
issue
and
it's
semi-annoying,
because
we
have
zonal
clusters
for
a
very
important
reason,
which
is
cost
like
we.
D
If
we
have
regional,
then
pods
can
talk
to
any
pod
in
any
region
and
we
just
get
a.
We
get
things
for
cross-cluster
traffic,
so
zonal
clusters
make
sense
because
everything
talks
to
itself
in
the
one
zone
and
we
don't
get
things
for
that
traffic.
But
google
give
us
two
options:
the
regional
cluster,
where
the
traffic's
free
to
go
across
zones
as
much
as
it
wants
and
high
available
masters,
which
they
say
they
recommend
for
production,
use
or
zonal
clusters.
But
this
downside
of
your
masters
could
be
unavailable
during
upgrades
and
obviously
during
real
outages.
D
I
don't
think
this
is
a
burning
issue
that
needs
to
be
tackled
straight
away.
But
it's
just
something
to
note,
and
the
other
interesting
point
is
once
again
going
back
to
people
outside
like
reliability
and
other
people.
Looking
at
our
architecture,
it
is
confusing,
like
it's
different
right,
like
people
like
the
upstream
helm
chart
like
gitlab
helm
chat,
they
do
great
amount
of
testing
in
a
single
kubernetes
cluster
and
then
they're
like
okay.
So
it
all
should
all
work
for
you
and
then
I'm
like.
No.
D
I
need
to
go
into
terraform
and
create
load
balancers
and
do
like
internal
dns
and
like
wire,
this
load.
Balancer
up
and-
and
you
know,
we've
got
to
do
one
for
each
zonal
cluster
and
it
can't
just
talk
to
itself
by
the
cluster
ip,
because
it's
not
the
same
cluster
and
then
one
component
lives
in
this
cluster
and
one
component
lives
here,
and
so
I'm
kind
of
like
I'm
okay
to
keep
zonal
clusters.
D
It'll
be
great.
If
google
could
just
give
us
masters
on
zonal
clusters,
and
so
part
of
this
issue
is
we
need.
I
would
like
to
squeeze
google
a
little
bit
on
giving
us
a
solution,
but
I
think
I
I
would
I'm
wondering
if
we
should
just
move
to
zonal
clusters
and
then
just
make
them
identical,
so
a
small
copy
of
everything,
and
then
it's
just
everything
talks
to
itself
inside
the
zonal
cluster,
but
we
just
duplicate
that
three
times
and
then
have
h
a
proxy
like
arbitrary
traffic
through
the
front.
D
The
huge
downside
to
that
is
say
like
the
web
pod
talking
to
the
ap
ipod
in
one
zone
can
talk
for
either
cluster
ip,
no,
no
external
load
balancer
needed.
But
if
api
one
zone
goes
down,
we've
got
no
failover
to
another
zone.
So
there's
definitely
like
pros
and
cons
with
all
this
approach.
It's
a
problem
really
the
it's
really
a
problem
that
kubernetes
like
the
kubernetes
layer,
the
platform
layer
is
supposed
to
solve
and
I'm
actually
really
disappointed.
Google
don't
have
a
good
answer
for
it,
but
yeah.
D
Maybe
maybe
this
problem
is
just
too
big
for
now
and
in
fact,
kubernetes
upstream
have
added
alpha
features
so
in
alpha
state
in
kubernetes
122
to
solve
this
problem
for
everyone.
So
maybe
this
problem
will
just
go
away
naturally
in
a
year,
but
it
is
worth
pointing
out
that-
and
it's
just
me
me
and
apac
time
zone.
D
In
the
same
time
that
the
cluster
auto
upgrades
happen,
I
get
dinged
on
that
and
yeah
it
is,
can
it
it
would
help
people's
understanding
and
align
our
architecture
better
with
what
we
expect
upstream,
if
we
could
move
to
just
everything
runs
in
the
same
cluster,
even
if
we
duplicate
that
multiple
clusters,
canary
and
everything
just
canary
goes
into
there
like
just
everything,
goes
and
they're
just
cut
and
paste
like
identical.
C
Just
with
with
canary
right
are
there
never
going
to
be
sort
of
global
things
that
we're
going
to
want
to
test
and
roll
out
core
dns
or
service
discovery,
or
something
like
that
where
we
want
to
change,
we
want
to
test
it
in
a
canary
state
and
it
it
I
mean
I
guess
we
could
have
everything.
Well,
we
can't
really
have.
We
could
have
most
things
petitioned.
I
guess,
but
but
there
might
be
some
things
where
it's
easier,
just
to
have
a
separate
cluster
for
canary.
Do
you
follow
yeah.
D
Configuration
yeah,
possibly
no,
that's,
certainly
another
option
as
well.
C
But
generally,
I
agree
that
that,
having
like
a
single
cluster
and
and
and
it
you
know,
keeping
it
using
a
kubernetes
fun
like
feature
to
to
to
keep
everything
in
in
in
the
zones
would
be
much
better.
C
D
I
think
they
call
it
anthos
service
merchant,
like
they've,
got
some
stuff
that
can
kind
of
do
it.
I
think
gke
does
talk
about.
D
Oh
yeah,
right
yeah,
so
that's
topology,
keys
on
services.
D
Java
did
raise
a
good
point
that,
obviously,
if
if
somehow,
we
would
have
solved
this
problem
and
go
back
to
a
single
cluster,
there
is
like
a
all
our
eggs
in
one
basket
if,
like
the
cluster
crashes
or
fails
an
upgrade,
but
that
being
said,
we
could
move
to
more
like
a
blue
green
cluster
kind
of
setup,
where
we
still
have
two
clusters:
they're,
both
regional
and
and
they're
there
for
reliability,
not
just
because
we
want
to
shift
them
in
different
zones.
D
Yeah,
I'm
still
on
the
fence
about
a
lot
of
this,
I'm
just
trying
to
think
of
a
better
way
to
to
kind
of
clean
up
this,
so
that
people
can
just
contribute
to
the
upstream
chart.
It's
all
wired
through
the
kubernetes
internal
services.
They
all
know
how
it
all
works,
and
then
we
just
kind
of
duplicate
that
without
like
oh
now,
I
need
to
figure
out
what
terraform
I
have
to
do
to
glue
this
all
together.
A
Yeah,
and
so
I
think,
yeah,
I
appreciate
you
bringing
this
up.
Graham,
it's.
It
sounds
big
enough
that
it's
the
sort
of
thing
that
would
need
to
be
an
okr
and
need
to
have
some
kind
of
like.
I
think
it
would
need
to
be
tied
to
us
having
to
unlock
something
versus
just
it's
sure.
B
A
Hard
to
use
just
because
it's
going
to
be
such
a
huge
change
like
super
risky
and
stuff.
So
like
yeah,
I
mean
like,
let's
keep
it
in
mind,
but
we
don't
think
we're
gonna
prioritize.
This
we've
got
enough
kind
of
scaling
related
things
that
will
come
in
so
yeah.
So
I
think
it'll
be
a
good
one
to
see
kind
of
like
what
things
look
like
a
little
bit
further
down
the
road.
A
B
I
had
some
comments
on
this.
I
just
wanted
to.
A
B
Some
of
them
so,
like
I
think,
like
graeme,
already
covered,
like
I
have
my
reasons
for
preferring
the
multi-cluster
just
for
the
blast
radius
isolation,
but
on
the
other
hand,
the
bandwidth
issue
is
really
not
much
of
an
issue
except
between
aj
proxy.
So,
if,
if
we
do
like
this
was
the
main
driving,
it
was
cost,
we
had
to
do
multi-cluster
because
of
cost
because
of
bandwidth.
Now
without
nginx,
the
only
cross-zone
traffic
is
between
h.a
proxy
and
the
cluster.
B
B
Maybe
if
terraform
confusion
is
the
main
driving
factor,
could
we
just
simplify
this
with
jsonnet
like
have
maybe
some
guard
rails
for,
like
you
know,
we
could
have
like
one
jsonnet
definition
that
generates
the
terraform
for
the
different
clusters,
like
you
define
your
node
pools
in
one
like
and,
and
that
would
like
make
it
maybe
clear,
but
personally,
like
maybe
I'm
too
close
to
this,
I'm
just
not
feeling
the
pain
like
for
terraform,
but
maybe
I
I
don't
know
so.
C
That's
not
cool
jeff
like
full
full
dedicated
we're
generating
variables,
but
we're
not
generating
terraform.
Are
you
talking
about
generating
the
terraform
itself?
I'm.
B
A
B
No,
I
mean
I
mean
like
in
the
project
itself.
We
just
generate
the
kubernetes
cluster
terraform
config,
with
the
jsonnet
definition.
It
says
like
okay,
these
are
the
workloads
and
then
from
that
we
could
generate
the
the
cluster
config
in
some
way.
That
would
make
it
more
straightforward.
Maybe
it's
less
straightforward.
I
don't
know,
but
I
I
don't
know
like
what,
where
the
pain
points
are
with
kubernetes,
because
I
haven't
felt
them.
D
Yeah
so
you're
right,
I,
that
is
a
really
good
point.
I'm
I
didn't
even
consider
that,
but
that
is
a
really
good
point.
We
can
do
it
one
better.
All
of
our
gke
clusters
come
with
a
the
cloud
connector,
so
you
can
actually
just
write.
They've
got
crds
and
components
for
a
lot
of
the
gce
resources
we
need
and
they
could
probably
just
if
we're
writing
you
know,
json
like
it
could
just
be
stuff
that
you
know
yeah
like
you're
right.
We
could
probably
make
tooling
that
makes
this
better.
D
I
didn't
even
think
of
that,
and
that's
that
actually
would
make
it
so
simple.
You
could
just
basically
define
a
services,
we
need
this
exposed
and
then
it
would
create
the
dns
record,
or
we
already
have
yeah.
D
To
wrap
this
up,
it
sounds
yeah.
It
sounds
like
it's
not
an
immediate
problem.
You
know
we'll
we
can
circle
back
to
it
in
six
months
or
a
year
and
see
if
it's
still
a
problem,
then
or
if
you
know,
if
things
are
better
or
worse,
I
think
the
maybe
maybe
it's
also
I
mean
the
the
big
component.
I've
had
recently
has
been
cars,
and
I
put
that
in
the
regional
cluster.
D
So
maybe
the
different
way
to
look
at
this
problem
is
we
should
we
should
be
moving
everything
to
zonal,
where
we
can,
I
guess
like,
like
maybe
the
reason
sidekick's
in
the
regional
is
because
it
doesn't
need
to
like
it
just
talks
to
redis
right
and
like
like.
Maybe
this
was
more
just
a
bad
call.
I
made
on
that
component,
which
so
I
see
that
pain
point
more,
but
for
the
other
components,
it's
not
as
big
as
a
bigger
pain
point
but
you're
right
when
we
replace
hp
proxy
there's.
E
A
Awesome
great
and
then
I
just
want
to
give
a
little
bit
of
an
overview
of
what
I
expect
we'll
be
looking
at
in
the
next
next
few
months.
So
pages
is
going
to
be
the
next
one
which
gets
us
to
the
end
of
the
stateless
service.
A
Migration
skybeck
has
already
started,
putting
together
issues
for
that,
so
that
will
kick
off
pretty
much
as
soon
as
we're
done
kind
of
hands-on
with
web,
so
yeah,
possibly
even
next
week,
and
then,
alongside
that,
we
have
two
that
are
coming
up
that
are
both
I'm,
not
sure
which
one
will
land
first,
but
it'll
either
be
redis
or
perfect.
A
So
perfect
should
from
what
we
hear
should
just
migrate.
So
we
can
certainly
attempt
that,
and
you
know
we'll
need
to
test
it
and
see
how
that
looks,
but
there's
nothing
actually
blocking
that
work
at
the
moment
and
then
the
other
one
is
redis.
So
there's
a
we
have
a
shared
okr
with
platform,
that's
right
across
platform,
so
we're
sharing
it
with
the
scalability,
which
is
to
work
out
a
kind
of
longer
term
redis
scaling
plan.
A
So
we
were
expecting
that
this
quarter
will
be
kind
of
the
investigation
and
working
out
like
which
cluster
makes
sense,
and
how
would
we
actually
go
about
doing
this?
Then
there
are
some
kind
of
new
things
coming
in.
As
of
yesterday
that
we
may
need
to
push
this
force,
we
need
to
scale
redis,
so
it's
quite
possible
that,
if
we're
thinking
kubernetes
is
the
answer
to
how
we
scale
radius,
we
may
want
to
sort
of
adjust
the
order
and
perhaps
do
redis
before
we
try
perfect.
A
So
those
are
certainly
the
next
three,
though
I
would
expect
to
be
coming
in.
B
Thanks
amy,
I
had
just
a
couple
comments.
One
is
about
secret
refactoring.
I
don't
know
this
could
belong
in
reliability
and
not
delivery.
Maybe.
A
Yeah
be
better
there,
but
I
would
I
would
like.
A
Right
yeah,
so
that's
a
super
good
point.
So
yes,
secret
is
almost
certainly
going
to
drop
out
from
one
of
these
other
projects,
but
yes,
it
may
be.
It
may
end
up
being
a
reliability
project
yeah
but
like
yeah
by
all
means.
If
you've
got
to
create
an
epic,
then
go
for
it
go
for
it
now,
but
yeah,
I
think,
secrets
will
become,
will
almost
become
a
like
required
piece
for
for
at
least
one
of
these
three.
B
Yeah
and
then
the
second
point
is
just
our
our
migration
post
migration
story.
This
this
might
need
to
come
before
secrets,
because
I'm
just
really
looking
forward
to
getting
puma
secrets
out
of
gkms
and
the
deployer
machine
that
will
be
like
the
last
actually
deployer
and
console
are
the
last
two
vms.
B
Once
we
migrate
pages
that
will
be
running
puma
and
require
things
like
the
database
password
and
the
redis
password
like
as
soon
as
we
get
rid
of
decommission
those
hosts,
then
one
of
our
two
major
secrets,
like
redis
password
database,
those
can
go
into
google
secrets
manager
or
whatever
solution
you
know,
and
we
won't
need
to
use
gkms
for
them
anymore.
So
that
would
be
great.
But
do
you
think
that
won't
fall
on
delivery
is
like
how
we
do
migrations,
or
is
that
also.
A
I
think
it
will
so
what
we
know
at
the
moment
is
that
at
some
point,
registry
are
going
to
need
these
we've
kind
of
kicked
it
down
the
road
a
little
bit,
because
at
the
moment
they
don't
need
them.
So
we
don't
know
exactly
what
that
looks
like,
but
yeah
once
we've
gone
through
webb,
then
that
there's
no
real
reason
not
to
to
sort
these
out.
One
thing
actually
related
to
post
deployment.
Migrations
is
our
other
big
challenge
through
our
other
big
focus
through
q3
is
around
deployments
and
rollbacks.
A
Now
the
single
biggest
problem
with
rollbacks
is
post
deployment,
migrations
block
pretty
much
most
packages
from
being
rolled
back,
so
there's
a
sort
of
a
separate
angle
there
as
well,
where
we're
actually
going
to
review
myra
has
started
thinking
about
it
already
like.
What
do
we
actually
do
with
post
deployment
migrations?
How
do
they
fit
into
deployments
and
they're
for
rollbacks,
so
yeah,
I
think
post
point
migrations
will
will
certainly
be
a
delivery
thing.
D
So
some
quick
points
I
already
have
an
epic
for
the
gsm
migration.
It's
probably
a
bit
unloved
as
in
it
needs
a
bit
of
an
update,
I'm
more
than
happy
to
not
necessarily
do
the
work,
but
like
help,
I
know
exactly
all
of
the
options
we
have
available
and
what
needs
to
be
done
and
which
vms
need
to
change.
Oh
scope.
I
know
it
all,
so
I'm
happy
to
help
scaffold
that
out,
so
it
could
be
actioned
by
anyone
and
in
terms
of
the
console
migration.
D
So
we
were
going
to
try
and
do
a
console
readiness
review
as
part
of
the
web
migration,
and
I
think
we
we
think
it
might
not
be
ready.
Turning
on
the
console
so
the
masters,
so
the
console
servers
to
running
kubernetes
is
like
a
two-line
code
change
because
we
already
have
the
helm
chart
already
supports
it.
Like
we've
got
all
of
the
bits
there
to
basically
just
spin
up
console
master
pods
running
in
kubernetes
extremely
quickly,
and
we
could
even
federate
them
to
the
vm.
B
The
ole
boxes
are
used
for
developer
rails
access,
so
they
have
all
the
secrets
to
connect
to
the
database.
This
is
what
I
want
to
focus
on
the
ul
boxes.
Those
I
don't
think
have
as
many
secrets
like
they
don't
need
the
redis
password
the
database
password.
So,
although
it'd
be
easy,
I'd
rank
it
kind
of
lower
priority.
I'd
like
to.
A
Awesome
great
great
stuff,
thanks
everyone
is
there
anything
else
we
need
to
cover
today.
A
Nope
super
all
right.
Well,
good!
Luck,
graham
with
next
steps.
Let
us
know
how
we
can
help
and
hope
everyone
has
a
great
day,
we'll
see
you
soon.