►
From YouTube: 2021-04-01 GitLab.com k8s migration EMEA
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
B
A
Starbuck's
got
some,
I
don't
know
if
he's
going
to
demo,
but
he's
going
to
give
us
a
run
through.
Hopefully
we
can
use
this
recording
to
help
henry
catch
back
up
again
next
week.
So
he's
going
to
give
a
bit
of
an
overview
of
where
we
are
and
what
comes.
A
B
B
So
what
happened
after
I
left
the
the
incident.
Do.
A
A
Yeah,
so
europe
managed
to
pick
and
has
kicked
off
the
packaging
in
parallel,
so
fingers
crossed.
We
should
have
in
an
hour
everything
ready
so.
D
D
Else
yeah,
I
didn't
really
have
anything
to
double,
so
I
thought
I'd
share
a
little
bit
of
where
we
are
and
what
we
kind
of
came
across
this
week,
just
to
show
how
difficult
it
is
to
do
a
migration
of
a
service.
D
Roll
out
I
was
not
prepared
for
this
meeting.
I'm
sorry.
D
So
the
first
time
we
attempted
to
enable
staging,
we
appeared
successful
initially,
but
we
were
getting
a
lot
of
qa
failures.
D
D
D
Traffic
to
staging,
we
end
up
getting
a
lot
of
error
messages,
so
it
was
immediately
pulled
where's
that
thread
yeah.
This
is
where
jarv
helped
me
out.
D
We
discovered
that
we're
missing
a
configuration
inside
of
our
gitlab
configuration,
so
this
was
something
that
I
missed,
despite
the
fact
that
we
were
that,
I
had
an
auditing
to
go
through
the
configuration.
I
missed
an
item,
unfortunately,
that
required
an
update
to
our
home
charts
and
then,
as
I
made
the
necessary
updates
to
our
home
charts,
I
introduced
two
bugs
into
our
home
charts,
so
I
had
to
fix
both
of
those
in
order
to
get
us
past
this
point.
D
Rack
attack
was
showing
up
a
lot,
but
after
I
dig
further
into
this-
and
I
don't
know
if
I
documented
this
properly-
I
did
so.
I
discovered,
after
a
larger
situation
that
we
are
hitting
rack
attack
quite
frequently
we're
on
a
constant
basis,
but
we
just
had
a
spike
when
the
api
was
first
enabled
in
kubernetes.
D
So
we
could
see
the
drop
off
from
our
virtual
machines
and
then
the
same
amount
picked
up
in
kubernetes.
The
difference,
though,
with
this
was
that
I
noticed
that
traffic
was
coming
from
all
over
the
place
when
we
were
on
kubernetes
or
on
our
virtual
machines.
But
then
our
api
was
only
seeing
traffic
for
my
low
bouncers
when
it
started
taking
traffic.
D
So
graeme
helped
me
with
this
here's
another
proof
where
I
was
sending
a
curl
request.
You
can
see
the
requests
coming
in.
We
only
see
our
load
bouncers.
D
D
We
ran
into
an
issue
where
geo
is
pounding
century
with
errors,
it
was
determined
that
this
is
just
due
to
a
data
situation.
I
reached
out
to
geo
for
some
assistance
to
evaluate
this
particular
error,
and
we
just
have
bad
data
on
staging
for
geo
also
found
a
fun
error
where
we're
trying
to
look
for
an
exclusive
lease,
but
apparently
there's
one
already
in
execution.
D
Apparently,
we
have
some
sort
of
a
code
situation.
Robert
haven't
come
out
with
this,
where
we
are
reusing
a
set
of
code,
that's
specific
for
sidekick
jobs
that
handles
specific
user
requests
to
the
api.
If
you
were
to
like
make
a
change
to
your
user
in
some
way
shape
or
form,
it
touches
the
same
code.
D
So
this
is
an
error
message
that
we
see
quite
often,
but
we
could
safely
ignore
it,
which
I'm
not
really
too
thrilled
about
so
an
opening
issue
to
hopefully
get
that
looked
into
and
remediated
I'll
talk
about
this
one
in
a
second.
What
was
this
new
error?
I
forget
what
this
was
go
away.
D
There's
a
foreign
key
problem,
so
this
is
probably
another
situation
I
deemed
it.
It
was
a
situation
where
you
have
bad
data
again
in
staging,
so
I'm
ignoring
that
and
the
metrics
we
you
can
see
from
this
chart.
We
just
lost
our
metrics
when
traffic
was
fully
shifted
over.
This
was
also
fixed
by
graeme.
So
graeme
is
you
know,
you're.
D
But,
as
you
can
see,
the
traffic
just
kind
of
our
metrics
just
dropped
off
when
we
shifted
traffic
over
to
kubernetes,
so
graeme
again
fixed
this.
Oh,
I
spun
up
an
issue
and
grain
fixed
that
as
well.
So
now
we
are
seeing
trash
or
metrics
in
our
api
now,
which
is
wonderful.
So
if
I
go
to
the
last
say
three
hours,
we
have
our
aptx
data
in
our
low
bouncer
abdex
and
error
ratios,
which
we
were
missing
when
we
first
transitioned
over
so
so
now.
D
D
We
don't
have
metrics
in
our
grafana
at
all
we're
capturing
metrics
somehow,
even
though
metrics
is
not
enabled
in
our
deployment.
So
I
need
to
look
into
that,
but
I
don't
know
if
this
is
a
blocker
for
us
going
into
production,
but
it's
something
that
I
want
to
investigate
a
little
bit
so
graeme
is
going
to
graham,
is
already
helped
me
out
greatly
with
this,
but
I
want
to
make
sure
that
if
console
falls
over,
we
don't
start
pounding
the
primary
database
unnecessarily,
because
that's
our
fallback
option,
hopefully
with
the
postgres
12
migration.
D
You
know
things
will
be
a
lot
faster
or
you
know
not
as
saturated,
but
until
then
we
run
the
risk
of
sending
144
pods
directly
to
only
the
primary
database
instead
of
sending
read
queries
to
any
of
the
secondaries,
since
we
don't
know
how
bad
of
an
impact
this
is.
I
want
to
look
into
this
a
little
bit
today,
or
at
least
figure
out
some
of
the
issues
that
I'm
discovering
with
console
and
see.
If
we
can
remedy
some
of
those
and
gather
a
list
of
things
that
we
need
to
work
on
in
jarv.
D
D
E
E
E
E
F
There
are
quite
a
number
of
console
agents,
but
kubernetes
isn't
adding
like
an
agent
per
pod.
We
have
an
agent
per
node,
it's
a
daemon
set,
so
we
have
console
running
per
node
and
then
we
make
dns
queries
to
the
service
endpoint
from
each
pod.
So
compared
to
our
vm
configuration
where
we
have
an
agent
per
vm,
I'm
sure
we've
increased
the
number
of
agents,
but
right
now
I
don't
think
we
think
I
I
don't
believe
we
don't
think
the
problem
is
between
the
agent
and
the
server.
E
Yeah,
okay,
I
understood
so
the
problem
that
I
had.
I
mean
it's
a
long
time
ago
when
council
was
very
young,
but
it
was
exactly
the
same.
So
we
had
docker
installation
so
no
kubernetes
and
we
had
something
like
30
machines.
So
not
that
many
and
we
started
with
was
a
three
musk.
I
think
it
was
three
master
installations,
so
basically
they
were
no
agents,
they
were
just
they
were
just
reaching
out
to
one
of
the
three
master,
so
it
was
kind
of
something
like
1
to
10.
E
So
this
was
the
ratio
between
machines
and
so
clients
and
the
number
of
servers,
and
basically
he
kept
hanging.
So
the
dns
just
didn't
kept,
not
responding,
and
so
all
the
application
was
failing,
because
they
were
not
able
to
figure
out
where
to
reach
out
for
things
and
when
we
moved
to
one
agent
per
machine.
But
again
this
is
not
kubernetes,
so
it
was
able
to
handle
the
load.
E
E
F
Okay,
yeah,
that's
something
worth
exploring.
I
think
you
know,
we
don't
think
it's
between
the
client
to
the
server,
because
we
would
see
the
same
problem
on
virtual
machines
if
it
was
because
the
virtual
machines
and
kubernetes
are
using
the
same
console
servers.
But
if
we're
just
simply
like
overloading
the
dns
interface
on
the
client,
then
maybe
what
we
can
do
is
we
can
they're
not.
E
F
F
Yeah
yeah
yeah.
That's
why
I
think
it's
like
the
connection
between
the
application
and
the
agent-
that's
probably
getting
overloaded,
which
would
make
sense
right,
that's
happening
on
kubernetes,
maybe
maybe
what
we
could
do
here
is
reproduce
this
on
staging
by
just
blowing
out
the
number
of
replicas
temporarily,
just
as
an
experiment
and
see
if
we
see
the
same
problem.
F
If
we
like
you
know,
I
mean
that's
a
really
simple
thing
for
us
to
do
without
any
traffic,
if
we
create
the
same
number
of
pods
on
staging
as
we
have
on
production
just
even
for
a
little
bit
and
we
start
seeing
these
errors.
That
would
be
a
good
clue.
D
I
think
that's
a
good
idea.
We
could
probably
demo
that
right
now,
if
we
could
potentially.
D
F
F
And
starbuck
did
you?
Did
you
look
at
the
resource?
You
said
that
you
know
we
are
doing
a
good
job
at
monitoring
the
resource
utilization
for
console.
Have
you
looked
at
that
recently
and
does
it
look
like
we
are
like
at
the
limit,
because
maybe
this
could
be
as
simple
as
like?
We
don't
have
enough
cpu
in
memory.
D
I
haven't
looked
into
it
yet,
graham
also
posed
that
same
question,
so
I.
E
D
E
D
So
I
started
looking
into
that
this
morning.
I
just
happened
to
come
across
the
fact
that
we're
missing
a
lot
of
stuff
for
consoles.
So
it's
like
we
deployed
it,
but
then
we
didn't
care
about
it
like
it
didn't
go
through
a
writer's
review
type
of
situation,
so.
F
F
F
F
We
have
logging
for
the
console
server
which,
for
the
console
clients,
we
don't
have
logging
on
the
virtual
machines
or
kubernetes
for
kubernetes,
just
in
stackdriver.
I
guess
and.
E
F
E
D
D
Slowly,
gathering
a
list
of
all
the
things
I'm
finding
that
we're
missing
jarv,
you
had
the
idea
of
doing
something
to
the
deployment.
While
we
do
queries
what
would
be
the
best
way
to
go
with
this,
because
I
could
do
an
ns
lookup,
for
example,
I
don't
I
don't
shove
that
behind
a
while
loop
and
maybe
play
with
the
deployment
or
something
is
that
what
we
want
to
try
to
test
out.
F
Yeah,
I
can
give
you
the
example
of
just
using
dig
to
do
the
dns
query
on
the
shell.
I
was
thinking
of
just
doing
this
in
a
loop
over
the
course
of
a
day
to
see
if
we
see
the
same
issue-
okay,
yeah,
just
to
see,
because
I
don't
know
how
long
it's
going
to
take
for
us
to
reproduce
this
problem
on
a
given
pod
across
like
on
the
application.
I
was
thinking
of
even
doing
this
on
a
real
pod.
F
In
the
background
I
mean
not
like
in
a
super
tight
loop,
but
something
that
would
at
least
tell
us
if
this
is
happening
outside
the
application,
like
maybe
one
one
request
like
every
couple
seconds:
yeah
yeah,
but
I
like
this.
I
like
this
idea
of
like
expanding
staging
first.
If
we
could
just
see
this
on
staging,
then
you
wouldn't
have
to
mess
with
prod
at
all.
F
D
F
F
F
Yeah,
you
could
do
it
for
the
git
service.
I
think
we're
seeing
it
there
yeah
pretty
frequently
and
if
it's
something
that
we
see
even
when
we're
not
taking
any
traffic,
we'll
see
it
on
staging
yeah,
and
that
would
be
I
mean
that
would
at
least
clue
us
into
whether
this
is
a
problem
of
overloading
the
client
yeah,
which
I
think
is
a
pretty
good
theory.
B
B
E
So
I
I
don't
know
if,
if
we
have
vlogging
about
when
this
happened,
if
this
I
don't
know
primary
was
changed,
I
don't
think
I
mean
it
would
be.
A
database.
Failover
probably
would
be
something
that
we
are
aware
of,
but
if,
for
an
instance,
we
had
some
changing
in
the
in
the
content
of
console
services
around
when
it
got.
E
D
Yeah
george,
let's
just
do
a
a
merge
request
into
gitlab.com
and
bump
up.
The
replica
account
like
just
shoving
this
behind
a
while
loop,
I'm
not
seeing
any
failures,
but.
F
This
is
you're
doing
this
on
staging
or
yeah
staging
staging.
F
D
Easy
and
I
guess
what
we
would
look
for
at
that
point-
it's
just
a
log
message
inside
of
rail,
saying
it
can't
get
a
response
from
console
appropriately.
G
F
F
I
think
yeah,
if
you
start
on
this
today,
just
drop
me
a
message.
I
can
follow
up
on
it
friday
and
monday
evening.
D
F
I
mean,
like
I
think,
granted
we
didn't
have
rate
limiting
enabled
back
then
I
think
before
we
took
out
nginx,
but
I'm
pretty
sure
that
this
was
working
like
we
hadn't.
We
were
servicing
with
nginx
for
a
while,
and
I
was
definitely
looking
at
ip
addresses.
So
this
thing
kind
of
confuses
me
as
to
why
we
didn't
see
it.
There.
D
F
F
F
F
F
D
D
E
D
D
F
E
F
I,
my
my
preference,
I
think,
is
to
keep
nginx
just
to
kind
of
keep
as
much
the
same
as
possible,
but
it's
also
a
pain
to
remove
it
later.
So
I
don't.
A
Which
one's
safer,
like
removing
it
like
having
a
like
for,
like
move
now
or
changing
like
several
things,
which
will
give
us
a.
F
Part
of
the
motivation
for
removing
nginx
from
good
https
was
that
we
had
this
ancient
version
of
the
nginx
controller
and
we
were
like
you
know.
We
probably
don't
need
nginx
anyway,
so,
let's
just
remove
it,
there
was
no
proxy
request,
buffering
or
anything
that
you
know
everything
was
zeroed
out
in
the
config
api.
Like
I
don't
know,
I
I
it's
it's
a
little
bit
different.
A
I
I
think
we
should
keep
it
just
for
a
like
for,
like,
like,
I
think,
there's
a
lot
of
moving
pieces
already.
Unless
we
have
like
it's
going
to
make
something
considerably
more
difficult,
then
I
I
think
it
is
a
good
contender
for
the
post
migration
de
epic.
E
F
B
Many
hubs
between
all
the
different
services,
a
question
would
for
me,
would
be
do
we
have
a
replacement?
We
could
do
right
now.
That
would
be
easier
to
do
after
we
migrate,
so
removing
of
that
layer.
Oh
my
god,
I'm
not
making
any
sense.
Give
me
one
sec
to
organize
my
thoughts,
so
we
have
nginx
right
now
on
on
our
fleet
and
we
are
talking
about
possibly
removing
nginx
as
a
layer
when
we
migrate
to
kubernetes
right.
B
But
the
question
I
want
to
ask
here
is:
is
there
a
way
for
us
to
remove
nginx
now
or
replace
it
with
something
else
that
would
make
it
easier
for
us
to
remove
later
after
we
migrated
kubernetes
to
kubernetes?
D
B
Well,
the
the
question
here
is:
if
we
remove
nginx
what
what
do
we
expect
to
take
that
thing
over
like
what
will
take
over
engine
x's
role,
we
expect
each
proxy
to
be
that
right.
E
B
B
E
B
But
again
I
think
amy
has
a
point
here,
which
is
one
on
one,
and
that
is
what
we
should
keep
in
mind
yep.
But
what
I'm?
What
I'm
trying
to
add
to
is
what
the
one-on-one
is
is
up
to
us
to
decide.
C
F
Yeah
it
sounds
like
you're
saying,
like
you
know,
if
we're
gonna
remove
nginx,
we
should
feel
confident
about
doing
it
on
bms,
which
would
allow
us
to
change
one
thing
at
a
time.
Right,
I
think
that's,
that's
sort
of,
I
think,
that's
reasonable,
and
we
could
even
do
this
on
just
canary.
We
could
do
it
on
the
subset
of
vms.
F
F
So
we
would
need
to
look
at
that
again
as
well,
but
I
I
I
think
it's
a
good
idea
to
like
if
we're
going
to
do
this,
let's
disable
on
vms.
First,
instead
of
after
we
move.
D
D
What
I'll
do
is
try
to
bump
up
the
priority
of
our
existing
nginx
issue
and
then
add
some
notes
from
this
meeting
into
that
one
and
then
create
a
new
issue
to
evaluate
potentially
removing
nginx
from
you
know,
maybe
starting
with
staging
first
obviously,
but
seeing
what
changes
we
need,
because
I
know
it's
going
to
invoke
a
little
chef
modifications
in
order
to
accomplish
that
work
and
we'll
see
what
happens
with
that
front.
B
But
keep
in
mind.
B
Is
also
because
console
is
delaying
us
anyway
right,
so,
let's
try
to
if
we
somehow
magically
can
possibly
parallelize
this
and
do
the
testing
in
parallel
with
the
console
work
to
make
sure
that
we
can
get
those
things
converging
to
the
same
point,
which
is
once
we
come
to
that
same
point.
We
can
continue
with
the
api
migration
uninterrupted.
D
E
I
don't
know
if
you
mentioned
this
just
got
a
bit
distracted
by
my
son,
but
if
I
remember
correctly
on
omnibus
packages,
nginx
has
special
configuration
regarding
bufferings
of
requests
based
on
apis
for
artifacts
uploads,
and
things
like
that.
B
Jar
mentioned
that,
but
the
point
there
was
we
want
to
see
whether
we
can
move
that
layer
to
an
existing
service
that
we
already
have
and
remove
that
addition
right,
like
if
possible,
which
would
make
it
much
easier
for
us
to
just
do
api
api
migration
without
that
additional
layer.
On
top
where
we
know
the
vms
would
work
in
the
same
way
as
that's
the
general
idea
behind
it.
I'm
not
saying
it's
easy,
but
it's
a
general.
D
Okay,
aside
from
me
needing
to
backfill
some
issues
with
details
and
some
new
issues,
is
there
anything
else
that
we
want
to
discuss
on
the
meeting.
D
What
I
would
like
to
do
is
go
ahead
proceed
to
get
something
running
in
canary.
That
way.
I
could
do
a
final
analysis
on
making
sure
our
metrics
logging
and
configurations
look
okay
in
production,
but
I
would
prioritize
what
we
just
discussed
over
moving
into
production.
So
maybe
we
take
traffic
in
canary,
but
before
we
go
to
our
main
stage
is
how
I'm
thinking
right
now.
A
D
A
Thank
you
very
much.
Everyone
today
speak
to
you
soon.