►
From YouTube: 2020-09-04 GitLab.com k8s migration EMEA
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
B
A
We
don't
really
have
much
to
demo,
but
I
guess
we
wanted
to
see
each
other.
So
let's
chat.
A
B
Maybe
we
can
just
do
highlights
and
let.
B
A
B
So
let's
start
with
that,
the
only
blocker
we
have
right
now
is
the
cross,
a
z
traffic
other
than
that
there
was
a
new
blocker
that
we
just
discovered
this
morning,
which
is
that
I'll
put
that
on
the
issue,
which
is
just
another
charts.
B
A
I
probably
I.
C
D
D
My
opinion,
but
if
you
look
at
the
like
the
rails
ones
are
like
obviously
critical
the
century
ones.
I
was
looking
at
them
the
other
day
and
there
was
like
a
lot
of
random
stuff
in
there.
You
know,
and
none
of
it
particularly
helpful.
B
Okay,
well,
I
have
an
mr
for
the
fix.
We
just
need
to
get
it
merged,
so
maybe
it's
like
a
minor
block
or
just
something
we
noted
yeah
randomly.
Let's.
A
Let's,
let's
lower
the
severity
of
this
and
sure
a
four
and
priority.
B
You
already
so
I
guess
other
than
those
well
one
in
one
quarter
of
an
issue
other
those
issues
just
have
the
performance
problems
that
we
saw
yesterday
when
we
started
to
take
traffics
like
donald
trump
before
the
discussion,
your.
B
Yeah,
it
sounds
like
that.
We're
gonna
start
taking
a
percentage
of
traffic
to
canary
to
canary
git.
Before
we
do
the
next
migration.
A
A
My
only
concern
with
that
was:
are
we
exposing
our
users
to
more
outages
and
that's
why
I
was
asking
how
many
of
these
incidents
did.
We
have
with
getting
https
in
the
in
the
recent
past,
and
I
guess
it's
really
hard
to
find
that
because
we
are
not
really
it's
not
really
easy
to
track
those
things
at
the
moment.
D
That
one
black
one
used
today,
no
no,
there
was
one
sorry
just
try
to
find
it
cameron,
basically
pinged
on
that
issue
that
scalability
470
cameron
placed
it
right
at
the
bottom
of
that,
like
that
here
I'll
place
it
in
give
me
a
second:
it's
production,
production2461.
D
No,
no,
that
was
just
sorry.
I
thought
you
were.
You
were
asking
how
many
giddily
sorry
git
canary
issues
that
were
in
general
or
you're
talking
about
specifically.
A
A
For
this
one
is,
is
this
caused
by
the
work
that
jar
was
doing
or
not.
D
No,
it
wasn't
it's
just
that
the
problem
is
that
the
and
there's
several
things
right.
The
first
thing
is
that
the
the
majority
of
the
traffic
that
goes
to
canary
at
the
moment
is
going
to
prefect
effectively
because
it's,
and
so
anything
that
happens
to
prefect,
happens
to
to
that,
and
then
the
the
giddy
node
behind
prefect
is
so
overloaded,
just
because
of
the
pathological
nature
of
the
www.com
repo.
A
D
Also
gets
really
slow,
and
so
you
know
basically,
we've
got
that
silenced
on
the
alerts
now
pretty
much
non-stop
because
it
doesn't
adhere
to
any
of
our
alerts.
You
know
a
lot
of
the
time
we
just
kind
of
deal
with
it,
but
it
it
it
fires
a
lot
right.
A
D
Yes,
it's
difficult
to
yeah,
but
but
if
we,
but
if
we
didn't
have
it
all
pointing
at
the
you
know,
if
we
didn't
have
those
particularly
bad
repos
going
there,
it
would
be
a
fairer
comparison,
because,
obviously,
what
we're
doing
at
the
moment
is
taking
the
worst
repo
and
we're
sending
it
to
canary,
which
is
good
in
a
way.
But
it's
also
really
not
great
for,
like
you
know,
if
you're
trying
to
compare
the
kubernetes.
A
D
Yeah,
but
it's
still
going
to
the
same
back
end,
so
it's
still
gonna.
You
know
the
problem
is
the
problem
is
like
it's
going
through
prefect
and
then
it's
hitting
a
note,
that's
like
very,
very
overloaded,
and
you
know
trying
to
do
hundreds
of
clones.
You
know
constantly,
so
you
know
if
you
compare
it
to
the
main
stage,
because
it's
it's
got
that
directed
traffic.
It's
going
to
be
worse,
like
a
lot
worse
I
mean.
A
A
I
think,
like
it's,
it's
it's
a
bit
unfair
to
also
say
well,
let's
take
out
github.com
right,
like
that's
regular
request,
routing
from.
D
And
and
then
it's
like
it's
it's
it's
much
fairer,
we're
at
the
moment
we're
kind
of
saying
we
we
putting
the
hardest
repository
to
clone
onto
canary
and
then
every
time
something
goes
wrong.
People
are
gonna,
you
know
what
it's
been
like
with
the
registry
right,
it's
like.
Oh,
it's
kubernetes
right
and
actually
it's
you
know
often
not,
and
this
alerts
all
the
time
as
it
is,
and
so
it's
going
to
be,
like
oh
quickly,
turn
off
kubernetes.
D
I
think,
whereas,
if
we
kind
of
if
the
traffic
loads
are
roughly
the
same,
you
know
to
some
degree,
then
it's
like
it's.
D
Kind
of
compare
like
with
like
and
the
latencies
will
be
more
similar,
and
you
know
you
can
you
can
compare
latencies
at
the
moment.
It's
like.
Well,
you
know
the
the
latencies
on
on
this
are
like
five
times
as
much,
but
actually
every
repo
that's
going.
There
is
really
huge,
so
yeah.
D
So
the
the
other
thing
that's
nice
about
it
is
that
we
can
once
we
take
off
once
we
make
it
randomized,
we
can
take
off
the
alerts,
because
that's
the
other
thing
that
I've
been
feeling
quite
uncomfortable
about
is
that
we're
rolling
this
out,
but
we've
actually
silenced
alerts
for
the
stage
that
we're
rolling
it
out
on
which
is
not
cool,
and
so
with
this
we
can.
Theoretically,
you
know
first
randomize
the
traffic
and
then
take
off
the
alerts
and
then
roll
it
out
without
any
alerts
on
them.
A
A
Oh
jar
might
have
connection
issues.
D
Job
marin
was
asking:
if
you
wanted
to
do
it,
where
we
randomized
the
traffic
first
and
then
do
the
rollouts
or,
if
you
wanted.
If
you
had
a
different
plan.
B
Oh
yeah,
that's
the
plan,
of
course,
like
we're
gonna,
do
I'm
almost
done
with
taking
a
percentage
of
traffic
to
vms.
I
even
we
even
increased
the
capacity
on
the
vm,
so
that
now
we
have
three
because
I'm
a
little
bit
worried
because
we
do
the
rolling
upgrades.
So
I
only
want
to
take
out
one
note
at
a
time
instead
of
going
from,
like
you
know,
100
to
50,
we
go
from
100
to
66
percent,
so
yeah.
So
that's
going
to
be
the
first
thing
we
do
so
I
don't
anticipate
us
to
do.
A
Enough,
though
jarv
then,
because
what
we
want
to
do
is
then
have
a
bit
of
a
longer
sitting
time,
soaking
time,
so
that
we
can
actually
see
whether
there
are
any
alerts
going
to
canary,
depending
on
the
time
of
the
day
right,
so
that
we
have
a
good
base
comparison.
A
If
you
do
it
today,
if
you
roll
this
out
today,
you
can't
like
we
can't
go
on
monday
and
change
that
around
again
right.
We
have
to
pause
this
a
bit.
The
migration.
B
A
B
B
I'm
not
actually
working
so
good,
so.
B
Fine
and
when
we
do
this
migration,
we're
also
splitting
kubernetes
and
vms
first,
so
we'll
do
50
on
vms
50
on
kubernetes
and
then
we'll
go
100
so
but
yeah.
So
it
sounds
like
tuesday
is
our
our
time
frame.
D
So,
are
we
going
to
turn
the
those
silences
off
today.
D
A
Okay,
let's,
let's
make
sure
that
we
inform
them
what
we
are
trying
to
do.
I
know
there
is
going
to
be
some
pushback
on
doing
this
on
friday,
but
last
I
checked
friday
is
a
working
day
in
general.
So
as
long
as
there
are
tools
to
revert
back
meaning
drain
the
canary
right,
we
should
be
good
right.
Am
I
mistaken
here?
C
A
A
Okay,
let's
give
that
tool
into
their
hands
and
deal
with
it
yeah
my
connection's.
B
Awful
my
connection
is
awful.
Sorry
yeah
well
we'll
be
able
to
drain
canary
if
or
set
it
to
maintenance.
If
we
have
to
no
problem.
A
Cool
and
let's
leave
it
like
that
and
then
come
monday.
I
would
like
to
be
able
to
enable
canary
if
things
are
okay,
if
something
happened
over
the
weekend
yeah
I
work
monday,
so
I'm
happy
to
enable
it
on
monday,
okay,
but
yes,
sure,
like
one
day
when,
when
we
are
feeling
confident,
okay,
let's
let's
do
it
that
way-
jar
this
means!
Basically,
I
don't
think
before
wednesday
we'll
be
able
to
roll
this
out
on
kubernetes,
and
that
is
fine
because
we
want
to
have
a
more
stable
base.
C
C
It
was
later
determined
that
we
weren't
necessarily
blocked,
so
we
were
able
to
push
the
first
batch
of
the
queues
over
to
kubernetes.
The
only
problem
we
ran
into
was
that
there
was
a
network
policy
that
was
preventing
the
container
registry
from
being
accessed
by
sidekick.
This
was
later
resolved
quickly
because
it
created
an
incident.
C
I
think
jar
of
you
were
going
to
spin
up
an
issue
and
look
into
that
further
later.
I
did
not
do
any
further
investigation.
I
think
it's
just
the
way
our
home
chart
works.
It's
looking
at
the
internal
end
point
for
the
service
to
reach
the
container
registry
instead
of
going
outdoors
and
then
circling
back
through
h,
a
proxy
into
the
same
cluster
that
made
the
request.
C
So
now
I'm
working
on
batch
two.
So
I
have
a
select
number
of
queues.
There's
a
hundred
and
change
in
this
workload.
How
many
was
it
107
cues
I
selected
yesterday?
I
did
a
quick
check
and
we
are
doing
a
lot
of
nfs
reads:
opens
accessing
data
inside
of
nfs,
which
is
not
great.
It's
not
what
we
want
so
this
the
rest
of
yesterday
and
this
morning
I
was
trying
to
get
the
tool
that
andrew
wrote
that
hooks
into
ebpf
and
the
kernel
to
grab
some
nfs
metric
data.
C
So
now
we
should
be
able
to
get
the
classes
that
sidekick
is
x
or
requesting
nfs
access
of
some
some
form
so
for
the
rest
of
today,
along
with
other
work,
I'm
going
to
drum
up
the
appropriate
queries
to
make
sure
I
get
the
right
class
names
and
figure
out
which
queues,
they're,
part
of
and
then
remove
those
from
catch
nfs.
C
A
D
A
A
Better
to
just
reduce
it
to
only
the
ones
that
we
are
certain
about,
so
that
we
can
because
the
we
want
to
group
maybes
and
no's
or
or
maybe
zenyatta's
together
or
you
know,
because
we
cannot
expect
to
have
this,
but
up
to
you
it
doesn't
really
matter
as
long
as
we
get
to
the
bottom
of
them
yep
can.
Can
you
now
show
me
how
do
I
look
in
grafana
to
see
any
differences
between
what
is
in
kubernetes
when
it
comes
to
the
queues
and
what
is
not
in
this
chart,
so
not
antennas.
C
C
We
logs
are
going
to
be
the
best
place.
That's
how
we
discovered
the
problem
with
the
conditor
registry
when
batch
one
was
completed.
C
But
you
are
correct:
we
don't
have
any
visibility
inside
of
grafana
at
this
moment
to
determine
where
q
lives.
At
this
moment.
A
A
If
we
did
that,
because
I
don't
want
to
demonize
what
is
in
kubernetes
versus
what
is
in
vms,
but
it
would
help
investigations
quite
a
lot
if
we
at
least
knew.
A
C
We
do
at
least
have
a
method
where
we
could
search
for
kubernetes
that
something
exists
and
then
we'll
have
all
the
workers
and
logs
associated
with
kubernetes
that
are
running
inside
of
the
community's
workload.
And
then
you
know,
there's
the
I
don't
know
why
catch
all
showing
oh
yeah
catch
all,
because
that's
the
name
of
it.
Sorry,
I'm
mixing
up
catch-all
and
catching
the
best
of
my
mind,
but
this
will
at
least
help
us
from
our
logs
determine
where
things
are
running.
C
I
would
agree
that
we
probably
should
add
some
more
visibility
inside
of
grafana,
so
that
we
could
easily
determine
where
the
failures
are,
whether
they're
inside
of
kubernetes
or
inside
of
our
vm
infrastructure,
I'll,
create
an
issue
to
address
that
thanks
and
andrew.
Maybe
I
could
work
with
you
to
figure
out.
D
Yeah,
this
actually
ties
in
there's
this
thing
that
we
really
need,
and
it's
an
observability
backlog,
I'll
just
highlight
it
and
like
if
we
had
a
like
at
the
moment,
I
would
say
we
just
use
a
label
fqdn,
but
obviously
now
all
of
the
stuff
moving
over
fqdn
is
is
null
for
all
of
the
kubernetes
stuff.
And
so
we
need
like.
We
can't
use
instance,
because
it's
just
ip
and
no
one
knows
what
the
hell
the
ip
is.
D
But
we
need
like
a
name
that
that
works
in
kubernetes
and
in
in
like
vm
land,
and
so
what
we
can
do
is
we
can
just
create
one.
That's
like
a
concatenation
of
them,
and
so
some
will
have
the
first
parts
and
we'll
have
the
second
but
like
really.
What
we
need
to
do
is
fix
that
that
issue
and
and
rename
fqdn
or
something
there's
a
whole
epic
on
it.
Now
I'll
find
it.
D
Yeah,
the
the
the
most
important
one
is
that
one
like
the
rest
of
them.
You
know
like
ben,
really
wants
to
rename
environment
to
end
which
I'd
like
as
well
because
of
the
shortened
things,
but
the
one
that's
important
is
there
is
the
naming
of
the.
A
A
Great,
but
I'm
not
willing
to
take
on
another
epic,
just
no!
That's
an
observability.
D
C
D
Anything
I
just
wanted
to
say
I
was
looking
at
those
those
metrics
that
are
coming
back
and
like
this
is
all
on
me
because
it's
my
script,
but
I,
when
I
look
at
the
results,
I'm
I'm
not
super
like
filled
with
optimism
that
it's
working
as
I
hoped,
because
I
just
put
the
query
in
there
that
I
was
using-
and
you
know
maybe
with
time
it'll
get
better.
But
you
know
what
I'm
seeing
there
is
lots
of
job
wrapper
input.
Noteworker
I
mean:
do
you
think
that
does
nfs.
D
D
A
C
Also,
keep
in
mind
that
I
didn't
have
sidekick
configured
correctly,
because
I
had
concurrency
set
to
like
15
or
something.
C
D
9
30,
and
what
time
is
it
now?
How
long
ago
was
that.
C
A
C
B
D
D
Div
files
is
definitely
going
to
be
sound
species
yeah
yeah.
The
the
problem
is
that
if
there's
very
short
jobs
that
run
and
then
the
the
ebbf
thing
goes
from
the
kernel
to
the
land
kind
of
afterwards,
you
know
they
all
the
logs
take
a
long
time.
You
know,
because
I'm
kind
of
relying
on
like
synchronicity
of
the
logs
getting
written
me
reading
the
logs
and
then
me
getting
the
information
from
the
kernel
out
to
use
the
land,
and
so
it
might
be
that
I'm
kind
of
like
everything's
just
off
kilter
and
I'm
reading.
A
Yeah,
I
think
we
don't
have
anything
else
to
discuss.
I
think
so.
Oh
actually,
let
me
ask
jar
of
scarborough
anything
else
that
you
would
like
to
discuss
before
we
wrap
up.
I.
A
Perfect
awesome:
well
thanks
so
much
and
see
you
next
week,
tuesday.
I
think
some
of
you.