►
From YouTube: 2021-04-26 Multi Large Working Group Weekly
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Okay,
good
morning,
good
afternoon,
good
evening,
everyone
today
is
april
26th.
This
is
the
multi-large
working
group
weekly
sync.
So
let's
get
started
from
the
agenda
first
one
as
from
me.
The
service
discovery
sometimes
fails
inside
of
kubernetes.
That
issue
is
pretty
close
to
be
resolved.
The
solution
was
identified
and
verified
in
staging
was
promising
and
will
be
rolled
out
to
production
amy.
You
want
to
verbalize
your
comment.
B
Yeah,
so
just
repeat
that
really
like
yeah
we're
aiming
to
roll
this
out
to
production
this
week
and
then
so
as
it
stands
right
now,
I
think
we
need
the
simple
production
to
be
fully
like
to
see
the
errors
disappear.
So
hopefully
this
will.
This
will
solve
them,
but
if
it
doesn't
it's
still
a
good
change
because
it
simplifies
the
architecture,
so
we're
very
confident
that
we
want
to
roll
this
out,
but
hopefully
it
will
also
fix
this
problem.
B
We
believe
so
so.
We've
got.
We've
had
a
couple
of
incidents
over
the
last
few
months
that
it's
it's
difficult
to
say
for
certain,
but
yeah
we
suspect
it's
there
is.
This
issue
causes
the
the
incident
to
happen,
yeah,
which
is
really
why
we
don't
want
to
push
the
api
service
out
to
production
until
we
fix
this
just
because
the
unknowns
and
the
increased
number
of
amount
of
traffic
will
be
pushing
through
just
feels
like
if
it
is
causing
a
small
number
of
incidents.
We
suspect
that
would
become
a
larger
number
of
incidents.
A
D
D
D
B
I
I
believe
it's
always
dns,
so
what
we,
what
we
suspect
is
related
to
is
with
the
the
way
the
the
with
the
pods
coming
in
and
out
means
that
things
can't
talk
to
each
other.
So
we're
going
to
try
and
like
lock
those
together.
C
B
I
think
I
think
it's
always
a
dns
failure,
but
I
think
it's
just
simply
related
to
our
like
things
losing
things
starting
to
talk
to
each
other
and
a
pod
rotating
out
and
which
is
normal
in
kubernetes,
but
not
ideal
in
this
case.
So
the
simple
fix
that
we're
going
for
is
to
keep
things
within
their
node
pools,
which
hopefully
means
that
if
one
thing's
up-
and
it
starts
talking
to
the
second
thing-
they
both
say
they're-
either
both
still
online
or
they
both
rotate
out
together.
D
F
Think
of
it,
like
a
the
problem,
was
kind
of
like
a
shift
change
right.
If,
if
your
shifts
change
don't
align
with
when
your
managers
change
shift,
somebody
kind
of
just
loses
a
message
right,
so
it
it
is
dns.
But
it's
basically
because
somebody
went
to
talk
to
somebody,
but
they
left
five
minutes
to
go
for
lunch
and
the
new
manager
doesn't
come
in
for
an
hour.
B
D
B
Yes,
thank
you.
So
yes,
thanks
to
distribution,
we
now
have
sort
of
the
last
known
blocker
that
we
had,
which
was
the
notification
secret,
only
populated
during
gi
configuration
that's
now
resolved,
and
so
we
are
now
working
towards
getting
some
api
service
traffic
running
into
kubernetes
on
canary.
B
B
Now,
unfortunately,
we
found
a
difference
that
hasn't
been
trivial
to
replicate.
We
think
we
have
a
fix
that
we're
working
on,
but
we
we've
sort
of
got
dug
into
sort
of
request
buffering
here
and
at
the
moment
the
sort
of
short
version
is.
We
are
working
on
it
and
we
think
we
know
what
we
need
to
do.
We
don't
need
any
extra
help.
B
We
don't
believe-
and
I
hope
you'll
stay
that
way,
but
if
we
do
it,
there's
potentially
a
question
around
the
investigation
stan
has
pointed
out,
there's
possibly
an
inconsistency
in
omnibus
request
buffering.
So
that's
probably
the
only
question
mark
at
the
moment.
If
that
does
turn
out
to
be
the
case,
we
may
need
some
development
help
just
to
confirm
if
that's
expected
and
and
why
we
have
it
like
that.
What
we're
focusing
on
right
now
is
to
just
try
and
replicate
the
same
like.
B
Presumably
it
seems
to
work
so
we're
going
to
try
and
get
the
same
setup
running
on
kubernetes,
so
we
can
push
forward.
So
I
will
I'll,
probably
just
give
an
async
update
on
this
in
the
site
channel
once
we
hear
more
on
this,
but
for
now
that's
kind
of
where
we're
focusing
our
time.
A
Cool,
so
that
would
be
if
you
need
a
development
help
that
looks
like
that
would
be
the
distribution
team.
Is
that
correct,
stupid
and
jason.
F
It
largely
comes
down
to
the
idiosyncrasies
between
how
nginx
functions
on
a
vm
versus
how
it
functions
inside
of
the
nginx
ingress
controller.
We
don't
the
omnibus
can
fully
orchestrate
the
content
on
a
vm.
That's
not
the
case
in
the
nginx
ingress
controller.
So
it's
going
to
come
down
to
how
do
we
configure
the
ingresses
in
the
right
way
and
get
the
right
configurations
in
place
and
testing?
F
Some
of
those
some
of
the
workarounds
to
to
make
direct
to
edits
also
mean
zero
testing,
which,
if
you
bork
it,
then
it
is
instantaneously
breaking
all
of
nginx,
because
the
configuration
is
invalid.
So
we
have
to
figure
out
if
there's
a
good
way
to
do
it,
what
we
can
improve
if
we
can
do
anything
upstream.
A
Okay.
Let's
move
on
to
number
four
a
cameo
and.
G
Yeah,
I
guess
I
can
voice
this
so
with
pages
for
currently
not
depending
on
the
nfs
and
production,
so
we're
not
serving
from
it,
and
we
are
not
updating
content
on
it
and
there
are
some
issues
or
like
production,
changes
to
actually
get
rid
of
nfs
first
on
staging
and
then
on
production
and
first
on
pages
servers
and
then
on
sitekick
servers
and
also
one
change
about
basically
changing
the
config
flag
in
production
yeah.
G
A
Yeah,
thank
you
vlad,
so
I
will.
I
will
work
with
branch
to
see
if
we
can
get
some
sre
help
here
to
hook
up
with
you
so
stay
tuned
right
now.
I
know
that
sree
team
largely
is
helping
the
pg-12
upgrade,
but
I
will
try
to
find
out.
Someone
to
you
know,
engage
with
you
so
that
we
can
keep
this
moving.
D
E
D
A
This
okay,
then,
what's
happening
next
over
to
you,
amy.
B
Thank
you
so
yeah,
just
sort
of
reiterate,
really
that
our
plan
is
once
we've
got
the
api
service
up
and
running,
we'll
be
shifting
over
to
working
out
how
to
migrate.
The
web
notes
into
kubernetes.
D
A
Yeah
discussion
topics-
jason
you
you're
typing,
but
do
you
want
to
just
verbalize.
F
Yeah,
so
I've
seen
a
couple
of
issues
recently
regarding
disabling
redis,
key
watcher
for
some
of
the
workhorse
instances.
F
Specifically,
they
seem
to
be
related
to
production
and
the
load
on
redis,
because
there's
some
workhorse
that
don't
need
to
be
doing
this
functionality
do
we
know
exactly
what
the
priorities
are
that
on
this
are
so
that
I
can
make
sure
that
my
team
gets
involved,
there's
some
funniness
with
how
we
do
it
in
omnibus
versus
how
we
would
do
it
in
the
charts,
and
I
just
want
to
make
sure
we
end
up
on
the
same
page.
F
Yes,
we
do
have
issues,
I'm
literally
trying
to
find
them
all
of
a
sudden.
My
brain
went
oh,
I
should
ask
about
that.
E
I'm
I'm
asking
because,
like
I
think
yes
definitely,
we
can
disable
that,
but
it
also
seems
like
that.
We're
gonna
be
running
different,
poor
horses
in
the
production,
so
the
question
is
like
we
need
to
maybe
break
other
functions
at
some
point
because
we
disabled
it
now
so
it's
kind
of
late.
Can
we
somehow
optimize
that
simply
to
not
be
a
problem.
F
Right
and
it
like
the
the
proposed
method
right
now
is
literally,
if
you
say
not
to
use
keywatcher
to
disable
all
of
the
redis
configuration
so.
E
I
I
mean
this
is
one
of
the
ways,
but
it's
first
feels
pretty
risky
to
be
honest
like
since
this
is,
I
guess
right.
This
is
like
the
requirement
of
the
workhorse
right
now,
so
maybe
I
mean
I
can
help
like
with
the
with
the
idea
like
how
we
could
optimize
that,
but
I
think
from
my
perspective
I
would
rather
aim
on
fixing
workhorse,
rather
than
removing
radius
or
disabling
credits.
A
Okay:
let's
look
at
the
follow-up
with
the
issues
to
see
which
team
actually
can
fix
those
workhorse
issues.
I
know
that
workforce
is.
There
is
one
nominal
team,
that's
owning
it
right
now,
but
they
probably
cannot
fix
everything
about
workforce,
but
we'll
find
out
the
issues
I
mean
he,
the
owners
for
each
issues
or
the
the
the
team
who
can
work
on
each
individual
issue.
So
jason.
Please
get
the
issues
here
and
we
may.
I
will
look
at
into
the
look
into
those
issues
to
see
where
they
fit.
C
Yeah,
it's
open
question
of.
When
do
we
collectively
think
we
should
start
working
on
sort
of
at
the
next
service
to
migrate
over?
I
think
after
api,
which
is,
I
believe,
it's
italy.
You
know
I
think
we've
had
some
anecdotal
feedback
from
customers
that
it
hasn't
necessarily
performed
or
been
optimized
to
run
in
kubernetes
and
so
kind
of
wondering.
C
Given
that
knowledge
and
given
that
we
think
it's
the
next
service
to
sort
of
shift,
when's
the
right
time
to
start
spending
some
more
time
here
with
the
giddily
road
map
and
distribution
and
and.
B
F
F
We
have
customers
that
are,
you
know,
500
users
that
have
no
problem,
because
the
way
that
their
workflow
patterns
hit
to
get
the
giddily
instance,
it
doesn't
overload
it.
We
have
customers
who
have
a
hundred
users
and
1500
ci
jobs
which
just
destroys
italy,
and
it
doesn't
matter
if
it's
on
a
vm
or
in
a
container
in
kubernetes
it
just
it
smashes
the
poor
things
bits.
F
So
when
we
go
to
move
us
into
kubernetes
for
italy,
we
have
to
know
what
those
workflows
look
like
how
to
control
the
load
in
gita
lee.
How
to
make
sure
we
understand
what
patterns
say.
I
need
to
have
gillies
that
have
more
room
vertically
or
horizontally
and
how
to
handle
those
things
right
now
we
have
no
definition
of
any
of.
B
That
do
we
have
any.
Are
there
any
kind
of
deadlines
that
we're
working
towards
with
this?
Like,
obviously,
I
know
it's
like
we
want
to
do
it
and
we'd
like
to
do
it
as
soon
as
possible,
but
is
there
any
kind
of
business
date
that
we're
also
aiming
to
hit.
C
I
don't
think
specifically,
so
the
reason
I'm
asking
is
we
don't
block
in
from
the
migration
right.
This
could
take
some
time
to
resolve.
It'll
probably
require
getting
infra
and
distribution
all
working
together
here
in
some
way
shape
or
form,
and
so
therefore,
the
more
tunes
you
get
involved
and
the
tends
to
take
longer
to
try
to
fix
things
so
try
to
think
ahead
here.
So
we
don't
get
blocked
for
too
long
and
then
also
it's
it's.
C
The
last
stateful
service,
I
would
say
of
get
lab
that
runs
in
vms
and
so
from
a
customer
standpoint
and
architecture.
Standpoint
would
be
nice
to
have
it
all
running
in
kubernetes.
Well,
therefore,
you
could
have
you
know
your
redis
and
your
in
your
database
service
like
rds
and
elastic
cache
running
in
aws,
and
then
everything
else
runs
happily
and
kubernetes
would
be
a
great
state
to
get
into.
C
C
I
guess
the
indefinite
business
impact
would
be
the
moving
cloud
native
installation
maturity.
So
josh
is
that
kind
of
what
you're
thinking
too
yeah.
I
think
the
maturity
of
the
category
is
one,
but
also
just
the
customer.
Impact
of
you
know
we're
considering
recommending
the
hybrid
architecture
for
folks
over
the
helm
chart
because
of
what
one
reason
is,
because
we
don't
recommend
running
italy
and
kubernetes
right
now
right
and
so
that's
we
would
recommend
customers
to
do
that.
Instead
of
the
pure
case
install.
C
B
B
So,
from
a
kind
of
roadmap
point
of
view,
what's
the
like,
assuming
everyone's
kind
of
slightly
busy,
what's
the
sort
of
first
sensible
time
that,
because
I
I'm
gonna-
assume
that
this
is
a
there's,
some
bits
at
the
beginning
where
people
need
to
sit
down
and
bash
their
heads
against
it
for
a
bit
and
we
work
out
what
we
might
need
to
actually
solve.
So
is
that
the
sort
of
thing
we
can
schedule
in.
C
Yeah,
I
think,
to
that
end,
we'll
have
to
likely
do
some
fact-finding
and
investigation
spikes
to
really
get
a
handle
on
what's
actually
happening
here.
We
could
wait
for
infer
to
sort
of
be
ready
to
pick
it
up,
and
then
we
could
run
some
like
a
small
shard
in
kubernetes
and
that's
one
way
of
doing
it
or
we
can
try
and
staging,
but
we,
I
think
you
know
jason's
comment.
We
kind
of
see
this
when
we
get
more
load,
so
we
might,
I
don't
know
if
I'm
trying
to
use
the
processing
environment.
C
Let
me
just
open
up
the
questionnaire,
because
it's
I
think,
the
next
next
stage,
and
so
I
think
we
likely
would
involve
the
giddily
team
mark,
I
think
you're
on
maybe
infra,
although
if
we
can
figure
out
a
way
to
generate
load
and
generate
the
failure
scenarios
without
infrared,
we
could
potentially
do
it
without
a
euro
map
being
impacted,
yeah.
D
You
know
we
don't
have
this
slotted
in
currently
so
we'd
probably
be
you
know
july
at
the
earliest
amy
correct
me
if
I'm
wrong
there,
but
I
was
wondering
maybe
if
gitaly
could
pick
up
some
of
the
testing
and
staging
where
they
can
sort
of
help
discover
some
of
the
gaps
there.
H
I
mean
yeah,
I
think
it's
something
we
should
look
into,
I'm
I'm
currently
working
trying
to
figure
out
prioritization
over
the
next
few
months.
I
know
that
we're
doing
some
work
for
a
performance
and
input
of
things
right
now
trying
to
get
that
scheduled
in
so
I'm
very
happy
to
talk
about
potential
of
what
we
might
be
able
to
help
with
ahead
of
time.
H
That
would
smooth
the
transition.
If
that's
something
we're
looking
to
do
right
now,
we're
unclear
what
won't
work.
So
it's
sort
of
one
of
those
things
where
we're
gonna
have
to
try
it
and
then
sort
of
you
know
peel
the
onion.
We're
gonna
have
to
figure
out
what's
going
on,
so
I
don't
know
if
we
even
have
proven
it
doesn't
work.
I
think
it's
one
of
those
things.
That's
right
now,
somewhat
hypothetical
and
we're
in
the
process
of
trying
to
investigate
you
know
what
will
happen
when
we
actually
load
it
up.
C
Yeah,
I
think
the
next
step
is
getting
that
test
environment
set
up
and
then
loading
and
then
actually
driving
load
to
it
and
seeing
what
happens.
C
With
you
know,
with
with
resource
limits
set
and
things
like
that,
and
then
just
seeing
what
happens.
H
C
C
What
do
we
think?
As
far
as
you
know,
like
you
know
who
can
pick
this
one
up
and
then
that
might
define
the
backlog
dude
something
great.
A
Is
that
the
building
of
test
harness
or
I'm
thinking.
C
I
I
I
think
you
are:
we
have
some
test
harness
for
getting
where
we
can
drive
traffic
as
part
of
our
performance
testing.
That
framework
that
like
grant,
has
done,
I'm
not
totally
sure,
but
I
think
we
already
should
likely
be
able
to
drive
gilly
traffic.
I
would
think.
H
I
mean
I
can
start
asking
questions.
I
just
want
to
make
sure
I
ask
the
right
questions.
So
if
there's
anything
specific
people
want
to,
you
know,
get
the
answers
to
I'll,
happily,
contact
grant.
I
know
the
reference
architecture
is
working
with
giddily
cluster
right
now.
So,
if
that's
what
we
need,
then
we
have
that
if
we
need
something
in
addition
or
there's
specific
features,
we
need
to
make
sure
function
correctly
in
that
reference
architecture.
C
A
C
So
yeah
we
haven't
tried
this,
I
think,
with
with
grant's
performance
tool
and
see
what
happens
so
that
you
know
that
might
be.
The
easiest
next
step
is
to
see
if
we
can
get
someone
from
quality,
maybe
to
run
it
and
just
see
if
it
blows
up
or
not.
C
A
F
C
So
that's
my
next
step.
I
think
just
the
decision
point
here
is
whether
you
know
whether
someone
from
quality
or
an
engineer,
but
this
is
the
gpt
and
I
think
in
general,
so
claudio
make
most
sense
the
most
familiar
with
it
or
when
then
the
question
is
whether
italy
or
distribution.
Does
it
there's?
C
I
think
the
distribution
point
here
and
we
can
take
this
async
if
we
want
to,
but
I
think
that's
the
next
step
to
figure
out
who
wants
to
run
this
test
case
in
the
test
scenario
and
see
what
happens.
D
C
A
A
Okay,
we
are
almost
at
the
time
that
all
the
topics
we
have.