►
From YouTube: kubernetes sig-aws 20190531
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Hello:
everyone:
it
is
Friday
May
31st.
This
is
Sig
AWS.
We
have
a
very
light
agenda
today.
Please
do
add
items
to
the
agenda
if
you
would
like
to
discuss
them,
feel
free
to
put
your
name
on
there
as
well.
I
put
a
link
in
the
chat.
I
will
be
pasted
it
for
now
and
I
just
joined
I.
Don't
have
history
works.
I.
Am
your
while
you're
doing
that
I?
A
Am
your
moderator,
facilitator,
Justin,
Santa,
Barbara,
I
work
at
Google,
I
reminder
this
meeting
is
being
recorded
and
will
be
put
on
the
internet,
and
please
be
mindful
of
our
code
of
conduct
and
be
a
good
person
and
don't
say
anything
mean
about
eventual
consistency,
which
seems
to
be
the
first
item
on
the
agenda
and
with
that
I
guess
we
can
go
right
over
to
you.
Commands
I
think
you
added
that
right.
Right,
amount,
yeah,.
B
Yeah,
okay,
so
so
the
problem
is
that
we
are
seeing.
We
are
seeing
like
a
bunch
of
test
failures
and
unlikely
they'll
fail
in
production
to
where
we're
like.
We
are
running
describe
instance
or
describe
volume
and
describe
wall
of
Iran,
and
the
volume
says
the
describe
volumes
is
I'm
attached
to
this
node
X
and
we
did
not
perform
an
in
detach
like
say:
let's
say,
volume
is
attached
to
node
X.
B
Then
we
perform
a
detach
and
then
we
pull
if
it
is
detached-
and
it
says
okay
I'm
detached
now,
so
we
try
and
edit
attach
it
to
some
other
place
and
then
the
describe
bottom
same
described
problem.
If
you
call
it
again,
it
now
says
it
is
still
attached
to
the
node
from
where
is
we
just
detached
and
similar
problem
is?
We
have
observed
in
the
describe
instance,
code
path
where
we
detached
from
a
node.
B
We
pull
that
they
detached,
and
it
says
it
can
be
touched,
but
then
someone
else
tries
to
attach
and
then
that
checks
again-
and
it
says
oh,
this
new
instance
still
has
the
volume
so
so
yeah.
So
that's
the
problem.
We
are
seeing
and
I
think
the
I
don't
know
if
it
is
recent
or
what
happened.
But
we
haven't
seen
this
kind
of
issues
where,
like
the
same
same
API,
call
well
we'll
tell
X
and
then
it'll
del
Y
and
just
we're
thinking
we'll
have
to
kind
of
audit
the
whole
codebase
actually
and
see.
A
Sounds
like
fun
now
can
I
ask
a
clarifying
question:
do
we
think
that
this
is
because
we're
using
two
different
read
methods,
or
do
we
think
that
read
methods
in
general
can
like
sort
of
use,
cache
data
or
whatever,
and
then,
when
we
go
to
do
a
what
I
would
call
a
write
operation
like
an
attached,
but
it
actually
does
like
a
real
check.
What
do
we
think
the
like
yeah.
B
It's
this,
the
read
operation
that
is
giving
us
cache
data,
for
instance.
In
one
case,
what
is
happening
is
like
when,
when
a
volume
is,
you
are
trying
to
attach
a
volume
to
a
node,
X
and
and
before
we
attach
to
save
the
the
amount
of
the
right
like
the
mutable
API
calls
we
make,
we
perform
a
describe
volume
on
the
volume
to
check
if
volume
is
available.
If
volume
is
not
available,
we
we,
you
know,
like
we
see,
okay
if
it
can
be
charged.
B
A
B
Great
because
AWS
it's
the
documentation
says
that
the
correct
way
to
handle
this
is
like
is
the
Polat
with
exponential
back-off,
but
we
already
do
we
we
just
that
we
don't
know
when
we
can
rely
on
the
value
of
you
know
like
a
read.
Api
call,
it
sounds
like
we
have
to
always
do
a
mutable
API
calls
to
get
the
corrects.
Well,
you
that
to
to
get
the
collect
like
accurate
state
of
a
volume,
or
instance
right.
A
Which
would
mean
basically
that,
like
I
guess
what
we
can
do
is
we
basically
have
to
be
able
to
recover?
We
assume
that
all
our
reads
are
bad,
and
then
we
just
do
a
sort
of
optimistic
right
and
we
just
have
to
recover
from
the
idea
that
are
optimistic.
Right
may
have
been
bad,
I,
guess,
yeah
or
may
even
based
on
based
on
other
stale
data.
B
And
the
reconciler,
in
both
the
the
tackiest
controller
and
and
in
the
cubelet
site,
is
not
very
well.
You
know
like
return
to
recover
from
something
like
that
in
in
one
click
case,
at
least
a
volume
is
permanently
stuck
actually
like.
It
will
never
attach,
like
the
water
part,
will
never
mount
so
and.
A
We
do
have
CSI
coming.
I
do
wonder
whether,
whether
it's
just
that
you're
noticing
this
more
or
whether
something
has
changed
underneath
us,
in
other
words
like
if
it's,
if
nothing,
has
changed
and
you're
just
better
observing
it,
then
maybe
we
should
sprint
towards
CSI
and
make
sure
the
fixing
CSI,
because
it
feels
like
a
lot
of
the
time.
We've
been
constrained
by
the
interface
that
we've
been.
We
have
exposed
to
us
the
attach
detach
controller,
and
maybe
we
could
make
sure
that
CSI
has
a
more
appropriate
interface.
I
guess
or
a
more
powerful
interface.
B
Csi
has
same
problems
that
entry
travel
has
at
least
right
now,
because
we
were
just
noticing
and
see.
I
said
never
also
has
written,
with
the
assumption
that
you
know
like
that.
The
read
calls
returned
accurate
data
so
and
CSI
itself
will
not
fix
the
problems
that
we
have
because
still
uses
it
a
little
controller.
It's
just.
This
calls
are
going
to
external
driver
now.
D
B
A
And
this
is
on
is
this
something
the
the
error
that
were
observing
is
mostly
on
detach
or
mostly
on
attach,
because
we
do
have
evidence
on
attach
like
when
we
could
go
and
like
look
at
when
the
when
the
volume
actually
shows
up
on
the
node
right,
whereas
on
detach
I
feel
like
we
don't
we
don't
have
any
evidence
after
the
volume
disappears
like
there's
a
second
step.
Alright,
so
the
device
disappears
and
then
the
volume
can
still
be
in
a
detaching,
State,
hey.
B
B
A
D
B
B
So
what
happens?
If
we
try
to
attach
it
and
before
we
attach
it,
we
actually
pull.
Do
a
describe
volume
and
the
square
bottom
says
volume
is
not
available,
so
we
return
an
error
that
this
volume
cannot
be
attached
and
a
tightness
controller
when
it
sees
that
the
volume
is
you
know
like
cannot
be
attached
it.
B
It
raises
an
error
called
dangling
error
and
that
added
adds
the
volume
to
actual
size
of
the
world
and
and
and
but
in
truth,
volume
is
not
actually
attached
to
the
node,
because
the
describe
all
them
just
returned
stale
data.
So
in
that
case,
what
happens
is
that
the
volume
is
reported
as
attached
on
the
node
status
and
the
cubelet
weights
on
for
the
volume
to
appear
on
whatever
path
that
that
described
wall
in
return
and
it
and
the
volume
never
appears
there
it
just
so.
The
the
part
is
forever
blocked.
B
B
This
thing
to
work
around
some
of
this
limitation,
like
fix
on
with
the
core
community
construct
to
work
around
some
of
these
API
limitations,
but
and
I,
don't
have
a
good
answer
than
to
to
fix
some
of
the
way
we
handle
this
in
like
surface.
Is
this
one
level
up
so
that
coordinators
can
deal
with
this
correctly
yeah.
A
B
A
A
I
also
think
that
we,
we
should
be
more
willing
to
expend
quota
to
save
from
serious
bugs,
particularly
if
the
quota
is
on
a
sort
of
user
initiated
operations
like
an
attached
or
a
detach,
rather
than
just
background,
building
like
background
polling
is
what
really
kills
us
like,
adding
and
adding
any
more
calls.
There
is
like
really
bad,
but
I
feel
that
adding
calls
to
operations
is
less
bad
because
it
like,
if
we
get
throttled,
we
just
basically
we're
not
going
to
continue
to.
A
We
may
even
slow
down
the
speed
at
which
we
can
launch
pods
or
whatever,
which
is
not
great,
but
it's
not
as
bad
as
like,
just
getting
stuck
and
like
being
on
it,
just
having
no
quota
at
all
and
being
yeah,
which
is
what
happens
on.
If
you
make
do
it.
An
overly
aggressive,
continuous
background
hold,
for
example,.
A
Think
we
have
a
wind
coming
anyway
on
some
calls
because
of
the
I,
don't
know
what
it
was
introduced,
but
there
was
something
about
the
polling
of
volumes
on
an
attached
right
where
I
think
there's
a
PR
I
think
you
coming
tonight
amount,
whereas,
like
we're
gonna
introduce
like
exponential,
back-off
polling
and
right
now,
it's
a
pretty
aggressive,
like
is
every
second
polling
for
30
seconds.
So
we
should
get
some
polls
there.
Yeah.
D
F
A
E
B
So
so
exponential
back-off
we
are
currently
like
doing
whatever
we
are
implementing
very
implementing,
like
we
just
back
off
by
certain
amount,
and
we
do
it
but
I
EVs,
like
AWS
STK,
said
there
should
be
a
retry
after
header
present
in
their
responses,
but
that's
never
present
actually
does
Amazon
know
about
it.
Can
this
be
fixed?
Is.
B
No
issue
for
it
I'm
sorry
I'll
log,
something,
but
it
just
said
when
we
had
we
observed
when
we
are
trying
to
fix
this
exponential
back-off.
We
are
right
now
we
are
basically
blindly
exponentially.
We
are
not
backing
off
I,
don't
know
how
the
AWS
could
have
internally
works.
Like
let's
say
we
are
doing
a
two
minute
operation
here.
1.2
is
our
exponential
backoff
factor.
Then
we
just
back
off
like
that,
but
AWS
STK
said
you
should
back
off
until
retry
after
header.
What,
if
we
try
in
between
like
a
request?
B
Is
it
still
counted
against
the
quota,
so
so
so
I'm
just
trying
to
see
it
might
be
best
if
our
exponential
back-off
or
when
we
throttle
the
API
request,
it
could
use
the
retry
after
header,
but
currently
we
don't
you,
we
cannot
use
retry
after
header
like
currently,
we
are
throttling
right
when
we
get
request
limit
exceeded
error
from
Amazon,
but
but
we
are
just
throttling
by
our
own
heuristic.
There's,
there's
no
logic,
there's
not
much
logic
to
it.
So,
but
if
we
could
get
retry
after
header
from
Amazon,
then
we
can.
B
A
Exist:
okay,
yes,
yeah
I
mean
I.
I
am
guilty
of
writing
the
terrible
global
retry
handler,
which
is
very
hand
wavy
and
heuristics,
but
it
is
designed
exactly
as
you
say,
because
there's
basically
no
other
way
to
know
when
a
when
you
are
the
limits,
aren't
accessible,
yeah,
yeah,
and
so
we
basically
have
to
just
back
off
and
we
do
it
sort
of
a
global
across
the
globe.
Leave
me
across
the
process.
D
A
F
F
A
Have
approved
it,
thank
you,
cool
thanks,
sorry
about
like
leading
you
shouldn't
have
to
do.
That's
I
apologize
for
that,
but
thank
you
for
thank
you
for
it's
Mike,
cool
and
then
yeah
that'd
be
great,
and
then
we
have
another
item
on
the
agenda
ritesh.
Is
that
correct,
multiple
pause?
On
the
same
note,
do
you
want
to
talk
us
through
the
issue.
A
So
there's
an
issue
linked:
eight
of
us
EB
SCSI
driver
number,
295
Chris.
That
looks
like
you
see
seeds
who
I
presume
cryptid
Chang,
who
I
presume
is
the
correct
person
to
look
into
this
yeah
I,
don't
know
if
I
don't
know.
If
there's
anything
in
particular,
we
should
discuss
further
on
that
Ritesh.
Essentially,
it's
not
possible
to
share
the
PV
back
with
the
EBS
volumes
to
share
between
pods.
On
the
same
note,
if
the
pods
are
deployment
objects,
whereas
it
does
work
with
stateful
sets,
that
sounds
odd.
A
A
Okay,
well,
I
will
certainly
read
this
issue.
As
someone
say,
something
I'm
sorry
I
will
certainly
read
this
issue.
This
is
surprising
because
deployments
and
stateful
sets
should
both
end
up
as
pods
and
the
volume
logic
shouldn't
really
change.
The
way
mark
is
at
the
pod
level,
so
it
shouldn't
really
matter
what
they
use
new
deployment
or
staples
that,
but
just
because
something
is
surprising
does
not
mean
that
it
is
not
true.
So
this
is
certainly
interesting
and
yeah.
I
think
yeah.