►
From YouTube: SIG - Storage 2023-02-27
Description
Meeting Notes:
https://docs.google.com/document/d/1mqJMjzT1biCpImEvi76DCMZxv-DwxGYLiPRLcR6CWpE/edit#
A
Hello,
everybody,
my
name
is
Chandler
Wilkerson
I'll
be
guest
hosting
today
for
Adam
and
just
wanted
to
get
a
couple.
You
know
questions
out.
Do
you
guys
usually
wait
until
like
three
minutes
after
the
hour
in
order
to
start
the
meeting,
and
is
this
agenda
good.
B
Yeah,
we
usually
wait
a
bit
to
to
give
people
some
time
to
put
in
topics
Maybe.
A
C
D
A
C
F
E
Usually,
there's
more
stuff
in
the
agenda.
E
That
might
make
sense
to
talk
about
I've
been
working
on
this
for
the
last
few
weeks,
and
it's
it's
not
CDI
related,
it's
Comfort,
related
and
essentially
Cube
Ford
has
some
supporting
containers
that
have
a
very
high
request
to
limit
ratio,
the
biggest
one
being
the
hot
plug
attachment
container,
because
that
or
actually
it's
a
it's
a
part.
It's
not
a
container.
Oh
it's
a
container
in
the
file,
but
essentially
the
container
does
nothing
it.
E
And
if
you
have
a
you
know:
lower
ratio,
then
the
attachment
power
won't
start
because
the
ratio
is
too
high
and
I
I'll
probably
bring
this
whole
thing
up
in
the
actual
Cube
vert,
meaning
on
Wednesday
I.
Just
wanted
to
see.
If
anybody
had
any
thoughts
on
this,
the
same
issue
happens
for
container
disks,
because
the
container
disks
have
a
you
know
in
a
container,
with
a
very
high
request
to
limit
ratio.
E
Verdi
office
also
creates
a
container
to
get
the
Verde
of
s
into
the
virtual
machine
and
then
any
sidecars.
E
So
if
people
created
a
site
card
to
modify
domain
XML
again
it
it
does
almost
nothing
except
in
the
in
the
startup.
All
of
those
have
a
very
high
request
to
limit
ratio
and
they
all
fail
when
I
put
a
limit
range
with
a
ratio
in
my
namespace
having
sort
of
explained
the
issue
here,
I
have
a
PR
out
right
now
that,
for
specifically
for
the
plug
Pawn
sets
the
request
to
be
the
limit
on
both
CPU
and
memory,
and
that
will
fix
the
the
ratio
being
in
the
namespace.
E
It's
just
a
little
bit
of
wasteful,
because
I
am
essentially
reserving
CPU
memory
for
a
container
that
doesn't
do
anything,
but
it
only
happens
if
you
actually
help
plug
a
disk.
If
you
don't
terrible
upload,
you
know
it's
not
going
to
affect
you.
E
So
my
question
is
how
how
much
of
an
issue
would
that
be
for
people
the
I
am
actually
now
reserving.
Essentially
it's
only
like
80
megabytes
of
memory.
It's
not
like
it's
a
huge
amount
of
memory,
but
it's
an
issue
for
people.
If
we
do
that,
more
am
I
going
to
get
push
back
on
my
particular
PR.
B
So
I
think
I
think
this
way
of
tackling.
It
is
not
really
much
different
from
from
what
we
do
to
VMS
today,
which
is
just
add
at
some
overhead
to
the
user
resource
request,
but
so
I
I
think
it's
not
the
end
of
the
world.
It's
it's
you're,
basically
doing
the
same
thing,
but
for
or
a
sidecar
pod
and
regarding
the
way
forward
in
general,
I.
Think
the
kind
of
API
that
we
have
in
CDI
is
basically
the
way
to
go.
B
I
can't
see
another
scenario
where
it's
where
something
can
take
a
place
of
a
full-blown
API
that
gives
you
an
option
to
just
put
your
put
your
defaults
for
these
kind
of
PODS
right
and
be
happy
with
it.
E
And-
and
just
so
everybody
else
knows
in
CDI
in
the
cvicr,
you
can
set
a
configuration
where
you
tell
it,
use
these
requests
and
limits
on
the
worker
pods,
and
you
know
you
can
use
that
essentially
to
get
the
correct
ratio
you
want.
E
If
you
have
a
limit
range,
but
a
ratio
in
it
or
you
know,
if,
if
you
have
a
a
particular
image
that,
for
whatever
reason
uses
more
memory
than
the
default,
you
can
increase
the
the
amount
of
memory
that's
available
for
the
product,
so
I
I
I,
don't
know
how
many
people
we
have
here
that
that
are
running
this
on
like
large
clusters
or
anything
like
that.
It
doesn't
look
like
we
have
too
many
people
that
would
do
that.
E
Like
I
said,
I'll
bring
this
up
on
Wednesday
on
the
Cooper
meeting
itself.
Hopefully
we'll
have
some
people
there
that
you
know
run
this
on
large
clusters,
because
I
I
suspect
for
people
with
small
clusters.
It's
not
going
to
matter
that
much.
But
if
you
have
large
clusters,
you
know
all
the
little
reserved
pieces
and
memory
and
CPU
might
add
up.
E
C
E
C
A
B
And
I
think
we
can
just
pick
up
issues
from
the
36
we
have
on
the
issue.
Page
just
1203
is
the
last
one.
We
stopped
on
yeah.
E
E
A
E
E
So
essentially
what
happens
to
get
you
know,
progress
updates
and
data
volumes.
Is
the
controller
actually
directly
connects
to
the
the
pod
and
connects
to
one
of
the
metric
endpoints?
That
has
the
progress
update
and
for
for
Richard
here
we're
actually
just
using
the
human
image
Dash
p,
where
it's
spreading
the
percentage
or,
if
we're
doing
a
cable
conversion?
And
if
we're
directly
writing
you
know
if
we're
actually
like
downloading
it
through
a
standard
HTTP
connection.
E
Most
of
the
time
we
can
actually
calculate
the
percentage
because
we
don't
know
the
total
and
that's
when
we
do
the
DNA.
This
particular
issue
seems
to
be
about
not
being
able
to
connect
because
of
some
sort
of
network
policy
or
actually
directly
connecting
to
the
Pod,
and
if
there's
a
network
policy
that
prevents
that,
then
you
know
we
can't
get
information
and
we
can't.
E
Percentage
I,
don't
know
exactly
what
they
would
like
us
to
do
about
this,
but
oh
actually
they
tell
us
at
the
first
line.
They
wanna
a
service
associated
with
the
Pod
and
then
connect
to
the
service
so
that
we're
not
connecting
to
the
IP
address
of
the
pod.
E
That,
actually
shouldn't
be
terribly
hard
to
do.
We
just
need
to
create
services
on
the
Fly
for
each
Bond.
The.
E
E
And
I
might
actually
make
some
of
the
logic
in
the
controllable
sampler
because
in
the
controller,
we're
looking
at
the
the
IP
address
and
then
the
what
is
it
the
endpoints
on
the
pond?
And
if
we
have
a
surface
we
just
connect
to
the
surface,
it
might
actually
be
simpler
from
a
controller
perspective.
C
A
B
E
E
G
Don't
I
just
wanted
to
interject
I.
Don't
really
have
any
comments
about
this
particular
bug,
but
I
do
we
have
a
similar
problem
elsewhere,
where
we
want
to
sort
of
communicate
progress
of
PODS
that
are
you
know
that
are
doing
things
that
take
a
long
time
and
I.
Don't
I,
don't
know
of
a
good
way
to
do
it,
to
be
honest
and
so
I'm
really.
E
So
what
we
did
for
the
progress,
so
we
we
have
some
way
in
in
our
application
to
figure
out
the
progress
and
essentially
we
did
is
we
created
a
Prometheus,
endpoint
and
create
creating
one
is,
is
really
easy.
E
You
just
call
the
library
function
and
set
up
the
type
of
gauge
you
want,
and
essentially
it's
just
a
zero
two
hundred
gauge
and
then
our
controller
is
is
directly
connecting
to
the
Pod
to
that
endpoint
and
it
basically
just
gets
the
Prometheus
output
and
finds
the
correct
field
and
and
displays
that.
E
B
E
About
the
update
not
being
right
because
I
was
trying
to
connect
to
the
wrong
IP
address,
I
think
maybe
doing
a
service
might
actually
solve
that
problem.
So.
B
I
think
I
had
an
issue
about
yeah
you're
right
that
there
was
something
similar,
but
I
think
the
issue
is
that
we
just
don't
play
nice
when,
when
the
networking
is
bad
in
the
cluster,
I
I'll
have
to
dig
it
out.
E
So
do
we
think
that
this
issue
should
actually
just
be
fixed
I,
don't
think
it's
going
to
be
very
hard
to
fix,
because
creating
a
surface
that
points
to
the
plot
is
relatively
straightforward
right.
We
just
put
a
label
on
it
and
bring
this
service
to
the
label,
and
we
haven't
done
anything
with
this.
I
actually
didn't
know
this
existed
and
otherwise
I
probably
would
have
already
fixed
it.
Some
time
ago,.
G
I
mean
thanks
for
that
explanation.
Alex
it
does,
it
does
I
mean
I
know
you
say
it's
quite
easy,
but
it
does
sound.
You
know,
to
my
mind,
it
sounds
pretty
complicated.
You've
got
a
service
and
you've
got
something
in
the
Pod
which
is
answering
requests
and
if
you
compare
it
to
log
files
like
pods
will
just
collect.
G
I'm
saying
what
I'm
saying
here
is
that
if
you,
it
just
seems
to
me
when
I
look
to
this,
that
there
should
just
be
a
way
for
in
the
same
way
the
log
logs
are
collected.
There
should
just
be
a
way,
and
this
has
nothing
to
do
with
Cuba.
This
is
like
an
entirely
a
kubernetes
thing.
There
should
just
be
a
way
to
to
Signal
some
small
amounts
of
data
like
that
from
inside
the
Pod
to
the
metadata
I.
E
E
We
can
essentially
just
write
some
code
where
the
Pod
itself
just
updates
the
resource
that
wants
the
information
they
can
just
update
it.
The
thing
is,
we
were,
we
don't
really
want
to,
let
the
pot
know
or
the
the
the
application
is
running
in
the
pot.
No,
that
is
running
in
kubernetes,
because
then
it's
like
linked
to
kubernetes.
E
So
if
you
don't
want
to
do
that
now,
then
you
need
to
somehow
get
this
information
from
the
pop,
and
you
know
the
Prometheus
endpoint
every
day
is
one
way
you
could
do
an
HTTP
endpoint,
where
you
just
connect
with
the
HTTP
I'm
going
to
get
some
information.
That's
another
way,
that's
essentially
what
the
Prometheus
endpoint
is.
It's
just
an
ATP
endpoint,
but
Prometheus
has
a
bunch
of
libraries
that
you
can
use
to
to
get
this
information
relatively
easily.
E
So,
instead
of
pushing
the
information
from
the
the
Pod
itself,
now
you're
like
pulling
it
right,
you're
just
connecting
to
it
like
a
if
something
changed,
has
your
progress
updated
Etc?
It's
it's
just
sort
of
like
a
philosophy
thing.
If,
if
you
don't
care
that
your
application
knows
it's
running
in
kubernetes,
it's
probably
simpler
to
pass
it
acute
effect
that
it
can
use
to
update
the
resource
itself
and
and
let
the
application
do
that.
But
if
you
do
care,
then
you
have
to
go
through
some
gymnastics
too.
To
get
the
information.
E
It's
it's
actually
not
that
bad,
because
if
you
start
a
pod
as
a.
E
Account
there's
a
secret,
that's
automatically
injected.
That
is
the
cubeconfig
for
that
account
and
it's
it's
in
a
stable
place.
So
you
can
just
read
the
queue
config
and
build
your
client
from
there
and
then
use
the
client
to
connect
to
the
kubernetes
cluster
and
do
the
updates.
E
G
E
Well,
essentially,
what
if
you,
if
you
mount
a
secret
in
a
pod,
it
shows
up
as
a
file
somewhere.
So
if
your
user
can
connect
to
the
pawn
and
and
then
you
can
snoop
on
files,
yes,
but
if
you're
using
and
connect
to
the
power,
then
it
can
essentially
do
anything
already.
So
it's.
E
But
yeah
there's
a
couple
of
secrets
that
automatically
injected
into
every
pod
and
one
of
them
is
is,
is
the
Q
config
of
the
service
account
that's
running
the
also.
E
Well,
let's,
let's
I
I
think
we
should
let's
get
back
to
this.
This
issue.
I
think
we
should
actually
do
should
fix
it.
E
And
now,
if,
if
they
respond
at
least
I'll
get
the
email
here,.
C
E
We
never
get
nothing
to
get
through
all
the
the
different
ones
where
we
stop.
If
you
could
just
put
an
end
saying
that
this
is
the
one
we
stopped
at
so
for.
A
E
You
know
we're,
we
just
started
these
meetings.
E
Figure
out
what
a
good
format
is
because,
as
you
can
see,
we
started
off
with
having
quite
a
few.
You
know
things
to
discuss
in
the
beginning,
but
we're
sort
of
running
out.
So
you
know
it'll,
probably
end
up
being
mostly
a
bug,
Treehouse
thing
so.
A
B
E
D
B
You
know
like
maybe
that
maybe
those
parameterized
data
volume
tests,
you
know
the
ones
that
do
import
clone
upload
in
the
same,
the
same
described,
block.
E
Or
doesn't
that
we
upgrade
to
Ginkgo
two
yeah.
E
Doesn't
think
how
to
have
have
an
actual
label
on
the
test
where
you
can
actually
put
away
one
and
I've
seen
those
in
Q4,
so
I
know
keyboard
Edition,
2.0
already.
B
Yeah,
but
you
could
do
it
in
Ginkgo
one
as
well.
We
do
it
for
the
destructive
tests.
E
Well,
we
we,
we
put
a
a
label
on
there
and
then
pass
a
regex
to
find
a
particular
label,
but
it's
not
really
like
a
a
separate
label
feel
it's.
We
put
a
particular
magic
string
in
our
test
name
and
then
use
a
regex
to
find
it.
It's
it
it'll,
probably
work
I.
Think
an
actual
label,
which
is
a
separate
field,
would
be
nicer.
Well.
E
E
B
For
this
and
iron,
probably
not
just
the
issue,
no
on
our
repository
Okay.
So
let
me
there's
a
link
there
to
this
fcsi
to
ASF
CSI
PR.
Maybe
that
has
more
information.
Maybe
they
already
ended
up
implementing
this.
Somehow,
if
you
scroll
down
yeah
it
got
mentioned
in
sexy
aside,
I
had
workload
tests.
B
Okay,
now
it's
just
basically
the
same
description
as
the
CIA
issue.
Right.
E
C
E
E
C
E
I
added
this
a
long
time
ago
and
the
main
issue
was
for
manually
created,
NFS
persistent
volumes.
E
You
know
in
in
the
persistent
volume
you
have
to
say
both
read,
write,
many
and
read,
write
wands
and
then,
when,
when
you
create
a
PVC,
you
can
either
specify
read,
write
ones
or
read,
write
many
and
it
will
bind
or
allow
read
by
loans
and
rebrite
many
in
the
PVC
spec,
and
it
will
also
buy
and
right
now
a
data
volume
will
reject
any.
E
You
know
the
PVC
part
of
the
data
volume
if
you
put
in
more
than
one
access
mode,
it's
rejected,
even
though
it's
an
array
and
you
can
specify
more
than
ones,
and
it
should
accept
it.
So
this
was
just
me
saying:
hey.
We
need
to
fix
this
where
we
allow.
You
know
both
it's
I
I
since
then,
I,
don't
think
I've
actually
seen
anybody
create
a
or
or
create
a
PVC
with
both,
but
it's
technically
possible.
So
I
don't
think
we
should
reject
it.
C
E
Actually,
we
can
probably
close
this
since
I've
never
actually
seen
anybody
do
that.
It's
just
one
of
those
it's
theoretically
possible,
so
we
should
allow
it,
but
nobody's
ever
actually
done
it.
A
I
have
seen
a
use
case
where
you
create
a
DB,
and
you
don't
necessarily
like.
If
you
don't
specify
one
of
those
modes,
does
it
just
kind
of
accept
whatever
the
default
storage
class
provides.
E
No,
so
if,
if
that
we
we
added
since
then,
we've
added
a
storage
specification.
So
as
part
of
the
data
volume,
you
can
either
provide
the
PVC
now
I'm
going
to
call
a
template,
the
PVC
template
and
and
that's
the
the
template
that
the
controller
will
use
to
create
the
PVC.
But
then
we've
added
a
storage
section
and
in
the
storage
section
you
can
omit
certain
required
fields
that
are
required
in
the
PVC
section,
like
access
mode
volume
mode.
E
What
happens
is,
or
we
also
created
something
called
storage
profile
and
the
storage
profile.
Basically,
for.
E
E
So
if
you
omit
that
in
the
storage
section
it
will
go
and
look
in
the
store's
profile
and
say:
oh
okay,
we
should
use
block,
read
like
many
and
then
when
it
creates
the
actual
PVC.
It
basically
fills
in
the
blanks
from
the
storage
profile
and
then
creates
the
PVC.
That,
in
theory,
should
be
optimal
for
the
storage
you're
using.
A
E
Particular
thing
is,
is
very
much
very
much
an
edge
case
and
I.
Actually,
I
am
just
going
to
close
it
because
I,
don't
think
I,
don't
think
it's
very
interesting
and
it's
more
me
being
a
little
pedantic
that
you
know
I.
We
should
allow
it.
A
B
Yeah,
it
is
pretty
interesting:
we've
gone
back
and
forth
between
changing
the
default
value
on
this,
so
we
had
I
think
the
first
year
CDI
it
had
unsafe,
which
was
the
default
or
is
the
default.
B
Then
at
some
point
we
decided
to
pull
closer
to
rev
and
we
made
the
cash
option,
be
none
and
then
recently,
with
some
help.
B
We
concluded
that
to
write
back
was
the
way
to
go
for
us,
so
we're
currently
sitting
at
setup
on
right,
back
cache
mode,
but
I
think
it
makes
sense
to
make
this
configurable
it's
just.
If
you
scroll
down,
it's
there's
a
whole
Matrix
that
we
need
to
implement
for
this,
and
we
should
decide
on
it.
B
The
thing
is
Okay,
so,
okay,
Alexander
summarizes
it
pretty
well
on
this
comment
here:
it's
a
global
level,
there's
a
storage
class
level
and
a
per
data
volume
level,
so
I
think
that's
pretty
much
it.
We
have
to
give
people
the
Global
knob,
so
they
could
just
always
go
with
the
certain
cash
mode
and
then
a
per
storage
one
and
then
sometimes
somebody
wants
to
use
none
on
their
data
volumes
and
sometimes
they
want
other
things
so
just
give
them
a
data
volume
or
not.
B
I
think
that's
pretty
much
the
best
way
to
go
what
what
I
am
missing
is
that
for
some
reason
nobody
was
pushing
on
this
too
much.
This
is
a
pretty
old
issue,
but
it
totally
makes
sense.
G
A
there's,
an
interesting
problem
that
you
might
run
into
if
you're,
so
we
run
into
this
invert
Builder,
where
you
mix
up
direct,
like
oh
direct
rights,
with
reads
that
come
from
the
page
cache
and
you
can
actually
get
stale
data
in
the
page,
cache
that
doesn't
reflect
what's
actually
being
written
to
disk
it
that
to
be
very
specific
about
this,
it
occurs
when
you
run
qm
image
convert,
and
then
you
run
qmo
very
quickly
afterwards,
on
that
disk
image,
you
can
have
Camu
seeing
stale
data
and
I
can't
quite
remember
exactly
what
combinations
cause
problems
and
what
don't
you're,
probably
best
to
ask
Kevin
about
this.
G
Be
a
little
bit
careful
here
with
this.
You
may
run
into
problems
like
that,
and
it
may
even
be
like
the
kind
of
security
issue
as
well.
E
E
Reason
why
we
went
when
person
on
at
some
point.
In
particular,
we
saw
this
with
Blaster
and
or
Seth
where
we
had
the
part.
That
was
writing
the
image
on
node
a
and
then
immediately
once
it
was
done
on
node
B.
We
started
a
VM
that
was
trying
to
use
it
and
and
that
one
didn't
get
all
the
data.
Yet
due
to
some
caching
on
node
a.
B
Yeah,
it's
just
that
for
this
specific
use
case,
I
think
we
were
just
kind
of
ruining
this
person's
flows
by.
F
B
Cash,
none
like
they
will
just
making
things
slower,
I
think
it's
NFS
related
NFS,
4.1,
there's
I'm,
not
sure
exactly
what
happens,
but
from
the
discussion
and
this
issue.
It
seems
that
cashnan
didn't
make
sense,
so
they
would
have
benefited
a
lot
from
from
making
this
configurable,
but
I
don't
know
what
what
they
would
go
for
instead
of
none
I.
Don't
think
the
issue
has
that
information
well.
E
E
To
me,
the
question:
more
is:
what
level
do
we
want
to
implement
this
right
right,
I
I
gave
three
levels,
you
know
a
cluster
level,
a
storage
class
level
or
a
data
volume
level.
403
I,
don't
know
each
one
has
pluses
and
minuses
right.
If
you
set
up
at
a
global
level,
you
set
it
once
and
it
applies
it
to
everything.
But
if
you
have
two
different
stores
classes
that
one
different
modes,
then
you
can,
you
know
Express
them.
E
If
you
do
it
at
a
stores
class
level,
you
have
to
set
it
on
multiple
Source
classes,
so
there's
more
configuration
and
if
you
set
it
on
a
data
one
level,
then
every
time
you
create
a
data
volume,
you
have
to
set
it
right.
If
you
have
a
need
for
a
different
value
than
what
the
default
is.
B
For
now,
just
I
see
my
already
did
this
just
ping,
the
person
that
opened
it
and
just
take
interest.
If
the
performance
issue
resolved
itself,
because
we
did
change,
we
did
go
to
write
back
instead
of
cash.
None.
A
So
I
think
the
last
communication
on
this
issue
was
the
original
poster
saying
I
still
want
it:
user,
configurable,
okay,.
C
E
E
D
In
the
original
problem
that
inspired
this,
it
was
a
real
URL
and
instead
of
a
ring,
it
gave
you
some
HTML
and
a
success.
Success
thing
so
it
looked
like
a
successful
import,
except
it
was
definitely
not
an
image.
You
could
boot.
E
We
seem
to
be
heading
into
just
download
the
file
to
scratch
space
and
then
do
the
conversion.
I
was
doing
an
inline
seems
to
have
problems,
especially
if
we
have
like
a
g-ship
or
a
taxi
type
compression
in
there.
E
If
we
do
that,
then
the
check
sum
should
be
relatively
straightforward
to
compute.
By
once
we
have
the
data
in
the
scratch
space
during
the
checksum
is
simple.
So.
E
G
The
file
you
provide
I
was
thinking
about
that.
Actually,
so
there
isn't
one,
but
it
might
be
possible
to
add
one.
It's
sort
of
the
fundamental
problems.
You
have
to
read
the
whole
file,
which
was
sort
of
trying
to
avoid.
But
obviously,
if
you
want
a
computer
checksum
you've
got
to
read
the
whole
file
can't
get
around
that.
E
G
Yeah
yeah
I
know
obviously
yeah,
but
you
can.
You
can
skip
the
the
holes
if
you're,
if
you
have
an
image
with
holes
right.
E
G
That's
that's
immediate
benefit.
E
So
I
think
we
should
just
leave
this
one.
That's
the
last
one
we
looked
at
so,
but
for
the
next
time
we'll
look
at
it
again
and
see
what
we
we
can
get.
A
D
I
guess
one
other
thing
we
could
do
regarding
the
original
scenario
is
possibly
detect
very
suspicious.
Looking
images
that,
like
it's,
it
gave
out
HTML.
So
that's
detectable.
That's
gonna
be
like
a
common
scenario.
Maybe
we
can
make
that
into
an
alert
or
maybe
we
look
for
no
partitioning.
That's,
but
that's
that's
I,
don't
know!
Maybe
someone
really
wants
to
import
an
image
that
way
that
doesn't
isn't
partitioned
in
any
way?
That's
that's
actually,
no
reason
that
it
can't
be
that
way.
It
just
needs
to
have
a
bootloader
I.