►
From YouTube: CNCF SIG Storage 2020-07-08
Description
CNCF SIG Storage 2020-07-08
A
B
B
A
A
C
C
D
D
D
D
D
D
D
C
A
So
this
is
what
the
rest
of
the
of
the
slides
is
about
and
just
to
remind
you
about
the
context.
All
this
careers
stuff
is
about
getting
the
linen
store.
Storage
system
connected
to
kubernetes
and
lends
to
itself
relies
for
a
storage
application
on
a
component
called
DB,
which
is
a
kernel
part,
but
let's
focus
on
this
area.
A
D
A
And
configured
by
the
operator,
then
there's
a
satellite
that
needs
to
run
on
all
the
nodes
of
the
cluster
also
installed
and
configured
by
the
operator.
Then
the
DVD
kernel
module
that
can
be
brought
in
by
their
peers
operator,
but
it
is
in
a
way
optional,
no
configuration
needed.
Then
there
is
EDD
instance
necessary
well.
A
Let
me
put
it
that
way:
the
lynnster
controller
can
use
an
EDD
instance
and
makes
it
that
makes
a
lot
of
sense
because
then
all
your
metadata
is
stored
within
the
kubernetes
cluster,
so
I
would
say
this
is
the
recommended
way.
If
you
insist
to
use
external
SQL
database
that
is
possible
as
well,
then
the
storage
devices.
A
A
E
So
this
is,
this
is
basically
just
a
kind
of
an
optional
step
in
the
or
one
of
you
can
basically
configured
in
it.
Containers
for
the
satellites
or
the
satellites
are
at
even
set
and
optionally.
You
there's
in
it
containers
which
can
pull
in
kernel
modules
and,
depending
on
you,
configured
ret.
It's
either
modules
that
are
already
available
on
your
system
or
one
that
are
brought
in
from
our
side.
So
pre-compiled,
but
this
is
on,
is
imported
for
a
limited
set
of
distributions.
Oh.
A
Let
me
jump
in
here
so
that
the
previous
operator
has
the
option
to
also
compile
the
kernel
module
on
your
box.
So
that
only
works
if
the.
If
the
worker
nodes
have
the
kernel
headers
locally
and
then
it
compiles
it
within
that
init
container,
and
then
it
loads
it
into
the
kernel
from
the
unit
container
and
that
most
privileged.
C
E
E
A
A
If
you
choose
that
your
storage
architecture,
it's
based
on
that
yeah,
you
think,
then
the
lynnster
configuration
data
I
think
we
already
touched
that
so
either
they're
there
at
CD
within
the
customer
or
an
external
HDD,
or
an
external
SQL
database
and
they're
thinking
that
area.
We
support
Postgres
databases
and
Maori
at
the
TB.
A
And
yeah
snapshot
controller
might
be
provided
with
your
Canaries
distribution
and
stalk.
We
already
touched
on
that,
and
the
stork
is
very
helpful
if
your
lens
too,
is
going
to
use
DVD,
because
then
it
can
hint
to
kubernetes
where
to
place
cause
that
require
a
persistent
volumes
in
an
optimized
way
right
right,
then
the
CSI
driver
is
also
part
of
the
previous
datastore
project,
and
currently
it
has
these
capabilities
so
provisioning
attaching
snapshotting
resizing.
It
is
readwrite
once.
A
File
I/o
mode,
so
that
the
storage
stack
so
DVD
does
block
devices
so
it
by
default.
It
gives
you
access
file
system
on
top,
but
if
you
wish,
you
can
have
another
file
system
X
before.
If
you
wish,
you
can
modify
the
mount
flex
or
the
mkfs
flex
etcetera,
and
it
can
also
give
you
persistent
volumes
in
block
I
or
mode,
and
that's
especially
interesting
with
Cubert,
of
course,
and
although
our
stuff
is
readwrite
once
we
allow.
A
C
A
D
A
Well,
in
in
case
of
of
the
live
migration
and
KVM,
it
is
that
KVM
just
opens
it
on
the
target
node,
but
doesn't
access
it.
Then
it
stops
accessing
it
on
the
source.
Node
still
has
it
open.
Then
it
my
grades,
all
the
the
memory
pages
and
then
at
some
point
it
stopped
it
closes
it
on
source
node,
so
it
it
came.
Iam
actually
doesn't
access
the
block
device
in
parallel
from
both
nodes
concurrently.
A
D
A
A
If
the
linear
view,
GUI
can
also
be
part
of
that
I
need
to
bring
in
that
group.
I
I
cannot
answer
that
from
top
of
my
head
right
now
and
then
the
the
third
item
I
have
here
in
the
roadmap.
Is
this
we'd
write
many
leveraging
Linux
kernel
and
NFS
server
and
client,
but
that
is
like
you
know,
F
our
alt
goal,
so
maybe
some
something
we
will
work
on
in
2020.
A
Maybe
it
happens,
never
I,
don't
know
yeah.
So
that
is
what
we
plan
here
for
the
future.
Then
what
is
the
current
community?
How
did
all
that
come
together
and
it
literally
came
together
with
a
little
bit
and
out
loud
talking
about
the
idea
to
to
bring
Lynn
story
also
to
kubernetes
and
what
we
created
in
the
process
so
far
away's
we.
D
A
A
C
And
that
sounds
really
good,
so
so
Philippe
I'm
not
sure
if
you're
aware,
but
we've,
the
CF
has
recently
changed
the
the
sandbox
application
process.
So
if
you
want
to
proceed
with
this,
there
is
a
relatively
simple
online
form
you
can.
You
can
fill
in
to
provide
details
and
the
TOC
votes
on
Nats
I,
don't
know
if
it's
monthly
or
bimonthly
they
vote
and
that's
to
to
to
approve
those
those
sandbox
projects,
so
so
I
guess
that
would
that
should
be
your
your
next
step
now.
C
A
A
D
C
H
F
I
A
J
A
I
D
B
B
So,
okay,
so
we
are
trying
to
make
it
somewhat
easier
for
the
data
scientists
in
the
dead
engine
years
to
have
access
to
remote
data
sources
country
we
have
implemented
connectors
to
s3
and
then
oppressor,
but
we're
expanding
and
from
the
side
of
the
data
provider,
we're
looking
to
bring
let's
say,
easier
way
to
expose
datasets
and
provide
access
to
their
to
their
end.
Users
right
also
another
bit
that
we
are
looking
at
is
how
to
enforce
how
to
make
sure
they
have
governed
access
to
this
to
these
remote
data
sources.
B
So
now,
on
the
more
technical
objectives
that
we
have
right,
we
are
introducing
the
concept
of
data
set
right.
So
we
it's
a
new
custom
resource
definition.
Of
course
that
is
actually
a
pointer
to
remote,
s3
or
NFS
data
sources.
We
have
also
added
the
ability
to
look
up
data
sets
from
remote
catalogs,
like
hive
meta
store
and,
at
the
same
time,
we
are
looking
to
minimum
to
have
introduced
minimal
changes
to
the
end
user
workflow.
B
So
as
they,
the
users
would
be
shouldn't
be
in
from
modify
their
workflows
in
order
to
liberate
the
data
sets
and
now
the
bit
that
we
just
finished
is
the
transport
data.
Caching,
that
I
will
give
some
details
in
the
next
slide.
So
basically,
and
we
want,
we
want
to
bring
a
pluggable
interface
for
cussing
frameworks
to
implement
in
order
to
be
supported
in
the
framework.
What
this
means
is
that
the
framework
itself
would
work
without
without
those
plugins,
but
we
have.
We
have
created
first
lagging
based
on
safe.
B
That
will
never
a
truck
for
its
deployment,
but
we
give
them
instructions
on
compliment
your
own
caching
plug-in,
and
this
would
be
an
on-the-fly
deployment
of
the
class
port
without
without
the
user
realizing
or
the
user
remaining
completely
oblivious
to
the
fact
that
this
data
set
is
provided
by
a
cast
from
a
caching
plug-in.
So
also
imagine
that
if
you
are
casting
date
on
the
local
cluster,
we
come
Deb
scenes
and
we're
looking
in
problem
about
the
workload
schedule.
B
Don't
reminds
the
world
lot
scheduling,
so
imagine
knowing
where
your
data
sets
are
cast
in
which
nodes
on
the
kubernetes
cluster,
then
it
would
be
pretty
straightforward
to
give
hints
the
schedule
to
bring
this
pods
closure
to
the
cast
data
and,
of
course,
we're
looking
to
the
grid
with
spark
Cooper
flow
and
all
the
ml
and
deep
learning
frameworks.
And
so
this
is
the
overall
approach
that
before
so.
This
is
the
one
side
there
is
a
user
or
they
just
provided
that
creates
the
datasets
erd
right.
B
So
they
say
is
my
data
set
with
this
name
and
then
the
operator
takes
care
of
this
definition
and
provides
the
PVCs
or
the
comping,
Maps
or,
and
the
complement
and
signals,
and
so
on
so
forth,
and
for
the
pot
to
just
use
this
data
set.
They
just
needs
to
do
that
right,
like
add
the
label
which
says
dataset,
told
zero
a
number
and
then
dot
ID,
the
name
of
the
data
set
they
created
and
how
to
use
it.
They
could
use
it
as
a
mount
point
or
they
can
use
it
by
environmental
bar
abuse.
B
You
know
if
they're,
using
these
three
API,
for
instance,
they
would
use
the
they
would
get
these
credentials
there
and
the
connection
lives
there.
And
now
the
other
bit
is
the
as
I
said
the
castle
plugin.
So
imagine
that
we
can
install
in
parallel
some
custom
plugins
which
provide
that
functionality
of
caching
data
sets
and
we
have
implemented
solution
that
works
on
for
s3
buckets
and,
in
the
end,
as
I
said,
the
co
scheduling
of
the
pods
to
the
to
the
cast
data
set.
B
C
B
C
B
So,
basically
right,
we
in
leverages
the
corresponding
CSI
plugins.
So
basically,
the
data
set
operator
looks
for
data
sets,
reacts
to
the
creation
of
data
set
objects
right,
so
this
is
of
type
s3
right,
so
the
data
set
operate
or
realize
that
it's
an
s3
based,
so
it
creates
a
CSI
PVC
out
of
s3
right.
So
if
the
data
set
operator
that
creates
the
let's
say
the
native
kubernetes
components
from
from
this
data
set,
so
basically
we
have
it
requires
to
have
installed
the
as
part
of
the
framework
we
understand.
B
B
B
They
this
is
a
core
framework
right,
so
this
is
our
core
framework
and
I'm
going
to
show
how
to
transplant
counseling
works
right,
so
the
user
goes
and
declares
the
dataset
right
so
within
within
the
Davis,
the
operators,
the
dataset
controller,
so
the
controller
mix
it
ii
says.
Is
there
any
plug-in
available
in
the
cluster?
B
No,
so
basically,
I'm
gonna
create
a
saddle,
a
object
that
we
call
it
they
just
internal,
which
has
the
same
credential
same
endpoint
with
the
original
dataset,
and
then
it
goes
back
to
the
dataset
internal
controller,
which
receives
this
definition
and
create
the
native
components
right.
So
it
creates
the
PVCs.
They
come
from
apps
the
secrets
based
on
the
type
of
data
data
set
right
now.
B
If
there
is
sorry
I,
don't
know
how
to
move
this
bit
right,
but
so
in
the
case
that
there
is
data
set
available,
the
data
set
controller
delegates,
the
creation
of
the
data
set
internal
to
the
plug-in
it
passes
these
details
and
says
you
know
you
should
take
care
of
that.
The
definition
that
data
set,
but
in
the
end
just
give
me
the
data
set
internal.
That
I
should
create
so
they
this
plug-in.
B
So
in
our
case
safe,
you
know,
provisions,
provisions
safe
by
a
roof
and
it
creates
a
data
set
internal
at
the
end,
and
then
it
gets
handled
by
the
core
framework
again.
So
it
goes
back
to
the
data
setting
turn-on
controller
and
it
creates
again
the
corresponding
PVCs
from
the
css3
and
NFS
so
yeah,
so
we're
integrating
with
values.
B
I
Yeah,
my
question
is
that
you
know
I
I've
seen
a
lot
of
use
cases,
but
this
one's
new
to
me
mm-hm.
So
I
think
there
was
some.
Maybe
I
would
have
for
me.
I
don't
maybe
for
others
it
would
be
nice
to
see
a
lot
of
the
wise.
You
know
there's
a
lot
of
maybe
assumption
of
the.
Why
you
need
that
information
and
that
flow
I
I,
don't
I
am
not
familiar
enough
with
these
case
to
to
understand
that
flow.
D
I
I
B
B
H
H
D
H
B
David,
so
basically
it's
very
much
on
point
right,
so
imagine
that
there
is
a
provider
that
just
gives
you
no
disks
right
and
kubernetes
cluster
and
there
are
users
who
want
to
use,
as
David
said
remote
data
sources,
but
at
the
same
time
we
want
to
optimize
that
by
trying
to
bring
as
much
as
closer
to
the
to
the
boat
by
caching
it
to
the
remote.
As
now
we
mainly
working
with
this
theory.
B
But
you
know
we
have
support
for
NFS
as
well,
so
trying
to
load
as
much
of
the
data
in
the
local
disks
what
the
pods
are
actually
running,
and
this
is
very
common
on.
You
know
deep
learning,
workloads
that
there
keep
on
reusing
the
same
datasets
right,
so
we
want
to
completely
make
it
transparent
for
the
end-user,
so
they
don't
have
to
deal
with.
You
know
configuring
optimizing
mounting
the
datasets,
it's
all
done
for
them
right.
C
Exactly
so
yeah
nice,
if
I,
could
just
summarize
just
to
make
sure
I'm
understanding
this,
because
this
is
actually
pretty
interesting
so
so
effectively,
you
use
the
CRS
as
a
catalog
of
datasets
and
if,
if
a
data,
scientist
or
somebody
wants
to
run
a
workload
in
a
cluster,
that's
utilized
it's
that
data
set,
then
you
either
sort
of
orchestrate
its
that's.
The
the
file
system
is
available
within
that
cluster
or
you
implement
caching
to
make
sure
it's
available
in
that
in
that
cluster,
because
the
data
set
the
remotes,
presumably
so,.
B
This
is
spot-on,
so
there
are
both
my
answers
both
because
so
we
tried
to
tackle
two
issues
right.
One
is
the
disability
right.
So,
as
you
said
that
a
scientist
would
go
and
say
goodbye,
CTL
get
datasets
on
their
namespace
and
they
will
see
they're
available
once
so.
They
will
just
get
a
name
noting
that
there
could
be
a
case
where
there
is
another
persona,
creating
the
datasets
for
them
right.
B
We
bring
the
hooks
for
casting
frameworks
to
transparently
with
minimal
effort
as
support
Cassie
in
this
pipeline
without
the
user.
Realizing
it's
happening
at
all,
so
the
API
right
for
a
free,
tasking
plugin
is
that
you
need
you
would
be
passed
a
data
set
and
you
are
responsible
for
creating
a
data
set
internal,
so
provision.
You
know
your
services
provision,
your
post
provision,
whatever
you
think,
needs
to
be
provision,
but
in
the
end
gives
us
give
us
the
data
set
internal
and
all
the
s3
Mountain
lordy,
the
other
orchestration
stuff
happens
by
us.
B
So
in
Multan
the
custom
plugin
we
won't
have
to
do
mount
or
implement.
History
of
you
know:
utilizes
three
FS
on
their
own.
It's
already
part
of
the
core
framework,
so
it
tries
to
tackle
both
things
right
and
the
third
step
that
we
want
to
do
now
is
the
sky
right
saying:
if
we
know
the
datasets
where
are
cast
in
the
in
the
cluster,
maybe
we
can
direct
the
pod
to
be
scheduled
on
them
on.
They
know
that
has
the
data
cast
so
to
achieve
a
bit
more
data
locality.
K
B
The
cases
that
we
are
handling
is
so
we
support
right
right.
If
you
write
back
on
this
PVC,
it
would
be
synchronized
on
the
cloud
right.
We
just
the
case
that
we're
looking
at
more
is
when
these
that
are
pre-populated
right.
So
imagine
that
you
have
imagined
it
on
on
an
s3
bucket
right,
so
when
what
the
PVC
that
we
create
contains
this
packet
mounted,
so
it
would
be,
it
would
reflect
the
contents
of
the
remote
s3
bucket
right.
So
it's
not
empty.
K
B
So
it's
it's
a
dynamic
PVC
right
and
we
rely
on
the
CSIs
three.
So
there
is,
there
is
a
CSI
plugin
for
s3
that
we
have
modified
a
bit
to
suit
our
needs
and
yeah.
It's
a
dynamic,
PVC
right,
yeah
I
can
I
can
actually
pour
it
very.
Very
quickly.
Show
you
that
right.
So
this
is
an
example.
Dataset
right,
so
this
my
emotes,
not
right
and
I
do
okay,
and
if
you
see
get
datasets,
you
will
see
that
it's
there
and.
B
There
you
go
so
we
have
this
PPC
right,
so
it
was
just
created
Yanis
17
seconds
ago,
and
if
we
create
this,
if
we
go
and
use
that
right,
so
we
want
to
use
dataset
Yanni's
as
mounted.
So
this
is
optional.
The
show
here,
so
you
can
mount
it
somewhere,
you
want
to,
or
you
can
leave
it
to
the
default.
So
we
create
a
pod.
B
B
D
B
Is
the
convention
that
we
use
so
inside
the
pod
there
would
be
an
empty
data,
sets
right
so
yeah.
This?
Is
this
how
it
would
look
like
right
so,
as
I
said,
we're
looking
to
optimize
the
flow
of
the
end
user
as
mods
are
supposed
to
be
right,
so
they
won't
have
to
deal
with.
You
know
we
don't
have
to
change
the
workflow
at
all.
They
just
need
to
the
only
things
that
they
need
to
do
from
the
users
perspective
is
to
annotate
their
boats
like
this
right.
The
two
two
labels
you.
B
K
B
C
B
C
Thank
thanks
thanks.
Yes,
thanks.