►
From YouTube: Kubernetes VMware UG 20210107
Description
January 7, 2021 meeting of the Kubernetes VMware User Group. This meeting hosted a presentation on utilizing GPU resources from Kubernetes hosted applications and services when Kubernetes is hosted on vSphere hosts with GPU hardware available on some or all of the hypervisor nodes.
A
Hi
welcome
to
the
january
7th
meeting
of
the
kubernetes
vmware
user
group
on
the
agenda.
Today
we
have
listed
the
the
item
of
gpu
support
for
kubernetes
workloads
when
kubernetes
is
hosted
by
vsphere.
A
My
co-host
is
hopefully
destined
to
be
miles
gray
co-chair
of
this
group,
but
he
has
he's
not
present.
Yet
I
did
communicate
with
him
last
night
and
he
said
he
intended
to
be
here
and
he
was
checking
his
lab,
so
I'm
hoping
he
shows
up
in
the
next
few
minutes,
but
right
now
he
isn't
here.
A
I
have
posted
some
links
in
the
agenda.
Notes
document
for
that
relate
to
gpu
support
for
kubernetes
on
vsphere.
A
There
is
a
youtube
video
recording,
which
is
an
introduction,
there's
also
some
documentation
which
outlines
the
exact
requirements,
I'm
not
an
authority
on
the
subject,
but
the
last
time
I
looked
at
it,
you
can't
take
just
any
old
gpu.
There
are
certain
requirements
for
specifications
that
they'd
have
to
meet
most
of
the
demonstrations.
I've
seen
relate
to
nvidia
gpus.
A
Okay,
I'll
take
silence
as
no
or
somebody
is
having
trouble
finding
their
unmute
button,
but
that's
okay.
Do
we
have
anybody
here
who
has
experience
with
running
gpus
on
vsphere
for.
A
B
I
I
actually
do
we
have
a
I'm
with
a
consulting
firm
partner
of
vmware
and
we
have
a
customer,
that's
very
interested
in
machine
machine
learning
and
so
obviously,
gpus
are
going
to
play
a
role
in
that
and
I'm
really
just
trying
to
figure
out.
You
know
what
this
means
to
them:
uh-huh.
A
Yeah,
I
did
just
a
little
bit
of
research
just
trying
to
put
those
links
in
the
agenda
notes,
and
it
seems
like
it's
something
that
is
definitely
catching
on.
Certainly,
when
kubernetes
is
run
in
the
public
clouds,
amazon
azure
google
cloud,
there
is
already
support
for
using
gpus
for
workloads
in
those
clouds,
and
I
think
that
at
least
for
some
of
them
they
can
be
exposed
to
kubernetes.
A
Oh,
I
don't
know
if
you'd
call
it
multiple
tenants
or
multiple
workloads.
You
know
in
some
circumstances
the
actual
compute
hardware
is
not
necessarily
shared
against
multiple
accounts,
but
still
sharing
across
multiple
apps
is
another
thing
that
isn't
necessarily
something
you
want
to
dedicate
a
full
hardware
gpu
to
so
the
bit
fusion
technology
has
a
technique
for
taking
a
large
physical
gpu
and
carving
it
up
into
smaller
virtual
instances
to
create
the
illusion
that
workloads
get
a
gpu.
A
Even
if,
under
the
covers
at
some
level,
you
know
they're
getting
a
fractional
resource.
A
Okay,
I
just
got
slacked
by
miles
so
he's
on
his
way,
so
I
think
he'll
be
here
in
a
couple
minutes,
but
anyway,
just
as
a
summary,
I
think
that's
going
on,
and
I've
heard
that
machine
learning
is
yeah,
definitely
an
app
calling
for
gpus
and
there's
two
forms
of
it:
both
training
training,
your
machine
learning
and
then
executing
it
later.
A
I've
read
of
a
lot
of
use
cases
for
people
doing
image.
Recognition
too,
where
they've
got
an
application
with
cameras
and
trying
to
use
the
gpus
to
filter
that
large
data
stream
to
go,
find
interesting.
Deductions
of
it,
whether
it's
recognizing
people
license
plates
controlling
machinery
for
iot,
so
I
see
miles
has
now
popped
on
the
participants
list,
so
we'll
give
them
a
few
minutes.
I
think
to
get
his
rig
set
up.
C
Yeah,
sorry
folks,
I've
been
on
pto,
so
my
notifications
have
been
off
so
I
apologize
for
being
late
for
this.
I
am
in
a
state
where
I'm
good
to
go,
though,
so
I
can
just
share
my
screen.
If
you
guys
don't
have
anything
ongoing.
A
Yeah
we've
just
been
having
a
little
chat
of
possible
use
cases,
but
go
for
it.
C
Sure
thing,
okay,
so
host
has
disabled.
Okay.
There
we
go
just
one
okay.
Can
you
see
my
screen?
Yes,
awesome?
Okay,
so
again,
apologies
for
being
late.
I
completely
lost
track
of
time.
So
what
we've
got
here
is
a
tkg
cluster.
It's
a
tkg
service
cluster,
so
part
of
e3
with
tanzu.
It's
not
important.
It
could
run
on
any
kubernetes
cluster.
The
way
this
demo
and
the
way
we
wrote
this
app
means
that
it
should
be
portable
across
any
distribution.
So
it's
not
specific
to
vsphere.
C
What's
hanzo
or
tkg,
that's
just
the
way
that
we
have
it
set
up
so
on
here
you
can
see
I've
got
this
namespace,
which
is
a
vsphere
with
kubernetes
namespace
for
those
of
you
that
aren't
familiar
with
it,
and
underneath
that
we've
got
a
tkg
cluster
called
endor
and
inside
our
endor
cluster.
We've
got
this
application,
which
does
counting
of
flowers
and
there's
a
whole
story
behind
this
that
that
we
made
up
for
vmware
grove
show
that
we're
doing,
but
the
context
isn't
isn't
really
that
important.
C
In
any
case,
we've
got
three
masternodes
and
four
worker
nodes
here
and
there's
a
few
things
in
this
environment
that
you
sort
of
need
to
be
aware
of
that
are
actually
in
use
here.
So
we've
got
bitfusion
installed
and
if
you're
not
aware,
bitfusion
is
a
way
that
you
can
ingest
gpus
into
a
server
virtual
machine.
So
you
can
see,
we've
got
the
fusion
102
and
o3
I'll.
C
Show
you
those
in
the
inventory
here
in
a
second
and
then
we've
got
clients,
and
you
can
see
that
we've
got
a
whole
bunch
of
clients
here,
because
we've
just
had
various
iterations
of
this
spun
up
over
time
and
the
reason
it
keeps
them
all
is
because
the
idea
is
that
you
could
as
an
msp
bill
for
this.
So
you
can,
you
know,
allocate
a
portion
of
a
gpu
and
you
can
get
charged
per
gpu
hour
or
something
like
that.
So
we've
got
bitfusion
installed.
C
So
in
edit
settings
you
can
see
under
pci
device.
We've
got
a
tesla
t4
nvidia
gpu
pass
through
to
this
vm,
and
the
same
is
true
with
bit
fusion
o2
and
o3.
So
on
bit,
fusion,
0102
and
o3
we've
got
the
bitfusion
server
software,
which
allows
it
to
allocate
a
portion
of
a
gpu
over
the
network
to
one
of
these
clients.
C
C
So
what
I'd
like
to
show
you
next
is
our
application
itself,
so
just
move
some
of
this
zoom
stuff
out
of
the
way
here.
First
of
all,
all
the
code
is
on
github.
So
if
you
want
to
actually
build
this
or
if
you
want
to
run
this
on
your
own
environment
and
you
have
access
to
gpus
and
bitfusionbits-
which
you
should
as
of
the
expert,
if
you're
a
vmware
v
expert,
then
you
can
just
use
our
container
images.
C
C
So
that's
all
documented
within
the
repository
here
and
if
we
go
into
the
bit
fusion
one
there's
a
few
little
bits
and
pieces
that
you
need
to
add
like
servers.conf,
that's
how
the
container
finds
the
bitfusion
servers
and
then
client.yaml,
which
is
like
a
token
it's
an
authentication
mechanism
between
the
container
and
the
bitfusion
server
itself.
C
So
that's
how
we
build
the
docker
container,
so
there's
this
worker
container,
which
actually
does
the
image
processing
and
then
we've
got.
The
kubernetes
manifests
in
a
separate
repository
just
for
cleanliness.
Really,
we
deployed
q
prometheus.
We've
got
the
whole
run
book
here.
So
if
you
want
to
deploy
this,
then
we've
got
the
tkc
yaml
in
here
pod
security
policies,
the
namespaces.
C
C
So
I've
just
documented
how
you
would
patch
your
credentials
into
each
namespace,
so
you
can
get
around
that
and
then
deploy
prometheus
and
the
application,
and
the
reason
that
we
deployed
prometheus
here
is
because
we
actually
use
a
prometheus
external
metric
and
if
you're
not
familiar
with
prometheus,
it's
got
a
concept
of
internal
metrics
which
are
like
kubernetes
only
metrics
like
pods
and
namespaces
stuff,
like
that,
it's
got
custom
metrics,
which
can
be
made
up
of
those
kubernetes
objects,
but
are
like
abstracted
one
layer
and
then
there's
external
metrics,
which
can
come
from
a
completely
different
outside
system,
and
that's
the
way
that
we
chose
to
run
it
for
this
one.
C
So
there's
a
whole
bunch
of
manifests
in
there.
That
show
you
how
we
created
those
external
metrics,
and
then
we
use
those
metrics
in
something
called
the
kubernetes
horizontal
pod,
auto
scaler.
So
whenever
we
change
the
desired
state
of
frames
per
second
or
flowers
per
second
for
this
application
to
process
it'll
automatically
scale
out
the
app
allocate
more
gpus,
you
know
give
it
more
crunch
so
that
it
can
get
through
these
things
quicker.
C
So
we
thought
this
was
quite
a
nice
model
of
deploying
the
application
and
having
it
scale
is
based
on
a
desired
state
of
performance,
rather
than
just
saying:
okay,
give
it
two
gpus
and
whatever
it
does,
it
does
it'll
allocate
gpus
until
it
meets
the
desired
state,
which
we
thought
was
quite
quite
clean,
and
then
you
deploy
the
application
itself,
so
we're
going
to
have
a
look
at
the
manifests
and
I'm
going
to
open
up
visual
studio
code
to
do
this,
just
better
syntax,
highlighting
and
such
so
under
here
this
is
our
deploy.yaml
and
you
can
see
everything
is
in
this
deploy.yaml.
C
C
This
one
is
another
deployment
and
it
deploys
the
actual
workers
or,
as
we
call
them
in
this
case
wookies,
because
we
went
for
a
star
wars
theme.
You
can
see
here
that
this
app
is
called
flower
counter
worker,
and
this
one
is
just
called
flower
counter.
So
this
is
the
dashboard
one,
and
this
is
the
actual
thing
that
does
the
number
crunching
the
the
gpu
work.
The
images,
as
you
can
see,
are
based
off
of
these
github
repositories.
C
I'll
also
note
that
the
repos
here
have
a
github
actions
enabled
on
them.
So
whenever
we
do
a
push,
it
automatically
builds
the
container
and
then
pushes
it
to
docker
hub.
So
if
you
clone
the
repo,
I
think
it
clones
the
actions
as
well,
because
it's
under
this
workflows
thing
so
there
would
be
a
few
bits
and
pieces
in
there
that
you
would
need
to
populate
out.
I
think
it's
there,
you
go,
you
need
to
populate
repo
and
you
need
to
populate
your
there's
something
else
as
well.
C
Anyway,
if
you
have
a
look
through
the
code
and
through
the
there
is
secret
key,
then
you'll
get
it.
So
that's
done
in
your
your
repo
settings
here,
so
you
can
have
the
full
ci
system
or
ci
setup.
The
way
that
we
have
it
too.
So
we
use
those
images
we
export
at
port
8080
as
metrics
for
prometheus
on
each
of
these
workers.
C
C
You
can
omit
the
dashboard
deployment
and
it'll
work.
Fine.
These
two
are
kind
of
interesting,
actually
so
partial
gpu
and
these
are
bitfusion
parameters.
So
if
you
look
through
the
docker
file
for
this
image,
there's
descriptions
of
what
all
of
these
things
do
and
what
we've
what
we've
exposed.
So
you
can
see
I'm
asking
for
half
of
a
gpu
from
one
gpu
or
you
can
change
this
to
two.
You
could
say
I
would
like
a
half
a
gpu
of
two.
I
don't
know
why
you
would
do
that.
It's
just
an
option.
C
That's
there
but
yeah.
So
you
can
ask
for
a
partial
amount
of
a
gpu
and
what
you're
allocating
here
with
bitfusion
is
the
amount
of
frame
buffer.
So
say
you
have
a
gpu
is
24
gigs
ram.
Then
you'll
get
12
gigs
of
ram
allocated.
For
that
one
say:
tensorflow
workflow,
which
is
what
this
is
in
this
case,
and
then
batches
is
the
number
of
flowers
that
we're
going
to
put
through
it.
C
It's
just
a
benchmark:
it's
not
actually
processing
flower
images
because
we
don't
have
a
bank
that
big
of
stuff
yet
so
we
just
run
a
benchmark
and
we
get
to
do
2000
batches
to
generate
a
frames
per
second.
So
that
is
the
deployment
for
the
thing
that
actually
does
the
tensorflow
work
and
the
gpu
work,
and
then
we
have
two
services,
one
for
the
dashboard.
Again,
you
don't
really
need
this.
C
This
is
just
something
we
have
for
the
roadshow,
but
what
you
do
need
is
this
service
for
the
metrics,
so
this
takes
the
metrics
on
each
port
and
then
exposes
them
as
a
service
internally
to
the
cluster
all
right.
Actually,
this
is
a
service
type
load
balancer,
so
we
could
debug
it
externally,
but
this
is
what
prometheus
then
scrapes.
So
it'll
look
at
and
we'll
look
at
this
in
prometheus
itself.
C
It'll,
look
for
a
label
of
flower
worker
counter
on
a
service,
find
all
pods
behind
that
and
then
scrape
those
for
metrics,
and
you
can
see
that's
your
prometheus
service
monitors
there.
So
we've
got
two:
we've
got
our
dashboard
fps
and
our
worker
fps,
our
worker
fps,
is
the
one
that
that
we
actually
use
in
this.
This
is
just
a
remnant
of
us
doing
some
testing
here
that
we
haven't
cleaned
up.
C
So
this
takes
the
fps
from
each
of
these
instances
of
one
of
these
containers.
It's
doing
gpu,
processing
and
scrapes
it
every
five
seconds.
So
you
can
see.
The
interval
is
five
seconds
and
the
target
port
is
8080,
which
is
the
metrics
port
that
we
mentioned
further
up,
and
this
is
the
thing
that
actually
does
the
magic
here.
So
this
is
a
horizontal
pod,
auto
scaler.
C
It
is
using
auto
scaling
api
version,
v2
beta
2..
You
need
that
to
use
this
new
syntax
for
external
metrics
and
what
you'll
see
here
is.
We've
set
a
max
replicas
of
four
and
a
min
replicas
of
one.
It's
going
to
target
the
deployment,
a
new
hope
wiki,
which
is
the
worker
that
actually
does
the
tensorflow
stuff,
and
I've
also
changed.
Some
of
these
advanced
overrides
just
to
make
it
easier
to
demo
whenever
we
were
doing
the
road
show
by
default.
C
C
The
interesting
piece
here,
though,
is
the
actual
metric
itself.
So
you
can
see.
The
metric
is
type
object,
it's
nice
and
generic,
because
this
is
how
we
like
to
do
things
in
kubernetes
land,
and
we
say
the
metric
is
going
to
be
called
flowers
per
second
total.
The
object
is
in
the
name,
space
flower
market
and
the
value
the
target
value
is
500
and
for
whatever
reason-
and
I
have
not
been
able
to
figure
this
out-
there
is
no
way
to
have
a
horizontal
pod
autoscaler
make
an
inverse
relationship
to
a
metric.
C
It
is
always
100,
positive
and
linear.
I
cannot
figure
it
out.
So
basically,
if
we
say
the
target
metric
is
a
thousand
frames
per
second,
for
example,
kubernetes
will
look
at
that
and
look
at
the
current
metric
and
say
well,
that's
lower,
so
that's
good,
so
there's
no
way
that
we
can
actually
say
you
know.
We
would
like
this
metric
to
target
a
higher
value
rather
than
a
lower
value.
C
It's
always
striving
for
lower
value,
so
you'd
probably
be
able
to
do
some
math
thing
in
prometheus
and
come
up
with
a
nice
statement
that
inverts
everything,
but
just
know
that
at
the
minute,
if
I
say
500,
it's
going
to
give
me
one
replica
and
then,
if
I
reduce
this
and
say
10,
which
you'll
see
now-
and
I
say,
10,
it's
going
to
say-
oh,
I
haven't
met
that.
So
it's
going
to
start
scaling
it
up.
Okay,
so
that's
the
dry
stuff
out
of
the
way.
What
does
it
actually
look
like?
C
So
this
is
the
application
itself.
The
scaling
isn't
playing
too
nice
here
at
the
minute
you
can
see
these
are
actually
full
numbers
here.
It's
just.
I
presented
this
on
a
larger
screen
last
time.
So
let
me
just
get
this.
Like
this,
so
you
can
see
what's
going
on
so
at
the
minute
you
can
see,
we've
got
one
wookie,
which
is
one
worker.
It's
doing
114
flowers
per
wookie,
because
we've
got
one
wookie
and
it's
currently
processing
114
flowers
in
total.
So
that
makes
sense
right.
C
There
we've
got
one
lucky
114
flowers
per
second,
so
that
means
we're
doing
114
per
wookie.
Here's
the
history,
so
you
can
see
the
number
of
flowers
per
second
per
wookie
over
time
and
the
number
of
wookies
in
total.
So
let's
have
a
look
at
the
kubernetes
and
the
things
here.
So
if
I
go
into
my
item,
hopefully
you
can
still
see
that
I've
just
gone
full
screen.
Is
that
still
there.
C
Yeah,
it's
still
there,
okay
cool,
so
we're
in
our
name
space.
So
if
I
do
cube
cto
qctl
get
pods,
you
can
tell.
I
haven't
been
back
to
work
yet
qc
pods
we've
got
to
we've
got
our
dashboard
and
we've
got
our
wookie.
So
if
I
do
a
cube,
ctl
get
all
what
we
should
see
is
horizontal
pod,
auto
scaler
at
the
bottom
here,
which
is
what
we
just
looked
at
and
you
can
see
it
says
114,
which
is
the
number
of
flowers
per
second
that
we're
currently
processing
so
that
matches
our
metric.
C
So
we're
saying
we're
targeting
500
and
it's
got
114
min
pods
is
one
max.
Pods
is
four
replicas
one
okay.
So
what
we're
going
to
do
is
change
this
target
from
500
to
10..
So
remember,
I
said:
there's
an
inverse
relationship
here
for
whatever
reason,
but
it
is
what
it
is.
I'm
going
to
change
this
to
10
and
it's
going
to
start
spinning
up
new
wookie
notes.
So
you'll
see
these
start
to
increase
so
at
the
minute
status,
running
101.
C
So
let's
go
into
the
directory,
which
is
this
one
and
let
me
just
make
sure
I've
actually
saved
that
change.
So
there
we
go
10
and
we'll
do
cube,
ctl
apply
manifests
and
what
we
should
see
is
everything
is
unchanged
except
the
horizontal,
auto
scaler,
which
it
has.
You
can
see
that's
configured
so
now,
if
we
do
cube,
ctl
get
hpa,
dash
w,
which
is
get
horizontal,
pod,
auto
scalers,
and
we
put
a
watch
on
it.
C
What
you
should
see
now
is
it's
going
to
realize
that
this
metric
10
you'll
notice
that
the
targets
change
from
500
to
10
is
now
outside
of
its
range,
and
you
can
see
the
replicas
have
just
gone
from
one
to
four.
So
if
I
do
qctl
get
po
w,
what
we
should
see
is
there
are
indeed
more
workers
being
spun
up.
So
if
we
go
into
our
dashboard,
you
can
now
see
that
there
are
four
workers.
C
C
So
what
you'll
see
is
there's
a
latency
period
here
where
it
starts
to
ramp
up
before
it
actually
starts
processing
the
benchmark.
What
we
can
look
at
in
the
meantime
is
if
we
go
into
vsphere
and
bitfusion
you'll,
see
now
that
we've
allocated
half
of
this
gpu
half
of
this
gpu
and
all
of
this
gpu,
because
we've
spun
up
more
clients.
Likewise,
if
I
go
into
my
clients
here
and
again,
this
is
over
vpn,
so
the
ui
is
a
little
bit
sluggish
there.
It
is,
and
we
sort
by
allocated
you
can
see.
C
Indeed,
I've
got
four
and
they're
running
half
a
gpu
each
as
well.
You
can
see
the
history
of
each
one
of
these,
so
you
can
see
this
one
spinning
up
and
if
we
go
back
in
here
there
we
go
indeed,
so
it
started
actually
processing
data.
So
we're
now
aggregating
at
364
flowers
across
the
four
wookies
and
you'll
see
that
in
the
history
here
once
there,
it
is
once
it
catches
up.
C
It
says
we're
doing
124
on
111
on
another
zero
on
another
and
125
on
the
final
one.
So
we've
set
this
desired
state
where
we
said
we
would
like
you
to
achieve
a
certain
number
of
flowers
per
second
granted.
You
know
it's
an
inverse
relationship,
but
we'll
figure
that
out,
but
it
has
then
allocated
more
and
more
worker
nodes
until
they
can
achieve
that,
and
also
dynamically
attached
real
physical
gpus
to
each
of
those
containers
and,
like
I
said
this
will
run
on
any
distro.
This
is
not
tkg
specific.
C
This
will
run
on
open
shift
run
on
vanilla
kubernetes.
All
you
need
to
do
is
update
your
docker
file
to
include
the
bitfusion
bits
like
this,
so
this
is
all
up
on
github.
If
you
want
to
have
a
look
at
this
stuff,
you
can
build
this
out
yourself
on
whatever
version
of
kubernetes
you're
running.
So
anyone
got
any
questions
on
that
or
anything
they
want
me
to
poke
or
have
a
look.
D
At
can
I
ask
a
question
specific
to
bitfusion
absolutely
go
ahead.
How
does
bitfusion
compare
to
mig.
C
How
does
it
compare
to
your
wig
yeah?
You
are
outside
of
my
area
of
expertise
there.
I
thought
it
was
going
to
be
more
bit
fusion
specific
specific.
As
far
as
I
understand
it,
mig
has
some
changes
to
the
networking
layer
as
well,
and
it
requires
some
specific
bits
to
be
installed
on
the
host.
I
think
it
uses
some
kind
of
vgb
transport
or
it
uses
gpu
over
rdma.
C
Maybe
I
could
be
totally
off
base
there,
but
I
know
there
was
some
considerations
around
mig
that
made
it
more
quote-unquote
performant
than
bitfusion,
but
for
ml
type,
batched
workloads
where
you're
you
know
dispatching
a
batch
to
a
gpu,
letting
it
do
a
process.
It
can
give
you
a
result
back
that
isn't
as
relevant.
It's
more
for
those
real-time
type,
visual
workloads
that
you
would
see
a
benefit
there.
D
Okay
yeah,
I
was
gonna,
be
my
next
question,
because
a
real-time
inference
with
a
frame
time
return
limit
is
one
of
our
concerns
right,
okay,
yeah,
and
that
would
certainly
be
something
to
have
a
look
at
there
would.
A
I'm
just
curious
about
what
the
impact
might
be
if
you've
got
some
of
these
workloads
running
and
a
v
motion
occurs
just
because
I'm
familiar
with
the
history
of
potentially
some
issues
with
storage
volume
mounts
and
just
wondering,
if
perhaps
these
fractional
gpu
attachments
to
the
workloads
might
be
impacted
by
vmotions.
C
Right:
okay!
Yes,
so
let
me
clarify
that's
a
good
point
steve!
I
didn't
quite
make
clear.
So
if
we
go
into
our
hosts
and
clusters
view
here-
and
we
have
a
look
at
our
workers,
what
you'll
see
is
that
none
of
these
workers,
if
I
go
into
their
edit
settings
here,
actually
have
if
it
lets
me,
I
actually
have
a
pci
device
attached.
So
the
way
bitfusion
works
is
it
doesn't
actually
do
a
pci
attach.
So
the
pci
attachments
are
still
on
these
bit
fusion
vms.
They
they
don't
change.
C
What
the
containers
get
is
essentially
an
ethernet
just
an
ip
address
path
to
a
gpu
that
fusion
does
some
magic
encapsulation
of
the
cuda
api
calls
transfers
them
over
the
network
to
the
bitfusion
server
and
the
bitfusion
server
then
executes
those
commands
locally.
So
it's
not
that
we
are
mounting
gpus
directly
into
vms
here,
because
that
leads
to
its
own.
You
know:
complications
around
drivers
that
are
installed
what
os
you've
got
installed.
All
that
kind
of
stuff
like
nvidia,
is
very
ubuntu
centric
with
everything
so
distro
would
matter.
C
Bitfusion
is
entirely
ethernet
based.
So
if
you
do
have
one
of
your
workers
vmotion
around
or
if
you
have
even
your
bitfusion
servers
vmotion
around,
you
won't
notice
a
difference.
There
is
no
pci
attachment,
there's
no
kubernetes
understanding
that
there
is
a
gpu
allocated
here.
The
gpu
is
connected
straight
to
the
container
and
inside
of
itself.
The
calls
are
just
being
transferred
over
the
network
to
the
fusion
server.
A
Okay,
that
sounds
like
a
great
architecture,
then
for
resiliency,
where
it
should
be
pretty
immune
to
things
going
on
under
the
covers.
As
long
as
you
don't
drop
network
connectivity
and
right,
there's
plenty
of
techniques
for
investing
in
a
lot
of
redundancy
there
to
keep
your
network
connections
live.
C
And-
and
that's
what
I
really
liked
about
the
fusion
architecture
and
why
I
chose
it
for
this
application
was
because
it
is
so
dynamic
because
I
don't
need
to
allocate
so
say.
For
example,
if
you
look
at
nvidia's
offering
today,
you
would
have
to
add
a
dedicated
gpu
node
to
your
kubernetes
cluster
and
that
cannot
move
from
that
kubernetes
cluster.
So
you
have
that
gpu
always
pinned
to
that
case
cluster.
C
However,
with
bit
fusion,
because
the
server's
centralized
and
the
clients
are
decentralized,
you
can
have
it
so
that
those
clients
don't
have
to
come
from
the
same
cluster
like
I
could
have
10
20
different
kubernetes
clusters,
all
talking
to
the
same
three-bit
fusion
servers
and
getting
their
slices
allocated
to
them
without
having
to
build
dedicated
nodes
that
are
gpu
nodes
into
my
case
clusters.
A
Okay,
I'm
just
wondering
I
know
you
already
said:
you're,
not
an
authority
on
mig,
but
I
believe
that's
an
alternate
way
for
attaching
gpus
to
hypervisor
nodes,
and
is
it
the
same
network
attached
or
do
you
even
know.
D
It's
well
I'm
not
going
to
answer
the
question
specifically,
but
it's
locally
attached
to
the
host
and
it's
it
is
a
mib
that
gets
installed
to
be
able
to
do
that.
So.
D
Exactly
it's:
it's
basically
the
the
new
version
of
vgpu
and
it
does
require
ampere
architecture.
Okay,
okay,.
C
Oh,
I
think
I
know
what
the
difference
with
mig
is:
isn't
it
that,
previous
to
ampere,
the
only
way
they
could
do
fractional
gpus
via
vgpu
was
frame
buffer,
whereas
now
with
mig,
they
do
frame
buffer
and
compute
isolation?
So
you
get
dedicated
cuda
cores
and
vrm.
D
Yeah,
you
can
carve
up
the
gpu.
However,
you
want
specifically
with
timeshare
and
resource
and
everything,
whereas
with
v2p4
it
was.
It
was
purely
fractional.
A
Right
so
another
question
miles
just
for
people
who,
as
a
learning
exercise,
might
be
trying
to
put
together
a
lab
or
something
to
play
around
with
this.
I'm
sure
it's
in
the
docs.
But
what
are
the
hardware
requirements
for
this
related
to
the
gpus?
Does
it
work
with
kind
of
any
modern
thing
out
there
or
are
there
things
to
look
out
for
if
you.
C
C
C
The
bitfusion
only
works
with
quote-unquote
workstation
or
enterprise
gpus
as
nvidia
views
them,
and
it's
a
legal
thing,
so
you
would
have
to
have
some
kind
of
enterprise
or
workstation
gpu
that
supports
the
vgpu
standard,
because
that's
what
bitfusion
does
to
mount
these
gpus
into
the
server.
So
some
kind
of
gpu
like
that,
you
could
probably
pick
something
up
like
a
tesla
t4
on
ebay.
You
know
relatively
cheaply.
C
It's
not
going
to
be
cheap,
but
it'd
be
very
cheap
compared
to
like
the
new
ampere
stuff
or
or
whatever
so
some
kind
of
gpu
that
supports
vgpu,
tesla
t4
is
probably
a
good
place
to
start.
There
are
lower
models
that
you
could
probably
pick
up
in
the
ebay.
That
would
do
the
same
thing
aside
from
that.
C
No
other
hardware
requirements,
really
you
just
need
to
be
able
to
fit
it
into
your
host
and
then
the
bit
fusion
stuff
is
just
an
ova
that
you
deploy
you
mount
the
pci
device
into
and
then
the
bit
fusion
bits
as
you've
seen
are
just
something
that
you
slip
stream
into
your
docker
container.
So
it's
it's
really
just
that
gpu,
but
sadly
it
can't
be
a
consumer-grade
gaming,
gpu.
A
C
You
know,
I
do
not
know
the
compatibility
matrix
for
bit
fusion,
so
I
I
can
tell
you
what
this
is
running,
so
this
is
bitfusion2.5,
which
is
the
latest,
and
this
is
7.0
u1,
which
is
the
latest
on
pacho
2,
which
is
the
one
that
was
released
sometime
in
december
like
december
18th.
So
it's
the
very
latest
vsphere
and
the
very
latest
of
bitfusion
in
a
lab.
Probably
not
a
problem,
though.
Obviously
because
this
is
running
d
sphere
with
tanzoo.
C
A
C
So
it's
actually
kind
of
a
stupid
story
that
we
made
up
for
our
v
mugs.
The
idea
is
because
I
was
presenting
with
two
dutch
guys
that
there
is
a
flower
market
on
endor
and
luke
and
leia
work
there
and
luke
is
a
data.
Scientist
and
leia
is
an
infrastructure
person
and
they
have
to
come
together
and
figure
out
how
they're
going
to
make
this
new
kubernetes
style
application
that
uses
image
inferencing
to
count
the
number
of
flowers
that
are
going
through
this
flower
market
at
any
one
time.
C
So
it's
just
kind
of
a
dumb
story
behind
it,
but
this
is
this
is
what
we
built.
The
truth
is
it's.
It's
just
running
a
tensorflow
benchmark
in
the
background,
but
you
know
it
just
makes
it
a
little
more
interesting.
If
you
add
a
bit
of
a
story
to
it,
yeah.
A
And
then
some
people
on
this
in
this
meeting
might
not
be
familiar
with
you.
You
dropped
that
this
was
done
for
a
vmug
roadshow
and
they
might
not
even
know
what
vmugs
are,
but
those
are
like
a
different
form
of
user
group,
that's
associated
with
vmware
for
a
long
time
and
they
have
sort
of
the
equivalent
of
local
and
regional
meetups
going
on.
C
So
this
was
done
for
the
dutch
v
mug
or
they
call
it
the
nlv
mug,
and
that
was
just
before
christmas.
It
was
all
done
in
english
and
there
were
three
sessions
by
a
guy
called
johann
and
niels
and
myself
and
it's
a
three-part
series
and
we
set
the
stage
and
all
builds
up
to
the
demo.
So
that
was
done
before
christmas.
But
we
will
be
doing
this
at
other
ones
in
the
future.
C
We're
just
getting
bookings
and
stuff
sorted
out
at
the
minute,
but
there
will
be
other
vmware
user
groups
that
will
be
present
in
the
the
whole
story.
Behind
this
app.
A
C
No
problem
again,
sorry
for
being
late.
I
just
it
completely
slipped
my
mind
and,
like
I
said
you
know
no,
no
notifications,
because
I'm
on
pto
so
yeah,
I'm
just
glad
that
I
could
actually
show
it.
A
Well
last
call
for
questions
for
miles
and
then,
if
just
jump
in,
if
you've
got
them,
if
there
aren't
any,
I
didn't
there
was
nothing
else
on
the
agenda.
But
one
thing
I'd
like
to
bring
up
here
since
we've
got
all
these
people
on
the
call,
are
any
suggestions
or
asks
for
content
for
the
next
meeting
coming
up
in
february,
particularly
if
we've
I
know,
we've
got
a
few
users
on
the
call.
A
It's
a
lot
easier
speaking
for
both
myself
and
miles.
If
we
get
some
hints
as
to
what
users
want
to
see,
rather
than
making
the
stuff
up
ourselves
and
also
if
anybody
wants
to
volunteer
to
speak
on
subjects
like
use
cases,
user
experience
we'd,
welcome
that
content.
D
D
When
we
had
talked
about
that
sometime
last
year,
there
was
a
question
around
making
that
a
feature
flag
for
the
storage
class,
so
that
anything
would
work
because,
like
something
like
etcd
doesn't
have
an
enterprise
partner
to
sell
it
through
right.
So
I'd
be
really
interested
to
know
if
that
is
still
being
considered.
D
Since
that's
I
think,
highly
necessary
for
kubernetes
is
to
be
able
to
do
a
feature
flag
on
a
storage
class
for
that
if
it
is
or
if
it
isn't,
also
how
that
feature
works
and
the
performance
of
it
and
and
what
not
right,
because
there's
no
reason
to
replicate
a
piece
of
data
nine
times
for
something
like
that.
Cd.
A
Okay,
so
this
ask,
I
think
what
I
might
do
is
find
somebody
who
actually
has
worked
on
that
storage,
but
is
it
to
expose
the
capabilities
of
the
underlying
storage
implementation
up
to
the
kubernetes
level,
so
that
things
can
see
it
is
that
what
you're
asking.
D
D
And
then
it
makes
intelligent
selections
there
do
I
replicate.
Do
I
not
that
kind
of
thing
and
the
the
conversation
originally
was
with
enterprise,
but
in
the
conversations
everybody
seemed
to
agree
that
it
made
sense
to
be
able
to
just
expose
that
down
for
any
application.
Given
how
many
you
know
free
open
source
projects
are
that
support
that
type
of
functionality.
So
I'm
I'm
curious.
If
that
was
followed
through
on
and
regardless,
I
would
actually
be
interested
to
see
that
implementation
and
how
it
works
in
the
performance.
Okay,.
C
No,
I
do
actually
yeah
so
jordan.
We
had
a
presentation
on
that,
a
while
back
from
gopala
and
that's,
I
believe,
you're
talking
about
the
data
persistence
platform.
The
sort
of
the
problem
with
those
kinds
of
presentations
is
they're,
very
vmware
feature-centric
and
not
kubernetes
centric.
C
You
know
they're
proprietary
technologies
and
they
only
apply
if
you're
running
on
top
of
vsphere,
with
tanzu
in
particular,
so
it's
not
broadly
applicable
to
everyone
else
on
the
call-
and
you
know
this
being
a
community
meeting-
we're
sort
of
foreboding
to
talk
about
that
kind
of
stuff.
That
said,
there
are
a
bunch
of
talks
out
there
on
the
data
persistence
platform,
the
vsan
data
persistence
platform.
C
I
can
throw
some
into
the
the
agenda
that
you
can
have
a
look
at
both
by
myself
and
the
product
managers
and
the
engineers
that
actually
built
it.
On
your
other
question,
with
regard
to
the
storage
classes,
I
do
know
that
there's
a
cap
that
is
open
upstream
and
it
is
there
to
expose
storage
features
up
to
kubernetes
or
you
know,
vice
versa,
down,
so
that
the
application
can
request.
C
You
know
I
don't
need
this
replicated
underneath
and
if
the
storage
understands
that
call,
then
it'll
make
sure
it
doesn't
get
replicated
at
the
minute.
The
only
way
that
we
do,
that
is
through
data
persistence
platform.
That
is
the
way
that
we've
achieved
that
we
have
some
certified
partners,
like
you
mentioned
that
actually
do
that
today,
and
you
know
we
have
had
this
question
ourselves
of.
There
are
plenty
of
open
source
offerings
out
there
that
do
not
have
a
commercial
partner,
for
example,
postgres,
which
we
use
at
length
inside
vmware.
C
It
doesn't
have
a
commercial
partner.
So
how
would
we
offer
that
on
top
of
the
platform?
So
some
of
that
remains
unanswered.
You
know
just
from
from
ourselves
at
vmware,
but
the
cap
itself
is
in
progress,
so
there
may
be
something
coming
to
upstream
kubernetes
that
actually
exposes
those
flags
that
would
make
it
a
more
generic
thing
rather
than
just
partner
enterprise
certified
stuff.
D
A
Yeah
and
just
I
know
we
get
some
newcomers
here
so
for
acronym
death
definition.
Cap
is
kubernetes
enhancement
proposal
and
you
can
search
the
kubernetes
github
to
go,
find
those.
C
It
should
the
problem
comes
with,
you
know,
scheduling
it.
What
note
does
it
get
placed
on
and
then,
if
it
is
full
storage,
you
know
it
has
to
be
co-located
with
the
local
compute,
which
is
not
complex,
but
you
know
it's
not
a
transparent
operation
today
you
could
do
it
with
vvols
pretty
easy,
because
you
just
turn
replication
off
or
ask
for
it
to
be
placed
on
something
that
is
not
replicated
with
vsan.
C
It's
a
little
bit
more
complex,
mainly
not
because
you
can't
provision
it,
it
will
provision,
you
know
a
faults
to
tolerate
of
zero
object
and
it
will
let
the
client
access
that
storage,
but
you'll
have
non-linear
failures.
So,
for
example,
if
your
storage
is
on
one
node
and
your
computer's
on
another
node
and
it's
a
you
know,
zero
copy
thing
and
you
lose
one
or
the
other
node
you'll
lose
your
workload
and
it
is
quite
complex
to
try
and
figure
that
out.
C
So
we
don't
support
that
today,
simply
because
the
the
way
that
failures
happen
is
just
a
real
mess
when
you
start
to
look
at
it
like
that,
it's
easier
to
pay
the
the
storage
penalty
than
have
to
deal
with
the
outages.
That
would
inevitably
come
from
something
like
that
and
that's
why
we
we
built
dpp
the
data
persistence
platform
to
try
and
deal
with
that
stuff,
but,
like,
like,
I
say,
jordan,
that's
good
feedback
I'll.
A
Yeah
and
thanks
jordan
for
the
input
on
future
content,
anybody
else
got
anything.
A
Okay:
well,
if
we
don't
have
anything
else,
we'll
close
this
meeting
a
few
minutes
early,
then
thanks
for
coming
and
the
next
meeting
as
usual
will
be
the
first
thursday
of
february.
Bye.
Everybody
thanks.