►
From YouTube: Machine Learning on OpenShift SIG Using Ceph for ML Workloads on OpenShift - Kyle Bader (Red Hat)
Description
Machine Learning on OpenShift SIG Using Ceph for ML Workloads on OpenShift - Kyle Bader (Red Hat)
A
Anything
that
that
might
have
a
spark
operator
component
to
it,
but
I'm
not
sure
so
I'll
let
Kyle
dare
his
screen
and
walk
us
through
that.
Now.
A
B
So
one
of
the
things
I've
been
recently
working
on
is
kind
of
building
together,
like
almost
like
a
tutorial
for
experiencing
stuff
object,
storage
and
learning
how
you
can
use
stuff
object,
storage
with
with
some
of
the
tools
that
are
coming
out
of
rabbit
lettuce.
So
if,
if
you're
interested,
you
can
do
this,
you
know
you're
in
time
to
buy
just
kind
of
cloning.
This
repo
I
have
here
really
kind
of
called
together
in
this
last
week.
So
it's
it's
improving!
B
It
runs
on
openshift
and
kind
of
the
idea
is
there's
kind
of
like
a
micro
stuff
called
Seth.
Nano
and
I
have
kind
of
configuration
here.
That'll
create
a
set
of
credentials
and
use
openshift
secrets
to
sort
of
those
credentials
and
then
create
kind
of
a
stateful
step
running
like
a
single
pod
step
cluster
effectively.
B
That's
just
you
know
just
described
here
during
the
bootstrap
process
of
that
that
stuff
Nano
is
stateful,
said
it
will
use
the
the
credentials
from
the
secrets
to
kind
of
create
an
additional
set
of
users,
but
I
have
already
here
in
my
mini
shift
running
on
my
desktop
here,
a
set
cluster
consisting
in
one
pod,
a
more
kind
of
robust
environment.
You
know,
there's
the
workaround,
you
know
a
step
operator
using
using
rook
and
and
you'd
probably
want
to
do
that
in
kind
of
a
real
environment.
B
I
was
using
the
s3
a
connector
from
the
nuke
and
kind
of
the
the
pre-canned
images
when
I
was
initially
playing
around
with
or
the
OpenShift
spark
images
format
analytics
were
we're
using
a
an
older
version
or
a
little
spark
with
an
older
version
of
Hadoop,
and
that
was
that
was
kind
of
problematic.
But
one
of
the
new
things
that
you
know
the
brand
and
all
that
committee
is
the
ability
to
they
have
their
publishing
these
incomplete.
Both
Bishop's
Park
builder
images
that
you
can
pre
and
build
with
and
provide
a
tarball.
B
Basically,
a
link
to
spark
caramel
and
it'll
create
a
custom
open
ship
spark
image
for
you
with
that
particular
version
of
spark,
so
I
kind
of
put
together
a
spark.
The
latest
spark
and
a
new
2.8,
not
five
caramel
and
then
I'm
using
that
to
create
create
a
build
here.
A
built-in
chef
spark
and
then
I
can
go
ahead
and
use
that
that
resulting
image
as
the
base
for
for
a
notebook
notebook
here.
B
Though
I
did
that
earlier
here,
cluttered
it
up,
I
have
have
notebook
here:
I
threw
the
environmental
variable,
I
included
kind
of
the
notebook
in
in
this
particular
repository,
and
it's
it's
getting
the
rgw
end
point
kind
of
from
this.
This
Zelda
command
here
and
the
most
straightforward.
If
you
are
you're,
you
know
just
working
from
within
a
notebook,
and
you
want
to
interact
with
the
object,
store,
photo,
is
kind
of
the
AWS
library
and
choice
in
the
case
of
Python.
B
So
if
you
kind
of
want
to
be
able
to
interact
with
not
good
store,
this
really
kind
of
as
easy
as
installing
boto
and
then
you
know,
creating
a
boat
object
to
interact,
interact
with
the
object
store.
This
particular
image
I'm
just
using
the
base,
notebook
image,
and
so
it
doesn't
include
photo
three.
B
You
can
always
install,
and
you
know,
install
cook
condo
packages
from
the
notebook.
But
if
you
do
this
a
lot,
you
might
want
to
build
your
own
base,
notebook
image,
the
the
secret
key
and
the
user
Keith
their
credentials
or
for
the
staff
stuff,
Buster
and
then
the
end
point
are
are
specified
and
as
I'm
creating
the
photo
object
here
and
those
are
being
sourced
from
the
environment.
They
make
their
way
into
the
environment
from.
B
B
Being
passed
to
the
new
out
man
line
and
then
the
t-grip
and
or
the
user
key
and
super
key,
or
coming
from
open
ship
secret
and
being
exported
to
that
environmentally
revolt
inside
the
pod.
So,
instead
of
having
like
it,
would
be
very
bad
hygiene
to
kind
of
have
your
your
you
know,
s3
super
key
and
and
a
user
key
kind
of
like
statically
coded
into
the
notebook.
So
this
is
kind
of
the
the
current
best
approach
that
I've
I've
found,
at
least
in
connection
with
a
South
all
fixed,
or
so
we
have
this.
B
This
object
now
and
we
can
use
it
to
create
a
bucket
in
the
set
panel
object
store.
So
we
have
the
you
know:
SEF
object,
store,
running
and
open
shift
here
and
then
create
the
book
created
the
book
created
a
bucket
and
then
just
wrote
kind
of
a
dummy
object
into
that
bucket
and
and
then
listed
the
contents
of
that
line.
So
we
can
see
that
there
now,
that's
all
fine
and
well,
but
you
know,
if
you're
using
photo
just
within
the
confines
of
the
notebook.
B
That's
obviously
not
a
particularly
scalable
approach,
depending
on
what
you're
doing,
and
so,
if
you
have
to
do
some
more
heavy
lifting
and
interact
with
the
object
store,
that's
kind
of
where,
where
spark
comes
in
so
you
can
run
this
notebook
I
can
create
a
spark
context
here.
Of
course
it's
just
it's
just
talking
to
the
you
know,
it's
doing
writing
spark
locally
within
the
pod.
That's
running
with
the
notebook.
This
could,
you
know,
be
a
good
cluster.
B
He
provision
with
the
Machine
Co
or
for
like
a
spark
operator,
I
suppose
I'm,
not
particularly
familiar
with
that.
Yet
so,
oh
that's!
It
yeah
learn
more
about
that
and
then
with
s3a
you
have
to
set-
and
you
know,
there's
a
number
of
things
you
need
to
set
similar
to,
as
you
had
to
set
a
photo
again,
we're
setting
the
end
point
and
then
the
credentials
here,
but
we're
also
telling
it
to
use
a
path
style
access
kind
of
by
default.
B
B
Now
one
of
the
things
one
of
the
real
one
of
the
big
reasons
I
wanted
to
use
the
Hadoop
2.8.
Why
I
built
the
custom
open
over
ship
spark
image
was
it
allows
her
bucket
configuration
where
I
can
have
a
different
set
of
credential
or
a
different
end
point
for
different
buckets
so
right
here,
I'm
drawing
from
one
of
the
tutorial,
the
red
Analects
io
community,
tutorials.
B
That
they
have
some
data
that
they've
made
available
in
Amazon
s3
bucket
kind
of
showing
how
you
can
interact
from
the
same
context
with
data
both
in
the
public
cloud
and
the
private
cloud
or
public
object
or
private
object
store
here.
I'm
saying
that
this,
the
bucket
Brad
analytics
data
has
a
different
end
point
than
the
default.
B
B
So
and
very
much
is
you
have
this
kind
of
same
operational
modalities
being
using
death
as
an
object,
store
and
then
I'm
gonna
sana
s3?
So
if
developers
were
used
to
having
the
you
know,
experience
in
the
public
cloud
and
you
want
to
replicate
that
private
type
environment,
it's
really
really
relatively
seamless
for
prevented
experience
here,
I'm
going
to
there's
another
bucket
of
mine
called
be
dist
I'm
doing
the
same
thing.
I
did
up
here
with
the
rad
analytics
here.
A
B
B
It's
like
a
sanitized
version
of
the
reports
that
customers
provide
after
a
trip
and
one
of
the
things
they
do
with
it,
and
this
notebook
has
programmed
and
is
doing
a
sentiment
analysis
or
you
know,
building
a
building,
a
model
training
it
and
then
you
know
dating
that
model,
and
so
this
is
kind
of
the
data
that
we're
sourcing
there.
So
this
is
what
this
command
does.
Is
it's
reading
the
sample
data
out
of
the
bucket?
That's
in
Amazon
I
go
away,
so
I
can
read
it.
B
B
You
can
show
the
schema
going
back
to
the
data
frame,
one
which
was
the
end
of
day
kind
of
take
your
data
here
kind
of
count
it
where
you
can
register
the
table.
If,
like
you
know,
a
lot
of
data
folks
that
are
analyzing,
data
are
familiar
with
using
sequel,
so
they
just
want
to
use
raw
sequel
and
less
familiar
with
using
using
kind
of
the
Python
methods
for
manipulating
data.
They
can
certainly
do
that.
B
B
One
of
the
things
that
the
data
hub
team
for
our
and
company
we're
doing
we're
analyzing
data
that
was
stored
in
their
local
set
object
store.
So
you
know
this
is
set
bucket
and
SEF.
Nano
I
can
kind
of
load
the
sample
data
set
into
a
data
frame
and
luckily
I
already
installed
those
in
the
kernels.
Those
are
good
load
into
the
data
frame.
This
the
stats
are
pretty
data,
then
this
is
coming
from.
B
At
the
end,
at
the
end
of
this
notebook,
I'm-
probably
not
going
to
get
all
probably
stop
here
in
a
second,
don't
we
get
all
the
way
through
it,
but
you
know
the
saving,
the
resulting
model
and
tokenizer
and
feature
dimensions
back
into
s3,
and
then
it
persists.
You
know
you
I,
don't
have
to
worry
about
reattaching
that
pv2
something
an
object,
store,
that's
available
to
anything
that
has
the
appropriate
credentials,
and
this
is
really
kind
of
a
neat
way
of
kind
of
sharing
data
across
multiple.
B
I
write
write
this.
This
is
the
folks
on
shorts
team,
but
basically
they
train
a
train,
a
machine
learning
model
using
using
this
data,
and
then
then
you
eventually
got
some
charts
and
then
and
save
it
back
into
the
set
cluster
author
cup,
so
they're
kind
of
showing
the
sentiment
of
these
trip
reports.
You
know,
based
on
on
the
person
successful
versus
unsuccessful.
These
are
all
the
sanitized
made
up
kind
of
people's
names
and
then
breakdown
based
on
the
personality,
the
audience
or
the
rose,
customer
engineering
etc.