►
From YouTube: OpenShift Commons Briefing: Continuous Development and Deployment of AI/ML Models with Kubernetes
Description
OpenShift Commons Briefing
Continuous Development &Deployment of AI/ML Models with containers and Kubernetes
Guest Speakers:
Will Benton (Red Hat)
Parag Dave (Red Hat)
Peter Brey (Red Hat)
hosted by Diane Mueller (Red Hat)
2020-06-04
A
All
right,
everybody
welcome
back
again
to
another
openshift
Commons
briefing.
This
week
we
seem
to
be
having
an
AI
ml
theme
going
on,
so
we're
really
pleased
to
have
will
Benton
and
pyrite
Dave
from
Red
Hat
with
us
today
to
talk
about
continuous
development
and
deployment
of
AI
models
with
containers
and
kubernetes.
I
saw
a
preview
of
this
earlier
internally
at
Red,
Hat
and
I
thought
it
would
be
a
great
thing
to
bring
and
have
a
discussion
around
so
as
I'm
gonna,
let
the
guys
introduce
themselves
talk
about
it.
A
Do
some
demoing
and
at
the
end,
we'll
have
live
Q&A.
So
if
you
have
questions
you
can
type
them
in
the
chat
either
here
in
blue
jeans
or
on
Twitch
or
in
YouTube
or
Facebook
or
wherever
you're
watching
it
live
stream.
Then
we'll
relay
them
back
here.
So
with
that
said,
will
please
take
it
away
and
introduce
yourselves
in
let's
rock
and
roll.
B
I'm
will
Benton
I'm
an
engineering
manager
and
an
engineer
at
Red
Hat
in
the
office
and
CTO,
and
my
focus
has
been
on
helping
Red
Hat's
customers
build
machine
learning
systems
in
the
cloud
with
kubernetes
and
OpenShift,
and
one
of
my
sort
of
particular
passions
lately
has
been
figuring
out
how
to
use
contemporary
infrastructure
to
make
data
scientists
lives
easier.
So
how
can
we
improve
that
machine
learning
workflow
as
well
as
sort
of
running
the
system
in
production.
C
Brian
I'm,
probably
I'm
a
member
of
the
product
management
team,
the
developer
tools,
bu,
my
focus
has
been
you
know,
aligned
with
what
those
looking
at,
which
is.
How
can
we
enable
developers
to
create
and
deliver
the
applications
across
from
dev
to
test
a
prod
in
the
fastest
way
possible?
So
you
increase
the
frequency
in
the
most
efficient
way
and
the
optimal
way,
and
then
what
are
the
differences
that
have
happened
when
it's
a
specific
kind
of
a
workload,
whether
it's
AML
applications
versus
a
an
IOT
application
versus
a
traditional
Java
application?
C
C
So,
let's
start
by
looking
at
a
few
preambles
around:
how
do
businesses
really
benefit
from
a
inm
are
difficult
organizations.
You
know
globally
across
the
board
in
trying
to
understand
you
know
what
are
the
AML
initiatives
that
they
are
chasing?
What
are
the
benefits
that
they
would
like
to
derive
from
these,
and,
if
you
put
them
in
terms
of
various
categories,
you
can
we
see
that
my
adopting
certain
a
I
powered,
intelligent
software
businesses
can
actually
drive
a
lot
of
benefits
from
areas
like
customer
satisfaction.
C
So
this
we
can,
you
know,
gain
knowledge
about
what
the
customer
usage
is.
What
is
happening,
the
trend
sentiment,
analysis,
train
analysis
and
you'll
be
able
to
actually
increase
the
satisfaction
with
the
policies
and
services
can
also
gain
a
competitive
advantage
by
creating
some
differentiated
digital
services
that
were
driven
with
in
AI
driven.
You
know
the
concepts
behind
it
instead
of
a
rule
or
a
process
driven
and
a
philosophy
behind.
C
Obviously,
if
you
can
optimize,
you
can
leverage
AI
and
deep
learning
to
kind
of
optimize
your
current
business
services
that
are
out
there
and
you
can
hence
increase
your
revenue
because
you're,
optimizing
them
and
also
drive
some
new
revenue
streams
by
you
know
offering
similar
services
and,
lastly,
is
in
charge
in
automation.
Right
if
you're
able
to
automate
the
manual
repetitive
time
consuming
business
operations,
you
can
actually
reduce
your
operational
costs
and
this
allows
you
to
be
more
efficient
and
yet
offer
a
higher
customer
satisfaction.
C
So
a
IML
internally
in
organizations
is
being
leveraged
to
kind
of
drive,
increased
value
in
these
four
areas,
and
that
is
being
done
by
out-of-the-box
products
or
sub
that
has
been
built
in-house
in
the
next
type
is
well.
Here
are
some
examples.
You
know
when
we
looked
at
how
companies
are
leveraging
AI
and
machine
learning
to
achieve
some
positive
business
outcomes.
So,
for
example,
you
know
financial
services.
We
all
are
part
of
this.
You
know
outcomes
like
reduce
fraud,
so
you
know
you've
heard
of
credit
card
fraud,
detection
engines.
C
You
know
things
that
can
predict
whether
the
transaction
is
real
versus
fake.
That's
driven
with
AI
and
machine
learning,
so
it's
something
which
is
being
you
know,
driven
very
strongly
by
the
financial
services
markets.
Similarly,
in
the
medical
field
in
the
healthcare,
we
see
a
lot
of
medical
diagnosis
being
done
today.
I
right
so
there's
a
lot
of
work
happening
there,
but
they
can
speed
up
the
time
it
takes
deliver
diagnosis
and
also
increase
the
accuracy
of
the
diagnosis
by
augmenting
medical
professionals
with
an
AI
and
ml
driven
application.
C
When
we
look
at
insurance
claim
industries,
we
see
that
a
lot
of
the
automation
that
is
happening
around
processing
of
claims
around
a
pool
of
claims
is
increased.
You
know
the
the
amount
of
claims
I
get
that
approved
and
also
decrease
the
amount
of
time
the
customer
has
to
wait
to
actually
get
the
claims
approved
and
obviously
we
already
bought
a
Platinum
is
driving
right.
I
mean
the
self-driving
cars.
These
are
all
driven
lots
and
lots
of
data
being
processed
on
the
edge
with
a
inm
all
applications.
C
So
what's,
if
you
look
at
like
what's
driving
all
this
right,
it's
basically.
What
is
you
know
why
now
right,
and
what
is
happening
now
is
the
growth
of
AI
is
actually
driven
by
easy
access
to
abundant
computing
power,
faster
processing
with
specialized
computing
processors,
rapid
development,
with
some
rich
open
source
frameworks
that
you
can
actually
use
n
technologies
for
AI
learning
models
and
it
widespread
awareness
and
acceptance
amongst
all
of
us
in
the
world
of
AI.
C
Like
we
don't
see,
AI
interfering
in
our
lifestyles,
we
see
AI,
actually
augmenting
our
lifestyles
and
so
within
the
awareness
and
the
acceptance
has
spread.
It
has
actually
led
to
all
of
these
initiatives
not
be
taken
up
by
the
e
companies
and
the
computing
processing
and
the
computing
power.
What
is
to
run
on
supercomputers
and
not
be
deployed
to
a
cloud
environment
at
a
fraction
of
the
cost
and
time
so
these
two
factors
have
combined
together
to
basically
make
AI
ml
real.
C
The
next
up
is
so
now:
let's
look
at
okay,
so
you
know
we
understand
the
benefits
of
a
IML
for
a
company.
So
what
is
the
end-to-end
flow
that
is
followed
in
order
to
create
an
AI
ml
application?
So,
let's
take
a
look
at
a
typical
development
lifecycle
and
you
will
see
that
it
exhibits
many
similarities
to
traditional
software
development,
so
it
starts
with
the
business
leaders
right
so
the
business
leaders.
C
They
will
develop
the
AI
ml
models.
This
would
be
you
know,
collaborations
will
be
done
with
engineers
to
make
sure
that
their
area
models
are
able
to
leverage
the
architecture
in
the
systems
with
the
data
of
azides,
once
the
am
models
are
created.
The
next
part
of
that
is
well.
They
need
to
get
deployed,
so
the
data
scientists
typically
collaborate
with
the
app
developers
to
integrate
the
deployment
of
these
AI
mo
models
into
the
entire
application
development
process.
So
this
is
where
the
applications
consume.
C
The
models
make
them
productionize
and
put
them
into
the
application
that
the
end
user
would
not
be
consuming
to
leverage
these
models.
The
app
developers
also
lead
the
deployment
of
the
applications.
Is
you
have
to
deploy
them
out
right,
so
they
deploy
the
application
and
when
it's
deployed
at
that
point
in
time,
yeah
my
models
will
now
start
running
right
and
so
they're
running
its
infant's
capabilities.
New
data
comes
in.
C
They
have
a
kind
of
in
for
the
data
to
see
how
accurate
it
is,
and
these
models
are
being
monitored
and
managed
together
by
the
data
scientists
and
the
application
developers,
because
they
want
to
make
sure
that
the
models
are
delivering.
The
desired
outcomes
now
IT
arts
is
typically
continuously
engaged
across
all
the
aspects
of
this
lifecycle,
so
it
helps
you,
the
awareness
of
manager
and
monitoring
and
remediation
the
entire
system.
While
these
models
are
being
monitored
by
the
data
scientist
and
app
developers,
they
want
and
making
sure
that
the
correct
predictions
are
being
made.
C
It
requires
you
to
go
back
and
retrain
the
models.
So
the
loop
is
it's
a
big
feedback
loop
that
happens
right
models
will
get
retrained
as
needed,
say,
for
example,
you're
making
predictions.
So
your
goal
is
to
increase
the
accuracy
of
the
predictions.
So
you
have
a
you
deploy
it,
you
can
new
data
coming
in.
You
see
how
the
outcome
is
and
then
you
go
back
and
you
never
end
up
retraining
the
model
so
to
make
sure
that
the
predictions
are
accurate.
So
this
is
what
happens.
C
Similarly,
when
some
new
data
comes
in
and
it's
like,
oh,
we
did
not
train
on
that
data
over
a
I
am
a
Mollison
to
now
handle
this
new
data.
So
now
we
gotta
go
back.
We
create
the
models
again
and
then
do
the
entire
deployment
process.
So
this
continuous
feedback
loop
is
always
happening
from
training
developing
in
the
air
model.
Two
deployments
tube
actively
training
them
it
and,
as
we
covered
like
you
know,
this
involves
personas
from
all
over
the
place.
C
Now,
if
we
take
a
look
at
this
typical
life
cycle,
you
will
see
that
at
the
top
you
know
we
have
the
project
lifecycle,
setting
the
business
goals
and
the
data
and
engine
isn't
they
need
to
work,
and
if
we
kind
of
gather
the
data
prepare
the
data
make
it
ready
in
order
to
execute
all
this,
you
need
it.
Let's
call
it
the
machine
learning
software
tool
chain.
So,
for
example,
it
starts
with
tensorflow
jupiter
notebooks.
You
know
python
stacks,
for
example,
for
development.
C
So
this
hybrid
cloud
platform
basically
empowers
data
scientists
and
their
engineers
and
developers
to
be
agile
and
collaborate
it
through
the
entire
process,
and
you
don't
depend
too
much
on
IT
operations
for
individual
tasks.
Now
this
hybrid
cloud
platform,
because
it's
self-service,
also
needs
to
be
optimized
for
the
kind
of
AI
ml
application,
you're
building.
So,
for
example,
if
there
are
hardware
accelerators
like,
for
example,
TB
user
can
help
you
speed
up
the
development
of
the
ml
models
and
you're
on
the
inferencing
tasks.
C
C
If
it's
you
know,
processing
data,
that's
coming
from
an
edge
location,
then
that
needs
to
be
covered
as
well
and
has
to
be
done
in
such
a
way
that
IT
operations
can
manage
it
from
a
single
place
can
manage
it
in
the
same,
you
know
in
a
singular
way,
in
a
consistent
way,
rather
than
trying
to
adopt
for
each
particular
environment
in
the
infrastructure
where
it's
landing.
So
this,
if
you
take
a
look
at
the
entire
lifecycle
and
in
the
tool
chain,
this
is
where
all
of
it
lives
right.
C
We
start
with
the
with
the
bottom
infrastructure.
Where
things
are
gonna
get
deployed.
You
have
a
cloud
platform,
the
runs
on
top
of
it
as
the
compute
power
has
the
knowledge
of
what
goes
into
GPUs.
What
doesn't
go
to
GPUs
provides
the
architecture
for
the
data
provides
architecture
for
the
tooling.
That
includes
the
CI
CD
process
for
continuous
deployment
and
faster
deployment,
and
all
of
this
leads
into
thee
into
the
you
know
the
intro
and
flow
of
delivering
the
applications.
C
So
now
that
we
know
that
this
is
the
tooling
and
the
the
Infosys
that
is
needed.
Let's
look
at
the
benefits.
What
the
container
base
architecture
and
kubernetes
brings
to
this
development
and
deployment
of
models
through
that
lifecycle.
We
just
covered
C
with
containers
and
and
kubernetes
driving
their
orchestration
of
the
containers,
data
scientists
and
software
developers
can
act,
develop
ml
models
and
the
associated
intelligent
applications
powered
by
these
models,
with
a
very
high
degree
of
agility
flexibility,
Portability
and
scalability.
So
we
think
of
leveraging
the
power
of
kubernetes
right
and
infrastructure
is
code.
C
You
can
now
automatically
set
up
your
AML
environments
across
the
infrastructure.
You
know
hybrid
cloud,
so
there's
public
clouds
or
on-premise.
You
can
set
it
up
automatically
because
you're
declaring
it
as
code,
and
you
can
now
do
on-demand
provisioning
of
your
computer
resources
during
your
development
process
of
the
models
during
the
deployment
process
of
the
models
and
then
during
the
running
of
the
models.
C
So,
as
your
demand
grows,
you
can
actually
scale
out
the
running
part
of
it
or,
as
your
data
demands
grow,
as
your
data
gets
bigger
and
bigger
in
your
training
sets,
you
can
actually
now
have
higher
compute
that
helps
you
develop
the
models,
so
the
power
that
kubernetes
and
containers
bring
to
the
table.
The
biggest
one
is
around
scalability,
because
you
can
then
scale
as
you
need,
and
also
around
H,
a
right,
because
if
it's
in
a
real-world
environment,
many
applications
are
running.
C
If
you
have
downtime
or
failures,
you
know
whether
it's
Network
failures,
hardware
failures,
your
entire
solution
can
keep
on
running
and
it
can
actually
be
automatically
provisioned.
Where
else
it
needs
to
go
to
broad,
uninterrupted
service
to
your
customers.
When
we
look
at
portability
now
we
talked
about
how
the
models
need
to
run
across
weight
as
part
of
the
infrastructure,
which
means
that
we
don't
want
to
create
a
model
that
can
only
run
on
on-premise
with
a
particular
public
cloud.
C
It
has
to
be
really
effective
or
we
factor
the
idea
is
use,
containers
and
kubernetes
to
allow
us
to
fold
these
models
to
run
no
matter
where
the
end
environment
ends
up
being,
and
so
this
on
just-in-time,
inventory,
just-in-time,
scaling,
H
a
portability
and
being
able
to
quickly
deploy
changes
to
very
specific
pieces
of
the
products
versus
updating
a
monolith
application
with
one
change
that
you
had
to
make
to
take
care
of.
Maybe
a
new
data
model,
or
maybe
to
some
bug
that
was
found.
C
It's
much
harder
to
do
that
when
it's
being
driven
as
a
single
application
versus
a
containerized
set
of
application
and
a
continuous
set
of
microservice
is
the
make
of
the
application,
because
then
you
can
update
as
you
need
for
the
respective
pieces
of
them.
So
it
makes
you
more
agile
in
how
you
will
respond
to
either
new
requirements
or
bugs,
and
also
about
new
computing
requirements
that
you
have
on
the
scale.
So
now,
I,
don't
know
what
do
you
will?
So
we
can
take
a
look
at
this
in
action,
but
openshift
thanks.
B
For
so
I
wanted
to
sort
of
do
a
deeper
dive
into
that
machine.
Learning
lifecycle
from
from
sort
of
a
practitioners
perspective
and
talk
about
how
we'd
use
this
to
solve
a
concrete
problem.
So
Parag
talked
about
a
lot
of
problems
that
are
actually
driving
business
value.
I'm
not
going
to
talk
about
such
a
problem
today,
I'm
going
to
talk
about
a
problem
that
everyone
went
under,
but
no
one
is
looking
to
build
a
new
solution
for
right
now
and
that
problem
is
spam.
B
Classification
we're
going
to
start
with
a
hypothetical
data
set
where
we
have
two
kinds
of
data
sources.
We
have
data
on
the
top
which
we're
calling
legitimate
documents,
and
that
looks
like
data
on
the
bottom,
which
we're
calling
spam
documents
and
if
you
look
at
these
and
think
about
them,
you
could
probably
say
well.
I
can
see
some
differences
between
these
things.
I
could
see
a
way
to
tell
them
apart.
If
you
really
think
about
it,
you
might
think
that
the
excerpt
on
the
top
sounds
suspiciously
like
Jane
Austen
and
the
excerpt
on
the
bottom.
B
It
sounds
suspiciously
like
it
came
from
a
user
comment
on
a
recipe
site
or
a
review
of
a
food
product,
and
that's
in
fact
what
our
data
sources
are.
We're
gonna
call
legitimate
documents,
documents
that
have
been
generated
by
a
generative
model
trained
on
Jane
Austen's,
complete
creative
output
and
our
spam
documents
are
going
to
be
documents
that
are
trained
on
fine
food
reviews
from
a
large
internet
retailer.
B
So
the
idea
is
that
we
can
tell
these
things
apart
by
looking
at
them,
and
we
should
also
be
able
to
write
a
program
to
tell
them
apart.
So,
let's
dive
into
that
workflow
and
see
what
we
do
in
this
specific
case
to
sort
of
solve
that
problem.
The
first
task
that
a
data
scientist
is
going
to
do
again
in
conjunction
with
business
leaders
and
stakeholders,
is
figure
out
a
way
to
formalize
the
problem.
B
We
need
to
figure
out
what
it
means
to
succeed
at
this
problem
and
turn
success
into
a
number,
and
that
could
be
metrics
that
we're
already
collecting
or
metrics
that
we
need
to
invent
and
record
in
the
case
of
document
classification
or
spam
filtering
success
could
mean
not
missing
spam
messages
right,
like
I,
never
want
to
see
a
spam
message
in
my
inbox.
Now,
of
course,
we
could
say:
I'm,
never
gonna
see
a
spam
message
in
my
inbox
by
sending
everything
to
the
spam
folder.
B
So
that's
obviously
not
the
whole
story,
but
it
could
also
mean
that
we
don't
miss
file,
legitimate
messages
right
that
we
don't
see
that
we
don't
have
a
lot
of
legitimate
messages
that
would
wind
up
in
someone's
spam
folder.
Now
those
are
metrics
that
we
can
test
when
we
have
a
training
set
when
we
know
what
the
truth
is
or
something
there
are
also
busy
metrics
we
might
care
about,
and
in
this
case
it
could
be
feedback
from
our
users.
How
many
messages
did
we
send
to
someone's
inbox?
For
example,
they
got
flagged
as
spam.
B
How
many
messages
did
we
send
to
someone's
spam
folder
that
they
moved
back
into
the
inbox?
Obviously
these
aren't
the
whole
story,
because
people
aren't
perfect
right.
Someone
is
not
going
to
go
through
every
message
in
their
spam.
Folder
and
say:
did
I
really
mean
to
read
this,
and
even
if
they
did,
they
might
not
give
us
the
signal
by
sending
it
back.
But
these
business
metrics
are
an
important
part
of
the
problem
and
responsible
data
scientists
will
focus
on
the
whole
picture.
B
Looking
at
all
of
these
metrics
together,
once
we
have
those
metrics
out
of
the
way.
Our
next
step
is
to
collect
clean
and
label
data.
In
this
case,
that
means
going
from
raw
messages
where
we
have
labels
to
labeled
messages
in
a
regularized
format
where
we
have
individual
documents
that
are
examples
that
we've
labeled
as
either
spam
or
legitimate.
B
It's
basically
just
summarizing
points
in
space,
and
so
what
I
want
to
do
is
encode
every
document
that
I
see
as
a
point
in
space
in
such
a
way
that
similar
documents
correspond
to
similar
points
in
space
and
then
I
can
say
interesting
things
like
oh,
it
looks
like
there
are
a
lot
of
legitimate
documents
in
this
part
of
space.
So
I
could
maybe
say
that
my
model
is
going
to
distinguish
between
things
that
are
in
this
part
of
the
space
and
things
that
are
in
other
parts
of
the
space.
B
Just
as
an
example,
once
we
have
those
features,
we
can
use
those
as
input
to
a
model,
training
algorithm
where
we
take
the
label
data
where
we
know
the
truth.
The
approach
that
we
use
to
turn
that
label
data
into
future
vectors,
and
then
we
allow
the
model
to
identify
patterns
in
those
vectors
that
we
can
use
to
answer
the
question
we
care
about
in
this
case.
Is
this
document
spam
or
not,
and
really
at
a
high
level?
All
of
this
model,
training
algorithm
is
doing
is
identifying
good
trade-offs
in
how
it
summarizes
the
data.
B
B
Maybe
it
was
ads
for
online
gambling
one
week
and
online
gray
market
pharmaceuticals
the
next
week-
and
you
know
mortgage
discounts,
the
third,
but
there
would
be
various
topics
that
would
elude
the
spam
filter
and
then
someone
would
sort
of
identify
that
these
were
getting
through
to
the
inbox
and
push
out
a
new
version
of
the
spam
filter
that
caught
those
things,
and
so
the
spammers
and
the
spam
filters.
We're
playing
this
Katan
scheme
in
the
real
world
in
general.
B
Models
can
start
misbehaving,
and
the
interesting
thing
about
models
is
that
conventional
software
components,
if
we're
lucky
break
and
obvious
ways
they
don't
build,
they
don't
deploy,
they
obviously
slow
in
production
models,
though
remember
we
just
have
a
function
that
makes
a
prediction
and
the
way
that
this
can
can
misbehave
is
sort
of
more
insidious
than
the
way
that
our
conventional
web
apps
might
misbehave,
and
that's
that
the
model
could
keep
giving
you
answers.
They
might
just
be
wrong.
B
Far
more
often
than
you
can
accept
so
by
monitoring
the
behavior
of
the
model
in
production,
we
can
identify
this
before
it
causes
us
a
business
problem.
Now,
as
prague
said,
this
is
not
a
waterfall
right.
This
is
actually
a
iterative
process
and
at
a
lot
of
stages
in
the
workflow,
we're
backtracking
and
changing
decisions
we
made
earlier.
Another
really
interesting
thing
about
this
lifecycle.
Is
that,
because
of
all
these
loops,
we
have,
we
need
to
be
really
careful
about
the
latency
between
phases
and
a
lot
of
organizations.
B
Data
scientists
when
they
need
new
infrastructure
to
try
a
new
approach
to
solve
the
problem,
have
to
file
a
ticket
with
IT.
They
have
to
get
something
supported
in
a
lot
of
environments.
If
data
scientists
are
wanting
to
build
a
model
service
that
can
be
incorporated
into
an
application
they're
either
going
to
develop
that
service
themselves,
using
a
skill
set,
that's
probably
not
where
they'd
rather
be
spending
their
time
or
they're.
Gonna
have
to
have
a
communication
exercise
with
an
active
team
and
they're
gonna
have
to
say,
hey,
look
at
this
technique,
I
developed.
B
Can
you
figure
out
how
to
turn
it
into
a
production
application
and
based
on
our
experience
of
seeing
this
work
flowing
in
person?
There
are
some
teams
where
this
works
very
well,
but
for
some
teams
this
turns
into
a
lot
of
time
spent
at
a
whiteboard
and
a
lot
of
raised
voices
and
a
lot
of
eventual
apologies.
B
How
we
solve
this
problem
from
end
to
end?
The
first
thing
I
want
to
show
you
is
the
open
data
hub
operator,
which
is
a
community
project
sponsored
by
Red
Hat
that
provides
an
end-to-end
data
science
and
data
engineering
discovery
environment
with
a
single
click
instead
of
filing
a
ticket
with
IT.
If
I'm,
a
data
scientist
that
has
access
to
open
shift,
I
can
install
this
myself.
If
this
is
already
installed
by
my
organization,
I,
don't
even
have
to
install
it.
B
I
can
just
go
to
an
endpoint
and
get
an
interactive
development
environment
for
data
science.
Now
a
lot
of
data
scientists
prefer
to
work
in
conventional
IDs,
but
a
lot
of
IDs,
but
a
lot
of
data
scientists
also
like
to
work
in
these
so
called
interactive.
Notebook
environments
and
I'll
show
you
what
these
look
like
here:
I
have
a
directory
of
notebook
environments
that
I've
launched
from
Jupiter
hub
on
the
open
data
hub,
and
this
is
basically
just
a
way
to
do
literate
programming
in
a
document.
B
So
I
have
some
pros
here
and
I
have
some
code
and
then
I
have
the
output
of
that
code
and
I
can
change
this
code
as
it
runs
and
edit
it.
So
this
is
a
really
nice
way
to
experiment
with
techniques
interactively
right.
I
can
say:
I
want
23
rows
of
this
data
set
instead
of
50
and
I,
get
a
different
result
and
I
can
edit
this
and
it's
it's
also
a
communication
tool
right.
This
is
for
a
lot
of
data
scientists.
A
lot
of
their
job
is
communicating
results
to
stakeholders.
B
So
we
want
to
explain
what
we're
doing
show
the
code.
Let
people
in
the
code
se3
reproduce
our
work
now.
The
interesting
thing
is
we
can
also
have.
We
can
have
these
sort
of
tables
and
we
can
also
have
plots
right.
So
we
could
say:
is
there
a
clean
separation
between
the
points
in
space
for
this
problem,
and
we
can
see
that
yeah?
There
basically
is
so
I
could
take
this
notebook,
use
it
to
develop
the
technique
and
then
and
it
over,
to
a
stakeholder
and
use
this
as
the
basis
for
presentation.
B
B
Now
for
this
concrete
problem,
we've
looked
at
a
couple
of
different
approaches
here:
I've
run
them
already,
but
I
can
just
restart
and
run
this
again,
and
this
is
a
feature-
engineering
approach
where
we're
basically
going
to
turn
documents
into
vectors
so
that
we
can
feed
them
into
a
machine
learning
algorithm,
and
we
can
see
that
we
have
some
sort
of
sanity
checking
our
spam
and
legitimate
documents.
The
spam
document
look
is
by
talking
about
cake
ups
and
dad
coffee.
B
This
legitimate
document
is
talking
about
things
that
upper
middle
class
people
in
19th
century
England
are
doing,
and
this
thing
is
talking
about
tea
and
dog,
biscuits
and
baby
food,
and
so
on.
So
we
see
that
there's
some
clear
distinction
between
the
kinds
of
things
that
these
documents
are
talking
about.
If
we
go
on
and
look
at
the
rest
of
this,
we
can
see
that
you
know
we
were
able
to
sort
of
trim
these
into
these
large
vectors
and
then
have
and
then
sort
of
save
that
save
that
pipeline.
B
There's
nothing
in
this
notebook
that
knows
about
OpenShift.
Crucially,
this
is
just
a
communication
tool
that
a
data
scientist
would
work
with
now
we're
gonna
train
a
model,
and
we
can
again
I've
sort
of
run
this
in
advance
just
before
we
started,
but
we
can
go
through
and
look
at
it.
Here's
some
metrics
on
how
well
our
model
is
doing.
This
picture
basically
means
how
many
of
the
legitimate
messages
did
we
actually
predict
with
legitimate
how
many
of
the
spam
messages
did
we
actually
predict
or
spam
and
then,
on
the
other
diagonal?
B
How
many
spam
messages
do
we
call
legitimate
and
how
many
legitimate
messages
that
we
call
spam
and
again
this
is
the
it's
just
a
communication
tool
right.
This
doesn't
look
a
lot
like
something
you
could
immediately
drop
into
production
now
in
a
conventional
workflow.
A
data
scientist
would
take
these
notebooks
and
send
them
to
an
active
team
and
have
the
app
dev
team
figure
out
how
to
implement
them
and
in
a
service.
B
But
we
know
that
with
open
shifts,
developer
experience,
we
can
do
better
right
and
we
actually
have
a
source
to
image,
build
as
part
of
a
Tecton
pipeline
here
that
will
take.
These
notebooks
extract
the
code
that
trains
the
model
and
build
a
micro
service
around
this
after
training
the
model
in
a
build.
So
I've
already
run
this
in
advance.
B
So
we
let
them
we
let
them
to
in
the
model
to
make
it
better.
We're
just
showing
you
the
first
stage
of
this
lifecycle,
but
then
we
go
through
from
there
and
we
actually
deploy
a
que
native
service
based
on
the
model
that
we
trained
in
those
notebooks
in
production.
So
what
we've
done
is
we've
extracted
the
code
that
does
the
feature
engineering
and
the
model
training
from
these
notebooks.
B
B
That's
running
right
here
in
this
pipeline
service,
we
also
have
a
parallel
build
that
just
uses
regular
source
to
image.
I've
also
built
a
a
version
of
this
that
just
uses
a
conventional
sourced
image
build
to.
So
if
you
have
an
adopt
a
Tecton,
yet
you
can.
You
can
still
use
similar
techniques
we
like
to
show
the
latest
and
greatest,
though,
all
right,
so
here's
how
you
might
interact
with
this
in
a
in
an
actual
application
and
I
have
I
have
a
couple
of
different
URLs
here,
we're
using
one
of
them.
B
This
is
the
one
for
the
K
native
service.
We
also
have
one
for
the
conventional
OpenShift
service
down
here,
so
I'm,
defining
the
end
point
that
I
want
to
interact
with
I'm,
declaring
a
very
simple
client
library,
where
I
just
take
the
text
that
I
pass
in
and
post
it
to
that
rest
service,
that
I
created
so.
B
B
We
would
hope
that
this
would
get
predicted
as
Jane
Austen,
but
again
we
left
some
room
for
improvement
in
the
model,
so
these
are
both
show
up
as
spam,
but
let's
try
this
with
some
more
documents:
I'm
gonna
load
in
the
training
data
I
had
and
I'm
gonna
take
a
sample
of
these
and
look
at
how
well
the
model
performs
on
these
examples.
So
we
have.
B
We
have
a
lot
of
examples
here
in
a
lot
of
predictions,
and
the
interesting
thing
here
is
that
we
can
actually
go
back
and
track
metrics
about
the
predictions
we've
made
and
the
the
the
interesting
part
is
that
we
can
sort
of
look
at
this
service
and
see
what
it's
what
it's
done
right
remember.
We
talked
about
data
drift
right.
B
We
may
not
know
whether
message
is
spam
or
legitimate
in
real
life,
but
we
may
know
that
we
expect
that
a
certain
percentage
of
messages
we
see
are
spam
right
and
in
the
real
world,
maybe
maybe
90%
of
all
email
traffic
is
spam,
and
most
of
it
just
never
makes
it
to
your
inbox,
maybe
95
percent
of
spam.
But
if
that
distribution
changes
over
time,
we
know
that
the
data
we're
seeing
in
the
real
world
no
longer
corresponds
to
the
data
that
we
trained
our
model
on.
B
So
if
we
start
with
like
a
hundred
thousand
a
hundred
thousand
examples
with
five
percent
legitimate
95
percent
spam,
we
should
expect
that
the
distribution
of
these
messages
is
roughly
comparable
to
95
percent
spam
and
five
percent
legitimate,
so
we're
tracking
these
metrics
from
the
model,
and
we
can
actually
see
them
in
Griffin,
ax
and
as
soon
as
Griffin,
that
catches
up
will
see.
Those
metrics
reflected
in
this
dashboard
here,
but
you
can
see
how
we've
sort
of
built
it
up
over
time.
B
Run
some
different
experiments
and
we'll
see
when
those
catch
up
with
when
Griffin
it
catches
up
with
those
experiments
that
we've
run.
If
we
say,
25
percent
of
messages
are
legitimate.
We
should
see
different
curves
in
that
graph,
so
we're
getting
a
little
bit
of
a
tick
up
here
as
as
the
metric
system
catches
up,
but
we
can
see
that
these
these
curves
are
gonna
catch
up
over
time,
we'll
use
a
shorter
time
window.
So
it's
a
little
easier
to
see
and.
B
You
know
as
we
go
on
so
we
see
that
there
are
a
lot
of
legitimate
messages
and
spam
messages
with
this
latest
one,
and
that's
not
what
we'd
expect
right.
We'd
expect
that
these
would
be
growing
at
the
same
rate,
because
the
proportion
between
them
would
be
staying
the
same
so
in
a
real
in
a
real
installation.
We
wouldn't
just
have
a
data
scientist
monitoring
this
dashboard
waiting
for
something
bad
to
happen.
We'd
want
to
let
them
do
something
more
productive
with
with
their
time,
but
we
could
define
an
alerting
rule
for
a
like.
B
Has
this
distribution
changed,
or
even
have
another
model
that
detects
anomalous
behavior
in
these
predictions?
So
just
to
recap
what
we've
seen
in
this
in
this
demo
end-to-end
is:
we've
seen
using
open
data
hub
on
open
shift
to
provision
a
self-service
discovery
environment
we've
seen
using
that
open
data
hub
to
do
interactive
development
and
sort
of
produce
a
machine
learning
technique
in
an
interactive
notebook.
B
We've
seen
how
we
go
from
that
interactive
notebook,
which
is
really
a
communication
tool
and
not
what
we
think
of
as
a
conventional
software
artifact
to
an
actual
production
service
that
we
can
incorporate
into
our
application,
using
open
chips,
developer
experience
and
Tecton
build
pipelines.
And
then
we've
seen
how
we
can
monitor
the
behavior
of
that
model
in
production
so
that
we
can
detect
when
it
misbehaves.
B
B
A
That
was
great
and
thank
you
for
that
explanation.
We
have
a
couple
of
questions
in
the
hair
and
I'm
gonna
unmute
Pete
Bray
who's,
also
from
Red
Hat
is
here
and
he's
been
answering
a
little
bit
of
the
questions
in
the
chat
as
we've
been
going
and
you
could
see
Pete
there,
and
so
one
of
the
questions
and
I
think
it's
a
good
conversation
was
around
storage
and
we'll
lead
had
asked
that
question
about.
You
know
what
is
trending
now
and
storage
for
AI
and
m/l
data.
I,
wonder
if
you
could
address
that
Pete
sure.
D
And
I'll
paraphrase
a
little
bit
what
I
wrote
in
the
response
back?
The
answer
is,
it
really
depends.
We
are
seeing
some
particular
trends,
but
it
really
depends
upon
the
types
of
data
and
in
general,
you
know.
There's
there's
really
two
large
actually
there's
three
large
categories
of
data
structured
data,
which
you
normally
would
think
of
as
things
like
customer
records
or
things
that
would
go
into
databases
they
fit
very
nicely
into.
You
know
a
tabular,
common
or
columnar
type
of
format,
but
we
know
that
not
all
data
is
nice
and
neat
like
that.
D
In
fact,
there
was
another
category
of
data
called
semi-structured,
which
is
midway
between
being
very
structured
and
columnar
to
being
very
unstructured,
which
is
actually
the
third
category
of
data
and
in
the
unstructured
category.
These
are
things
like
files
and
I
think
what
lead
who
had
asked.
The
question
was
specifically
asking
about
unstructured
data
files.
Basically
that
were
he,
it
looks
like
he's
using
NFS
for
today
and
so
I
skipped
over
the
middle
section,
which
was
semi
structured
data.
It's
basically
a
combination
of
both
structured
and
unstructured
data.
D
Helping
people
be
able
to
do
that
to
get
to
this
new
necessary
environment,
you
might
ask
well,
why
would
you
want
to
do
that?
There's
a
lot
of
different
reasons.
The
most
primary
reason
is
that
s3
presents
a
very
flat
namespace,
which
is
massively
expensive
and
when
you're
building
a
data
Lake
that
could
potentially
be
hundreds
of
petabytes,
that's
very,
very
important,
and
that's
actually
one
of
the
challenges
with
traditional
file
systems
like
NFS,
is
there's
limits
to
their
ability
to
scale
because
it's
more
of
a
hierarchical
type
of
namespace.
A
B
I'll
take
I'll,
take
a
crack
at
it
and
I
think
you
probably
have
some
thoughts
here
too.
That
I
mean
chime
in
if
you'd
like,
but
so
there
are
a
lot
of
technologies
in
this
space
that
solved
issues
of
data,
lineage
I'd
really
look
at
I,
really
look
at
sort
of
managing
the
model
lifecycle.
A
big
concern
here
is
reproducibility
right,
and
there
are
so
many
facets.
B
Reducibility
says
so:
I'm
gonna
start
by
level
setting
and
then
I'll
get
get
to
your
question
like
with
Jupiter
notebooks,
you
saw
how
I
went
back
and
edited
things
right
and
I
ran
things
in
different
orders.
You
can
do
that
in
a
notebook.
If
I
do
that
in
a
notebook,
the
output
in
the
notebook
is
not
going
to
be
what
someone
else
is
going
to
get
if
I
send
it
to
a
colleague
and
she
tries
to
run
it
right.
B
If
I
don't
have
the
same.
Libraries
installed
that
you
have
installed,
you
will
get
different
results
than
I
will.
If
I
have
a
library
that
has
soft
dependencies
right
where
it
behaves
one
way,
if
an
optional
package
is
installed
versus
a
non-optional
package
installed,
you
may
get
different
results
than
I
do
running
it
running
a
model,
and
then
finally,
there
are
all
sorts
of
other
concerns
related
to,
like
you
know,
making
sure
that
I
specify
random
seeds,
making
sure
that
I
use
random
number
generators
in
a
way.
B
That's
safe
for
the
kind
of
parallelism
that
I'm
exploiting
in
my
application,
making
sure
that
making
sure
that
any
native
libraries
that
my
Python
or
JVM
code
is
calling
out
to
are
the
same
versions
and
have
the
same
behavior.
If
you
really
need,
if
you
really
need
bit
level
reproducibility
of
your
model,
which
which
many
people
do,
then
you
have
a
whole
host
of
challenges
in
the
code
and
that's
what
we
focused
on
today.
You
also
have
a
whole
host
of
challenges
with
the
data
right.
B
You
know,
your
model
is
only
as
good
as
the
data
it
gets.
Your
model
is
only
reproducible
if
you
know
which
data
you
use
to
train
it
and
how
you
got
that
data
and
I
think
in
terms
of
actual
data,
lineage
tracking.
There
are
a
lot
of
great
projects
in
this
space
that
address
that
component.
It's
not
something
we
addressed
in
the
demo
today,
but
you
can
look
at
technologies
like
pachyderm,
for
example.
B
There
are
other
projects
like
I.
Think
DBC
is
another
good
example,
or
the
quilt
project
has
sort
of
a
metadata
layer
for
for
machine
learning.
Data
sets
as
well,
and
it's
it's
a
tricky
problem
right,
I
think
I
think
what
a
lot
of
people
like
or
what
a
lot
of
want
to
have
is
something
that
looks
like
a
get
style
interface,
where
you
have
a
content-addressable
set
of
trees.
You
can
say
I
built
this
model
against
the
immutable
data
that
I
had
in
this
particular
hash
of
a
tree.
B
B
Ceph,
of
course,
is
immutable
by
default
right,
like
you're,
not
overwriting
things
unless
you
have
to
so.
It's
a
case
where
sort
of
our
platforms
here
provided
or
the
read
hath
life
forms
provide
a
provide,
a
primitive
that
you
could
use
to
support
this,
but
I
mean
again,
there
are
a
lot
of
community
projects
that
solve
this
problem
really
well
and
I.
Think
they're.
Those
are
those
are
all
worth
looking
at
in
this
case,
and
the
way
we've
been
thinking
about
this
problem.
B
Is
that
really
you
know
you
don't
just
want
to
track
your
code
and
your
libraries
and
your
hyper
parameter
settings
and
your
ending
seeds.
You
also
want
to
track
your
data
and
in
terms
of
actually
thinking
about
lineage
in
terms
of
pipelines.
If
you
have
a
sort
of
classical
data
Lake
to
data
warehouse
architecture,
where
you're
going
from
raw
data
to
sort
of
incrementally
cleaned
data
in
multiple
steps,
you
need
a
way
to
replay
those
pipelines
that
you
need
to
track
the
identities
of
the
data
you're
dealing
with
at
every
stage.
I
do
Pete.
D
You've
made
some
really
good
points.
William
in
you
know
at
the
very
high
level
about
you,
know
the
the
code
piece
of
the
equation
here,
as
well
as
the
data
piece
of
the
equation
and
at
a
very
high
level
I.
Think
many
of
us
probably
heard
the
statistic
that
you
know
as
as
Parag
was
presenting
this
flow.
That's
on
the
screen
right
now.
You
know
the
gathering
and
prepare
data
stage
is
actually
probably
one
of
the
most
problematic
stages
right
now
for
data
scientists,
I
think
a
lot
of
people
cite.
D
But
what
you
know
we're
talking
about
here
and
what
william
was
talking
about,
is
an
even
more
specific
case
of
this
problem,
because
you
know
not
only
do
I
have
the
problem
gathering
the
data,
but
how
do
I
ensure
that
there's
reproducibility
and
I
was
gonna
answer
in
exactly
the
same
way
that
there
there
are
lots
of
different
ways
to
address
this
there's
obviously
commercial
packages,
but
there's
a
lot
of
open
source
packages.
Also
that
can
help
you
with
this
particular
problem.
D
It
is
something
that
you
know
the
industry
I
think
is
focusing
on
because
it
is
such
a
profit.
It
is
such
a
broad
problem
with
respect
to
my
earlier
comments
about
object,
storage
technology
becoming
much
more
prevalent.
This
is
an
area
also
where
object
storage,
as
the
technology
can
help,
because
it
has
built-in
versioning
capabilities
for
objects
and
so
you're
able
to
maintain
that
data.
You
know,
as
you
know,
the
objects,
the
files
whatever
it
is,
may
potentially
change.
A
Alright,
well,
you
mentioned
a
couple
of
open
source
potential
projects
and
things
like
that
and
there's
another
question
in
here,
and
maybe
we
can
tease
out
a
little
bit
too
about
how
to
get
started
on
openshift
with
all
of
this
through
a
question
molly
is
asking
about:
are
there
open
data
hub
cookbooks
recipes
for
all
these
AI
ml
processes
and
steps
that
one
can
refer
to
and
I
think
that's
kind
of
we've
talked
about
it,
you've
demoed
it.
How
do
people
get
started
where,
where
are
the
resources
and
things
of
that
nature
for
this?
B
Yeah
absolutely
so
I
think
open
data
hub
IO
is
is
a
great
place
to
learn
about
the
open
data
hub.
We
have
a
couple
of
github
github
projects
and
github
orders.
We
have
a
github
organization
where
we've
collected
some
of
these
tutorial
materials,
and
you
know
I'm
happy
to
follow
up
offline
with
anyone,
who's
interested
in
reproducing
some
of
these
things.
They're
trying
some
of
these
things
out,
but
I
cannot
as
soon
as
I'm,
not
sharing
my
screen.
I
can
put
a
couple
of
links
in
the
chat
all.
B
B
B
B
So
remember:
we
talked
about
a
key
aspect
of
reproducibility
is:
do
you
have
the
right
libraries
installed?
Well
a
great
way
to
solve
that
problem
is
with
having
your
development
environment
stored
in
a
container
image,
because
then
I
don't
have
to
worry
about
whether
or
not
the
libraries
I
installed
are
even
still
available,
which
is
probably
C.
Surprisingly
often
or
whether
or
not
you
installed
the
exact
same
versions.
B
If
I
go
into
one
of
those
Jupiter
hub
notebooks
and
try
and
allocate
six
gigabytes
of
memory
in
a
way
that
might
crowd
out
other
people,
who's
Jupiter
notebooks
happen
to
be
running
on
the
same
VM
or
the
same
physical
node
on
the
Linux
kernel.
Openshift
would
terminate
my
notebook
kernel
and
then
say
you're
using
too
much
memory
now.
B
The
question
is:
can
I
get
work
done
in
this
in
this
environment
and
the
facility
for
that
is
to
say
we
can
set
resource
limits
automatically,
and
these
are
profiles
that
you
can
configure
as
an
administrator
when
you
install
the
open
data
hub,
but
we
have
basically
t-shirt
sizing.
Is
it
do
you
want?
Do
you
want
whatever
the
default
is,
which
is
typically
small?
B
Do
you
want
small,
medium
or
large,
and
by
requesting
those
environment
you
can
you
can
get,
you
can
get
more
or
less
resources,
and
the
idea
is
that
people
who
need
more
to
get
their
work
done
will
request
those
resources.
And,
ideally
you
know
you
have
the
sort
of
sort
of
cultural,
cultural,
mores
where
people
don't
take
more
resources
that
they
need
and
release
them
when
they're
done
with
them.
B
But
there
are
technical
solutions
to
that
problem
as
well,
and
then
just
as
long
as
I'm
in
this
launcher,
we
can
talk
about
the
sort
of
other
aspects.
We
have
a
way
to
sort
of
preloads.
We
have
a
persistent
volume,
backed
by
SEF
running
in
the
open
data
hub
and
I
can
pre-populate
that
with
the
contents
of
the
git
repository,
we
also
have
integration
with
Seth.
B
I
didn't
use
it
for
the
demo
today,
but
if
I
had
an
object
store,
the
open
data
hub
would
actually
fill
in
my
users
credentials
as
environment
variables,
so
I
don't
have
to
have
those
in
a
notebook.
I
just
have
to
sort
of
refer
to
these
by
BOTS
as
environment
variables
and
access
them,
and
we
have
all.
There
are
a
lot
of
other
demos.
If
you
go
to
open
data,
opt-out,
I/o
or
search
open
data
hub
on
YouTube,
you
can
see
other
demos
where
they're
showing
that
stuff
integration
more
more
in-depth.
A
And
I
think
we
might
not
have
everyone's
silent
on
all
of
the
streams,
which
is
amazing,
so
anyways
do
you
have
a
final
slide
that
links
to
resources
or
anything
that
you
can
throw
up
so
in
case
people
want
to
find
you
or
do
some
interesting
research
on
top
of
open
ship
and
tested
test.
Your
theories
and
practices
out.
A
Right,
that's
all
good
we're
we're
good
and
Rakesh.
So
as
you
answered
his
questions,
so
that
was
great,
so
it
is.
If
there's
any
final
words,
we
got
about
five
minutes
left
from
Peter
prac
or
anyone
about
what's
next
for
ml
on
kubernetes
and
maybe
specifically
OpenShift.
If
there's
anything
coming
down
the
pipeline
new
operators,
new
partnerships
or
things,
we
should
watch
for
I.
C
B
C
Data
scientists
and
for
the
app
developers
like
the
personas,
every
song
on
the
lifecycle.
How
can
we
make
it
easier
for
them,
depending
on
the
kind
of
AI
ml
application
that
is
being
built
in
so
the
tooling
the
interactivity
part
of
it
right?
Because
you
have
you're
touching
a
lot
of
points
you
looking
at
data
and
models
and
Jupiter
notebook,
but
then
you
also
want
to
do
work
in
an
IDE
level
kind
of
in
a
work
as
well
in
a
roller
than
a
notebook.
You
want
to
preview
them
you're,
bringing
in
so
we're
looking
at.
C
How
can
we
make
now
that
we
have
identified?
Is
we
have
you
know
the
equipment
is
an
open
ship.
How
do
we
make
it
better
and
easier
for
developers
and
for
data
scientists
to
come
in
and
start
creating
from
scratch?
If
you
are
like
you
know,
profile
is
my
company's
got
something
going
on,
but
how
do
I
start
like?
Where
do
I
go?
You
know
it's
like
it's
like.
How
can
we
make
it
easier
for
them,
so
we
are
focused
on
that.
Those
leading
political
tracks
aren't
it.
We
should
definitely
see
some
good.
A
So
you
see
the
resources
screen
here,
there's
one
last
short
question:
I
hope
it's
a
short
question,
because
we're
almost
at
the
end
of
the
hour
is
how
easy
is
it
to
customize
the
Jupiter
hub
landing
page
he's
talking
about
he's
on
creme
and
would
not
need
the
AWS
feels
that's
a
real
question.
B
So
those
AWS
fields
actually
apply
if
you're
on
Prem,
because
they're
also
credentials
for
Saif
right.
So
in
the
open
data
hub
we're
deploying
openshift
container
storage,
that's
as
I
said,
it's
used
to
back
the
persistent
volume,
so
your
workspace
is
basically
backed
by
Saif
in
that
case,
and
you
can
also
refer
to
larger
data
that
is
stored
in
SEF,
hosted
on
OpenShift
as
part
of
that
open
data
hub
deployment,
so
those
credentials
are
gonna
apply
on
prem
to
the
to
these
storage.
Back-End
with
the
open
data
hub
is
provisioning.
A
All
right,
I'm
gonna,
give
it
a
pause
for
a
minute
I'll
mention
that
we
are
probably
in
the
not-too-distant
future,
going
to
be
hosting
a
virtual,
open
ship,
Commons
gathering
with
an
MLA
I
focus.
So
if
there's
topics
you
want
to
cover
or
people,
you
want
to
hear
from
reach
out
and
let
me
know
and
I'll
try
and
curate
a
very
interesting
day
for
everybody
and
reach
out
to
some
of
the
folks
that
are
on
the
call
here
today
and
others
to
make
that
happen.
But
I'm
not
seeing
any
more
questions
coming
in
anywhere.
A
So
please
do
check
out
open
data
hub,
do
and
all
of
the
Center
of
Excellence
AI
resources
and
tools.
They're,
doing
awesome,
work
that
lots
of
end
users
and
customers
doing
really
interesting
things
using
this
from
Mass
Cloud
to
anthem
was,
on
the
other,
any
thoughts
of
people
doing
some
really
interesting
work
on
in
ml
and
AI
and
data
science.
Next
week
we
have
the
folks
from
how's
my
flattening
dot
CA,
which
is
a
bunch
of
data
scientists
who
are
using
the
Ontario
data
sets
for
kovat.
So
take
a
look
at
what
they're
doing
there.
A
So
they'll
be
coming
in
and
talking
about
their
stuff.
So
there's
a
lot
of
interest
in
this
use
case
on
open
shift
and
learning
as
we
go
and
hopefully
enabling
you
to
do
what
you
need
to
on
top
of
open
ship
so
will
Perry
Pete.
Thank
you
very
much
for
taking
the
time
and
doing
giving
this
talk
today
and
the
demo
always
always
insightful
and
educational
and
thanks
again
to
Chris
short
for
producing
it
and
making
these
live
streams
just
flow
so
nicely.