►
Description
Data Scientists and Red Hat: Better Together
Sherard Griffin (Red Hat)
OpenShift Commons Gathering on Data Science
January 28, 2021
https://commons.openshift.org/gatherings/OpenShift_Commons_Gathering_on_Data_Science.html
Find out more about OpenShift Commons, please visit: https://commons.openshift.org
A
A
I
want
to
talk
a
little
bit
today
about
data
scientists
in
red
hat
and
how
we're
better
together,
but
before
I
dive
into
that,
let's
take
a
little
bit
of
a
step
back
and
look
at
why
red
hat
actually
decided.
It
was
best
to
get
into
the
ai
industry
and
look
at
ways
in
which
we
can
help
our
customers
along
those
journeys.
A
A
We
also
saw
that
red
hat
itself
needed
ai
in
order
to
increase
the
open
source,
development
and
production
efficiency.
We've
looked
at
things
like
analyzing,
build
logs
with
anomaly
detections
to
be
able
to
to
find
interesting
patterns
or
or
discover
new
things.
That
may
not
be
right
with
the
way
that
we're
that,
with
that
we're
building
and
developing
the
software,
and
it's
allowed
us
to
increase
that
efficiency
and
get
products
out
to
customers
a
little
bit
faster
and
and
be
able
to
adapt
to
the
market
quicker.
A
The
third
thing
here
is:
we
saw
that
customers
needed
ai,
integrated
into
open
source
products
and
services
to
be
able
to
build
or
to
be
able
to
leverage
an
intelligent
platform.
Now,
what
do
I
mean
by
that?
If
you
think
of
a
lot
of
the
platforms
that
customers
are
using
like
openshift
in
in
also
things
like
red
hat
insights,
where
we're
helping
them
manage
their
own
infrastructure,
ai
allows
that
to
be
smarter
to
be
more
predicted
to
be
able
to
predict
things
before
customers
know
about
it
and
it's
been
great.
A
Introducing
those
technologies
and,
at
the
end
of
the
day,
customers
benefit
from
these
intelligent
platforms,
because
it
can
react
to
their
environment
where
maybe
perhaps
humans
aren't
able
to
grasp
all
the
data
that's
coming
in
and
be
able
to
make
decisions
as
quick
as
machine
learning
can
do
that.
Now,
how
do
we
go
about
doing
this?
One
of
the
foundational
pieces
of
approaching
the
ais
problem
space
that
we
saw
is
we
knew
it
had
to
be
on
an
infrastructure
that
could
run
in
a
myriad
of
different
areas.
A
So
if
you
look
at
the
bottom
of
this
graph,
we
knew
it
had
to
run
on
physical,
virtual,
private
clouds,
public
clouds
hybrid,
as
well
as
the
edge
that
had
to
be
a
baseline,
where
we
needed
to
meet
the
customers
where
they
were.
On
top
of
that,
we
also
needed
to
bring
hardware
accelerators
into
the
mix.
A
It's
with
some
of
the
challenging
machine
learning
initiatives
that
customers
were
endeavoring
on
that
it
has
to
benefit
from
the
use
of
gpus
and
fpgas
in
that
space
and
then
also
being
able
to
utilize
self-service
capabilities
with
the
hybrid
cloud.
That's
key
with
technologies
like
openshift
and
rel.
A
On
top
of
all
of
that
core
infrastructure.
That's
where
we
started
looking
at
well.
Where
do
we
need
to
work
with
the
open
source,
ai
communities,
as
well
as
our
partners,
to
provide
the
rest
of
that
story?
And
so,
when
you
look
at
that
chart
that
shows
a
typical
aiml
initiative.
It
starts
with
setting
the
goals
and
preparing
the
data
all
the
way
through
developing
and
training
the
model
and
deploying
that
model
as
a
service
and
getting
some
value
and
some
some
data
back
from
that
model
being
generated
well.
A
But
then
also
to
use
the
tools
that
they
need
to
tools
like
tensorflow
and
jupyter,
notebooks
and
spark
and
python,
and
all
of
our
partner
technologies
to
be
able
to
solve
their
own
challenging
problem,
and
so
when
we
did
that,
we
not
only
allow
you
know
we
not
only
open
this
up
for
customers
to
use,
but
then
we
started
using
this
internally
ourselves
to
be
able
to
bake
all
that
intelligence.
I
mentioned
in
the
previous
slide
into
our
technologies
and
also
into
our
business
processes
at
the
core
of
all
of
this.
A
In
in
where
once
we
just
provided
those
tools,
we
realized.
What
we
were
truly
doing
is
democratizing
access
to
the
tools
and
democratizing
the
data
for
the
data
scientists.
No
longer
are
they
burdened
with
having
to
know
where
all
of
the
data
resides.
They
have
one
platform
that
can
run
in
all
of
these
different
data,
centers
and
all
of
these
different
cloud
providers,
and
they
don't
have
to
you
know
they
don't
have
to
carefully
craft
their
machine
learning
models
to
only
run
on
certain
technologies.
A
But
the
key
part
of
this
is
all
of
the
access
to
the
tools
and
all
the
access
to
the
data
is
still
governed
by
I.t.
But
in
a
way
that
the
data
scientists
have
their
own
self-service
capabilities,
they
can
spin
up
their
tools.
They
can
get
access
to
their
data
without
bogging
down,
I
t
and
having
to
work
with
it.
To
get
all
of
these
things
done,
I
t
can
curate
that
process
in
the
platform
itself
and
then
the
data
scientists
have
the
freedom
to
make
the
choices
that
they
want.
A
When
we
looked
at
how
we
needed
to
provide
the
tools
it
wasn't
in
just
one
space,
we
recognized
to
be
able
to
have
a
data
scientist
use
tools
from
a
beginning
to
end
to
from,
in
data
ingestion,
all
the
way
through
to
deploying
their
model.
We
had
to
work
with
partners
that
helped
them
along
that
journey
and
some
of
the
partners
focus
on
data,
governance
and
security,
data
processing
databases
as
a
whole
and
then
also
the
hardware
accelerators.
A
This
is
just
a
glimpse
of
the
partners
that
we've
worked
with
today
and
there's
many
more
to
come.
Now
I
talked
about
what
we've
done
in
the
past.
I
want
to
talk
about
where
we're
going
in
the
future,
we're
starting
to
transition
from
empowering
data
scientists
with
the
hybrid
cloud
democratizing
the
data
and
now
we're
moving
into
improving
the
data
science
experience
across
the
hybrid
cloud,
that's
very
challenging,
but
we're
hearing
customers
in
their
journey
and
we're
really
it's
really
resonating
with
what
we're
trying
to
do
in
the
space
as
well.
A
We're
looking
at
ways
in
which
we
can
optimize
data
governance
across
the
hybrid
cloud.
That's
an
interesting
problem
because
no
longer
are
companies
storing
all
of
their
data
in
one
place,
in
fact,
no
longer
are
they
storing
it.
In
one
cloud
provider,
everything
is
becoming
fragmented
because
of
the
need
to
be
able
to
be
as
close
to
where
the
data
is
generated
as
possible.
But
it's
also
becoming
fragmented,
because
enterprises
are
getting
so
big
and
there's
so
many
tools
out
there
that
different
organizations
are
just
doing
processes
and
generating
data
differently.
A
But
in
order
to
get
access
to
all
that
data,
it's
very
key
that
we
work
with
the
data
scientists
to
figure
out
the
ways
in
which
they're
trying
to
bring
that
data
together
and
lots
of
efforts
are
going
on
right
now
to
improve
the
services
around
that,
as
well
as
the
technologies
to
break
down
those
data.
Silos.
A
We're
also
working
with
partners,
specifically
to
main
to
to
decrease
the
maintenance
of
the
machine,
learning
tools
that
they're
offering
through
automation,
intelligence
and
additional
services.
And
this
is
key
because
we
don't
want
it
departments
to
be
bogged
down
with
maintaining
all
of
the
tools
that
data
scientists
need.
A
But
if
you
can
imagine
making
more
intelligence
into
it
into
their
tools
being
able
to
know
when
things
aren't
quite
healthy
and
having
self-healing
capabilities
or
self-diagnostic
capabilities.
Those
are
critical
to
being
able
to
have
a
platform
that
runs
on
its
own
and
so
by
working
with
hand
in
hand
with
our
partners.
We're
providing
better
tools
for
data
scientists
and
for
it
departments
that
work
that
work
in
a
way
that
provides
more
intelligence
around
what's
going
on.
A
The
third
thing
I
want
to
talk
about
is
the
area
in
which
we're
improving
the
usability
of
machine
learning
tools
by
minimizing
infrastructure
management,
and
when
I
think
of
this,
I
think
of
the
job
of
a
data
scientist
and
ultimately
they're.
It's
not
their
responsibility
to
be
able
to
maintain
the
infrastructure
themselves.
A
It's
an
ideal
experience
would
be
that
they
go
in
and
they
do.
They
use
their
tools,
the
way
they
need
to,
but
they
don't
care
where
those
tools
are
running.
It
doesn't
matter
that
it's
on-prem
or
in
the
cloud,
or
it
doesn't
matter
the
in
fact
that
it's
kubernetes
or
openshift
or
some
other
technology.
A
They
just
want
a
certain
experience,
and
so
now
what
we're
looking
at
are
ways
in
which
we
can
abstract
the
infrastructure
from
the
tools
themselves,
so
that
data
scientists
don't
have
to
worry
about
the
infrastructure
management
themselves,
and
so
there's
some
exciting
work
going
on
there
and
we're
hoping
in
in
all
of
that's
happening
through
both
the
platform
itself,
but
then
also
looking
at
ways
in
which
we
can
provide
a
better
managed
experience
for
the
customers.
A
Now,
another
area
in
which
we're
innovating
and
we're
working
with
the
data
scientists
are
the
needs
for
bringing
ai
to
the
edge.
This
is
interesting
because
we
want
the
k.
We
want
data
scientists
to
have
the
capability
to
train
at
the
core
and
then
deploy
at
the
edge.
This
is
very
critical
for
some
of
the
workloads
where
customers
have
have
data
centers
all
over
the
world
and
data
scientists.
Traditionally,
it's
a
challenge
to
be
able
to
build
a
model
in
one
place
and
deploy
it
into
a
vast
ecosystem
of
clusters.
A
Now
they
have
the
capability
of
doing
it
in
many,
and
you
can
follow
along
with
that
project.
Down
below
you
see
the
link.
You
know
we
call
it
our
blueprint
for
industrial
edge
and
industrial
manufacturing.
The
last
thing
I'll
talk
about
today
is
a
really
really
interesting
project
that
we
have
going
on.
It's
called
operate
first,
but
it's
in
conjunction
with
the
mass
open
cloud.
A
If
you're
not
familiar
with
the
mass
open
cloud,
it's
a
public
cloud
where
we,
the
industry,
together
with
you
know,
a
lot
of
research
institutes
have
worked
to
have
worked
to
build
out
this
public
cloud
where
anyone
can
go
in
and
collaboratively
do
work.
Now,
we've
extended
our
philosophy
at
red
hat
of
of
how
we
do
open
source
technology
and
we've
moved
into
the
space
of
operations.
A
A
That's
in
an
open
public
cloud
for
anyone
to
take
a
look
at
and
anyone
to
get
involved
in
and
we're
bringing
machine
learning
to
that
to
that
environment
so
that
the
data
scientists,
the
operations
all
of
the
stakeholders,
the
application
developers
can
all
work
together.
Even
our
partners
work
together
in
one
open
way,
so
that
we
can
enrich
and
better
the
ai
community.
A
So
some
interesting,
fascinating
things
going
on
in
that
space
as
well.
It's
a
great
test
bed
for
new
technologies,
new
concepts
that
companies
and
open
source
communities
are
working
on,
and
it's
also
a
great
way
for
the
data
scientists
to
provide
a
feedback
loop
loop
of
what
they
need
so
that
the
companies
participating
can
he
can
listen
and
help
to
create
more
technology
to
fulfill
their
needs.
You
can
look
at
the
url
below
as
well
to
take
a
look
at
what's
going
on
in
that
community.
A
So
that's
just
the
few
things
that's
happening,
but
it's
really
really
exciting.
I
I
I'll
I'll
end
with
this
note.
Innovation,
specifically
in
the
ai
space
happens
when
we
work
together
and
that's
why
we're
really
focusing
on
our
open
source
communities
and
how
we
can
work
together
to
to
to
take
things
to
the
next
level
and
really
look
at
what
data
scientists
needs
need
and
help
with
that
journey.