►
From YouTube: Building Ignite AI Platform on OpenShift Hongfei Cao & Kevin Martelli (KPMG) OpenShift Commons 2022
Description
Building the Ignite AI Platform using PostgreSQL and Kafka on OpenShift
Hongfei Cao & Kevin Martelli (KPMG)
OpenShift Commons Gathering on Databases held on 02/23/2022
Slides: https://bit.ly/3MeH4tV
Join OpenShift Commons: https://commons.openshift.org/index.html#join
Full Agenda here:
https://commons.openshift.org/gatherings/OpenShift_Commons_Gathering_on_Databases.html
A
Good
morning,
good
afternoon,
everyone,
this
is
a
hong
kong
from
kpmg.
I'm
a
you
know,
cloud
engineer,
director
co-presenter
with
kevin
martelly
he's
a
principal
in
the
you
know
cloud.
You
know
industry
area,
so
our
topic
is
how
we
leverage
openshift
data
storage
to
deploy
our
you
know,
ignite
machine
learning,
platform.
B
Yeah,
so
I
think,
as
hong
kong
was
alluding
to
you
know
as
part
of
this
database
session,
that
we're
having
here
at
commons,
we
plan
on
showing
our
kpmg
platform,
the
kpmg
client
platform's,
our
data
science,
ai
and
ml
platform
used
for
business,
but
it
takes
advantages
of
a
lot
of
these
technologies.
B
B
With
some
of
the
more
I
would
say,
robust
models
that
we
were
storing
required,
some
object,
storage
or
using
pvcs
with
a
min
io
layer
on
top
of
it
not
exactly
database,
but
a
data
data
storage,
type
of
platform
and
application.
We
thought
it'd
be
interesting
to
share
as
well.
B
If
you
go
down
one
slide
on
fei
as
I
was
mentioning
just
so,
we
could
set
the
background
on
what
kpmg
ignite
is
so
so
kpmg.
Many
years
ago,
we
built
what
we
call
our
data
science,
ai
and
ml
platform
powered
on
top
of
openshift.
It's
a
platform,
that's
built
in
a
very
modular
way,
to
allow
the
usages
of
best
pieces
of
matter
whether
open
source
software,
proprietary
software,
commercial
software
that
can
kind
of
plug
in
in
order
to
build
your
use
case
or
application.
B
Initially,
it
was
built
for
data
scientists
and
engineers.
However,
there's
a
hook
there
in
for
the
business
to
be
able
to
engage
with
and
interact
with
the
data
sets
that
are
coming
out
as
well
as
to
keep
that
human
loop
through
the
then
process.
B
And
finally,
it
was
built.
You
know
mainly
around
unlocking
the
value
of
unstructured
data.
It
since
has
changed
to
do
structure
data
as
well
as
semi-structured
data,
but
really
build
off
of
all
the
the
rich
text
that
needed
to
be
taken
out
of
these
unstructured
documents.
And
what
I
just
wanted
to
quickly
show
here
before
we
dive
into
the
details,
is
how
a
use
case
and
methodology
is
built
which
aligns
to
some
of
the
ways
that
we're
using
different
database
technologies,
so
use
cases
is
put
together
by
a
component.
B
A
component
can
be
in
like
something
open
source
like
an
ocr
engine,
a
component
b
classification
data
extraction,
so
there's
many
components
that
get
strung
together
into
a
workflow
to
produce
an
output,
and,
as
these
components
are
you
know,
communicating
back
and
forth
kafka.
Is
that
the
messaging
channel?
If
you
will
that
allows
these
components
to
talk
back
and
forth
and
there's
interfaces
in
that
human
loop,
so
users
can
see
the
output
and
help
to
retrain
and
re-update
the
models.
B
If
you
get
that
one
slide-on
thing
and
then
finally,
this
is
the
last
slide
before
we
we
drive
into
the
content.
If
we
think
about
ignite,
we
think
about
it
as
a
layered
cake.
If
you
want
there's
sort
of
always
the
top
in
the
user
experience
part
of
it.
There's
interfaces
and
annotation
uis
and
management
consoles
of
how
people
can
interact
with
the
data
coming
out
of
the
platform.
We
have
what
we
call
in
that
middle
layer,
the
ignite
ai
platform
that
that
these
are
kind
of
like
the
ai
tooling.
B
That
enables
you
to
build
and
execute
pipelines.
As
I
was
mentioning
earlier
things
that
could
be
proprietary,
that
kpmg
has
built
where
we
call
it.
You
know
custom
type
of
capabilities,
things
that
may
be
open
sourced
in
the
market
like
atlantic
tester
act
or
things
that
we've
kind
of
built
as
part
of
our
overall,
like
drivers
of
certain
types
of
you
know
more
tactical
data
extractions
and
which
we
call
our
intelligent
domain
engine.
B
And
if
you
look
to
the
left,
it
talks
a
lot
about
some
of
the
core
fundamental
things
about
the
platform,
so
loom
is
a
way
that
we
store
data.
So
there's
a
consistency
of
where
you
put
something
into
a
particular.
You
know
component
and
how
something
come
comes
out
of
that
component
and
then
finally,
you
know,
as
one
would
expect.
B
We
have
the
the
the
orchestration
layer
which
is
really
powered
by
openshift,
and
we
have
some
workflow
engines
in
there,
but
I
wanted
to
highlight
this
core
infrastructure,
so
the
core
infrastructure
is
where
we're
going
to
focus
most
of
our
talk
on
today,
and
these
are
around
the
different,
I
would
say,
database
like
applications
that
we're
using
so
we're
using
kafka
we're
using
postgres.
You
know
we're
also
using
min
io
as
we
talked
about,
and
then
we
are
also
using
elasticsearch,
but
we
won't
go
into
that
for
timing,
but
we'll
go
through
the
types.
B
The
ways
that
we're
using
you
know
kafka
how
kafka
is
set
up
in
the
platform
pros
and
cons
and
then
we'll
also
talk
through.
You
know
how
progress
is
being
used
as
well.
A
Right,
thank
you,
kevin
for
the
rest
of
the
presentation.
Let
me
introduce
you
how
we
set
up
and
leverage
openshift
datastore
to
deploy
the
database
for
ignite
platform
and
also
share
some
of
the.
You
know
lessons
in
our
best
practice
and
there
was
a
benefit
deployed
on
top
of
openshift.
So
the
first
component
I'm
going
to
introduce
to
you
is
kafka.
So
we
leverage
kafka
for
you
know
message
broker
to
stream
our
ignite
workflow
metadata
or
you
know
some.
A
You
know
you
know
job
you
know
result.
You
know
to
the
multiple
worker
container
for
to
simplify.
Here
is
a
three
node
kafka
clusters
with
high
availability
setup
and
each
you
know,
worker
original
broker
container
part
will
have
multiple
process
volume
claim
amount
to
it.
Here
we
have
a
customized
storage
class
for
the
business
volume
which
are
using
the
encrypted
openshift
container
storage
ocs.
A
So
the
storage
setup
here
right
is
mainly
for
distributed
kafka
message.
You
know
data.
Also,
our
country
version
akava
requires
zookeeper
to
store
the
cluster
information,
so
we
also
set
up
the
high
availability
zookeeper
cluster.
You
know
as
a
one
simple
example:
we
have
a
three
zookeeper
nodes
right
as
a
minimum,
a
column
cluster
and
each
zookeeper
sim
node
right
similar
to
kafka.
It
has
a
multiple
versus
volume
class
amount
to
it.
A
A
You
know
advantage
to
build
the
hybrid
cloud
strategy,
cloud,
agnostic
approach
using
openshift
and
out
of
the
box
openshift
offered
us
as
a
default
like
in
the
orchestration
of,
and
also
the
failover.
Through,
the
you
know,
skillful
side
deployments
user.
You
know,
building
replica
set.
A
Also
it's
a
whole
deployment
is
using.
You
know
automated
csd,
workflow,
which
you
know
helped
us
significantly.
On
the
you
know,
the
kafka
restore
you
know,
early
update,
patching,
etc.
Last
but
not
least,
a
series
openshift
that
we
can
easily
scale
up
and
scale
down.
You
know
our
kafka
cluster
would
keep
a
cluster
based
on
the
workflow.
A
You
know
needed
hey.
B
B
A
component
could
be
some
like
heuristic
rule,
that's
getting
information
out
of
a
document
and
if
you're
you're
going
across
hundreds
of
thousands
of
documents
and
then
you're
having
you
know,
thousands
of
instant
of
these
these
components
spin
enough
to
operate
on
the
documents.
There's
a
lot
of
communication
and
trafficking
going
back
and
forth
between
kafka,
saying
one
component's
done
next
component.
Take
it
next
component's
done
so
all
that
interchange
between
you
know
the
the
the
process
of
executing
component,
one
component,
two
component,
three
component:
four:
to
produce
some
type
of
output.
B
You
know
had
a
lot
of
heavy
throughput
on
how
kafka
needed
to
be
kind
of
deployed
configured
with
on
the
platform
to
have
the
certain
slas
that
needed
to
be
in
place
and
also
keep
the
resiliency
of
how
the
the
tooling
needed
to
work.
So
there
was
a
couple
of
things
that
that
the
team
has
worked
through.
I
think
on
fay.
B
You
know
talk
through
them,
but
that
was
initially
a
challenge
with
how
many
messages
were
going
back
and
forth
because
of
the
spin
up
of
the
pods
to
execute
those
individual
components
for
selected
workloads.
A
Yep
see
you
kevin
right,
the
next
component
I'm
going
to
talk
about
this
postgres,
so
we
as
a
united
platform,
we
use
a
postquest
to
store
our
internet
workflow
metadata
as
a
traditional
data
store
similar
to
kafka.
We
also
want
to
deploy
postgres
high
availability
in
cluster
setup,
and
what
we
found
out
is
openshift
offers
a
postgres
operator.
You
know
through
the
vendor,
you
know
the
implementation,
so
it
significantly
reduce
the
complexity
of
deploying
the
high
variability
postgres
cluster.
A
Also,
we
have
you
know,
building
some
of
the
solutions
or
customized
solutions
for
backing
up
the
postgres
data
which
leverage
the
the
object,
storage,
mirror
os
landing
zone
type
of
solution.
We
dump
the
prospects
data
to
and
once
the
prospects
cluster
restored-
or
you
know
backup,
we
can,
you
know,
share
the
data
across.
A
You
know
the
to
the
different
cluster
or
you
know
you
know
backup,
restores
the
data
to
the
new
postgres
cluster
when
we
deploy
the
postgres
on
openshift,
we
found
you
know
below
advantage
right
benefits,
including
the
easy
deployment
through
the
operator,
and
it
has
a
you
know,
a
very
good
integration
with
storage
support.
A
Also,
the
building
enterprise
grade
level
high
availability
in
orchestration
failover
help
significantly
on
the
database
deployment
it
also
similar
to
kafka.
It
provides
a
cloud
agnostic,
hybrid
cloud
approach
of
the
deployment,
and
you
know
easy
migration
csd
high
integrated
using
the
existing
ccd
like
jenkins
ansible
paper
gorilla
tikton,
so
it
can
even
reduce
riser
deployment
time
last
but
not
least,
the
building
security
module
to
support
the
policy
and
hardening
our
deployment.
A
Next,
I'm
going
to
quickly
talk
about.
Another
type
of
you
know:
storage,
we're
duty
library
for
ignite
machine
learning
model
different
from
postgres
kafka.
Here
we
directly
leverage
the
standalone
precision
volume
claim
running
on
top
of
the
openshift
cluster
storage.
Like
many
other
machine
learning
platform,
ecosystems
ignite
also
has
a
model
database
or
model
inventory
to
store
the
trained
model,
and
sometimes
the
model
could
be
a
very
large
scale.
A
If
you
know
it
involves,
for
example,
deep
learning
or
you
know
the
nature
network
processing
model,
it
could
be
like
you
know,
even
several
gigabytes
size
to
speed
up
the
model.
You
know
prediction
or
classification
process
when
we
serve
the
model
to
reduce
downloading
the
model
from
the
database
model
database
model
inventory,
we
actually
set
up
a
centralized
shared,
read,
read
many
persistent
volume
class
to
store
those
a
large
size,
object
or
operating
model,
and
then
later
it
will
share
across
multiple.
A
You
know:
machine
learning,
worker,
container
or
parts
so
this
required.
You
know,
minimum
data
download
time
is
only
one
time,
data
load
and
it
is
significantly
reduced
network
traffic
between
the
auto
database
and
the
open
shift
cluster
and
given
the
model
itself,
the
nature
of
the
model
is
relatively
static.
Compared
to
the
other
data
we
store
in
kafka
or
postgres,
we
can
do
the
separate
deployment
loading.
The
model
as
at
the
beginning
of
the
you
know,
model
serving
job
and
it
only
required
infrequent.
A
You
know
data
updates,
which
we
have
a
separate
deployment
job
for
the
model
updates,
so
here
from
the
right
hand,
side
it
shows
before
the
deployment
that
we
go
to
mounts
processor
and
volume,
rewrite
versus
volume
claim
to
our
deployment
part
and
it
will
download
the
model
from
ml
flow
as
our
model
inventory
once
the
model
is
persist
there
for
any
model
serving
pod
or
worker
job,
and
it
can,
you
know,
load
this
business
volume
claim
as
rewrite
money
and
reduce
the
network
traffic.
A
Last
but
not
least,
I'm
going
to
quickly
touch
on
the
object,
storage
setup
inside
ignite,
so
we
also
leveraged
the
miao
as
a
hardware
file
system
on
top
of
the
openshift
ocs
storage
container.
Here
the
miao
is
occupied
as
a
state
force
assad
and
each
male
safer
side
has
a
multiple
versus
volume
claim,
with
a
customized
storage
class
to
benefit
us
ignite.
The
miao
is
support,
supports
rerun
money
and
has
the
api
with
a
secured
access
key
to
allow.
A
You
know
different
worker
container
to
access
the
mail
data.
For
example,
we
can
store
the
runtime
log,
organize
job
input,
you
know
the
documentation
list,
etc.
On
top
of
the
mail
as
our
shared
object,
storage,
okay,
so,
finally,
to
conclude
our
united
deployment
openshift
work.
We
found
that
leveraging
openshift
especially
operator,
is
a
key
for
our.
You
know:
enterprise
level
grade
deployment
to
handle
the
postgres.
A
You
know
machine
learning,
platform,
storage.
We
found
the
openshift
offer
a
lot
of
out-of-box
in
functionality
to
support
high-availability
failover
and
you
know,
say
cd
pipeline
also
to
have
a
better
high-availability
support.
We
prefer
to
deploy
our
platform
to
multiple
clusters
in
a
different
region
and
location
data
center.
To
enable
the
rerun
many
precision
volume
claim
is
the
key
to
reduce
our
network
traffic
for
large-scale
machine
learning.
A
Pre-Trained
model,
like
you,
know,
deep
learning
model
for
ilp
and
shared
across
multiple
lp
or
model
serving
job,
also
customized
versus
one
claim.
Backup
utility
is
also
the
key
to
help
us
quickly.
You
know
rotate
or
you
know,
update
our
existing
database
like
postgres
or
kafka.
A
Last
but
not
least,
you
know
migrate
from
the
old
storage
class
to
the
ocs
encrypted
story
class
and
give
us
you
know
a
better.
You
know
throughput,
and
you
know
encryption
from
the
openshift
storage
perspective.