►
From YouTube: IBM Db2 Warehouse MPP on OpenShift with Jana Wong (IBM), Michael St-Jean and Sagy Volkov (Red Hat)
Description
IBM Db2 Warehouse MPP on OpenShift Container Storage
Jana J Wong, Db2 Performance Lead Architect (IBM)
with Michael St-Jean and Sagy Volkov (Red Hat)
OpenShift Commons Briefing
July 1, 2020
A
Hello,
everyone,
my
name,
is
karina
angel
and
I'm
here
with
michael
st
john
and
siggy
volkov
from
the
red
hat
storage
business
unit,
as
well
as
jana
wong
from
ibm,
and
we
are
here
to
talk
about
ibm
db2
warehouse
on
openshift
container
storage,
so
really
excited
about
the
performance
test
that
they've
been
doing
on
ocp
and
please
michael.
Take
it
away
thanks.
B
A
lot
karina
yeah
so
today
we're
going
to
talk
a
little
bit
about
I'll,
give
a
little
introduction
to
db2
with
red
hat
openshift,
and
then
I'm
going
to
turn
it
over
to
suggy
and
jana
to
talk
a
little
bit
more
about
the
testing
that
was
done
and
the
performance
results.
So
let's
go
ahead
and
get
started
here.
If,
if
you're
not
familiar
with
db2
well,
I
think
you
should
be
because
db2
is
actually
one
of
the
industry's
leading
database
platforms.
B
B
Db2
is
is
a
great
piece
for
that
and
then
as
well,
db2
can
be
used
as
a
data
warehouse,
symmetrical
multi-processing
type
deployment,
as
well
as
a
multiple
parallel
processing
deployment.
They
also
have
event
store,
which
is
a
in-memory
database,
can
be
deployed
on
the
cloud
or
through
cloud
pack
for
data.
B
So
for
our
testing
we
looked
at
db2
warehouse.
Db2
warehouse
includes
a
built-in
machine
learning,
as
I
mentioned,
both
smp
and
mpp
processing
and
in
database
analytics
combined
with
this
ibm
blu
acceleration.
B
What
we
did
here
was
we
used
db2
warehouse,
multi-parallel
processing
for
our
tests,
and
you
know,
typically,
as
we
look
at,
that,
we
have
a
lot
of
customers
who
are
looking
at
business,
intelligence,
type,
workloads,
aiml
type
workloads,
and
so
we
wanted
to
focus
in
on
that.
But
as
well,
you
know
we
felt
that
the
db2
warehouse
mpp
tests
would
give
us
a
good
indication
because
they're
very
complex
workloads.
B
We
thought
that
it
would
give
us
a
good
indication
of
how
well
db2
works
in
an
openshift
environment
with
openshift
container
storage
and
then
as
well.
You
know
where
we're
looking
at
this
from
a
hybrid
data
warehouse
perspective,
so
you're
able
to
do
rapid
data
retrieval
and
you
have
flexible
deployment
and
scalability,
regardless
of
how
you're
deploying
that
that
implementation.
B
So
as
you
look
at
why
people
are
you
know
considering
running
db2,
because,
typically,
you
think
about
some
of
these
databases
running
in
a
traditional
on-prem
type
of
environment.
Why
are
people
looking
to
modernize
their
their
their
data
infrastructure
and
move
db2
to
more
of
a
containerized
development
environment?
B
Well,
if
you
look
typically
right
now
in
the
industries
about
71
percent
or
more
of
organizations
are
planning
to
contain
rise,
existing
applications
and
so
for
the
past
few
years,
ibm
has
been
doing
a
lot
of
work
to
make
sure
all
of
their
software
applications
are
running
in
a
new
modern,
containerized
environment,
and
you
see
this
a
lot
with
customers
who
are
trying
to
modernize
and
go
cloud
native
with
their
application,
development,
application
deployment
and
now
a
lot
of
those
machine
learning
and
data
intelligence
types
of
the
applications.
B
First
of
all,
we
have
the
ability
to
do
very
rapid
deployment
if
you
think
about,
and
and
if
you're
familiar
with
openshift
and
the
openshift
operator,
you're
able
to
to
deploy
your
applications
much
faster
with
the
db2
operator,
you're
able
to
do
a
very
quick
deployment
to
a
a
worker
node,
you
have
simplified
life
cycle
management.
So,
as
you
know,
in
your
day,
two
operations
as
new
application
versions
come
out.
It
can
be
automatically
updated.
B
So
you
have
a
much
easier
update
process
which
gives
you,
if
you
see
in
the
middle
and
the
bottom,
faster
delivery
of
new
features
and
then
db2
services
can
be
deployed
as
microservices
and
as
I
mentioned
these
day,
two
operations
they
can
be
developed,
they
can
be
spun
up,
they
can
be
updated
and
scaled
independently
right
and
then
we
have
that
flexibility
that
I
alluded
to,
for
you
know
on-prem
or
across
private
clouds,
public
cloud
type
of
deployments.
B
So
what
are
some
of
the
key
benefits
I
mean?
Typically,
if
you're
looking
at
like
an
open
shift,
containerized
type
platform,
this
reigns
true
with
delivering
your
db2
environment
as
well.
It's
you
know.
It's
all
around
agility
so
being
able
to
deploy
db2
when
you
need
it
where
you
need
it.
So
one
of
the
key
benefits
now
is
that
your
your
data,
scientists
or
application
developers
don't
need
to
go
back
to
an
administrator
to
get
resources
for
their
projects.
They
can
spin
up
projects
very
quickly
very
easily.
B
They
can
spin
up
sandbox
type
of
environments
if,
if
they
don't
like
something,
they
can
just
trash
it
and
it
all
gets
done
automatically
within
the
the
kubernetes
infrastructure.
So
this
is
a
great
deployment
strategy
for
people
who
want
to
take
advantage
of
db2
and
then,
from
a
scalability
perspective,
running
on
openshift
container
storage.
We
saw
we
see
and
we'll
show
you
in
some
of
the
testing
that
we
did,
that
we
meet
or
exceed
resource
utilization,
scalability
and
performance
expectations
across
the
board,
and
then
it's
also
about
reducing
the
complexity.
B
So,
with
openshift
container
storage,
we
provide
the
data
services
that
are
provisioned,
just
like
you
would
provision
compute
resources
for
the
db2
application
and
it's
all
done
within
one
unified
control
plane.
So
what
is
some
of
the
technical
differentiation
that
we
see?
Well,
you
know
by
doing
a
lot
of
the
testing
internally
with
ibm.
We
have
a
validated
solution,
so
you
can
trust
in
the
reliability
and
the
performance
of
the
environment.
So
you
know,
as
you
as
you
look,
for
example.
B
Ibm
has
done
a
lot
of
testing
around
these
solutions
with
with
other
storage
environments,
so
they
have
a
good
idea
of
what
works.
What
doesn't
work?
What
performs
well!
This
is
why
we're
coming
to
you
today.
We
want
to
tell
you
about
the
great
performance
and
and
the
exceptional
deployment
experience
that
we
had
right
and
then,
from
a
security
perspective,
there's
a
lot
of
security
already
built
into
ibm
db2,
but
also
by
using
openshift
and
openshift
container
storage.
B
We
have
very
strict
standards
or
security,
so
you
have
a
very
secure
storage
layer
for
db2
across
that
entire
environment
and
then
from
a
life
cycle
management
perspective.
I
already
kind
of
talked
a
little
bit
about
this,
but
openshift
container
storage
is
designed
and
tightly
integrated
in
with
red
hat
openshift.
So
you
have
a
consistency
across
your
user
experience
in
that
type
of
environment
and
you're
able
to
manage
your
compute
resources
for
the
application,
in
this
case
db2,
as
well
as
your
storage
layer
independently.
B
And
then
what
does
it
mean
to
you?
Typically?
So
if
you
take
a
look
at
it
from
a
big
data
or
analytics
aiml
director's
perspective,
you
know
running
ibm
db2
on
red
hat.
Openshift.
Container
storage
gives
a
big
data
director
the
ability
to
scale
storage
as
their
needs
increase
with
reliability
and
performance,
and
the
ability
to
utilize
red
hat
openshift
to
run
both
db2
and
the
storage
that
supports.
B
It
makes
operations
more
efficient
and
better
utilizes
existing
I.t
skills
that
you
might
have
already
in
your
it
department,
right
from
a
data
architect
or
data
engineer,
perspective,
think
about
open
shift
and
openshift
container
storage,
providing
more
of
a
modern
data
architecture.
That's
based
on
containers
and
the
kubernetes
orchestration.
B
So
with
kubernetes
operators,
data
scientists
can
work
entirely
within
red
hat
openshift
to
program
their
infrastructure
for
both
the
db2
application
and
red
hat
openshift
container
storage.
So
they
can
focus
on
innovating
focus
on
their
solutions
and
and
not
worry
about
the
underlying
infrastructure,
and
here
we
have
a
quote
from
pyotr
mirzebski
he's
the
director
of
db2
development
for
ibm
data
and
artificial
intelligence.
B
One
of
the
great
things
about
this
implementation
is
with
no
prior
experience
with
openshift
container
storage.
They
were
able
to
ramp
this
up
within
a
couple
of
weeks,
so
you
know
they
actually
thought
that
it
was
going
to
take
them
a
lot
longer
without
any
prior
experience,
took
them
about
a
week
to
set
everything
up
another
week
to
get
everything
tuned
up
and
the
test
ready
and
you'll
hear
more
about
that
from
jana
and
siggy.
So
I'll
pass
this
on
to
them
to
talk
a
little
bit
more
about
the
test
layout.
C
Thank
you,
michael.
Let
me
share.
I
like
how
we
all
look
very
young
in
the
picture
that
you
put
in
with
the
last
quote
me
and
jana
are
going
to
talk
about
the
actual
testing
that
was
done.
I'll
concentrate
on
the
layout
of
the
cluster
of
the
openshift
and
ocs
cluster
and
jana
will
talk
about
the
results,
so
we
have
this
decided
to
use
the
data
warehouse
version
of
eb2
for
a
few
reasons.
C
First
of
all,
it's
designed
to
do
a
massive
parallel
processing
of
data,
so
we
want
something
that
will
hammer
our
storage,
the
openshift
container
storage
as
much
as
possible.
It's
also
pretty
much
what
a
ibm
usually
use
when
they
test
a
new
storage
subsystem.
The
data
wireless
version
and
the
end
of
this
long,
db2,
wh
oc,
means
on
the
cloud
and
the
reason
we
decided
on
on
the
cloud
was
well.
C
Everyone
is
doing
something
on
the
cloud
and
also
it
was
the
in
terms
of
a
constraint
of
what
we
could
use
at
this
point
of
time.
The
cloud
was
easier
to
do
as
you
can
see
in
this
slide.
There's
a
few
calculations
that
are
being
done
in
trying
to
basically
match
a
db2
data.
Warehouse
to
the
cluster
that
you
are
running
on
each
in
the
past
would
run
a
a
a
partition
of
the
data
warehouse.
C
What
is
called
a
logical,
node
or
multiple,
logical,
node
and
mln,
and
there's
a
few
calculations
that
need
to
be
done.
Minimum
of
eight
gigabyte
ram
per
core
of
whatever
you
are
running.
You
can
see
it
all
in
here.
The
other
part
of
this
equation
is
that
we
also
looked
at
the
openshift
best
practices
and
it
says,
leave
two
cpus
per
node
and
then
gigaback,
eight
gigabyte
of
ram
per
node
for
openshift.
The
rest
you
can
use
for
your
application.
C
So
with
that
in
in
mind,
as
I
said,
we
decided
on
a
aws.
This
is
a
a
seven
node
in
openshift
for
the
three
cluster.
It's
seven,
because
there's
only
one
master,
don't
try
this
in
a
production,
but
for
a
budget
perspective,
it's
a
it's
easier.
C
We
decided
to
use
four
nodes
of
a
r5a
4x
for
the
worker
nodes
that
are
going
to
run
the
db2
pods
or
the
mlns,
and
these
these
these
nodes
are
are
known
to
have
a
very
good
ratio
and
communication
bandwidth
between
the
cores
and
the
memory.
These
are
amd
nodes,
if
I'm
not
mistaken,
and
we
are
using
three
instances
that
are
of
the
I
think
aws
call
them
a
storage
instances,
i3
and
2x.
C
These
are
basically
an
aws
instance
that
has
directly
attached
to
it
or
supposedly
directly
attached
to
it
two
nvme
devices
to
2.3
terabyte
nvme
devices,
so
our
openshift
container
storage
cluster
is
basically
formed
out
of
these
three
nodes.
Each
of
them
has
two
storage
devices
gives
us
a
total
of
six
devices
and
we
ran
our
initial
test
on
everything
on
a
single
zone.
C
So
doing
all
of
this
calculation
of
how
much
we
need
to
keep
on
the
let's
call
them
the
db2
worker
nodes.
How
much
we
need
to
keep
for
a
openshift
basically
gives
us
this
amount
of
requirements
that
was
used
for
each
of
the
mlns
or
the
db2
partition
pods,
and
the
total
of
the
db2
capacity.
C
Four
db2
compute
nodes,
four
mln
52
cpus
and
460
4168
gigabyte
of
ram.
This
is
how
it
looks
basically
in
a
in
a
nice
nicer
diagram
on
the
top.
The
four
open
shift
nodes
that
are
going
to
run
db2
on
the
bottom,
the
three
openshift
container
storage
nodes
that
are
going
to
provide
the
storage
and
our
single
master
on
the
side.
C
For
for
the
setup
itself,
we
basically
needed
what
I
guess
in
the
db2
world
is
called
a
storage
zone.
Two
types
of
a
storage
zone-
one
is
a
shared
storage
zone.
This
shared
storage
zone
has
a
is
using
a
cfs.
C
Sf
is
one
of
the
is
the
building
blocks
of
openshift,
container
storage
and
portion
of
cfs
is
or
directly
is
used
to
share
information
between
the
partitions
of
the
db2
ports
or
the
instances
of
db2
that
are
running
on
a
different
nodes.
C
Also,
the
test
data
was
created
and
stored
on
on
this
ffs
directory
in
order
to
create
once
and
then
upload
using
external
table
into
a
all
the
database
airpods,
and
then
we
have
also
a
a
zone
or
storage
zone
that
is
not
shared,
that
is
pair
database
instance
or
pair
database
port,
and
this
is
using
safe,
rpg
the
block
device
option
of.
As
stated
this.
This
zone
needs
very
high
performance
in
storage,
and
so
that's
why
we
chose
cflbd
for
that.
C
This
is
a
little
bit
of
a
how
db2
looks
in
the
kubernetes
slash
openshift
world
db2
is
installed
as
a
or
the
dp2
mlns
are
installed,
as
a
stateful
sets,
there's
a
an
another
version
of
a
hcd
that
runs
as
a
stateful
set
and
and
basically
to
track
information
and
heartbeat
between
the
different
partitions
or
the
different
ib
db2
pods.
C
There's
other
parts
that
are
running
in
the
background,
some
from
a
management.
Some
is
a
toolbox,
but
the
two
most
important
one
are
the
to
the
top
from
the
openshift
container
storage
layout
of
configuration.
C
As
I
said,
a
we
used
those
nvme
devices
that
the
aws
instance
provided
what
you,
via
what
we
call
direct,
attach
storage,
we're
using
the
local
storage
operator
to
basically
hand
out
these
nvme
devices
or
as
spvs
and
then
openshift
container
storage
in
return,
use
those
as
the
building
blocks
for
the
sf
cluster
and
then
provide
the
storage
from
there,
because
the
the
nodes
that
we
use,
we
want
to
keep
it
as
a
cheap
as
possible,
or
we
I
had
to
tweak
a
little
bit
the
cpus
that
I
gave
to
other
components
of
a
openshift
container
storage,
because
we
are
mainly
going
to
use
a
little
bit
of
cfs
and
mostly
of
a
rbd
block.
C
So
you
have
the
resources
that
I
kind
of
limited,
which
basically,
in
future
versions,
we
might
have
the
ability
to
even
control
these
dynamically,
their
resources
for
pair
different
components,
of
a
openshift
container
storage,
which
will
basically
mean
that
we
will
be
able
to
provide
even
more
performance
or
more
resources
just
to
the
rbd
portion
and
allowing
a
dp2
to
get
even
more
performance
from
the
same
layout.
C
Just
another,
quick
diagram
of
how
openshift
container
storage
basically
looks
those
up
in
the
top
are
basically
our
excuse
me,
our
db2
pods
that
are
running
in
in
some
nodes,
the
red,
the
openshift
container,
storage
pods.
We
have
several
of
them.
Those
many
osd's
that
you
are
seeing
are
basically
a
pod
that
gets
attached
into
a
storage
device
that
the
cef
cluster
will
use.
So
in
our
case,
we
had
six
nvmes.
We
had
six
osds
demons.
C
Look
up
on
on
these
osds
on
the
metadata
on
them,
provide
information
on
where
to
read
and
when
from
and
there's
other
pods
that
are
also
a
part
of
the
openshift
container
storage
and.
A
D
All
right,
let's
take
a
look
at
performance.
So
when
we
looked
at
performance,
we
basically
wanted
to
answer
four
basic
questions.
One
is
how
well
does
dvd
db2
on
red
hat
open
shift
container
storage
perform
in
general?
How
well
are
we
utilizing
our
system
resources?
D
How
does
the
system
scale
as
we
increase
the
size
of
our
workload,
and
how
do
we
compare
to
existing
cloud-based
storage
solutions?
For
those
are
the
four
questions
we
were
going
after.
In
order
to
answer
these
questions
and
to
test
the
performance
we
utilized
a
workload,
that's
called
bdi.
I
would
like
to
give
a
little
background
on
what
this
workbook
represents
to
show
that
it
is
really
relevant
as
a
typical
data
warehouse
application.
D
The
schema
of
this
pdi
workflow
follows
that
of
the
tpcds
benchmark
specifications.
So
our
standard
industry
benchmark
and
that
comes
with
seven
product
tables
like
store
sales
and
store
returns,
catalogue
sales,
catalog
returns,
wrap
sales
grab
returns
in
an
inventory
table,
in
addition
to
17
dimension
tables,
where
we
store
information
about
the
customer
data
about
the
items
about
the
products
and
so
on,
interesting
that
this
record
or
what
we
can
do
is
we
can
generate
this
database
at
any
scale
factor
and
in
order
to
analyze
the
performance
of
db2.
D
In
order
to
also
see
how
well
the
system
scales,
we
also
set
up
a
two
terabyte
pdi
broker
to
see
what
happens
if
we
increase
the
size
of
the
data
by
2x.
How
does
performance
change
now
to
the
query
site?
So
the
workload
is
a
query
only
workload
it
comes
with.
100
varies
that
were
inspired
by
cognizant
generated
spl
for
dashboards
and
reports,
and
we
basically
have
three
types
of
users
that
are
represented
with
this
workloads
for
one
we
have
the
returns.
D
Dashboard
analyst,
so
that's
a
person
that
would
investigate
the
rate
of
return,
the
impact
on
the
bottom
line
of
the
business.
These
users
run
typically
very
simple
queries
that
can
be
answered
in
sub
seconds
and
17
out
of
our
100
work.
Employees
fall
into
this
category
and
we
consider
them
simple
pairs.
D
The
second
user
that
is
represented
is
like
a
sales
report
analyst.
He
would
generate
sales
reports
to
understand
the
profitability
of
the
retail
enterprise.
D
These
users
run
more
intermediate
queries
with
like
runtime
of
up
to
one
minute
and
25
out
of
our
hundred
rupees
queries
fall
into
this
category
and
they
call
we
call
them
intermediate
queries,
and
then
we
have
a
third
user.
That's
the
deep
dive
analyst,
so
the
data
scientists
and
they
use
handcraft
to
deep
dive
analysis
to
answer
questions
identified
by
both
the
returns,
dashboard
analyst
and
the
sales
report
analyst-
and
you
know
these
are
very
complex
queries.
We
have
several
minutes
of
runtime
and
we
have
five
of
those
very
complex
queries.
D
Now
we
have
two
different
ways:
we
can
run
the
workload
and
we
utilize
both
of
those
during
our
you
know,
during
our
testing.
For
one,
it's
the
serial
mode,
where
we
have
a
single
user
that
basically
runs
through
all
100
queries
from
beginning
to
end,
and
we
measure
like
how
long
does
it
take
on
this
particular
system
to
finish
running
all
100
pairs
and
then
there's
a
second
mode
that
we
can
run
and
that's
the
concurrent
or
the
throughput
test
mode
where
we
have
a
given
number
of
users.
D
In
our
case,
we
used
16
and
32
concurrent
users
that
submit
a
number
of
queries
or
an
unending
amount
of
queries
of
a
certain
category
to
the
workbook
or
through
the
database
for
a
period
of
in
our
case
one
hour.
So
we
want
to
see
how
much
work
can
we
can
get
done
within
a
one
hour.
Time
frame.
D
The
queries
we
chose
are
only
intermediate
and
heavy
queries,
not
the
simple
ones,
because
obviously
those
can
screw
the
picture
off
the
throughput
quite
a
bit.
So
we're
looking
at
to
kind
of
stress
test
the
system
by
allowing
16
or
30
concurrent
users
to
just
put
intermediate
and
heavy
queries
against
the
database
for
a
period
of
one
hour,
and
we
get
a
throughput
of
varies
per
hour.
D
So
to
summarize
what
we
did
for
our
runs,
both
on
the
one
terabyte
and
the
two
terabyte
setup.
We
did
a
serial
warm-up
run
so
on
a
cold
buffer
pole
to
see
just
100
queries.
We
did
a
serial
three
iteration
run
where
we
run
each
query
three
times
measure
the
total
elapsed
time
for
that,
and
then
we
have
two
concurrent
or
throughput
tests
with
16
heavy
users
and
the
32
heavy
user
again
needs
we're
using
intermediate
and
complex
variations.
D
All
right,
you
can
move
to
the
next
slide.
One
more!
That's
right!
So
here
on
this
graph,
you
can
see
the
overall
performance
summary.
D
How
long
does
it
take
to
run
all
100
queries,
so
the
warm-up
run
for
one
terabyte
took
3.8
minutes
the
three
duration
serial
10.87
minutes
for
the
two
terabyte,
the
warm-up
was
7.7
minutes
so
about
2x
the
time
of
the
one
terabyte
workload
which
we
would
expect
because
we
doubled
the
data
and
then
for
the
two
terabyte
serial
three
iteration
months.
We
had
22.6
minutes,
which
also
is
about
two
times
the
amount
of
the
one
terabyte
three
iteration
runs
now
on
the
right
graph.
D
What
is
nice
to
note
here,
which
we
will
talk
a
little
bit
about
a
little
bit
later
too,
as
we
increase
the
data
size
by
a
factor
2x,
the
performance
only
goes
down
by
factor
1.7x,
so
we
don't
even
see
a
2x
drop,
which
means
that
the
system
scales
pretty
well
all
right.
Let's
move
to
the
next
slide.
D
Now
that
was
just
the
overall
overview.
What
we
just
saw
the
numbers
in
itself
on
one
system,
don't
mean
that
much
other
than
the
scalability
factor.
We
need
to
compare
to
something,
and
we
also
need
to
answer
well.
How
well
does
the
system
utilize
its
resources?
D
So
we
took
a
look
at
three
things:
cpu
utilization,
disk
utilization,
as
well
as
memory
and
network
utilization.
So
one
of
the
most
important
things
we
often
look
at
is
cpu
utilization.
So
we
look
at
you
know
how
busy
are
our
cpus
during
our
runs?
We
did
this
for
all
serial
and
multi-user
runs.
We're
going
to
focus
here
on
the
multi-user
runs,
because
that
is
more
interesting
as
we
look
at
it.
So
in
this
you
see
on
the
top.
D
The
top
two
graphs
represent
the
one
terabyte
cpu
utilization-
and
here
you
see,
the
db2
nodes
are
averaging
about
65
during
the
16
heavy
multi-user
run,
so
there's
still
room.
You
know
we're
not
totally
maxing
it
out.
The
ocs
node
cpu
utilization
is
also
fairly
low,
but
as
we
increase
our
data
volume
to
two
terabyte,
we
see
that
cpu
utilization
goes
up
from
that
65
now
to
90
for
the
db2
nodes,
and
we
also
see
an
increase
of
cpu
utilization
on
the
ocs
node
side.
D
D
Now,
let's
take
a
look
at
disk
utilization.
So
again
the
top
two
graphs
represent
the
one
terabyte
run
results
or
disk
utilization
on
both
db2
node
and
ocs
node,
and
on
the
bottom.
The
two
little
graphs
show
that
two
terabyte
ones.
Now
I
understand
that
the
picture
is
fairly
small,
but
it
it's
okay,
because
we
only
need
to
kind
of
see
what
has
changed
between
the
upper
graphs
and
the
lower
graphs.
D
So
on
the
top
for
the
one
terabyte
runs,
you
can
see
that
we
have
a
fairly
low
disk
utilization,
I
would
say
maybe
around
25
yeah
with
a
few
spikes
here
and
there.
One
thing
to
note
is
that
the
one
terabyte
setup
occupied
about
42
of
the
available
disk
space
that
we
had
and
almost
all
the
data
fit
into
buffer
pool.
D
So
we
would
expect
to
not
see
too
much
disk
io
because
most
of
the
data
is
in
buffer
pool
and
we
can
rate
straight
from
there,
but
that
changes
as
we
move
to
the
two
terabyte
one.
So
when
we
have
set
up
the
two
terabyte
workload
about
85
percent
of
this
space
was
now
occupied
and
we
now
only
fit
about
25
of
our
data
into
buffer
pool,
which
means
we
have
lots
more
disk
io
going
on.
We
need
to
clean
pages
from
the
buffer
pool.
D
We
need
to
read
new
pages
into
the
buffer
pool,
and
that
is
very
much
reflected
on
these
graphs.
So
if
you
compare
the
bottom
left
to
the
top
left
graph,
you
can
see
that
we
many
times
reach
this
busyness
of
like
100.
Even
you
know
it
spikes
here
and
there,
but
it's
definitely
much
higher.
We
also
see
on
the
right
side
that
on
the
ocs
notes,
we
also
see
that
this,
the
disk
utilization
increase
one
thing
to
notice.
I
think
saggy
mentioned
that
earlier.
D
We
have
run
this
on
four
db2
nodes
and
three
ocs
nodes.
The
graph
here
is
a
representation
of
one
of
those,
but
good
to
note
is
also
that
the
disk
utilization
is
very
similar
across
all
four
db2
nodes
and
across
all
three
redhead
ocs
nodes.
So
that's
a
good
sign
that
there's
no
imbalance
going
on
all
right.
Next,
like
this.
D
Now,
let's
just
take
a
quick
look
at
memory
and
network
utilization,
so
they
overall
appear
healthy
that
they
didn't
represent
any
performance
bottleneck.
We
have
memory
available
to
the
ocs
nodes
as
well
as
the
db2
nodes,
but
again
they
don't
represent
any
problems.
D
Overall,
the
ram
is
utilized
as
expected,
and
again
we
saw
that
memory
and
network
utilization
is
very
similar
across
all
four
db2
nodes
and
all
three
ocs
nodes
which,
which
shows
that
you
know
we
don't
have
any
screw
going
on.
We
really
have
a
good
balance
of
how
everything
is
working
again,
as
I
understand
that
the
picture
is
very
small
here,
but
I'll
walk
you
through
it.
So
the
graph
in
the
middle
represents
the
bars.
D
The
blue
bars
show
the
the
run
results
for
the
one
terabyte
setup
and
the
purple
ones
the
one
the
run
results
for
the
two
terabyte
runs,
and
you
could
see
that
you
know
as
we
increase
or
double
the
amount
of
data.
The
throughput
reduction
is
only
1.75
x,
which
suggests
good
scalability,
and
the
reason
for
that
is
that
during
the
one
terabyte
run
we're
not
running
our
resources
to
the
max.
We
saw
that
cpu
utilization
was
around
60
as
we
increase
our
data.
We
we
maximize
it
more
to
90
to
100
percent.
D
We
see
on
the
on
the
data
business
that
you
know
we
go.
We
utilize
our
system,
resources
really
well.
We
are
reaching
often
you
know,
90
to
100
percent
disk
utilization
and
also
our
ocs
notes
show
you
know
this
increase.
This
is
pretty
significant.
We
have
been
testing
with
other
systems
as
well
in
the
past
and
have
not
seen
this
great
of
a
scalability.
This
is
really
good
news
for
us.
D
Another
thing
that
I
want
to
mention
on
this
part
is
also
that
during
our
four-day
test
window
that
we
had
on
the
system,
the
perform
the
system
performed
really
well,
we
had
no
unexpected
outages,
so
the
resiliency
seemed
very
good.
So
that's
a
very
good
thing
now
in
order
to
also
evaluate
how
well
does
the
compare
performance
of
red
hat
on
of
db2
on
red
hat
ocs
compare
to
existing
cloud-based
storage
offerings,
we
have
run
the
same
set
of
tests
on
different
configurations.
D
The
one
pictured
here
is
one
that
basically
comes
closest
in
terms
of
number
of
mlns
number
of
cpus
numbers
of
ram
that
we
had
in
comparison
to
the
red,
open
shift,
container
storage.
We
measured
the
you
know
the
same
type
of
test,
one
terabyte
and
two
terabyte
bdi
workload.
Pictured
here
is
the
one
terabyte
output.
We
would
run
the
warm-up
and
serial
runs
as
well
as
the
throughput
runs.
Pictured
here
is
the
throughput
runs,
which
is
again
more
relevant
and
we
would,
in
the
end
normalize
it.
So.
D
The
cloud-based
storage
solution
had
about
50
percent
more
ram,
so
we
ended
up
normalizing
the
numbers.
What
would
it
look
like
if
we
had
about
the
same
number
of
ram?
The
number
of
cores
or
cpus
was
the
same
to
start
with,
so
when
we
do
that,
we
can
see
that
we
are
pretty
much
on
par
in
those
two
cases
which
suggests
you
know
the
performance
is
as
expected,
it's
doing
really.
Well.
C
This
is
just
a
beginning
of
a
journey
between
openshift
container
storage
platform
and
db2.
All
this
data
is
already
in
a
published
white
paper
and
the
next
white
papers
are
going
to
concentrate
on
our
all
sort
of
failover
scenarios
which,
from
the
db2
customer's
perspective,
is
super
super
important.
C
We're
going
to
do
also
some
bare
metal
performance
and
use
not
only
data
wireless
but
otp
and
all
up
workloads
to
test
everything
and
then
and
and
then
also
ibm
cloud
pack
for
data
version.
Three
will
have
support
of
a
openshift
container
storage
for,
and
I
think
that's
about
it.
These
are
the
the
people
that
help
us
besides
me
and
ayanna,
and
manny
luik
and
rishi
and
peter,
and
we
want
to
thank
them
as
well.
C
Yes,
for
my
understand,
it
does
need
its
own
hcd.
It
might
in
the
future
not
have
to
use
that,
but
right
now-
and
it
is
super
lightweight.
C
So
I
don't
know
if
aws
consider
those
storage
instances
as
part
of
ebs.
I
don't
think
so.
This
think
of
it
as
an
instance
that
has
a
two
storage
devices
directly
attached
to
it
in
ocs,
with
ceph,
basically
create
a
cluster
from
this
and
manage
the
storage
protects.
The
data
replicate
the
data
and
all
of
these.
B
A
C
You
can
this
goes
actually
more
on
the
requirements
for
dbe2
and
right
now.
The
requirements
are
for
a
db2
pod
to
consume
all
the
resources
on
a
particular
openshift
node
minus
all
the
resources
that
openshift
needs
are.
So
this
is
a
db2
requirement.
I'm
not
a
a
a
db2,
a
expert
on
that,
and
I
don't
know
if
they
are
going
to
change
it.
C
I
think
it's
more
of
a
reflection
of
the
migration
from
you
know
bare
metal
where
these
processes
of
db2
we
just
want
to
consume
as
much
resources
as
possible
on
each
on
each
server
moving
into
the
openshift
world
or
slash
even
to
the
cloud
it's
kind
of
right
now,
continuing
with
the
same
the
same
line
of
thoughts.
C
So
technically
for
sure
you
can
do
this,
but
right
now
I
think
the
dpt
requirements
are
to
have
the
db2
pod
on
consume.
All
the
resources
on
on
each
node.
A
All
right,
another
question
is
so
db2
you're
testing
on
openshift.
Does
this
also
run
in
ibm
cloud
pack
for
data.
B
Yeah,
maybe
for
those
that
aren't
familiar
with
the
ibm
cloud
pack
for
data,
you
know
it's
it's
one
of
the
ways
that
you
can
purchase
services
for
db2,
so
you
know
with
the
cloud
pack
for
you.
B
You
have
a
single
bundle
where
you
you
purchase
that
one
bundle
and
then
you
can
have
multiple
ibm
applications
and
db2
is,
is
one
of
them,
both
the
db2otp
and
db2
warehouse
and,
and
that
can
all
be
run
within
that
license,
and
then
they
can
all
run
that
within
the
openshift
environment
and
then,
in
addition,
there's
this
ibm
storage
suite
for
ibm
cloud
packs
that
gives
you
the
ability
to
deploy
all
your
data
services,
including
it.
B
It
does
include
the
red
hat,
openshift
container
storage,
as
well
as
red
hat
sep
storage
for
any
of
the
cloud
pack
environments.
So
so
that's
an
interesting,
interesting
way
of
purchasing.
You
know
overall,
ibm
services
for
your
environment.
C
Yeah,
so
this
was
a
ocp
ocs
4.3,
the
next
white
paper,
the
next
white
paper,
that's
going
to
come
up
on
failovers
will
be
with
4.4.
C
C
We
might
do
4.5,
it
depends
on
the
ocs
side
on
the
opposition
container
storage
side,
but
it's
definitely
going
to
be
4.4
or
something
higher.
Okay,.
A
C
Of
course
it
can
be,
you
can
run
the
openshift
cluster
either
on
bare
metal
or
on
some
on-prem
virtualized
environment,
and
you
can
do
literally
the
same
setup
in
terms
of
the
openshift
container.
Storage,
provide
your
bare
metal
devices
to
the
openshift
container
storage
pods,
so
you're
only
gonna
get
better
performance.
A
All
right,
I
haven't,
run
openshift
container
storage.
Before
do
I
need
to
go
to
training
or
send
my
admins
to
training.
So
what's
the
barrier
to
entry
here.
B
Well,
I
mean,
I
think,
that's
one
of
the
great
things
about
deploying
in
the
openshift
environment.
You
know
the
operators
really
streamline
the
day.
One
day,
two
operations-
and
you
know,
as
you
saw
in
the
quote
from
pyotr,
you
know
getting
the
environment
up
and
running
and
performing
to
scale,
was
very
simple.
You
know
even
for
a
team
with
no
prior
experience,
you
know.
Of
course
you
know.
Red
hat
does
offer
services
and
expertise
if
you
do
need
help,
but
you
know
getting
up
and
running
day.
B
One
is
is
pretty
fast
and
easy,
and
maybe
I
don't
know
if
segi
or
jana
have
something
to
add
to
that,
but
since
they
they
run
the
environment.
C
Yeah
well
I
I
just
want
to
say
that
the
quote
that
you
showed
from
pietre
is
actually
when
the
dp2
cp4d
team
was
doing
their
own
initial
testing
and
there
was
no
actually
involvement
from
reddit.
C
At
that
point,
they
basically
installed
their
own
openshift
cluster
and
installed
a
ocs
on
their
own
and
that's
actually
where
the
the
quote
is
coming
from
so
the
environment
that
we
tested
on.
You
know.
Obviously
I
know
what
I'm
doing
so.
I
know
how
to
install
ocs,
but
the
quote
actually
comes
from
a
completely
you
know:
db2
only
separated
a
team
doing
their
own
testing.
B
D
Yeah
I
had
made
this
same
experience
from
what
you
were
describing.
I
have
not
been
on
reddit
ocs
before,
but
it
was
really
easy
to
just
get
up
and
running.
I
I
mean
I,
I
asked
probably
a
few
questions.
How
do
you
do
this
and
that
here,
but
it
was
very
simple
overall
and
we
only
had
four
days
on
the
system,
and
I
think
we
got
more
than
done
than
what
we
expected.
We
would
be
able
to
do
in
the
time.
Can
we,
oh,
I
think.
A
C
Specifically
for
this
for
a
db2
data
warehouse
environment,
I
mean
for
oc
from
ocs
perspective.
We
are
I
correct
me
from
michael.
We
are
going
to
come
up
with
some
kind
of
a
sizing
configuration
guide
and,
of
course,
when
it
comes
to
storage,
things
change
and
varies
from
the
cloud
or
even
any
cloud
provider
and
their
own
specific
storage
abilities
and
and
then
to
the
on-prem,
whether
it's
bare
metal
or
virtualized.
C
So
I
do
think
we're
gonna
come
up
with
a
with
a
sizing
guide
to
help.
People
understand
that.
B
Yes,
I
I
think
that
would
be
good
for
us
to
to
take
a
look
at
you
know,
so
we
we
currently
have
an
internal
sizing
guide.
That's
you
know
kind
of
based
on
capacity
measures
and
how
to
how
to
configure
the
the
solution
across
different
clusters
right,
but
adding
that
perspective
of
you
know
what
do
you
need
for
performance?
What
do
you,
what
are
your
performance
expectations
and
how
you
can
should
configure
what
disk
you
should
use,
etc,
etc.
Is
is
helpful?
B
I
know
that
there's
a
knowledge
base
of
deploying
ibm
db2
on
openshift,
there's
a
knowledge
based
article
around
that
out
there
right
now.
I
don't
know
that
it
really
gets
into
the
storage
perspective
on
that.
Perhaps
the
the
white
paper
will
help
with
that,
but
I
agree
with
cd.
We
should
probably
look
at
adding
some
some
more
information
into
our
ocs
documentation
as
it
pertains
to
sizing
for
performance.
A
B
Yeah
yeah,
I
just
I
wanted
to
mention.
There
is
a
panel
discussion
with
some
of
the
ibm
and
red
hat
executives,
that's
scheduled
for
the
28th
of
july.
Unfortunately,
I
don't
have
a
link
to
that
yet,
but
it
I
believe
it's
probably
going
to
be
posted
out
on
ibm.com
events,
but
in
any
case
you
know
if
there's,
if
there
are
questions
around
that,
you
can
get
back
in
touch
with
any
of
us.
We
can
give
you
some
additional.