►
Description
During this episode we’re going to discuss two of the most common questions we receive: “how large should my nodes be?” and “how many Pods can I fit in my cluster?” We’ll look at how to determine node sizes, how node size affects the architecture, and what considerations need to be made for cluster sizing during this can’t miss episode!
A
Good
morning
good
afternoon
good
evening
and
welcome
to
another
episode
of
the
openshift
administrator
office
hours,
I'm
here
chris
short
executive
producer
of
openshift
tv,
principal
technical
marketing
manager,
here
at
red
hat,
joined
by
my
teammate
andrew
sullivan,
welcome
andrew
you
just
got
off
a
customer
call
how's
it
going.
I
am.
I
am
happy
to
be
here.
B
As
always
yeah,
it's
it's
been
a
crowded
morning.
B
So,
apologies
to
all
of
our
all
of
the
people
who
are
watching
who
are
listening
for
my
my
tardiness.
I
was
on
a
call
that
ran,
unfortunately,
a
couple
of
minutes
late,
so
I
do
appreciate
you
sticking
around
and
waiting
for
us
to
start
it's.
It
means
a
lot
to
me
it
it.
It
makes
me
happy
on
this
cold
rainy,
maybe
snowy
wednesday,
here
in.
A
Here
in
north
carolina,
it
is
actually
very
sunny
and
cold
here
today,
like
I
woke
up,
and
it
was
16
degrees
fahrenheit,
which
is
several
degrees
below
zero
celsius,
so
yeah,
but
it
is
now
warmed
up
to
a
beautiful
negative
six
celsius,
which
is
21,
fahrenheit,
yeah,
good
stuff.
B
C
C
B
It's
already
causing
you
know
massive
amounts
of
panic.
I
I
haven't
been
out
of
the
house
in
like
a
month
and
a
half
wow.
Thank
you
covet.
So
aside
from
you
know,
going
on
taking
the
dog
outside
and
going
for
a
walk
and
stuff
like
that,
so
I
can
only
imagine
that
the
grocery
stores
right
now
are
completely
out
of
bread
and
milk,
because
you
know
milk
salmon.
B
So
all
right,
I
won't
due
to
my
my
tardiness.
I
won't
waste
any
more
time
with
the
the
the
small
talk
that
you
and
I
can
do
at
any
time.
You
know
right.
We
do
work
on
the
same
team
after
all.
Yes,
indeed,
so
I
don't
think
that
there
were
any
follow-ups
from
last
week.
I
don't
recall
any
I
I
know
there
probably
was
that
I
am
forgetting
I'm
trying
to
get
better
about.
A
B
Notes
to
myself
to
actually
cover
those
things,
but
yeah
it
was
funny.
I
did
a
a
ask
me
anything
session
for
our
field
yesterday
and
came
away
with
like
almost
a
dozen
questions
where
it
was
wow.
I
don't
know
the
answer
to
that.
I'll
have
to
follow
up
and
I'll
have
to
help
out.
So
let
me
get
back
to
you
on
that.
One.
B
Which
is
also
interestingly,
christian
is
who
is
supposed
to
be
our
guest
today
is
in
today's
session
or
the
today's
version
of
the
same
thing.
I
did
yesterday
so
we'll
see
how
it
goes
for
him,
so
I
did
just
see
a
slack
pop
up
right
underneath
your
name
here
that
it
said
andrew
was
right
from
christian.
B
So
new
things
this
week
and
I
I
specifically
wanted
to
call
out
in
case
like
me:
you
filter
emails
from
red
hat,
which
I
know
sounds
strange
as
an
employee,
but
you
know
we,
we
have
a
bunch
of
distro
lists
and
one
of
the
noisiest
ones
is
the
errata
and
and
security
one,
because
every
time
we
release
senarata,
I
get
an
email
about
it.
Even
if
it's
a
product,
I
have
no
interest
in.
So
very
importantly,
we
announced
a
cde
this
year.
Yes,
so
I'm
going
to
you
know,
I.
A
A
B
Yeah,
so
I
I
know
you
know
this
is
the
openshift
administrator
office
hour,
sure
and
at
first
glance
you
think
well,
sudo
openshift
core
os,
so
coreos
does
have
sudo
in
it.
And,
yes,
even
though
everything
is
deployed
as
a
container,
we
still
rely
on
the
security
mechanisms
of
the
underlying
os
to
be
intact
right.
If
you
deploy
a
fips
cluster,
it
relies
on,
for
example,
those
you
know:
fips
encryption
libraries
at
the
os
level
to
right
to
provide
that
functionality.
B
C
B
Wanted
to
make
sure
that
our
audience
here
is
aware
of
this
and
and
is
paying
attention.
Thank
you.
So
a
couple
of
things
make
sure
that
your
container
images
are
up
to
date
right
if
you're,
using
ubi,
if
you're
using
a
rel
image,
you
know
make
sure
that
you
update
those
regularly
and
continuously
update
those
as
possible.
C
B
I
strongly
suspect
that
we
will
see
a
an
update
for
coreos
or
an
update
for
openshift,
which
will
include
an
update
for
coreos
to
address
this,
but
I
do
not
know
the
time
frame
for
that.
I.
A
Do
not
either
I
know
that
there's
a
4
6
13
release
in
the
fast
channel
right
now.
I
don't
know
if
that
includes
the
pseudo
patch
or.
B
Not
it,
it
does
not
so
as
soon
as
I
can
remember,.
D
B
You
go
to
the
cluster
settings
tab
inside
of
your
cluster,
so
you
can
click
on
the
fast
channel
and.
B
B
B
B
So
typically
we
release
these
updates
on
a
bi-weekly
cadence,
so
every
other
week,
every
two
weeks,
so
that
would
mean
that
4.6.14
would
ship
andrew's
guess
is
probably
a
week
from
this
monday
a
week
and
a
half
from
today
right
and
I
haven't
looked
at
so
there
are
nightly
releases.
I
have
not
looked
at
at
those
to
see
if
it's
been
addressed
inside
of
there,
I'm
not
sure
if
anybody
would
want
to
run
a
nightly
release.
B
There
are
some-
and
I
scrolled
past
it
up
here
for
rel
nodes.
There
are
some
workarounds
and
and
mitigation
factors
that
you
can
do
inside
of
here.
I
haven't
evaluated
these
to
determine
whether
or
not
they
are
suitable
for
core
os,
but
at
a
minimum
please
be
aware:
please
keep
an
eye
on
the
the
information
and,
of
course,
keep
an
eye
out
for
an
update
to
openshift
and
which
will
include
an
update
to
core
os
to
address
those.
The
the
security
vulnerability.
A
Well,
thank
you
for
covering
that
the
biggest
thing
is
just
right
now,
if
you
have
anything
public
facing
with
sudo
in
it,
make
sure
it
is
patched
and
updated
concluding
container
images.
So
please
make
sure
you
do
that
yes,
post
haste
as
they
would
say.
Yes,.
B
Very
very
important,
so
cool,
so
the
so
I've
got
two
other
things
that
have
popped
up
strangely
more
than
once
in
the
last
week,
and-
and
these
are
very
random
things
sometimes
so
monday,
there
was
an
internal
thread
that
was
basically
asking.
How
does
a
core
os
node
get
its
host
name?
B
A
B
Dhcp
host
was
handing
out
host
names,
so
when
it
when
it
dhcp'd
does.
B
B
Yeah,
so
the
the
core
of
it,
and
actually
I
forgot
to
grab
the
link
to
the
code
where
I
had
this
up.
Let
me
stop
sharing
and
see
if
I
can
grab
that,
because
it's
on
the
same.
B
Yeah,
so
the
the
core
of
what
we're
talking
about
here
is
when
there's
a
service
inside
of
coreos
that
runs
on
startup
that
basically
uses
hostname
control
to
set
the
host
name
nice
and
it
pulls
that
host
name
from
proxis
kernel
or
proc
kernel,
sis
or
wherever
inside
of
there
yeah.
So
the
question
is:
how
does
that
value
get
or
how
is
that
value,
assigning.
B
A
B
There
and
there's
a
bit
of
a
hierarchy
here,
so
if
I'm
doing
ipi
and
I
use
ipi
to
deploy
via
machine
set
new
virtual
machines,
what's
going
to
happen,
is
the
machine
set
and
machine
api
will
create
a
new
vm?
You
know:
hostname
cluster
dash
randomness
dash
worker
dash
12
right
right.
That
is
the
name
of
the
vm
and
vcenter
when
the
vm
powers
on
vm
tools
is
used
to
determine
that
name
and
feed
that
first
hostname
that
it
uses
interesting.
B
B
So,
for
example,
if
I
am
doing
dhcp
and
not
doing
dynamic,
dns
updates
so
core
os.
C
B
That
name
will
take
holds
right,
so
that
will
that
will
override
it
right.
So
why
is
this
important
so
a
couple
of
different
reasons,
so
the
biggest
one
particularly
with
ipi,
is
it
determines
which
csrs
to
auto
approve
based
on
that
host
name.
B
So
when
the
node
is
created
right,
machine
api
creates
a
node
gives
it
the
host
name,
it
comes
up,
it
pulls
down
its
ignition
config
and
goes
through
the
initial
configuration
stuff
and
then,
when
it
comes
time
to
join
the
cluster,
there's
an
operator
that
says
there's
a
csr
for
node
named
xyz.
I
see
a
node
trying
to
join
or
I
created
a
node
named
xyz.
Rather
those
two
match,
I'm
going
to
approve
the
csr
nice.
If
they
don't
match,
then
it's
going
to
say
I
don't
know
who
this
is.
I'm
I'm
not
approving
that.
B
You
know
you
you
can
go
in
and
manually
approve
those
csrs
right.
You
know
that
would
bring
the
node
in.
That
would
get
it
up
and
running
long-term
ramifications
of
that
not
not
known
to
me.
You
know
every
time
you
do
a
certificate
refresh.
I
don't
know.
If
it's
going
to
require
manual
approval
right,
remember,
openshift
will
main
or
automatically
renew
certificates
for
those
for
nodes
that
it
knows,
because
that
one
has
a
mismatch
you
know
does
does
that
apply.
B
I
don't
know
so
and
of
course,
node
auto
scaling,
which
the
whole
point
is.
I
want
it
to
do
it
by
itself.
I
don't
I
don't
want
to
have
to
go
in
and
tell
it
to
do.
It
would
now
require
manual
intervention
right
if
at
3am,
when
you
know
the
everything's
going
haywire
and
it's
trying
to
scale
up
from
one
node
to
400
nodes.
Now
you
got
to
go
in
and-
and
you
know,
get
out
of
bed
and
come
into
the
office
and
approve
all
of
those
things
letting
openshift
do
its
job.
B
B
All
right,
so
I
will
paste
this
in
here
and
find
our
twitch
here
and
paste
the
link
in
there
as
well.
So
this
is,
as
we
can
see
here,
or
maybe
you
can
see.
This
is
probably
a
little
small,
I'm
in
the
machine,
config
operator,
github
repo,
specifically
the
4.6
release
and
I'm
underneath
the
templates
common
base
units.
B
So
the
way
that
this
works
is
machine.
Config
operator
includes
a
number
of
files.
A
number
of
things
based
on
the
infrastructure
that
you're
using
and
the
deployment
type
that
you're
using
so
common
in
base
effectively
means
that
it's
going
to
be
included
in
every
node,
regardless
of
infrastructure.
That's
happening
so
real
quickly.
If
we
go
to,
for
example,
worker
and
select,
you
know
I'll,
do
the
zero
zero
if
I'm
deploying
to
vsphere
it's
going
to
include
these.
B
So
if
I'm
doing
a
worker
deployment
to
vsphere
right
and
it'll
include
this
mdns
and
basically
it
determines.
If
this
is
a
an
ipi
deployment,
then
it'll
output,
this
data,
as
a
part
of
that,
so
this
is
kind
of
how
we
do
specific
actions
during
the
install
process
or
doing
the
the
node
stand
up
process
based
off
of
the
infrastructure
and
other
other
things
that
are
happening
there.
B
B
B
So
the
question
is:
how
does
this
get
populated
and
whatever
populates
that
and
what
or
the
last
thing
to
populate
that
becomes
what
ultimately
sets
the
host
name?
That
is
returned
back
for
this
particular
host.
So
again
that
could
be.
You
know
something
early
on
through
vmtools,
that's
setting
the
host
name
could
be
dhcp.
A
First
question:
do
we
have
any
plans
to
develop
a
powershell
module
to
administer
openshift
you're,
a
powershell
guy
yeah?
I
I.
B
Wish
so
I
I
am
a
powershell
guy,
I
I've
I've
been
a
powershell
guy
since
the
very
first
versions
of
of
power
cli
with
vmware
I
I
was
one
of
the
co-creators
of
netapp's,
powershell,
commandlets
and
and
one
of
the
advocates
over
there
for
him.
So
I
I
do.
I
am
a
powershell
guy
and
unfortunately,
as
far
as
I
know,
there
is
no
intention
of
doing
that.
I
have
done
a
little
bit
of
research.
You
know
in
all
my
copious
amounts
of
spare
time
of.
B
Could
I
take
like
you
know
the
the
kubernetes
api,
the
openshift
api
and
effectively
auto
generate
some
powershell
commandlets
powershell
modules
based
off
of
that,
but
I
I
haven't
actually
tested
that
or
tried
that.
So
I
think
there
is
some
community-based
kubernetes
modules
for
powershell
again
haven't
had
time
or
haven't,
had
the
opportunity
to
check
those
out
and
test
those.
B
A
Of
probably
not
going
to
happen
but
entirely
possible
to
make.
B
Yeah
I
I'll
poke
around
I'll
see
if
I
can
poke
around
and
find
a
jira
issue
on
that.
B
A
B
We
did
finish
so
yes
and
no
so
update
manager
was
where
we
found
that
it
was
not
fully
released
yet,
but
for
olm
and
the
rest
of
it.
All
of
all
of
that
is
functional
and
should
work
as
expected.
As
far
as
I
know,
and
I
tested
it
in
my
lab,
I
think
we
do
have
a
live
stream
with
christian
and
maybe
myself,
where
we
spent
the
entire
hour
or
two
hours
covering
that.
B
So
we
can
I'll
dig
up
that
as
well.
You
notice
I'm
taking
notes,
so
I
don't
forget
these
things
included
in
the
show
notes.
So,
if
you
didn't
see
last
friday,
I
think
we
published
the
the
show
notes,
blog
post
on
openshift.com
blog,
so
any
of
the
links
and
other
things
that
we
used
last
week,
you
can
find
inside
of
those
blog
posts,
and
this
week
will
be
the
same.
I
don't
know
I
haven't
talked
with
alex.
A
No,
it's
a
very
cool
thing.
You're
doing,
and
I
greatly
appreciate
it
as
it
probably
helps
more
people
and
as
we
do
them
more
and
more,
it
will
help
even
more
people
as
we
go
so
yeah
that'll
be
super
cool,
so
look
for
follow
up
in
the
openshift
blog,
which
I
just
linked
to
next
question
from
I'm
gonna,
try
and
say
this
one,
because
rap
scallion
reeves
is
just
something
that
just
rolls
right
off
the
tongue.
Is
there
a
way
to
control
what
node
gets
deployed
onto
which
overt
host?
B
Yeah,
so
real
quick,
so
I
see
it
was
killer
goalie,
who
was
asking
about
disconnected
olm
yeah?
Sorry,
if,
if
there
are
things
that
you
would
like
to
see
or
are
missing,
please
reach
out
just
let
me
know
andrew.sullivan
redhat.com
and
we'll
be
sure
to
specifically
cover
that
you
know
next
week.
If
you
let
me
know.
B
B
A
B
So
controlling
node
placements
right,
so
this
will
apply
as
far
as
I
know,
to
all
of
the
ipi
deployments
and
it
is
essentially,
there
is
no
specific
mechanism
to
control
where,
in
the
cluster,
a
particular
virtual
machine
lands,
whether
it's
rev,
whether
it's
vsphere,
whether
it's
openstack
etc.
B
So
you
can
go
after
the
fact
and
apply
those
rules,
so
you
know
create
an
affinity
group
for
create
an
affinity
group
for
the
hosts
and
an
affinity
group
for
the
virtual
machines
and
and
it'll
manage
them.
That
way.
Actually,
now
that
I
think
about
it,
I
wonder
if
you
could
assign
a
group
to
the
template
that
it
uses
and.
C
B
A
B
Okay,
cool
uh-huh,
so
this
is
my
red
hat,
virtualization
manager,
environments.
You
can
see.
I
just
went
to
the
cluster
in
my
cluster
name
here
and
I'm
looking
at
affinity
groups,
so
I
can
create
an
affinity
group
and
assign
vms
to
hosts
and
I
can
create
these
rules
at
the
same
time
on
virtual
machines
and
I'm
just
going
to
edit
one
of
these
virtual
machines.
B
I
can
with
openshift
or
excuse
me
red
hat
virtualization
4.4.
I
can
specify
affinity,
groups
and
labels
directly
in
the
machine
definition.
So
what
I'm
thinking
out
loud
having
not
tested
this
at
all,
is
I
wonder
if
you
could,
for
the
template?
That's
used
with
ipi
basically
specify
this
information
so
that
any
vms
created
from
it
automatically
inherit
that
so
for
masternodes.
You
would
effectively
because
they're,
not
dynamically
provisioned.
B
You
would
go
in
and
assign
this
information
as
a
day
to
operation
through
rev
manager
to
assign
them
to
the
specific
hosts
and
then
for
the
the
worker
node.
Each
worker,
node
machine
sets
have
a
template
that
specifies
whatever
that
affinity.
Information
is,
if
you
happen
to
try
that
out.
Please
let
me
know
whether
or
not
that
works.
A
What
is
the
way
to
calculate
sizing
based
on
knowing
how
many
pods
we
are
creating,
and
I
mentioned
to
them
right,
like
yeah,
you
know
it's
very
dependent
upon
the
needs
of
those
pods,
but
if
you
know
like
you
have
500
pods,
is
there
like
a
magic
number
for
number
of
worker
nodes.
B
A
B
Now
resulted
in
me
also
doing
a
presentation
for
ibm
fast
start
around
the
same
topic.
Oh
lucky
you
so
yeah,
it's
a
fireside
chat.
I
think
I
was
asking
you
about
putting
a
fireplace
in
behind
me.
You
know,
like
you've,
got.
B
So
it
really
comes
down
okay,
so
it
comes
down
to
a
couple
of
different
things,
and
by
a
couple
I
mean
it.
It
varies
based
off
of
what
you're
doing
so.
First
I
want
to
use-
or
I
want
to
explain
two
two
terms
characterized
and
uncharacterized.
B
B
B
B
I
haven't
read
this
page
in
enough
detail
or
asked
that
question,
so
we
we
may
just
need
to
double
check
on
that,
but
we
want
to
look
at
importantly,
the
maximum
number
of
pods
per
node
and
then
whether
or
not
there
are
any
size
restrictions
or
limits
so,
for
example,
continuing
on
down
the
page
here
you
can
see
what
are
the
aws
instance
sizes
that
we
test
with
right,
so
things
like
how
much
cpu
how
much
ram
so
on
and
so
forth.
This
is
not
the
list
or
the
the
only
supported
instance
types.
B
These
are
just
the
ones
that
we
test
with.
So
essentially,
what
I'm
trying
to
discover
here
is:
is
there
anything
that
would
artificially
limit
or
change
the
number
of
nodes
or
number
of
pods
that
I
have
in
my
cluster
right?
If
I've
got
a
pod
that
needs
a
half
a
terabyte
of
ram
right
needs,
a
half,
a
terabyte
of
ram
that
can
pretty
dramatically
change,
how
I
size
my
nodes
and
how
I
interact
with
my
cluster.
B
So,
let's
assume
in
my
first
example
there
one
cpu
two
gigabytes
of
ram.
It's
pretty
straightforward
right,
500
pods
easily
fits
within
you
know
a
reasonable
node
size,
even
though
we
wouldn't
want
to
have
just
one
node
for
availability,
right
purposes,
etc.
So
now
we
can
do
kind
of
a
mental
exercise.
So
what
happens?
If
I
have
two
nodes
effectively,
I
will
have
two
nodes,
each
one
being
equally
sized,
so
256
cpus,
500
gigabytes
of
ram
from
an
application
perspective.
B
B
So
I
now
really
have
to
have
two
nodes
and
each
one
is
capable
of
hosting
the
entire
workload
at
any
one
point
in
time.
So,
let's
expand
it
out.
Three
nodes:
four
nodes:
five
nodes:
eight
nodes,
ten
nodes,
twelve
nodes
effectively.
What
you're
trying
to
do
here
is
figure
out.
What's
the
right
balance
of
distributing
the
workload
across
the
nodes
in
your
cluster
for
maximum
performance
and
maximum
availability
and
maximum
flexibility.
B
Flexibility
here
is
an
interesting
one
and
is
one
that
is
quite
subjective,
so
flexibility
here
could
be
well.
I'm
only
ever
going
to
take
one
node
down
four
updates
at
a
time,
so
the
other
nodes
only
need
to
have
enough
spare
capacity
extra
capacity
to
accommodate
that
it
could
also
be
failure
domain.
B
My
failure
domain,
maybe
I'm
running
in
a
physical
data
center.
You
know
on-premises,
maybe
it's
running
in
I'll
pick
on
rev
right.
I've
got
you
know
four
massive
rev
nodes.
Each
one
is,
I
don't
know
eight
terabytes
of
ram
and
you
know
500
cpus
and
I
could
easily
fit.
You
know.
30
of
my
open
shift
nodes
onto
those
four
hosts.
B
Okay,
but
what's
the
failure
domain,
because
now,
if
I
have
one
physical
node
that
has
you
know
10
virtual
nodes,
I
haven't
solved
that
problem.
I
have
to
be
able
to
accommodate
that
that
amount
of
infrastructure
failing
at
any
point
in
time,
so
we
have
to
be
aware
of
those
things
we
have
to
work
with
our
underlying
you
know:
infrastructure
underlying
service
provider.
B
If
you
will
to
understand,
what's
happening
there
and
be
able
to
accommodate
that
at
the
infrastructure
level,
we
also
from
an
application
perspective
want
to
be
aware
of
what
those
failure
domains
are
if,
if
the
application
is
architected
so
that
there's
a
single
pod,
that
is
a
single
point
of
failure
that
could
be
bad
right
so
and
and
then
none
of
this
planning
around
failure,
domains
et
cetera,
is
going
to
be
particularly
useful,
so
I've,
basically
or
in
a
nutshell,
over
the
last
six
minutes,
we've
talked
about
okay
workload,
sizing,
but
workload.
B
B
So
node
sizing
also
has
to
accommodate
not
just
the
workload
but
the
other
things
that
are
happening.
So
what
are
the
other
things
that
are
happening
so
kublet
itself?
You
know
the
other
kind
of
services
so
think
things
like
csi.
B
B
So
if
we
scroll
down
here
and
I'll
post
this
link
as
soon
as
I
make
sure
it's
the
right
one
yeah
here
so
platform
tested
cluster
maximums
and
then
so.
This
is
the
link
that
I
just
posted
a
minute
ago,
as
of
4.6,
half
of
a
cpu
500
millicore
is
reserved
for
the
system
compared
to
3.11
and
previous
versions.
B
So
if
you
expect
those
system
level
right,
open
shift
function
or
services
to
consume
more
than
half
of
a
cpu,
you
need
to
take
that
into
account.
B
So
in
particular,
metrics
prometheus
can
be
a
huge
consumer
of
cpu
and
memory
on
the
host.
Now,
when
does
that
happen,
the
more
pods
the
more
containers
we
have
running
on
that
host
the
more
right
efforts?
Cpu
memory
is
going
to
have
to
be
put
in
by
prometheus
to
collect
all
of
those
metrics
and
then
serve
them
back
up
to
the
metric
service.
B
So
it
becomes
a
little
bit
of
a
self
fulfilling
or
what
did
we
used
to
call
a
traffic
trombone
right
if
you've
ever
heard
that
term?
On
the
networking
side
right,
the
more
pods
I
put
on
the
host
the
more
non-application
resources,
I
need
on
the
node
to
accommodate
the
other
things
that
are
happening.
B
Don't
discredit
things
like
network
and
storage
traffic
as
well,
especially
if
you're,
using
iscsi
pvcs
and
other
things
that
are
known
to
consume
cpu
resources
at
high
throughput.
I've
got
40
gig
going
into
my
servers,
and
I've
got
all
of
these
pods
with
a
bunch
of
iscsi
pvcs,
and
you
know:
they're
pushing
30
gigabits
of
traffic.
That's
a
lot
of
cpu,
that's
going
into
into
processing
those
packets
and
doing
the
things
that
it
needs
to
do
so.
B
We
just
need
to
be
aware
of
that
plan
for
that
accommodate
all
of
that
type
of
traffic,
and,
of
course
it's
okay,
if
you
don't
get
it
right,
the
first
time
there's
nothing
wrong
with
that.
This
is
the
the
beauty
of
openshift
the
beauty
of
kubernetes
right.
We
can
add
nodes
in
at
any
point
in
time
right,
so
we
can
kind
of
temporarily
scale
out
with
bigger
nodes
and
then
go
back
up
and
remove
the
smaller
nodes.
B
A
So
all
right
cool,
I
try
to
explain
to
islam
how
to
do
the
math
essentially
for
his
worker
nodes
right.
So
he
has
two
worker
nodes
wants
to
put
the
workloads
on
there.
So
those
and
those
worker
nodes
have
a
baseline
of
system
requirements
right
and
then
your
workloads
have
their
requirements.
A
B
I
just
wanted
to
add
that
that
excess-
you
said
you
know
20
of
extra
overhead.
That
number
is
dependent
on
andrew's
opinion.
Two
things.
One
burst
capacity
for
things
like
node
failure,
right
and
two
burst
capacity
for
things
like
what
you
know,
the
the
slash
dot
effect
or
the
reddit
effect,
or
something
like
super
bowl
ad.
B
You
just
had
this
huge
burst
of
traffic
and
how
can
I
help
accommodate
that?
How
to
actually
determine
that
number
is
based
on
again
my
perspective,
your
ability
to
react
to
that
scenario
right.
What
do
I
mean
by
that?
If
you
need,
if
you
can
react
right,
you
know,
auto
scaling
will
take
effect
and
it
takes
four
minutes
for
me
to
get
a
new
node
up
and
operational
and
joined
to
the
cluster
and
ready
to
accept
workload.
Then
you
need
enough
capacity
to
accommodate
four
minutes
of
burst
right
right.
B
B
Well,
if
I'm
growing
at
x,
bytes-
and
it
takes
me
six
months
to
get
new
hard
drives
in
I'm
going
to
have
an
issue
in
four
days
based
on
my
alert
threshold,
that's
not
going
to
work
right!
I
have
to
have
this
balance
of
how
quickly
can
I
add
capacity
and
then
work
backwards
from
there
to
determine
what
my
alarm
threshold
should
be.
A
B
To
be
clear,
that's
the
bare
metal
installation
method,
not
just
right.
Physical
servers,
yeah
yeah,
so
the
minimums
for
compact
clusters
are
effectively
the
combination
of
control,
plane
and
worker
node
minimums,
so
the
bare
minimum
for
a
control
plane
node,
is
4,
cpus
and
16
gigabytes
of
ram,
and
that
is
if
we
go
to
installing
and
we
go
to
excuse
me
bare
metal.
B
B
You
know,
storage
drive
for
those
compact
nodes,
but
note
that
that
doesn't
include
any
workload.
So,
however,
much
application
capacity,
you
need
add.
On
top
of
that
now
that's
two
cpus
and
eight
gigabytes
of
ram
here.
I
think
it's
safe
to
always
build
on
top
of
that,
because
you're
going
to
have
the
metric
service
you're
going
to
have
you
know
those
other
things
that
are
deployed
inside
of
there
that
are
consuming
resources.
B
A
Cool
makes
sense,
and
thank
you
for
that.
Jp
dave
says
he
got
his
469
problems
figured
out.
It
was
a
csr
issue,
it
looks
like
one
of
the
nodes
wasn't
joined
or
doesn't
have
its
certificates
approved.
So
that's
good.
A
Yes,
yes
yeah,
so
jbj
says
when
in
doubt
do
an
occ
get
csr
yeah
good
point.
That
is
a
very
common
troubleshooting
step
that
I
use
are.
Is
everything
issued
right,
so
the
three
node
compact
cluster
you
mentioned
bare
metal
installation.
B
So
this
is
so
andrew
has
has
issues
that
I
know
my
team
is
well
aware
of.
We've
raised
these
of
we
overload
terms
when
we
talk
about.
B
So
ipi
installer
provision
infrastructure
is
also
called
full
stack,
automation,
upi
also
called
user
provision
infrastructure
or
pre-existing
infrastructure.
Those
are
fine.
Those
are
great
right.
We
understand
that
there
is
those
installation
methods
for
all
the
various
platforms
bare
metal
is
where
it
gets
confusing.
B
So
I
tend
to
use
when
you
see
me,
especially
in
written
communication,
I
will
refer
to
what
the
documentation
calls
and
let
me
scroll
up
here
so
with
the
installation
calls
a
bare
metal
install,
including
basically,
all
of
these,
the
this
entire
subset
installing
on
bare
metal.
I
call
these
non-integrated
installs.
B
B
So
if
you're
deploying
to
vsphere
and
you
use
the
bare
metal
or
platform
equals
none
in
the
install
config,
then
essentially
it's
saying
I
don't
know
that
this
is
vsphere.
I
don't
care
it's
vsphere.
I
have
no
integration
with
vsphere
whatsoever,
so
you
can't
use
things
like
the
dynamic
storage.
Provisioner
right.
You
can't
use
things
like
nsx
right.
All
that
other
stuff
it
is.
It
is
infrastructure,
agnostic
right.
B
So
this
is
the
installation
method
that
you
want
to
use
with
physical
servers
that
are
not
ipi,
so
bare
metal.
Ipi
right
is
also
the
installation
method
that
you
want
to
use
when
you
are
doing
a
mixed
infrastructure
deployment.
Some
nodes
are
virtual
machines
and
vsphere.
Some
nodes
are
physical
servers
right.
I
can't
mix
those
infrastructure
types
and
that
is
a
kubernetes
limitation,
not
right
openshift
limitation
and
I've
hilariously,
because
this
comes
up.
I
have
this,
this
github
issue
bookmarked,
I
love
it.
B
That's
I
just
posted
the
the
github
issue
into
the
chat.
So
that's
the
github
issue
for
kubernetes
that
prevents
us
from
mixing
infrastructure
types
in
a
cluster.
A
Damn
it's
still
open
too
yeah,
and
it
has
been
for
a
while
yeah
life
cycle
frozen,
milestone,
119.,
okay,
121
is
being
worked
on
right
now.
All
right
so
share
this
one
out
and
get
some
more
eyeballs
on
it.
B
All
right,
so
I
see
a
question:
is
there
a
virtual
ram?
Do
we
support
swap
space?
So
this
is
a
yes
and
a
no.
So
generally
right,
kubernetes
always
recommends
that
you
disable
swap
space
if
you've
ever
installed
a
cluster
with
like
coupe
admin
or
something
like
that.
It'll
say
swap
space
is
not
disabled
and,
and
it
will
force
you
to
do
that
before
you
continue
right.
So
why
is
this
important?
Because,
yes,
technically
you
can
turn
on
swap
you
can
turn
on
all
of
those
other
things.
B
Openshift
virtualization
has
brought
this
to
light.
You
know:
do
I
want
to
turn
on
things
like
kernel,
same
page,
merging,
right,
ksm
to
help
consolidate
and
get
more
over
commitment
of
those
resources?
B
So
this
is
a
choice
that
you
have
to
make,
but
I
can
tell
you
why
it
is
strongly
discouraged
in
the
kubernetes
community
and
that's
because
kubernetes
doesn't
know
when
those
resources
are
being
over
committed.
So,
for
example,
my
host
has
16
gigabytes
of
ram
and
it's
struggling.
It's
hurting
right.
It's
swapping
out
its
rate.
B
It's
sending
memory
pages
to
swap
and
application
performance
is
just
suffering,
but
using
that
swap
space,
kubernetes
isn't
aware
of
it,
so
it
just
looks
at
it
and
says:
oh
your
memory
pressure
looks
fine,
yeah
you're
right,
like
80
or
85
memory
pressure,
so
it'll
keep
assigning
workloads
to
it,
which
just
exacerbates
the
situation
right.
It
keeps
getting
new
pods.
It's
it's
masking
this
underlying
resource
contention
issue
right,
so
you
want
to
be
very
careful
anytime.
You
use
something
like
swap
or
other
resource
over
commitment
technologies.
On
your
notes.
B
B
Basically,
if
the
hypervisor
is
you
know,
you've
got
you've,
got
your
hypervisor
node,
that's
running
your
kubernetes
nodes.
The
hypervisor
is
way
over
committed,
it's
swapping
or
it's
it's
having
to
you
know.
Vsphere
has
cpu
ready
time
right
when
it's
waiting
for
time
and
can't
get
time
on
the
cpu
right,
so
the
vsphere
or
the
hypervisor
is,
is
really
hurting.
Kubernetes
doesn't
know
that
right.
So
it's
saying
I
need
to
auto
scale.
I
need
to
auto
scale.
B
I
need
to
auto
scale
because
the
application
you
know,
maybe
whatever
metrics
you've
got
set
up
on
the
application,
are
saying
everything's
slow.
I
need
to
scale
up.
I
need
to
make
my
application
perform
better.
So
you
end
up
with
this
kind
of
whirlpool
right
this.
This
circling
drain
of
application
needs
to
scale
up
it's
adding
more
resources.
The
underlying
hypervisors
can't
accommodate
those
it's
just
making
everything
worse
and
it
just
leads
to
disaster.
B
The
the
recommendation
I
always
make
is,
if
you
must
use
over
commitment,
make
that
over
commitment
happen
as
close
as
possible
to
the
application
right
so
with
what
that
means
is,
whichever
scheduler
is
closest
to
the
application.
In
this
instance
kubernetes
openshift.
Let
it
handle
that
over
commitment,
don't
do
it
at
the
hypervisor
and
kubernetes.
Don't
do
it
at
you
know
multiple
layers,
you
know
so
on
and
so
forth.
Right.
A
Cool,
so
we
are
approaching
the
top
of
the
hour.
Let's
see
so
somebody
dropped
ocp,
vsphere,
upi
automation.
The
project
is
more
easy
to
use
in
the
ipi
way
is
what
our
friend.
A
B
Yeah
there
are
scenarios
so,
for
example,
openshift.
I
think
it
was
4.4
4.3
or
4.4.
We
introduced
automatic
certificate
renewal.
So
if
you
used
an
early
4.x
version,
you
remember
you
deploy
the
cluster
and
then
within
24
hours.
It
would
rotate
the
certificate
and
if
you
shut
down
the
cluster
within
that
24-hour
period
before
it
rotated
this
the
certificate,
and
then
you
turn
it
back
on
after
that
period
of
time.
B
Everything
would
just
be
in
chaos,
and
it
was
this
long,
complicated
process
of
going
in
and
reissuing
and
reapproving
certificates
and
getting
everything
back
up
and
running.
So
we
fixed
that
it
now
does
automatic
certificate,
rotation
and
approval
and
all
those
other
things
except
when
it's
the
cluster's
been
down
for
a
very
long
time.
I'm
talking
weeks,
sometimes
you
will
need
to
go
in
and
just
reapprove
those
csrs,
even
with
ipi,
yeah
and
reapproving
those
csrs.
Basically
to
get
this,
the
nodes
joined
again,
resets
that
whole
process
and
then
it'll
do
itself.
A
Yeah
it
used
to
be
a
bear
and
now
it's
a
little
bit
easier,
you're
right,
so
I'm
there's
a
lot
of
chat.
So
sorry,
if
I
missed
something,
there's
one
question:
do
we
answer
the
about
the
scale
up
versus
scale
out?
A
We
didn't
ask
that
yet
right.
So,
okay,
would
you
say
it's
better
to
scale
up
worker
nodes
or
scale
out
vertically
versus
horizontally
scale.
Up
makes
more
sense
to
me,
but
scale
out
means.
I
don't
have
any
configuration
changes
right.
So
it's
just
adding
another
node,
for
example-
and
you
know
my
answer
to
that
was
kind
of
like
it
really
depends
on
what
you're.
B
A
Right
like
what
is
faster
in
your
instance
right
if
you're
on
aws
and
changing
you
know
like
ram,
is
pretty
the
you
know,
interesting,
yeah
and
just
quick,
sometimes
right,
but
there
is
your.
Your
system
has
to
be
able
to
acknowledge.
B
C
B
B
You
know
node
failure
for
that
burstiness
or
that
burst
of
of
new
workload
in
the
event
of
a
node
failure
awesome.
So
initially,
maybe
it
makes
sense
to
do
scale
out
instead
of
scale
up.
On
the
other
hand,
you
know
if
your
application
has
fundamentally
changed.
You
know
hey.
We
thought
that
the
largest
pods
were
gonna
have
to
accommodate.
B
Were
you
know,
and-
and
I
know
they
sound-
an
awful
lot
like
vm
sizes,
because
sometimes
they
are
an
awful
lot
like
vm
sizes,
you
know
two
cpus
and
and
eight
gigabytes
of
ram,
but
really
you
know
after
running
for
a
couple
of
months,
the
app
guys
figured
out
that
we
really
need
16
gigabytes
of
ram.
You
know
that
can
dramatically
change
your
strategy
of
hey.
I
still
want
to
keep
x
number
of
instances
of
the
application
per
node.
So
now
I
need
to
scale
up
scale
vertically
to
keep
my
ratio
in
check.
B
So
that's
one
thing
that
I
have
not
discussed
at
all,
and
this
is
a
concept
that
was
introduced
to
me
in
the
storage
world
and
it's
called
stranded
resources
wow.
So
with
storage
we
hear
of
stranded
resources
when
I
have
a
iops
to
gigabytes
mismatch
right.
So
spinning
media
is
really
good
for
this,
or
is
a
really
good
example.
I
can
have
10
terabytes
on
a
single
hard
drive,
but
that
hard
drive
can
only
deliver
100
iops.
B
I
can't
use
it
because
I
don't
have
the
iops
to
deliver
that
right
and
flash
media
ssd
and
especially
nv
nvme
almost
have
the
inverse
problem,
and
this
is
why
you
know
particularly
storage
vendors
that
do
deduplication,
compression
and
stuff,
like
that
see
a
big
benefit
from
flash
media,
because
it
concentrates
iops
onto
those
that
media
and
that
media
has
much
higher
iops
per
gigabyte
density
so
packing
more
of
those.
In
is
beneficial
for
the
media,
so
the
same
thing
is
true,
with
virtualization,
with
kubernetes
with
openshift.
B
B
So,
if
I
need
you
know
when
I'm
creating
my
nodes,
if
I
deploy
a
virtual
machine,
that's
so
that's
what
one
to
four
cpu
to
ram.
So
if
I
deploy
a
virtual
machine
that
has
eight
cpus
and
maybe
48
gigabytes
of
ram
right,
that's
that
ratio
is
off
right.
I
I'm
going
so
four
to
one
or
one
to
four
eight
cpus
48
giga
gigabytes
of
ram
so
eight
times
four
is
32
right,
so
I
would
have
a
32
one.
B
Four,
eight
cpu
32
gigabytes
of
ram
virtual
machine
virtual
node
in
my
openshift
cluster,
to
be
able
to
effectively
accommodate
that
workload
with
48
gigabytes
of
ram
I've
consumed
all
of
my
cpu.
But
now
I
have
an
extra
was
that
16
gigs
of
ram
that's
basically
inaccessible.
As
a
result
of
that,
so
you
want
to
be
cognizant
of
those
ratios
and
keep
them
balanced
so
that
you
don't
strand
resources
accidentally
right.
C
B
So
I
think
it's
gonna
depend
on
the
storage
type
right
for
one
so
for
talking.
Ocs
ocs
does
distribute
data
across
the
nodes
right
I
did
and
I'll
I
have
the
link
up
here.
I
know
everybody's
still
looking
at
my
stream
screen,
so
you
get
a
lovely
picture
of
chris
and
I
hey
chris
talking
and
me
not
paying
attention.
B
I
call
that
a
wednesday,
so
we
did
talk
about
storage
or
sizing
storage
for
your
nodes,
so
I'll
include
a
link
I'll
post
it
here
in
the
chat,
real
quick,
but
I'll
also
include
a
link
to
that
in
the
show
notes.
So
if
you
want
to
go
back
and
listen
to
that
episode
where
we
talked
about
sizing,
the
discs
that
are
used
by
openshift
nodes
to
maybe
help
cover
that.
A
Awesome
cool,
so
we
are
at
the
top
of
the
hour.
We
have
another
show
coming
up
here
in
30
seconds.
Openshift
commons
briefing
is
going
to
include
the
team
at
kong
if
you're
or
wait
nope,
yes,
kong.
If
you're
familiar
with
the
folks
at
kong,
they
have
the
big
gorilla
logo,
so
yeah
we're
gonna
be
jumping
to
that
here
in
a
few
seconds.
So
thank
you
all
for
joining.
Thank
you
for
your
questions.
Andrew,
go
back
through
discord,
chat
and
see
yeah.
B
And
please
I
don't
know
if
you've
got
a
thing
for
discord,
please
feel
free
to
join
discord.
Ask
questions
at
any
time.
Also,
please.
If
you
have
a
question
that
we
didn't
get
answered
today,
follow
up
social
media,
practical,
andrew
at
on
twitter
or
andrew.sullivan
redhat.com.