►
From YouTube: OpenShift at Point72
Description
William Shaw from Point72 and Dan Foley from Agio discuss Point72's production deployment of OpenShift at the OpenShift Commons Gathering Boston on May 1, 2017.
Learn more and see the slides here: https://blog.openshift.com/openshift-commons-gathering-at-red-hat-summit-2017-video-recap-with-slides/
A
A
We
are
a
private
event
of
private
investment
company.
We
manage
the
assets
of
our
founder,
Steve
Cullen,
who
is
about
eleven
billion
in
assets
under
management
is
money.
Our
primary
offices
are
located
in
Stanford
with
offices
in
London,
Hong,
Kong,
Tokyo,
New,
York
and
Singapore.
We
have
about
a
thousand
employees
across
the
globe,
so
our
mission
is
to
be
the
industry's
premier
asset
management
firm.
We
want
to
deliver
superior
risk-adjusted
returns
and
adhere
to
the
highest
ethical
standards.
A
I
want
to
talk
a
little
bit
about
some
of
our
values,
because
our
values
actually
drive
how
we
do
everything
across
the
business,
whether
it's
technology
accounting,
the
way
we
trade
everything
we
do
so.
Ethics
and
integrity
are
four
forefront
for
us
firm.
First,
we
want
to
make
sure
that
we
succeed
together.
So
what
we
deliver
NIT
and
the
development
needs
to
make
sure
that
the
firm
comes.
First,
we
want
to
be
innovated
innovative.
We
want
to
strive
for
excellence,
we're
not
satisfied
with
the
status
quo.
A
We
want
to
constantly
try
to
make
things
better
as
a
company.
We
work
hard
to
work
together
and
we
listen
to
our
peers.
We
listen
to
the
business
and
everything
we
do
is
driven
to
meet
the
business
requirements
in
order
to
deliver
the
experience
that
they're
expecting
and
the
service
levels
that
they
need.
And
the
last
item
which
is
important
to
me
is
that
we're
very
community
focused,
so
we
actively
participate
in
things
in
our
community,
whether
that's
technology
or
the
computer
communities
we
live
in.
A
So
just
by
a
quick
introduction.
My
name
is
Billy
Shaw,
the
director
of
systems,
engineering
I,
have
a
long
career
with
UNIX
and
Linux,
starting
in
the
deity
as
a
spook
as
a
CT
worked
at
San
Francisco
for
a
company
called
organic,
which,
if
anyone's
heard
the
Apache
web
server
kind
of
came
out
of
us
there
back
in
the
90s,
worked
for
Chase
Bank
for
a
while
and
I've
been
with
point
72
since
2004
my.
B
Name
is
Daniel
Foley
I
work
at
I'm,
a
systems
engineer
for
Osseo
I've
been
working
with
Linux
and
UNIX
for
seven.
Eight
years
now,
I
have
been
working
at
a
zoo
for
since
2013
on
the
linux,
support
implementation,
team,
doing
deployments
and
designing
infrastructure
and
support
as
well
for
our
client
environments,
since
March
of
2016
I
have
been
a
dedicated
resource
for
the
point.
Seventy
two
Linux
engineering
team
specifically
also
working
on
the
openshift
enterprise
deployment.
A
So
how
do
we
get
started
with
open
shift?
Last
February
I
was
actually
part
of
our
technology
steering
committee
and
we
started
discussing
how
we're
going
to
change
our
trade,
aggregation
and
processing
platform,
and
through
that
planning
we
came
up
with
a
series
of
principles
to
guide
us
on
our
journey
towards
microservices,
so
we
wanted
to
be
open
source
first
cloud.
First,
we
wanted
reactive
design,
CI
CD,
elastic
scale.
Resiliency
is
always
important
to
us.
We
always
have
to
be
up.
A
We
want
everything
to
be
secure
streaming
and
distributed,
and
we
want
to
build
big
things
from
small
things
right.
We
want
to
have
a
touch
grid
and
development
moving
code,
we'll
talk
about
each
of
these
and
what
they
mean
to
us
here.
So
we
want
to
use
open
source
tools
wherever
possible,
that's
a
shift
from
where
we
had
been
in
the
past
and
as
part
of
our
community,
we're
making
it
a
practice
to
also
start
contributing
back
to
the
open
source
projects
that
we
use
to
run
our
business
cloud
first.
A
We
want
to
always
look
to
the
cloud
for
our
platform
for
our
scale
for
elasticity,
and
we
want
to
be
reactive
right,
so
we
want
to
use
asynchronous
messaging.
We
want
to
be
able
to
have
components
that
have
loose
coupling,
that
we
can
replace
things
in
and
out
of
our
infrastructure,
our
micro
services
and
not
have
a
huge
chain
of
dependencies
across
them.
See
ICD
processes
are
key.
I'll
talk
a
little
bit
more
about
that
principle.
There
is
to
automate,
whenever
possible,
so
elastic
scale.
That's
where
the
cloud
piece
comes
in.
A
A
Resiliency
is
pretty.
Obviously,
we've
set
up
SLA
as
we
want
to
stay
within
them,
especially
as
the
workload
increases,
and
we
want
to
be
decoupled
very
similar
to
to
being
able
to
keep
things
reactive.
We
want
to
be
able
to
swap
textile
technology
in
and
out
today
you
know
we
may
be
using
one
technology
for
streaming
and
tomorrow
we
might
may
want
to
be
able
to
use
another.
We
want
to
be
able
to
be
able
to
bring
those
in
and
out
test
them
and
bring
them
in
with
a
a
full
automation
stack.
A
We
want
it
to
be
secure,
it's
not
optional
for
us,
it's
important
to
us
and
we
apply
it
to
every
layer
of
our
CIC
desex
talk
about
streaming.
So
you
know
we
do
a
lot
of
processing
through
our
analysis
and
our
experience
is
a
lot
easier
to
to
treat
a
bat
as
a
stream
that
it
is
the
street
stream
as
a
batch.
So
that's
part
of
our
processing
technique.
What
everything
to
be
distributed!
That's
where
all
the
shift
came
for
us.
It
was
a
huge
part
of
our
selection.
A
There,
a
little
bit
of
big
things
with
small
things
right,
so
we
do
file
both
agile
practices.
We
have
chapters
so
that,
while
the
agile
teams
are
cross-functional,
there's
a
DevOps
member
who's,
familiar
with
both
open
shift
and
the
CI
CD
and
our
chapters
meet
weekly
to
discuss
what
the
needs
of
each
of
the
agile
teams
and
that's
by
building
those
small
things
filing
agile
methodologies,
we're
able
to
do
our
frequent
release
cycles
and
we're
able
to
provide
business
value
very
quickly
at
the
end
of
each
sprint
and
test-driven
development.
A
So,
as
we
talk
about
automation-
and
we
talk
about
our
pipelines,
we
want
everything
to
be
automated
so
that
a
developer
who
starts
has
to
write
all
of
his
unit.
Tests
first
code
will
not
pass
the
CI
CD
process
without
proper
unit
testing.
It
will
not
compile.
It
will
not
go
anywhere
else.
It'll
just
be
kicked
back
until
they
do
that,
and
that's
also
reviewed
as
part
of
a
pull
request.
A
So
if
they
try
to
speak
someone
by
or
an
approver
says
well,
let's
just
get
it
in
the
the
FCS
CD
we'll
kick
it
back
out
move
code,
not
data
right,
so,
like
most
shops,
our
code
should
be
much
smaller
than
our
data
and
we
had
a
bad
habit
in
our
own
monolithic
architecture,
where
we
were
pushing
data
all
over
the
place
to
get
to
the
code.
Whether
was
moving
copies
of
large
sequel
databases
or
something
like
that.
A
A
So
as
we
get
to
our
adoption
of
open
shift,
we
started
the
journey
back
in
June
of
2016
and
that
we
said
we
want
to
look
at
platform
as
a
service.
It's
important
to
us
to
make
sure
we
weren't
locked
into
a
cloud
vendor.
We
wanted
to
use.
There
is
functionalities,
but
we
wanted
to
be
able
to
have
flexibilities
so
that
today
it
may
be
AWS
tomorrow
it
could
be
to
GCE
and
five
years
so
now
it
would
be
something
that
none
of
us
even
know
about.
A
A
It
was
a
worked
out
really
well
I
passed
all
the
tests
far
and
beyond
anything
else
that
we
had
looked
at
and
then
we
narrowed
it
down
to
a
total
of
three
we
used
in
the
POC
shortly
after
that,
our
Sprint's
started
for
a
Minimum
Viable
Product
in
the
middle
of
August,
and
we
decided
that
that
was
going
so
well
by
a
mid
November
that
we
wanted
to
be
an
open,
shipped
enterprise.
We
started
off
with
origin
because
we
want
to
understand
everything
before
we
made
the
investment
and
it
went
really
well.
A
Through
that,
we
also
looked
at
the
Esk
stack,
that's
provided
an
open
shift,
because
logging
is
a
huge
part
in
monitoring.
So
we
deployed
that
and
we
completed
our
MVP
at
the
end
of
the
year
displayed
it
to
the
company
and
it
went
really
well.
We
were
really
really
able
to
strongly
demonstrate
the
scale
I
think
with
a
processing
two
years
worth
of
trades,
and
we
do
massive
volumes
of
trades
billions
of
millions
per
day.
A
We
were
able
to
aggregate
I,
think
800
million
trades
in
a
few
hours
with
three
pods
and
as
we
scaled
up
enough
other
time
we
got
the
50
paws.
It
was
almost
linear
we're
able
to
do
the
800
million
trades
in
40
minutes.
So
that
was
pretty
huge
for
us
shortly
after
that
we
work
with
Red
Hat
consulting
in
March,
and
we
did
a
open
shift
container
platform
installation
as
part
of
our
enterprise,
and
we
just
recently
installed
Jenkins
and
Cod
forms
to
help
supplement
the
work
that
we're
doing
so.
A
So,
as
we
talked
about
our
deployment
strategy
today,
we're
using
the
atlantean
stack
since
we
have
Jenkins
we're
in
the
process
of
replacing
it.
But
a
developer
will
check
their
code
in
to
get
when
they've
had
an
approved,
pull
request.
We
will
do
a
build
and
unit
tests
that
build
for
us
consists
of
taking
our
code
from
our
git
repo
compiling
anything
that's
necessary.
A
So
if
they're
working
on
a
release
branch,
they
can
make
some
changes
to
their
namespace
they're
able
to
work
in
dev
and
have
multiple
either
features
or
bug
fixes
going
on
concurrently
from
from
the
code
base
before
before
it
goes
out
and
into
our
dev
environment.
So
it's
just
a
little
peek
at
what
the
release
would
look
like.
We've
set
it
up
so
that
right
now
they
can
push
a
button
or
it
can
happen
automatically
and
then
their
code
gets
deployed
via
a
API
call
into
openshift.
A
So,
in
general,
this
is
the
the
reference
architecture
that
we're
working
with
from
the
point.
Seventy
two
networks
should
look
pretty
familiar
to.
Anyone
has
done
a
deployment,
but
we
have
our
H
a
proxy
load
balancer
with
three
masters
that
that
come
inside
our
our
nodes,
we've
opted
to
specify
nodes
and
use
node
selectors
for
deployments,
so
there's
a
developer's
working
and
part
of
their
sprint.
With
our
chapter
meetings
for
the
DevOps
side,
they'll
talk
to
us
ahead
of
time.
I
know
in
this
sprint
I'm
going
to
need
some
amount
of
compute
resource.
A
Some
of
our
applications
will
allocate
dedicated
CPU
or
memory
and
and
that
process
will
make
sure
that
nodes
get
deployed,
have
the
proper
labels
applied
and
in
their
code
as
part
of
their
GUID
repository,
they
can
specify
a
node
selector.
That
gives
us
a
real
big
advantage
in
controlling
our
cost
in
the
cloud
so
that
we're
only
deploying
to
high
end
servers
that
are
doing
a
higher
end
workload.
That
may
require
a
more
secure
memory.
A
A
We
also
have
a
direct
connect
up
to
our
cloud
providers
with
an
IPSec
stumblin
between,
so
that
we
we
have
a
high
level
of
security
and
confidence
there
for
multiple
environments,
we're
leveraging
router
starting
so
that
we
can
have
dev
QA
UAT
within
the
same
cluster
using
different
subdomains
and
make
sure
that
as
developers
are
pushing
through
and
the
CI
CD
kicks
off,
that
everything
is
organized
and
structured
that
we
don't
have
any
blending.
We
do
keep
production
as
a
separate
cluster
so
that
something
that
may
go
awry
here
doesn't
affect
our
SLA
for
production.
A
Next,
oh
and
I
talked
about
the
node
selectors
already
and
we
and
I
talked
about
how
we
guarantee
compute
resources
by
leveraging
those
node
selectors.
So
we
do
some
monitoring
reporting
and
where
we're
getting
through
it.
So
some
of
the
requirements
we
wanted
to
be
able
to
have
a
charge
back
and
show
back
multiple
cloud
providers
provisioning
a
view
of
our
resources
both
for
the
cloud
and
OpenShift,
so
forms
seem
to
fit
that
pretty
nicely
for
us.
A
We
have
it
running
inside
of
openshift
and
its
own
bain
space,
and
we
ever
have
success
with
both
origins
and
the
o
CP
platform.
So
we've
had
no
problems
there.
Just
as
an
example
of
one
of
our
sandbox
environments,
we
are
able
to
get
some
high-level
stats
and
you
can
see
in
the
top
left.
We
have
both
our
origin
and
OCP.
A
We
also
leveraged
the
Esk
stack,
elasticsearch
Kabana,
influent
d,
pretty
heavily.
We
wanted
to
make
sure
we
had
a
UI
for
dashboards.
We
had
separation
of
the
data
so
that
we
could
run
fast
queries
scale
it
quickly.
But
one
thing
that
was
really
important
to
me
is
that
I
didn't
want
my
monitoring,
a
lobbying
platform
to
be
on
the
same
system
that
a
monitoring
and
logging.
So
we
opted
to
separate
the
efk
stack
and
we
use
secure
forward
or
for
fluent
D,
and
we
send
everything
outside
of
that.
A
So
if
there's
a
problem
and
that
problem
is
large
enough,
that
I
can't
get
into
openshift
and
we
haven't
seen
that.
But
if
there
were
then
I
have
access
to
the
data
right
away.
The
other
advantage
for
us
that
they've
brought
us
is
that
we
can
then
run
the
latest
releases
and
anyone
who's
using
the
elasticsearch
stack
knows
that
they
have
a
very
frequent
release
cycle
and
they
are
providing
a
lot
of
features
that
we've
been
looking
for.
A
The
last
thing
from
our
developers
was
the
ability
to
leverage
different
plugins,
either
in
a
left,
the
search,
fluid
D
or
Cabana,
and
we
face
the
challenges.
That
is
the
the
new
container
image
that
was
coming
from
Red,
Hat
and
openshift
would
update
and
overwrite
some
of
our
customizations
so
that
that
proved
out
really
well
for
us
and
gave
us
exactly
what
we're
looking
for.
So
this
is
an
overview
of
what
we
have
set
up.
So
we
have
a
fluent
D
daemon,
set
levers
to
secure
Florida
to
a
fluent
D
server.
A
That
also
is
running
our
Cabana
interface
with
elasticsearch
5.3,
and
then
this
little
dashboard
at
the
end
is
one
that
I
contrived
with
a
ton
of
errors,
but
just
to
show
a
dashboard
code.
Bhama
in
terms
of
the
number
of
you
know,
what
do
we
have?
Some
of
our
own
stats
that
were
skipped?
Cpu
failed
error,
so
we
were
messing
around
and
taking
things
down
to
create
a
dashboard.
That's
interesting,
because
the
life
it
really
doesn't
show
us
very
much
so
Prometheus,
which
was
mentioned
earlier,
was
also
important
to
us
in
our
stack.
A
We
looked
around
at
some
commercial
products
and
we
of
course
open
source
first
and
found
that
previous
met
our
needs
quite
well.
It
does
a
good
job
of
scraping.
The
data
gives
us
a
nice
life
speed
for
a
restful
api.
As
we
get
into
Jenkins,
we
can
pull
metrics
in
from
Jenkins
as
well.
So
we
decided
to
do
that.
It's
in
a
custom
container
that
we
created
for
OpenShift
and
we
don't
expose
Prometheus
data
to
any
of
the
users.
A
B
B
There's
a
bunch
of
you
create
different
organizations,
different
levels
of
access
control,
which
was
nice,
the
ability
to
export
graphs
and
data
and
the
ability
to
create
custom
views
and
dashboards
griffin
is
as
well
deployed
in
the
custom
container
and
for
access
to
the
data
feeds
and
keep
prometheus
and
accessible,
remove,
remove
the
Prometheus
route
and
used
internal
DNS
name,
the
listed
there
and
in
Griffin.
Oh,
we
used
multiple
data
feeds,
aren't
able
to
control
user
access,
create
custom,
dashboards,
queries
and
export
data
for
analysis.
B
Here
are
some
examples,
different
dashboards
that
we
created,
as
you
can
see,
we're
not
just
using
this
for
open
ships.
Specifically,
we
can
also
use
it
for
our
Hadoop
cluster
as
well.
As
you
know,
managing
that
the
host
metrics
physically
pretty
much
anything
that
we
need,
which
is
why
we
chose
Griffin.
B
So
troubleshooting
we
just
wanted
to
share
with
you
guys
some
of
the
troubleshoot,
like
some
of
the
issues
that
we
ran
into
some
of
the
troubleshooting
steps
that
we
took,
how
we
resolved
them
and
just
a
different
couple,
different
scenarios,
which
may
be
useful
for
some.
If
you
guys
run
into
it
in
the
future,
see
we
use
node
selectors
as
part
of
our
deployments,
we've
had
to
change
the
cloud
instance
type
to
match
the
business
workloads
used
by
OpenShift.
We
found
there
are
times
where
node
labels
were
no
longer
applied
after
the
instance.
B
A
Yeah
sure
so
one
of
the
key
things
managing
the
team
that
I'm
responsible
for,
of
course,
is
cost.
So
we've
made
it
a
point
to
turn
off
our
dev
and
POC
environments
through
the
process
so
that
we
we
would
be
able
to
save
money
right
because
through
those
we're
running
on
hourly
instances.
So
as
part
of
that
you
know,
this
is
just
a
simple
thing
where
we,
we
wrote
some
jobs
that
would
run
through
the
Amazon
or
Google
api's
and
shut
everything
down
on
schedule,
bring
it
up
on
schedule.
It
was
flexible
and
adjustable.
A
We
didn't
give
the
developers
access
to
decide
when,
but
we
made
sure
that
it
was.
It
was
consistent
and
we
did
change
it
over
time,
as
our
development
teams
became
more
global
spreading
everywhere
for
the
Ukraine
all
the
way
through
India,
so
I'm
oftentimes,
sometimes
just
they
often
have
sometimes
we'll
see
a
cluster
not
ready.
If
we're
running
a
OC
get
nodes-
and
this
can
have
any
number
of
reasons.
Typically,
we
found
the
best
thing
to
do
is
to
go
through
and
just
start
doing,
some
good
old-fashioned
troubleshooting
on
the
node.
If
it's
not
ready.
A
A
This
did
not
translate
well
from
from
the
PowerPoint,
but
at
any
rate
one
of
these
it
was
supposed
to
fade
the
background.
But
in
this
example,
if
we
are
able
to
do
a
disposer
described
node,
you
can
start
to
see
that,
on
the
top
part
you're
going
to
get
some
utilization
bubbles
for
all
the
pods
inside
your
node.
A
So
we
also
find
that
very
useful
in
troubleshooting,
because
if
the
nodes
busy,
because
there's
a
heavy
workload
which
pot
is
causing
it
what's
what's
really
driving
the
workload
in
the
utilization
which
in
turn
we
can
go
back
to
the
developer
and
say:
hey.
You
just
released
this
code
and
it's
really
messing
things
up.
Please
go
take
a
look
and
we're
going
to
ask
you
to
deploy
a
new
build,
so
so
that's
pretty
pretty
nice
and
then,
of
course,
on
the
bottom.
A
If
you
haven't
looked
at
that,
you
can
get
the
events
that
have
happened
for
that
node
for
the
for,
though,
since
the
since
of
this
case
of
for
the
last
seven
and
ten
hours,
so
it
was
first
seen,
but
it's
checking
frequently
every
10
seconds.
It
seems
so
we
leverage
this
pretty
much
we're
command-line
junkies,
the
UI
is
great
and
our
developers
use
it,
but
we
we
do
everything
to
the
API
and
UI,
so
so
our
routes,
we
we
have
a
distant
experimentation
and
our
POC,
where
we
were
taking
routers
in
and
out.
A
One
of
the
things
we
were
looking
for
was
to
understand
for
production.
If
we
hit
a
certain
level
of
requests
and
workload
flowing
through,
therefore,
the
applications
were
running.
How
is
the
behavior?
How
do
we
extend
our
available
routers
on
the
infrastructure?
Those
in
order
to
make
sure
we
can
service
requests?
So
during
that
process
we
we
really
found
that
this
was
self
inflicted.
A
But
the
fact
is
that
we
didn't
have
wildcard
DNS
entries
because
we're
not
using
an
f5
load
balancer
in
front
of
it
for
our
routes
so
going
in
and
just
paying
attention
to.
Other
parts
of
your
infrastructure
is
really
the
point
here.
Don't
assume
everything
could
be
isolated
inside
up
OpenShift
because
there
are
upstream
dependencies
in
your
environment
as
we
went
through,
Skype
ENS
was
important
to
us,
and
I
dan
has
done
a
lot
of
work
with
some
custom
ation
there.
So
you
want
to
talk
a
little
about
that.
B
Well,
yeah
sure
I
mean
basically
it
was
the
we
had
to
find
a
way
to
assist
our
developers.
They
have
multiple
applications
and
custom
applications
that
need
to
communicate
with
each
other,
but
they're,
not
necessarily
in
the
same
project
or
namespace
I
mean
this
was
a
pretty
simple.
You
know,
fix
and
looking
up
in
the
documentation
about
sky
DNS
by
default,
functions
and
openshift,
but
you
know
using
BB
different
names
there.
You
know
with
the
service
the
project,
the
namespace.
You
know
SBC,
that
cluster
out
local.
A
B
To
take
this
okay,
so
we
found
that
applications
will
expose
poor
other
than
HTTP
and
HTTPS,
and
our
developers
need
access
to
those
ports.
So
that
was
one
of
the
issues
that
we
ran
into
at
first,
that
actually
we
were
banging
our
heads
on
for
a
little
while
is
like
the
ports
open,
but
nobody
can
connect.
This
is
this
is
kind
of
weird,
but
I
mean
it
was
again
fairly
simple:
the
load
balancers
don't
handle
non
HTTP
or
HTTPS
traffic.
There
is
a
solution
for
this
node
ports,
which
is
also
fairly
simple
to
resolve.
B
You
can
go
into
the
service.
Template,
modify
the
amyl
right
there
in
the
UI
change
the
type
from
cluster
IP
to
node
port,
and
it
will
either
automatically
assign
a
node
port,
or
you
can
specify
one
yourself.
The
range
by
default
is
30,000
to
32,000.
We
in
our
CITV
process.
We
have
a
port
management.
B
One
of
our
developers
helped
me
out
with
this,
but
it
will
go
through
and
select
the
next
open
port
in
the
list
out
of
32,000
and
automatically
assign
it
and,
as
you
can
see,
you
can
like
once
the
node
port
is
assigned
to
sign
to
all
the
nodes
in
your
cluster,
not
just
one.
So
it's
kind
of
a
reservation
that
no
matter
when
the
pod
is
running
on
any
one
of
your
nodes.
That
port
is
specifically
reserved
for
that
pod
everywhere.
A
B
So
sorry,
so
no
Definity.
That
was
something
that
we
also
had
an
issue
with
prior
23.4
and
anti-node.
Excuse
me
no
to
anti
affinity
was
we
wanted
to
ensure
specific
applications
and
pods
were
running
all
on
different
nodes
right
in
case
one
of
the
nose
goes
down.
We
are
100%
certain
that
this
application
is
running
on
a
different,
different
node
well,
prior
to.
A
If
I
can
add
to
that
to
is
we
were
shutting
things
down
every
day
and
bringing
it
up
as
the
cluster
would
come
up
if
a
node
came
up
and
it
was
the
selector
matched
it
would
try.
Open
shape
would
try
to
deploy
everything
there,
because
it's
sought
first
and
we
would
be
able
to
go
back
and
rebalance
using
this
technique
as
well.
Right.
B
Exactly
and
so
that
that
was
an
issue,
3.4
has
included
the
node
anti
affinity,
which
has
solved
our
problem,
but,
as
you
can
see,
one
workaround
that
we
used
was
just
multiple
labels.
It's
a
label
and
it's
like
a
secondary
group
label,
so
you
have
your
type,
which
is
in
this
case.
You
know
our
messaging
and
then
a
secondary
group,
which
was
our
messaging
3.
B
So
the
way
that
that
you
know
in
the
example
here
is
say
we
have
three
pods
that
we
want
to
ensure
is
running
on
our
high
compute
nodes
and
they
have
to
be
evenly
distributed.
To
do
this,
you
have
three
or
six
nodes
specifically
and
then
you
will
have
one
like
they're.
All
the
nodes
are
labeled
with
compute
and
then
you'll
have
your
subgroup
two
of
the
nodes,
or
you
know,
Group.
One
group,
two
group
three
and
you
just
ensure
that
your
application
is
labeled
appropriately
to
a
specific
group,
so
they
are
evenly
distributed.
A
So
one
of
the
things
that
we
also
run
into
which
I'm
sure
everybody
has
is
troubleshooting
networking
issues,
things
that
we
don't
want
to
put
out
there
and
any
of
our
containers
or
pods
are.
You
know,
tools
that
we
don't
need
paying
a
net
cat
and
map
curl,
etc,
and
we
get
a
lot
of
requests
for
it.
So
we
came
up
with
some
other
techniques
to
use
it,
and
one
of
them
is
a
handy
little
device
called
dev
CCP.
A
A
But
as
you
see
in
the
top
left
just
three
lines
in
a
batch
script
and
we're
able
to
you
know
exec
the
device,
we're
able
to
do
an
echo
statement
and
feed
in,
in
this
case
a
very
simple
HTTP
request
and
we
get
the
response
back
in
there
and
then
we
can
parse
this
response
with
our
monitoring
systems
or
other
tools
and
very
quickly
and
very
easily
use,
what's
already
provided
to
us
without
adding
anything
extra.
So
so
we
make
that
a
key
part
of
our
troubleshooting
whenever
we
think
we
have
Network
issues.
A
So
sometimes
we
saw
some
things
with
with
our
docker
registry
early
on
and
really
that
just
came
to
us
being
fairly
new
with
it
and
following
some
good
practices
about
cleaning
things
up.
So,
oh
I'm,
sorry
a
little
head
of
my
site,
so
we
yeah.
We
had
multiple
images
that
we
wanted
a
private
registry,
so
we
do
use
artifactory.
All
of
our
images
are
source
to
from
Red
Hat.
We
feed
them
into
there
and
then
we
use
that
when
we
go
through
our
CI
CD
process
in
order
to
make
sure
that
we
have
an
application.
A
That
is,
it
is
purpose-built
for
what's
being
run,
and
then
this
is
just
a
list
of
some
of
the
images
again
there's
pages
of
them,
as
our
developers
like
to
deploy
early
and
often
so.
Some
of
the
other
problems
that
we
ran
into
with
we've
actually
run
into
this
everywhere
we
run
docker,
is
that
our
networking
team
many
years
ago
use
one
7200
16
as
the
network
locks
and
10%.
We
people
have
seen
that
so
we've
run
into
that.
So
we
we
have
a
practice
to
that.
A
B
B
Oh,
what
do
I
do
now:
okay,
the
clusters
down
I
have
no
more
stories
and
I
have
EBS
volumes
attached
like
how
am
I
supposed
to
you
know,
work
with
open
shift
and
expanding
these
volumes,
because
EBS
volumes
are
fairly
simple
to
expand
anyways
right,
so
you
create
a
snapshot,
and
then
you
create
a
volume
from
that
snapshot.
That's
a
larger
size,
fairly
simple!
Well,
then,
I
was
thinking.
B
What
about
an
open
shift
like
how
do
I
get
my
pod
to
see
this
disc
and
expand
the
file
system
like
is
openshift
going
to
handle
this
properly?
Is
it
going
to
like
recognize
this,
this
new
disk
and
like
reformat
it
since
I
have
XFS
specified
in
my
physical
volume
like
I?
Just
wasn't
sure
so.
This
took
some
some
testing
and
some
hoping
had.
Nothing
was
going
to
happen,
but
so
it
turns
out
what
I
could
do
was
expand
the
volumes,
the
EBS
volumes
itself
and
then
attach
I'm.
Sorry
then
actually
go
in
and
update.
B
The
physical
volume
object
in
open
shift
itself,
which
is
just
changing
the
volume
ID
and
the
size,
leaving
everything
else.
The
same
actually
spin
up
the
pod.
It's
going
to
fail,
obviously,
because
it's
still
out
of
space
but
I
can
check
and
see
where
that
pod
was
currently
what
node
it's
running
on,
login
to
that
node
and
and
simple
enough,
the
disk
is
mounted
and
you
can
just
do
a
XF
escrow
FS
and
grow
it
on
the
fly.
A
B
Api
and
my
wording
here
I
think
I
mess
it
up
a
little
bit.
It's
a
so
again
in
origin,
specifically
prior
to
3.4,
we
were
having
issues
with
the
AWS
Storage
classes
could
not
find
could
not
find
very
much
in
regards
to
the
our
specific
issue
other
than
it's
really
weird.
So,
basically
what
the
issue
was
even
with
the
zone
specified
whenever
we
would
go
to
auto
provision
and
EBS
volume,
it
would
be
created
in
a
random
zone,
whether
it
be
1a,
1b,
1c,
1d
I
mean
it
was
completely
random.
B
Therefore,
if
it's
not
in
the
same
zone,
it's
not
going
to
connect
so
to
address
the
issue.
We
created
an
API
to
kind
of
interface
between
OpenShift
as
well
as
AWS,
so
the
API
has
to
it
does
multiple
things,
but
this
was
specifically
in
regards
to
fixing
this
one
specific
issue.
I
just
wanted
to
add
a
bunch
of
other
stuff
for
fun
and
getting
a
token.
B
That
is
another
thing
that
I
had
to
add
for
our
developers,
assisting
them
with
some
other
issues
they
were
having
during
the
CI
CD
process,
but
so
the
API
would
make
a
query
to
AWS
to
create
the
EBS
volume,
which
would
in
turn
return
the
volume
ID
using
that
I
was
able
to
then
have
my
API
query.
The
openshift
API
to
create
a
physical
volume
object
with
The
Associated
volume,
ID
size
and
name,
and
this
was
something
that
we
added
into
our
CI
CD
process.
B
A
Right
so
the
last
thing
we'll
cover
on
our
troubleshooting
was
backups,
so
critical
to
everything
in
addition
to
managing
the
Lenox
team
and
responsible
for
our
cloud
I,
also
in
backups,
all
the
storage
at
the
firm
as
well.
So
it's
at
the
forefront
of
my
mind.
So
we
came
up
with
a
simple
technique
where
we
want
to
back
up
our
certs
and
any
of
our
keys.
A
We
want
to
step
through
and
make
sure
that
we
hit
SPD
and
we
hit
any
of
the
namespaces
and
deployments
that
we
have
so
to
get
there
right
now
and
all
of
this
will
run
it
will
drop
to
a
directory.
We
check
that
in
to
get
every
day
and
it
gives
us
the
ability
to
be
able
to
restore
from
any
tag
or
version
that
we
have
inside
of
get.
So
we
will
do
our
sed
CTL
backups.
We
will
log
in
make
sure
we
get
a
list
of
our
projects
and
from
the
list
of
projects.
A
We
export
each
step
of
the
way
as
an
open
shift
template
and
that
template
is
what
we
bring
inside
of
our
of
our
git
repo
for
backups,
our
git
repos,
of
course,
our
backup
in
our
enterprise
backup
system.
But
it
gives
us
a
lot
of
flexibility
as
we
look
to
the
horizon
for
next
steps.
We
also
want
to
be
able
to
create
multiple
environments
very
quickly
that
are
identical
and
if
we
can
pull
them
out
of
here,
along
with
the
git
repos
for
the
applications,
we
feel
like.
We
have
a
very
good
start
on
that.