►
From YouTube: OpenShift Case Study : RackSpace - Greg Swift, Rackspace
Description
OpenShift Commons Gathering December 5th 2017 Austin, Texas
Greg Swift, Rackspace
A
So
my
name
is
Greg:
Swift
I've
been
at
Rackspace
for
about
five
years,
I
actually
used
to
work
with
Joel
at
the
US
courts
before
coming
to
Rackspace
I'm
excited
to
hear
that
there
grow
up
doing
such
good
work
with
openshift
now
so
for
anybody
that
doesn't
know
Rackspace
well
overview.
These
are
the
things
I'm
gonna
talk
about
a
quick
overview
of
Rackspace.
What
we
do
what
we
need.
You
know
how
we're
gonna
get
there
and
then
the
kinds
of
things
we
learn
along
the
way
with
OpenShift.
A
So
first
off
Rackspace
is
really
about
managed
services
and
providing
fanatical
support.
It
seems
like
we're
selling
a
lot
of
individual
products,
but
really
like
at
the
end
of
the
day.
That's
what
we're
trying
to
provide,
and
we
do
that
over
a
huge
breadth
of
products
pretty
much.
If
you
want
it,
we
try
and
provide
it
mm-hmm.
A
This
can
lead
to
some
problems
for
us,
because
it
means
that
we
are
a
collection
of
hundreds
of
IT
departments,
all
highly
skilled,
highly.
You
know
intelligent
people
trying
to
make
things
happen
as
fast
as
possible
inside
their
domain,
and
so
you
got
the
AWS
guys
over
here,
the
other
guys
over
here
and
they're
all
just
going
that
way.
A
We
end
up
following
a
lot
of
good
practices,
a
lot
of
best
practices.
If
you
want
an
example
of
the
best
practices
to
come
to
us,
we
will
find
one
of
our
thousands
of
people.
That
knows
that
practice
really
well,
but
what
that
means
for
us
is
an
enterprise.
Is
it
makes
the
internal
use
of
products
a
little
bit
more
difficult,
so.
A
It
can
make
it
like
you're
changing
companies
if
you
switch
from
one
operations
team
to
another
or
one
of
the
groups
that
I
work
with
supports
several
hundred
apps
and
they
are
switching
companies
every
15
minutes.
Sometimes,
if
there's
something
big
going
on
and
then
the
compliance
time
is,
can
be
a
mad
rush
because
of
that
200
different
variances
to
accomplish
the
same
thing.
A
So
what
we
needed
was
to
try
and
be
able
to
come.
Rock
come
back
and
say:
okay
for
the
internal
things,
for
the
things
that
are
not
our
support,
bread-and-butter,
the
are
the
services
that
were
providing
out
to
our
customers.
How
do
we
solve
those
problems,
and
so
we
needed
to
do
was
realize
that
best
practices
needed
to
find
their
way
up
to
a
standard
practice.
Here's
the
commonality
that
we
need
to
be
following.
A
We
need
to
get
it
to
where
all
the
people
that
don't
need
to
be
mangie
in
the
entire
stock
have
a
good
option
for
somebody
else
to
do
it
for
them
and
then
realize
that
not
everybody's
gonna
get
their
problem
solved.
You're
still
gonna
have
that
10%
that's
running
off
to
the
side
and
that's
not
necessarily
a
bad
thing.
That's
where
innovation
can
happen.
That's
you
know,
sometimes
just
the
cost
of
doing
business.
A
So
we
can
go
further
as
a
company
together,
because
when
you're
working
for
a
company,
that's
who
it's
about
technically
it's
about
that
company
and
making
sure
the
product
is
good
for
them
and
not
hurting
your
co-workers,
who
are
part
of
that
company.
So
we
can
go
further
together
instead
of
faster
apart.
A
So
our
goals
developers
are
the
smees.
Let
them
be
this
me.
Let
them
know
about
god.
Let
them
know
how
prod
runs,
try
and
get
to
a
point
where
we
were
trying
to
get
to
a
point
where
we
can
just
say
developers
have
access
to
private
in
a
compliant
environments.
We
have
to
implement
some
significant
controls
to
make
that
acceptable,
but
it
is
possible,
get
operations
out
of
that
path,
make
it
so
that
the
dev
teams,
then
your
product
team,
at
this
point
just
doesn't
have
to
worry
about
the
standalone
operations
team.
A
A
Really
fancy
word
to
say:
whenever
PCI
comes
around,
we
can
give
them
that
report
a
lot
quicker
with
a
lot
less
resources
and
then
actually
move
faster,
because
the
trick
about
that
going
further
together
is
that
once
you
get
to
a
point
in
that
race,
you're
still
you're
actually
going
faster
as
well,
and
that's
the
you
got
to
find
that
point,
but
you
will
get
there
as
long
as
you
follow.
So
how
are
we
getting
there
for
I
as
we
decide
to
utilize
one
of
our
largest
I
as
products?
A
Rackspace
does
several,
as
I
mentioned
earlier
or
as
I
had
on
the
slide
earlier
when
we
went
with
was
Rackspace
private
cloud
powered
by
VMware,
it's
one
of
our
larger
products,
and
so
it
was
an
easy
win.
We
have
a
lot
of
internal
support
for
it.
We've
got
a
lot
of
experts
on
it.
We've
been
providing
that
product
for
probably
the
life
of
the
company
almost
so
then
the
only
problem
with
that
really
became
how
to
stay
ahead
of
demand
because
everybody
needs
a
place
to
put
their
stuff.
A
So
then,
our
first
passive
I
thought
I
had
updated
the
top
of
the
slide
for
a
nice
little
pun,
but
apparently
not
our
first
Casa
de
Paz
was
actually
started
about
two
years
ago.
It
was
an
in-house
app
written
in
Ruby
called
maestro,
and
it
was
built
on
top
of
marathon
and
mazes,
and
it
was
intended
to
be
very
Heroku
like
build
packs.
Curls
to
the
API
is
those
kinds
of
things
it
worked
for
the
most
part,
but
when
you
have
developer
turn,
then
you
have
a
hard
like.
A
We
didn't
have
a
team
supporting
it
after
a
year
and
once
we
start
getting
more
resources
into
it,
it
was
still
that
like
well,
maybe
going
to
open
shift
is
a
better
idea,
and
so
we
went
on
our
second
pass
and
so
we've
started
building
out
in
an
open,
shipped
environment,
we're
working
on
our
third
region.
Right
now
we
started
off
with
1.4
upgrade
to
1/5.
That
was,
unfortunately,
a
painful
upgrade
for
us,
primarily
because
of
logging
and
some
custom
changes
that
we
had
internally.
A
So
we
haven't
gone
to
one
six
yet
we're
in
about
to
try
that
out.
Storage
was
a
little
bit
of
a
hiccup
for
us
as
well.
We
started
with
cluster
of
s,
but
elasticsearch
did
not
like
it
for
the
aggregated
logging.
I
didn't
see.
Anybody
else
complain
about
that.
So
I
don't
know
if
it
was
still
something
we
were
doing,
but
so
we
moved
that
to
south.
We
still
occasionally
run
into
issues
with
that
and
we're
gonna
just
move
elasticsearch
outside
of
the
cluster.
A
A
Jenkins
did
very
much
become
a
top
consumer,
both
in
number
of
instances
and
actual
resources.
I
think
we
had
a
couple
that
I
had
a
minimum
memory
footprint
of
four
gigabytes
for
their
app.
So
but
the
successes
we
had
a
new
ticketing
API,
that's
had
a
demo
stage
right
now
that
was
able
to
get
all
the
way
out
to
production
for
that
within
a
couple
months
with
minimal
operation.
Involvement,
which
has
been
great.
Our
QE
team
several
months
ago,
migrated
over
there
testing
and
for
our
internal
identity
system,
and
they
say
its
fifteenth
out.
A
15
million
requests
from
this
testing
suite
within
a
couple
of
days.
He
implemented
this
and
he
was
very
happy
and
impressed
with
that
he's
saying
over
there
somewhere.
So
right
now
we're
at
a
couple
hundred
projects.
Half
of
them
are
sandbox
playgrounds
and
about
15%
in
our
CI
city
projects.
I've
only
got
one
customer
facing
production
bot
system
on
it.
Right
now.
We've
got
several
production
services,
technically
you're,
not
production
as
far
as
I'm
concerned.
A
So
some
lessons
learned-
these
were
just
some
points
that
I
thought
it'd
be
nice
to
share,
especially
if
you
haven't
done
this
before,
as
we
ran
through
the
things
we
ran
into,
so
it
took
a
while
to
fully
learn
this
lesson
in
the
ansible
inventory.
You've
got
the
elbe
nodes
and
because
the
routers
are
similar
to
the
elbe
and
because
they
both
run
each
a
proxy,
it's
real
easy
to
just
kind
of
sit
in
your
head
and
go
oh
they're,
the
same
thing
and
they're,
not
at
all
I
default.
A
A
The
routers
are
then
pods
running
H,
a
proxy
that
run
on
any
nodes
that
are
inside
your
the
router
selector,
which
defaults
to
the
infra
region
that
might
have
all
just
basically
like
by
default.
There's
an
info
region.
If
you
don't
have
scheduled
nodes
in
that
info
region,
say
you
just
have
your
three
masters
and
they're
all
set
to
unschedulable,
because
that's
what
the
image
the
instructions
tell
you
to
do,
you're,
never
gonna,
get
anything
running.
It
took
me
like
a
week
to
figure
out.
A
That's
why
those
nodes
weren't
coming
up
so
once
you
add
additional
nodes
into
that,
infra
region
that
can
be
scheduled
on
those
nodes
will
come
up
where
I
actually
ran
in
the
problem
was
I
had
two
nodes,
but
the
default
I
think
for
the
router.
Replicas
is
five,
and
so
I
only
had
two
nodes,
and
so
it
just
was
never
coming
up.
A
Once
we
went
in
there
shifted
that
down
to
two
everything
was
fine
and
so
basically
like
in
our
hosts
inventory,
there
was
a
nice
big
comment
section
now
that
says
you
know
make
sure
that
router
replicas
is
no
more
than
the
number
of
nodes
in
the
router
region.
We
set
aside
a
separate
region
for
the
routers.
A
So
right
now
we
actually
are
two
primary
elby's
which
run
the
uncontained.
Erised
are
also
running
the
router
I'd
like
to
change
that
at
some
point
and
just
keep
them
completely
separate.
I
think
it
would
be
easier
over
time
to
manage,
because
you
have
that
distinction
of
what
they
are
quotas,
one
of
the
things
that
I'm
happy
we
did
was
start
off
with
quotas
from
the
get-go.
We
every
project
that
you
create
it's
a
very
default
kind
of
minimal
quota.
A
We
don't
really
put
a
high
barrier
to
entry
to
requesting
a
higher
quota,
except
that
we
prefer
to
only
give
them
to
you
if
you
are
following
our
conventions
for
naming
and
such
to
prove
that
it's
not
just
to
your
personal
playground,
but
even
if
it's
your
personal
playground,
if
you've
got
like,
if
you
want
a
higher
quota,
we're
likely
to
give
it
to
you
because
we're
it's
that
we
just
kind
of
want
to
keep
a
lid
on
things.
We're
not
trying
to
be
overly
restrictive.
A
The
one
thing
that
we
didn't
include
from
the
get-go
we
tried
and
and
what
happened
was
or
what
we
didn't
implement
was
resource
limiting.
So,
like
CPUs
and
memory,
you
can
add
those
into
the
quotas.
Instead,
we
just
restricted,
like
the
number
of
items
that
you
could
have
the
number
of
pause.
The
number
of
storage
containers
things
like
that
when
we
added
the
resource
limiting
anybody
that
went
to
go
load,
a
new
app
failed
because
all
the
QuickStart
templates
don't
have
any
default
resource
requests.
A
And
if
your
template
doesn't
request
the
resource,
then
it
fence
so
laziness
and
time
and
all
that
of
what
was
going
on
at
the
time.
We's
like
okay,
well,
we'll
revisit
that
later,
because
it
means
we
have
to
go
edit.
All
of
the
templates
that
came
with
openshift
to
include
those
requests,
and
so
we've
got
a
store
on
our
backlog
to
go,
implement
that
everywhere
and
it
does
work.
We
did
play
with
it
a
little
bit
on
something
that
I'll
be
getting
to
in
a
second.
A
Don't
remember
why
you
hard-coded
it
in
here,
but
that's
based
on
the
documentation
and
size,
the
computer,
the
size
of
the
nodes
and
number
of
pods
that
they
can
handle
our
nodes
are
pretty
small
intentionally.
So
then
the
garbage
collection
threshold
high
and
low.
So
what
this
is
is
the
local
image
repository
on
each
of
the
nodes
takes
up
a
certain
amount
of
disk.
The
high
threshold
is
where
the
garbage
collection
kicks
in,
and
then
it
tries
to
clear
out
until
it's
lower
than
the
low
threshold
fairly
easy.
A
We've
actually
still
seen
that
error
once
or
twice,
but
it's
very
rare
now
so
definitely
something
to
keep
an
eye
on
then
the
other
was
our
first
major
incident
came
from
nodes,
starting
to
whom
kill
on
us.
It
was
decidedly
not
fun.
The
we
didn't
have
system
reserved
defined
or
secret
driver
I'm,
not
a
hundred
percent
straighter.
The
secret
driver
actually
has
to
be
in
there
reading
through
the
docs.
A
We
thought
all
three
of
those
bottom
ones
needed
to
be
there,
but
the
bottom
to
actually
break
origin
node
when
it's
there
there
so,
but
it's
worth
I
left
them
there.
So
you
can
see.
Don't
add
those
to
this
because
they
will
stop
origin
note
from
working,
but
basically
the
the
goal
there
is
to
reserve
an
amount
of
memory
on
the
system
so
that
way,
open
ship
doesn't
kill
itself,
which
is
what
happened.
A
This
is
where
we
actually
played
with
the
resource
inside
our
quotas.
Red
Hat
has
put
together
this
awesome
set
of
resources
and
they've
gone
around
the
country,
probably
the
globe,
giving
free
workshops
where
you
can
come
in
I
totally
thought
it
was
going
to
be
a
sales
pitch
and
I
went
in
and
we
got
to
sit
down
and
do
all
day
labs.
It
was
amazing,
it
was
so
much
fun.
So
excuse
me,
the
content
is
all
out
on
a
public
github.
It's
fairly
easy
to
kick
off
your
own
version
of
it.
I.
A
Run
this
internally
and
it's
just
up
running
now,
and
we
have
a
special
quota
set
aside,
that
has
resource
limits,
the
the
roadshow
that
they
did
in
San
Antonio.
We
blew
out
their
reservations
and
halfway
through
the
day
we
overloaded
their
system,
and
so
with
that
in
mind,
when
I
went
to
go,
do
the
big
version
of
internally
at
Rackspace.
A
We
made
sure
we
had
pretty
good
resource
quotas
in
place
before
we
let
people
on
it
and
we
were
able
to
handle
a
good
hundred
plus
people
which
was
about
what
was
in
the
roadshow
without
it
affecting
any
of
our
production
workloads
or
anything.
We
ran
it
on
our
main
system.
It
was,
it
was
pretty
good,
so
internally,
we've
got
several
teams
that
are
working
on
using
Hjelm
to
manage
things
basically
trying
to
provide
a
little
bit
more
composable
templates
for
reuse.
A
This
is
not
fully
embraced
for
everything
yet
mainly
because
a
helm
is
single
tenant
at
this
point,
but
there
is
work
upstream
to
change
that.
It
does
appear
that
this
is
going
to
event
kind
of
be
a
little
bit
more
of
a
thing,
and
so
we've
seen
a
lot
of
us
expect
success
the
teams
that
have
been
using
it.
If
anybody
wants
to
talk
to
one
of
the
people
that
uses
it
come
find
me
and
I'll
introduce
you.
A
A
We
had
deployed
using
a
ten
dot
Network.
Well,
we
already
used
ten
dot
everywhere
inside
or
a
big
hosting
provider,
with
a
big
private
networks,
we're
using
pretty
much
all
of
ten
dot.
Slash
eight
and
I
had
deployed
it
in
all
of
our
POC
s
using
in
172
and
then
I'm
going
to
go
do
prod
and
we
deployed
it
with
ten
dot
and
the
night
before
my
coworker
goes.
A
A
That
could
lead
to
some
weird
wonkiness
here
and
there
another
thing
that
bit
us
in
the
long
run
was
we
deployed
using
open
ship
danceable,
obviously,
and
we
went
to
go,
extend
it
and
we
had
no
idea
what
hash
we
deployed
on
and
we
went
to
go
deploy
again
using
whatever
was
the
current
things
acted
weird
because
it
was
not
exactly
at
the
right
State
and
for
whatever
reason
we
eventually
got
to
a
point
and
we
don't
know
what
hashed
we're
working
with,
but
definitely
keep
track
of
that.
It's
also
helpful.
A
A
We've
also
been
working
to
start
handling
a
lot
of
our
post
deployment
changes
like
adding
quotas
to
things
using
the
ansible
OC
module.
So
it's
a
lot
more
programmatic.
Instead
of
somebody
just
doing
the
OC
create
over
a
bunch
of
files
that
are
in
a
repo
which
it's
at
least
a
little
bit
more
automatic,
even
though
it's
the
same
results
and
then
I
left
that
last
line
in
the
wrong
spot,
so
it
was
so
greg
swift.
Those
are
the
ways
to
get
a
hold
of
me.
So
thank
you.
B
I
loved
in
this,
the
shout-out
to
the
Roadshow
stuff,
because
that's
been
one
of
the
things
that
we've
used
to
get
people
started
really
quickly
and
came
out
of
the
Evangelist
team.
So
it's
great
that
you're
you're,
making
taking
advantage
of
it
and
I
hope
other
people
will
too
does
anyone
else
have
any
questions
for
Greg,
while
he's
still
standing
there's
one
over
here.
A
So
Rackspace
historically
has
been
about
building
up
the
internal
knowledge
space.
We
are
actually
I
mean
we
explore
a
lot
of
options
and
avenues
over
time
and
having
conversations
with
red
hot
again
in
here
and
they're.
A
big
part
of
it
was
just
getting
started,
ramped
up,
knowing
that
we
were
gonna,
be
building
out
expertise
on
it
internally
and
hoping
to
be
contributor
back
to
the
community.
C
A
B
A
A
A
D
A
We
want
to
look
at
having
the
Federation
layer
going
on,
but
basically
we're
taking
the
approach
of
kind
of
like.
If
you
go
to
use
AWS,
they
don't
sink
your
products
between
regions.
You
still
go,
deploy
your
stuff
to
them,
so,
basically,
not
sinks.
You
give.
We
will
worry
about,
making
sure
templates
are
there
and
all
those
other
things
are
there,
but
we're
not
helping
anybody
make
sure
that
their
application
is
deployed
across
multiples.
A
B
Any
final
questions
for
Greg
he
will
be
here
this
afternoon
and
through
all
of
kuba
coop
con
2,
so
please
reach
out
and
I
will
set
up
my
laptop
in
the
reception
this
evening,
while
we're
all
drinking
beer
and
anyone
who
wants
to
get
on
the
slack
channel
I
will
sign
them
up
so
come
and
find
me
all
right.
Thank
you
very
much.
Greg.