►
From YouTube: OpenShift Commons Briefing Cloud Native Operating Models with Andrew Clay Shafer Red Hat
Description
OpenShift Commons Briefing
Cloud Native Operating Models
Guest Speaker: Andrew Clay Shafer (Red Hat)
Hosted by: Diane Mueller (Red Hat)
A
All
right
welcome
again
to
another
openshift
Commons
briefing
bus.
Today
we
have
Andrew
clay
Shaffer.
Here
again.
This
is
his
second
round
on
our
Fridays
with
DTO
and
he's
going
to
talk
a
little
bit
about
cloud
native
operating
models
and
what
what
that
means
to
him.
I,
I
kind
of
love,
the
topic,
but
that
the
title,
but
don't
quite
know
exactly
what
he's
going
to
come
up
with
today,
so
Andrew,
take
it
away
and
we'll
have
live
Q&A
at
the
end
of
this
yeah.
B
C
B
So,
going
back
to
you
know
the
last
10
years
it's
really
been
focused
on
open
source
infrastructure
and
in
products
back
to
puppet
OpenStack
in,
and
you
know
now
more
and
more
kubernetes
in
parallel
to
that.
I
was
part
of
organizing
these
DevOps
days
globally,
and
you
know
these
communities
of
practice
around
velocity
conference,
and
so
those
conversations
and-
and
you
know,
being
part
of
those
projects
gave
me
what
I
think
was
a
pretty
unique
vantage
point
to
try
to
help
these.
You
know
the
customers
and
and
the
communities
around
these
different
projects.
B
So
the
the
overarching
theme
of
lots
of
this
is,
you
know,
there's
there's
the
code
and
there's
the
git
repos
and
then
there's
what
do
you
actually
do
with
that
and
how
do
you
make
it
work?
And-
and
so
this
is
me-
trying
to
distill
some
of
the
thoughts
and
conversations
I've
had
in
the
last
few
months
and
try
to
at
least
remove
some
confusion,
I'm
not
sure
I'm
gonna
answer
everyone's
question:
I
mean
in
in
some
ways.
B
New
technology
is
limited
by
the
fact
that
you're
imagining
it
in
in
these
old
ways
and
so
the
new
tools
sort
of
require,
or
not
necessarily
choir,
but
they're
optimized
by
by
new
thoughts
and
new
behaviors.
So
in
this
old
world
you
know
we
used
to
manage
servers
and
we
would
brag
as
sis
admins
about
the
uptime
on
these
servers
that
have
been
up
forever
and
not
restarted.
And
now
we
live
in
a
world
where
you
know,
containers
might
be.
The
life
of
a
container
by
design
might
be
on
the
order
of
seconds.
B
You
know
minutes,
certainly
not
certainly
not
days
or
years
like
like
we
used
to
be
so
proud
of.
The
other
thing
that's
happening
is
you're
getting
more
and
more
of
this
work.
That
is,
you
know
the
automation
enforced
by
by
api's
and
and
not
done
by
by
human
toil,
and
that's
the
theme
I'm
going
to
come
back
to
you
over
and
over,
and
so
the
other
thing
I
think
is
worth
pointing
out,
particularly
in
enterprises
as
they
adopt.
B
But
technology
is
the
business
right
and
so
that's
a
kind
of
fundamental
mind
shift
as
well.
So
here's
some
questions
and,
like
I,
said
I'm
not
in
this
league
in
an
answer
question
but
I'm
kind
of
gonna
answer
questions
and
we'll
have
more
questions
by
the
end.
So
this
is.
This
is
relevant
to
this
narrative
around.
B
You
know
what
we've
seen
over
the
last
10
years
and
what
people
are
trying
to
do
and
if
you
look
at
what
what
I
see-
and
you
know,
there's
across
the
kubernetes
community
in
general
and
definitely
in
the
OpenShift
community-
is
that
people
are
in
very
different
places
with
respect
to
how
they
take
that
technology
and
put
it
into
their
into
their
workflows
into
their.
You
know,
behaviors
and
and
where,
where
they
let
it
change
their
behaviors
or
not.
B
And
you
see
this
in
a
lot
of
cases-
they're
they're
adopting
a
new
technology,
but
they
have
a
process
that
came
from
how
they
manage
their
VMs
that
came
from
how
they
manage
their
bare
metal
and
they
haven't
really
redone
or
rethought
from
first
principles.
What
what
promises
they're
able
to
keep
with
those
processes?
You
know
versus
the
opportunity
that
they're
losing
by
kind
of
slowing
themselves
down
so
I'm,
going
to
give
you
my
definition-
and
this
is
you
know,
there's
like
this
whole
sub-genre
of
people
debating
the
definition
of
words.
But
this
is.
B
This
is
Andrews
version
of
what
DevOps
is,
and
you
know,
I've,
given
this
or
set
this
on
stage
multiple
times
over
the
last
few
years.
To
me-
and
this
is
this-
is
in
the
framing
of
software's
eating
the
world
right.
So
there's
this
notion
that
software
thing
in
the
world
and
to
me
that
means
software
is
going
to
optimize
everything
that
it
can
over.
B
Some
you
know,
timeline
and,
and
to
me,
DevOps
is
really
about
optimizing,
that
human
experience
and
performance
of
operating
software
doing
it
with
software,
so
that's
sort
of
the
software
in
the
world
and
then
recognizing
these
are
social
technical
systems.
It's
not
just
the
technology
by
itself,
it's
for
us
by
us
and
so
you're
gonna
do
this
work
with
humans.
So
this
is.
This
is
like
you
know.
The
last
ten
years
of
my
career
is,
is
going
around
the
world
and
talking
about
and
helping
people
try
to
adopt
these
these
processes
and
these
technologies.
B
So
at
the
same
time,
in
parallel
to
this,
there
there's
this
conversation
and
I'm
gonna
come
back
to
Sree
and
end
up,
often
in
a
few
ways,
over
the
next
few
minutes.
Thirty
minutes
or
whatever,
but
this
is
the
quote-unquote
beginning.
You
know
this
person
coined
the
term
Sree
at
Google,
and
this
is
a
quote
from
from
Benjamin.
What
happens?
Sree
is
what
happens
when
a
software
engineer
is
tasked
what
they
used
to
be
called
operations
and.
B
All
so
site,
reliability
engineering
is,
is,
you
know,
quote
unquote
the
Google
implementation
of
DevOps
there's
there's
free
book,
so
you
can
read
so
there's
if
you,
google,
sre
book
you'll,
get
to
a
link
that
is
hosted
by
Google
that
has
all
of
the
text
for
free.
If
you
like
to
buy,
you
know
dead
trees
or
whatever
there's
there's
hard
copies
as
well.
There's
there's
the
SRV.
The
original
sre
book
was
2016,
which
I'm
gonna
come
talk
to
some
of
the
points
from
that
book
and
then
there's
an
SRE
workbook
that
came
out.
B
Why
does
this
even
matter
like
like?
Okay,
so
there's
words
and
you
have
DevOps,
you
have
SRT
like
what
is
it
was
it
actually
do
and-
and
it's
interesting
watching
you
know
this-
is
this-
is
a
common
theme
with
you
know:
agile,
devops,
transformation,
sree,
where
you
see
lots
of
people,
adopt
the
vocabulary
and
maybe
change
their
titles,
but
they
don't
really
change
their
process.
So,
let's,
let's
try
to
do
better
than
that.
B
So
the
these
models,
together
with
the
technology,
are
what
enable
all
these
things
that
we
kind
of
take
for
granted
right.
So
there's
lots
of
there's
lots
of
things
that
we
take
for
granted.
We
all
walk
around
with
these
these
little
supercomputers
in
our
pocket,
and
we
sort
of
think
you
know
and
for
the
most
part
Google
is
going
to
be
available.
Gmail
is
going
to
be
available.
Facebook's
going
to
be
available.
B
What
I'm
going
to
talk
about
platform
patterns,
then
I'm
gonna
kind
of
compare
and
contrast
this
notion
of
you
bill
that
you
run
it,
which
is,
is
sort
of
famous
we.
You
know
the
Amazon
way
of
doing
this,
which
is
slightly
different
from
the
Google
way,
but
I'm
going
to
kind
of
show
how
they're
they're,
not
that
different
in
some
ways
and
then
and
then
go
back
to
the
sre,
more
specific
language
and
sort
of
practices.
B
You
know
that
might
be
months,
then
then
at
some
point
there's
operating
system,
then
you
start
putting
together
and
and
then
you
eventually
get
an
app
and
that's
long-lived
and
and
tied
together.
So
you
kind
of
have
to
refresh
them,
and
then
we
get
a
little
better
as
we
get
to
some
automation
and
virtualization.
But
what
you
see
happening
in
the
cloud
native
organizations?
Is
they
collapse
a
lot
of
the
complexity
of
the
infrastructure
layers
and
then
they
they
want
to
spend
more
and
more
of
their
quote?
Unquote.
Complexity,
budget,
delivering
value
of
the
application?
B
So
if
you
go
to
a
lot
of
enterprise,
data,
centers
or
colas
or
whatever
you
walk
around
and
every
rack
has
slightly
different
configuration
of
gear.
If
you
go
to
cloud
native,
quote:
unquote:
data
center-
and
you
know,
there's
the
open
hardware
stuff
and
then
there's
the
Google
stuff,
not
all
of
its
open
whatever.
But
if
you
get
an
opportunity
to
go
see
one
of
these
data
centers
is
really
really
like.
Football-Field-Sized
data
centers
filled
with
racks
and
racks
of
identical
Geir,
because
they're
they're
really
collapsing
the
complexity
of
what
they
have
to
manage.
I.
B
Think
it's
also
worth
noting
here.
When
you
start
talking
about
so
people
say
things
like
the
ratio
of
the
machines
that
are
managed
by
these
kind
of
system
means
at
one
of
these
companies
versus
what
you
see
in
the
enterprise
and
a
lot
of
that
ratio
is
coming
from
collapsing.
That
complexity.
It's
easy
to
easier
to
manage
a
thousand
things
they're
identical
than
it
is
to
manage,
in
some
cases,
ten
things
that
are
not
identical
right,
so
so
that
law,
that
ratio
and
that
efficiency
comes
from
from
collapsing.
That
complexity
at
the
bottom
stack.
B
So
this
is
a
pattern
you
see
over
and
over,
and
then
you
look
at
what
what
comes
up
the
stock
into
kind
of
oil
called
the
platform
services,
every
single
one
of
these
organizations
and
there's
a
list
of
more
that
you
know
built
various
aspects
of
it.
This
built
this
sort
of
self-service
provisioning
platform
for
their
developers
to
be
able
to
do
work
right
and
Google's
been
very
public
about
how
they
did
that
and
what
they
did.
B
That
Amazon
hasn't
necessarily
been
public,
but
they
made
some
aspects
of
what
they,
what
they
learn
and
built
very
public
by
launching
ec2
in
2006
and
then
and
then
Google.
You
know,
obviously,
everyone's
trying
to
kind
of
play
the
the
cloud
provider
game
now,
taking
all
those
lessons
from
that.
So
everyone
build
these
things
from
first
principles,
slightly
different
ways,
but
they
build
them
because
they
had
to.
There
was
no,
there
was
no
community
project
that
was
those
helping
solve
these
things.
B
But
if
you
go
look
at
what
Netflix
build,
you
know,
circa
I'd
say
2010
ish
like
it
basically
looks
like
kind
of
kubernetes
s,
but
on
top
of
VMs
on
top
of
Amazon.
So
you
know
you
have
something
where
you
push
push
a
thing:
they
they
basically
baked
images.
Just
like
we
baked
containers
there
ma
Mis.
They
had
a
you
know
these.
B
These
Java
projects,
you
can
walk
through
the
open
source
projects
from
Netflix,
and
you
can
see
the
routing
and
the
log
aggregation
and
in
all
these
pieces
that
were
the
Netflix
job,
a
specific
way
of
doing
that,
and
now
you
know
you
can
map
a
lot
of
those
same
the
capability
straight
to
CN
CF
projects.
And
so
then
we
don't
all
need
to
rebuild
these
things
from
scratch.
A
Go
back
to
that
slide
just
for
a
second,
because
it's
interesting
around
2010!
They
did
come
out
with
platforms
there,
but
just
prior
to
that
was
when,
like
a
thousand
platforms
as
a
services
bloomed
to
use
their
infrastructure,
so
it
was
almost
like.
They
saw
the
need
for
a
platform
from
the
thousands
of
small
platform
as
a
services
and
offerings
that
were
there,
so
that
kind
of
drove
them
to
do
some
uniformity.
There
I.
A
B
Well,
I
think
I
think
the
first
thing
they
did
because
a
lot
of
the
platform
as-a-service.
This
is
like
a
deeper
philosophical
thing:
that
law
the
platform
as-a-service
failed
right,
yeah,
elapsing,
Google's,
first
foray
into
cloud
as
a
service
offering
was
App
Engine,
which
was
a
platform
as
a
service
that
didn't
appeal
to
a
lot
of
people
because
they
had
made
it
so
Google
specific.
B
You
had
to
basically
remap
the
concepts
you
were
used
to
on
to
you
know
BigTable
or
whatever
the
kind
of
the
Google
version
of
it,
and
they
could
keep
all
these
promises
about
scale,
but
it
wasn't
necessarily
the
paradigm
people
wanted
wanted
to
use.
So
there's
like
another
hour-long
conversation
about
revolution,.
B
B
A
B
So
this
I'm,
just
gonna,
read
this
out
loud
and
there's
a
reveal,
but
all
the
stuff
sounds
great
right,
so
remove
friction
from
product
development,
high-trust,
slow
process,
no
handoff
between
teams
do
not
do
your
own
undifferentiated.
Heavy
lifting
use,
simple
patterns,
automated
by
tooling
self-service
cloud,
makes
impossible
things
instant.
So
these
are
great
words
sounds
great
to
me.
I
did
not
write
them.
These
are
actually
stolen.
Word
from
work
forward
from
conference
presentation
on
the
Netflix
lessons
learned
by
Adrian
Koh
Croft
when
he
used
to
work
for
for
Netflix
right.
B
So
to
me,
when
you're
talking
about
something
like
open
chef
and
the
platform
and
the
goals
that
you
have
as
an
organization,
it
should
map
more
or
less
to
this,
like,
as
you
can
write
and
obviously
we're
all
in
different
parts
of
that
journey.
But
but
I
don't
I,
don't
think
a
lot
of
organizations
never
necessarily
have
this
as
a
North
Star
or
at
least
their
behaviors,
don't
indicate
that
and
and
so
the
more
that
you
can,
the
more
that
we
can
help
each
other
get.
B
There
then
I
think
the
better
off
Aaron's
going
to
be
because
I
like
nice
things,
don't
you
have
my
students?
Don't
I
got
me
at
this
point
earlier,
but
this
cloud
conversation
of
all
from
the
lessons
learned,
building
operating
these
services
and
and
that's
key
here.
So
these
are
services,
software
services,
platform,
services,
infrastructure
services.
Now
now
the
the
bad
thing
is,
you
actually
have
to
operate,
those
like
they?
B
So
what
is
operations?
We
keep
saying
this
word
and
and
there's
like
this
whole
body
of
operations.
That
means
something
in
business
context.
It's
totally
different.
What
I'm
talking
about
today,
but
for
me
operations
is
really
about.
You
know
the
system,
operations
and
and
building
this
kind
of
technical
infrastructure
and
and
delivering
things
that
way.
So
you
know
this
is
the
DevOps
days
is
the
velocity
conference.
This
is
the
SRU
book,
like
all
these
things
are
part
of
it
right.
So
it's
like
metrics,
pretty
much
key
to
understanding.
B
What's
going
on
now,
you
have
some
stuff
you
can.
You
can
hopefully
determine
things,
are
good
or
bad
and
when
things
are
bad,
you
hopefully
alerts
people
when,
when
things
need
to
change,
you
hopefully
aren't
doing
everything
manually
got
some
automation
there
there's
a
lot
of
what
I
consider
operations
that
really
comes
down
to
having
mental
understanding
of
the
system
and
getting
into
the
middle
of
it.
B
Basically,
but
this
this
is
the
focus
of
DevOps
day's
conversations
for
the
last
ten
years
and
there's
lots
of
meaningful
stuff
available
for
you
to
go
take
in
to
that,
but
these
are
kind
of
like
a
baseline
set
of
capabilities.
So
this
is
a
slide
from
2007
and
I
used
it
in
a
lot
of
conversations
and
a
lot
of
presentations.
B
I
was
made
by
one
of
my
friends
who
at
the
time,
worked
at
Amazon
and
he
was
talking
about
so
this
is
sort
of
the
golden
age
of
a
puppet
and
and
coming
into
like
the
beginning
of
DevOps
conversations.
So
you
have
traditional
operations
on
one
side
and
kind
of
the
new
quote:
unquote:
secret
sauce
operations
on
the
other
side-
and
the
argument
here
is
that
the
colors
on
the
graph
represent
quote:
unquote,
toil
so
the
the
humans
doing
work.
B
B
So
in
2020
we
should
be
able
to
compress
that
curve
even
more
given
the
the
platforms
and
the
tools
that
we
have
available
to
us
today.
But
the
thing
I
want
to
draw
here
and
as
we
go
into
the
rest
of
it,
is
this
notion
that
operations
is
the
secret
sauce
can
have
advantage
and
I
would
argue.
This
is
the
this
is
the
defining
advantage
of
the
cloud.
Natives
is
their
operational
excellence,
so
this
is
2006,
and
this
is
a
pretty
famous
interview.
I'll
just
read
it
as
well.
B
The
traditional
model
is
that
you
take
your
software
to
the
wall
that
separates
development
operations
and
throw
it
over
and
then
forget
about
it,
not
in
Amazon,
you
build
it.
You
run
it.
This
brings
developers
into
contact
with
the
day-to-day
operation
of
their
software.
It
also
brings
them
into
day-to-day
contact
with
the
customer.
This
customer
feedback
loop
is
essential
for
improving
the
quality
of
service,
so
this
is.
This
is
an
interesting
quote
in
time.
This
is
the
you
know.
B
This
is
a
blog
post
that
Johnny
Damon
wrote
after
the
first
DevOps
days
in
the
US.
They
identify
culture,
automation,
metrics
and
sharing
and
jazz
in
in,
like
you
know
very
very
quickly
after
that
I
added
lean
with
this
notion
of
continuous
improvement,
Kaizen
and,
and
so
like,
there's
no
shortage
of
DevOps
content
online,
but
this
is
sort
of
a
framing
for
some
of
this
other
stuff,
we're
talking
about
with
the
capabilities.
So
going
back
to
this
notion
of
you
build
it,
you
run
it.
B
What
Vernors
saying
when
he
says
you
build
it,
you
run
it.
The
software
teams
are
not
building
up
all
of
these
other
services.
Those
exists
for
them
inside
of
Amazon
in
2006,
for
for
a
developer
or
development
team.
To
get
access
to
provision
infrastructure
was
an
API
provision
and
database
was
an
API
right.
So
you
you
you've,
got
all
this
built
in
platform
and
infrastructure
services
available
to
the
developer.
B
The
developers
are
not
building
those
they're
not
running
those,
they
have
insight
into
them,
but
what
Vernor
actually
means
for
the
for
the
quotable
quotes,
who
Pizza
team
to
run
their
software
is
that
they
run
their
software
and
then
they
have
all
these
other
things
that
are
taken
care
of
for
them
by
the
those
other
responsibilities.
So
that's
something
that
I
think
sometimes
gets
lost
in
translation.
B
Where
you
see
groups
of
people
who
are
like,
oh
you
build
it,
you
run
it
it's
like,
oh
well,
you
know
you
gotta,
install
the
west's
and
and
you
giving
developers
who
may
not
have
that
kind
of
context
and
expertise,
a
bunch,
a
bunch
of
things
that
they're
not
necessarily
prepared
to
do
well,
and
so
there
is
some
value
to
that
sort
of
specification
stratification.
So
this
is
Google.
B
Sre
is
not
really
part
of
the
lexicon
until
2016,
so
this
is
ten
years
after
that.
Vernor
quote-
and
you
know
I
already
kind
of
said
this
earlier,
but
this
is
essentially
Google's
DevOps
implementation
and
you
know
one
of
the
reason
I
share.
This
mini
go
is
if
you
go
through
and
read
this
book
like
they
pretty
much
check
off
all
these
boxes,
you
can
kind
of
go
that
you
can
read
for
free
a
search,
that's
our
ebook,
and
this
is
my
recommendation
for
everyone:
good
DevOps,
coffee,
raid,
dev
off
steel.
B
So
wherever
you
find
good
ideas,
you
should
make
take
full
advantage
of
them
and
then
the
rest
of
this
or
for
the
next,
like
section
this
I'm
going
to
talk
more
specifics
about
sre
kind
of
in
proxies-
and
this
is
this
straight
from
the
book.
So
this
is
the
Dickerson's
hierarchy
of
reliability
from
the
sre
book,
and-
and
here
you
can
see,
the
foundation
of
reliability
from
a
Google
perspective
is,
is
monitoring
brain?
So
then
you
you
you,
you
have
monitoring.
B
You
can
figure
out
a
little
bit
more
about
what's
going
on
now,
you
know
something's
wrong.
So
then
you
respond
to
incidents.
You
respond
to
incidents.
Now,
okay,
like
we
respond
to
those
incidents,
we
learned
some
things.
We
do
some
analysis
that
kind
of
goes
back
into
it
and
then,
and
then
at
the
very
top
you
get
up
to
this
notion
of
the
product
like
what.
B
Why
does
this
infrastructure
even
exist
and
I'm
not
gonna,
go
through
I
mean
that's
our
ebooks
500
page
book,
so
for
now,
we'll
just
keep
going
that
but
I'm
going
to
come
back
to
this
notion
watering
as
the
central
thing.
So
this
if
anyone
hasn't
read
the
work
paper,
I
think
the
Borg
paper
was
like
2015.
They
published
this
paper
about
bored
and
it
talks
about
the
evolution
of
Borg
and
it
talks
about
kubernetes
and
some
of
this
stuff.
B
B
This
shift
in
the
understanding
of
what
is
a
competitive
advantage
and-
and
you
know,
that's
that's
a
fundamental
shift
and-
and
you
see
the
you
know,
kubernetes
and
the
ecosystem
that
was
built
around
that
as
a
reflection
of
them
reframing,
some
of
those
things
that
they
thought
were
secrets
to
them.
So
the
Borg
paper
has
this
gold
nugget
that
I
think
everyone
misses
they
get
focused
on
container
scheduling
and
like
fancy
algorithms.
So
this
is.
This
is
one.
This
is
straight
from
that
paper.
B
Almost
every
task
run
under
bore
contains
a
built-in
HTTP
server
that
publishes
information
about
the
health
of
the
task
and
thousands
of
performance,
metrics
and
I
have
a
standing,
wager
and
I.
Don't
think
I'll
ever
lose
this
that
you
will
get
more
operational
benefit
from
building
instrumentation
observability
into
your
software
into
your
applications
into
your
services.
B
Is
the
foundation
reliability
from
in
Google's
perspective
right?
So
people
miss
this,
but
what
you
can
do
monitor
monitor
monitor.
So
this
is
all
straight
from
the
book
and
we
don't
necessarily
need
to
go
through
in-depth
but
service
level
terminology.
You
have
to
build
up
to
that.
To
get
to
the
talk
about
SLO
s
and
to
me
as
though
those
are
the
defining
feature
of
s
re,
so
it's
worth
building
up,
so
you
have
service
level
indicators
which
would
basically
now
you
have
some
monitoring.
B
What
you're
doing
not
like
a
cookie
cutter
paint
by
the
dots
but
like
be
thoughtful
about
what
a
service
level
should
mean
for
the
particular
study
and
and
then
you
know
what
what
kind
of
that
drives
a
bunch
of
decisions
about
monitoring,
and
then
this
is
also
straight
from
the
sorry
book.
But
this
is
the
kind
of
the
the
golden
signals.
B
The
four
golden
signal
is
according
to
Google's
sre
book,
our
latency
traffic
errors
and
saturation,
and
so,
if
you
are
not
monitoring
those
things
right
now,
this
is
maybe
a
good
opportunity
to
steal
a
good
idea
from
someone
else
and
then
think
very
hard
about.
You
know
if
you're
not
marring
this
stuff.
Now,
why
not?
And,
and
if
you
aren't,
then
what
would
it
take
to
get
this
kind
of
information
and
start
to
think
about
the
you
know
the
meaning
of
each
of
these?
For
your
particular
context.
B
Now
that
you
have
s
allies
and
you
can
measure
these
things,
then
you
move
on
to
this
notion
of
an
SLO,
so
the
service
level
objective
and
to
me
this
is
sort
of
the
defining
the
the
defining
quality
of
s
area
is
really
centered
on
SLO
s
and
this
contract.
So
you
say
you
know
we
want
this
many
nines
or
this
many
whatever
for
these
particular
indicators
and
that
establishes
an
air
budget.
B
You
don't
necessarily
want
to
have
a
single
dimension
for
an
SLO
on
a
service,
but
you
also
don't
want
to
make
it
too
too
complicated,
and
you
know
this
last
point
here
is
worth
pointing
out
that
the
progress
is
more
important
than
perfection
right.
So
so
what
do
we
have
measured?
And
what
are
we?
What
are
we
kind
of
looking
at
today?
And
what
can
we
do
to
improve
that
system
and
make
it
so
there's
better
tomorrow?
It's
more
important
than
getting
it
perfect.
B
So
the
this
SLO
is
really
a
three
way
contract
between
the
developers,
the
the
business
and
the
operations.
So
the
the
business
is
saying
reliability
is
important
to
us.
If,
if
this
thing
is
not
available,
then
it's
not
delivering
value
developers
are
pushing
the
code
and
then
operations
is
responsible
for
that
reliability
or
this
or
er.
And
so
what
that
establishes?
Is
this
notion
of
a
service
level
objective
which
gives
you
an
air
budget
and
then,
in
the
context
of
the
air
budget?
B
The
idea,
at
least
from
the
aspirational
kind
of
perfected
version
at
Google,
is
that
you
can
do
things
with
that
air
budget,
so
I
think
it's
worth
pointing
out.
100%
is
not
realistic,
yeah
I'll
argue
it's
actually
impossible,
as
you
get
to
these
nine.
So
now
you
have
an
air
budgets
establish
you.
Can
you
could
talk
about
this
acceptable
level
of
unreliability
Ryan
for
some
things
that
might
be
minutes
or
seconds
and
and
especially
gets
interesting?
B
When
you
talk
about
building
these
services,
that
can
operate
with
continuous
partial
failure
right,
so
you
so
you
have
some
some
isolation,
some
concurrency,
so
that
some
fraction
of
your
system
can
be
down-
and
you
know
that
dovetails
into
another
kind
of
interesting
sub-genre
of
DevOps,
around
Engineering
and
injecting
failures
and
that's
an
all
good
fun.
But
now
we're
going
to
talk
about
today.
So
now
you
have
a
SLO.
You
have
these
consequences.
B
So
when
you're
below
your
air
budget
in,
like
the
perfected
version
of
this
you
you
you
have
this
notion
of
the
developer
self
self-service
access,
they
can
do
all
this
stuff.
When
you,
when
you
blow
your
budget,
will
go
to
this
next
slide,
then
then
it
changes
the
dynamic
of
these
things.
So
in
the
quote-unquote
aspirational
Google
sense
of
this
when
you're
below
your
air
budget
for
reliability,
the
dev
team,
the
they
deliver
features.
B
B
By
being,
you
know,
demonstrating
their
value,
and
then
s
re,
take
over
the
operational
responsibility
to
call
on
the
troubleshooting
after
services
have
gone
through
the
the
production,
reliability
review
and
the
application
reliability
review
to
kind
of
retool
the
architecture
to
match,
with
the
promises
of
the
SLO
that
the
authorities
are
signing
up
to
keep,
and
so
just
to
make
this
point.
This
is
something
I
think
gets
lost
because
people
see
the
s
re
and
they're
like.
Oh,
it's
just
like
traditional
operations.
B
You
know
we
just
have
like
we'll
just
change
the
name
of
our
system
and
so
sorry,
the
the
the
s
re
are
no
there
to
take
toil
away
from
the
software
engineers.
The
SRE
are
there
to
drive,
toil
out
of
the
system,
and
so
that's
the
whole
point
of
this.
You
know
one
the
e
part
of
sre
and
to
this
reliability
assessment
to
take
the
burden
of
it
and
in
at
least
aspirationally
from
the
book.
B
There's
this
notion
of
a
toil
that's
being
created
by
a
service
and
according
to
the
book,
if
you,
as
a
software
engineering
team,
exceed
the
SLO
air
budget
too
much,
then
the
the
sre
have
the
right
to
push
all
of
the
operational
burden
back
on
to
the
software
team.
So
it's
like
you
can't
get
your
stuff
together,
you're,
causing
me
a
lot
of
problems
and
works
like
ok.
B
Now
it's
all
your
problem
until
you
get
back
into
you
know,
if,
if
you,
if
you
need
to
go
outside
to
use
the
bathroom
like
you
can't
be,
you
can't
be
a
puppy
right
and
you
got
to
go.
We
got
trained
you
to
do
this
right.
So
there's
like
a
little
bit
of
a
dynamic
power
dynamic
where
the
essary
could
push
the
operational
burden
back
on
the
software
engineers.
I
think
it's
also
worth
pointing
out
and
I
had
a
lot
of
conversation.
B
Philosophical
conversation
with
people
at
Google
about
this
but
sre
are
effectively
the
architects
of
Google's
platform,
those
platform
services
and
those
data
services,
in
particular,
I
mean
in
a
sense
they
were
also
essentially
product
managers
there.
So
this
is
straight
from
the
book
as
well.
Sree
builds
framework
modules
to
implement
canonical
solutions
for
the
concern
production
area.
As
a
result,
development
teams
can
focus
on
the
business
logic,
because
the
framework
already
takes
care
of
correct
infrastructure
use.
B
So
when
you're
thinking
about
adopting
a
container
platform
and
kind
of
building
up
these
platform
services
for
your
own
organization,
I
think
it
may
be
you're
not
gonna,
adopt
the
s.
Re
model
wholesale,
but
thinking
thoughtfully
about
what
are
the
promises
that
the
services
that
we're
building
can
keep
with
respect
to
you,
this
infrastructure
used
as
we
make
them
available
to
our
developers
to
you
know,
go
back
to
the
lessons
learned
from
Netflix,
like
we
really
want
to
unlock
that
appraoch
development,
and
so
we're
kind
of
coming
to
the
end
of
this.
B
Oh,
my
kind
of
advice.
Is
you
don't
think
thoughtfully
about
these
different
services
that
you
have
to
operate?
Think
thoughtfully
about
who
has
the
accountability
to
operate
them?
Who
has
the
tools
to
operate?
I
really
like
SLO?
So
if
you
think
about
the
way
that
Google's
architected
itself
and
built
this
up,
that
you
have
these
infrastructure
services,
each
of
which
have
SLO
s
and
keep
promises
to
these
platform
services
right.
So
it's
like
at
the
bottom,
you
have
the
container
scheduling
you
have
Colossus.
You
have
like.
B
You
know
some
some
thoughtful
things
about
how
you're
gonna
schedule,
jobs
and
store
data
and
the
rest
of
that,
and
then
you
build
higher
level
services.
On
top
of
that,
they
also
have
the
rest
of
those
that
our
promises
kept.
The
kind
of
application
and
the
software
on
top
of
that
and
then
last
but
not
least,
you
have
the
customer
facing
us
all
those
because
because
hopefully
there's
some
business
at
some
point.
So
so
the
Vice
here
is
not
necessarily
all
like.
B
If
you
don't
have
monitoring,
if
you
don't
have
great
monitoring,
if
you
don't
think
you
can
kind
of
think
about
the
four
signals
or
what's
appropriate
for
your
services,
that's
a
great
place
to
start
investing
as
you
build.
You
know,
monitoring
capabilities.
Now,
okay,
we
no
longer
have
the
customers
tell
us
something's
wrong,
we
can
detect.
Things
are
wrong.
How
do
we
respond
to
that
be
thoughtful
about
incident
response
and
the
way
you're
gonna
manage
those
and
then
kind
of
build
yourself
up
through
the
this
pyramid
of
reliability?
B
So
what
is
DevOps?
What
is
s
REI,
I
promised
I
would
give
you
more
questions,
not
necessarily
more
answers,
honestly,
who
cares
I
honestly
care?
These
are
this
words.
What
works
is
more
interesting
question
in
my
opinion,
and
in
particular
what
works
for
you.
You
know
what
works
today
and
what
could
work
better
for
to
you
tomorrow
and
then
really
the
end
of
the
day.
B
If
it
doesn't
go
to
Prada,
doesn't
matter
production
or
didn't
happen,
can
you
put
code
into
production
and
how
fast
and
then,
once
you
get
it
there,
I
can't
keep
it
running.
So
that's
some
questions
for
you.
Thank
you
for
your
time.
That
was
a
quick
run
through
some
thoughts
and
conversation
I've
been
having
recently
about
how
to
how
to
optimize.
You
know
a
kind
of
operational
practice
and
process
around
your
OpenShift
investment.
A
There
was
one
diagram
you
had
in
there.
That
was
the
three-way
conversation
between
business,
ops
and
development.
I
think
there
was
one
drive
visual
there
and
when
I
saw
that
the
other
thing
was
the
other
component
and
you
added
it
in
later
of
the
customers
and
what
you
often
see
is
this
tension
inside
of
development
groups
and
product
management
groups
that
are
trying
to
deliver
more
features.
A
You
know
versus
the
stability
and
that
dealing
with
the
tensions
of
accountability,
for
maintaining
stable
and
developing
stable
services,
and
in
order
to
get
that
Optima
goal
of
the
sres
coming
in
and
taking
responsibility
for
the
operation
of
the
software
surfaces.
That
tension,
I
think,
is
one
of
the
things
that
that
that
you
have
to
tease
out
inside
of
your
organization
how
you're
going
to
I,
don't
know
whether
it's
the
Pavlovian
reward
them
for
good
behaviors
kind
of
things.
A
B
That's
what
it
comes
back
down
to
and
really
works
where
the
organizational
kind
of
misalignment
or
conflict
comes
from
is
that
you
have
executives
that
have
slightly
different
compensation
models
right.
So
it's
like
you
have
it
like
if
you
can't
align
that
higher
level
mission
at
the
top
level
of
an
org
the
chance
that
you're
your
front
line,
developers
and
operators
are
going
to
be
lying
to
zero
and
sort
of
like
we're.
So
like
you
have
to
really
revisit
fundamental
assumptions
about
some
of
those
organizational
power
dynamics
to
get
to
these
models
and.
C
A
Absolutely
I
think
that
the
reliability
of
your
service
and
your
software
is
is
key,
but
the
tension
of
delivering
I
I
see
it
every
day
and
all
of
the
you
know,
inside
of
Red
Hat
in
the
companies
that
we
work
with
of
wanting
to
deliver
more
features,
more
functionalities
at
higher
scale
and
that
pressure
to
deliver
more
but
and
try
not
to
ignore
the
stability
of
it.
I
think
that's
really
for
me.
A
The
other
thing
that
was
interesting
to
me
early
on
in
the
whole
slide
deck
and
was
the
artisanal
versus
industrial
conversation
that
traditional
was
artisanal
and
today
is
our
industrial,
and
then
there
was
another
one
about
delivering
the
impossible.
Adrienne
I
think
it
was
Adrienne's
quote
there
and
the
the
myth,
or
at
least
the
the
hope
that
I
have
is
that
this
industrial-strength
infrastructure
and
these
new
practices
will
allow
us
to
do
the
creative
things
that
we
want
to
write
and
that
allow
us
and
empower
developers
to
deliver
these
new
things.
A
These
new
features
they
need
to
a
please
new
services,
but
having
the
complexity
of
having
to
understand
what's
under
the
hood.
So
when
you
see
someone
now
come
to
the
table
with
a
new
offering
a
new
service,
but
they
also
have
to
understand
what
kubernetes
is
right,
that's
different
than
just
having
to
be
able
to
build
a
web
application
or
a
service
or
that
offer
a
database
offering.
So
there's
all
this
extra
that
you're
asking
developers
to
understand-
and
that's
really
I-
think
something
another
cultural
shift
and
we've
seen.
B
Some
ways
I
think
that's
actually
wrong,
like
you
don't
like,
if
your
developers
need
to
understand
more
and
more
infrastructure
to
do
their
work
like
it's,
not
that
you
want
to
be
ignorant
of
it,
but
you
want
your
developers
to
be
focused
on
the
creative
aspect
of
their
domain
and
the
value
they're,
creating
not
not
sifting
through
yamo
and
and
and
so
like.
There's,
there's,
definitely
an
aspect
of
understanding.
B
D
B
I
think
this
is
a
very
interesting
question
and
a
lot
of
organizations
are
being
kind
of
forced
to
ask
this.
So
so
I
think
that
you
have
to
look
at
what
you're
trying
to
accomplish
and
I'm
I'm,
not
from
this
school
of
thought
that
micro
services
are
always
better
than
monoliths
right.
So
so
thinking
about
what
what
kind
of
promises
you
can
keep
and
and
why
you
want
to
move
mindfully
to
these
architectures
is
key
now,
when
you
think
about
the
operating
model
and
these
tools
and
these
other
capabilities
that
I
talked
about
operations.
B
One
thing
to
keep
in
mind
from
the
very
beginning
is:
when
you
go
so,
let's
say:
let's
say
you
have
an
aspiration
of
the
micro
servers
architecture.
When
you
have
micro
services,
you
you
have
more
deployments,
you
have
more
things
that
need
monitoring
right.
So
if
you
have
a
high
fixed
cost
of
deployment,
you
know
in
terms
of
the
of
the
work,
the
automation,
it's
not
there,
the
testing
whatever
to
have
you
your
confidence
in
the
quote,
unquote,
release
or
you
have
like
these
unmonitored
systems,
and
then
you
go
to
a
micro
service
architecture.
B
If
you
don't
have
good
well
factored
monoliths
that
are
monitored
in
a
meaningful
way,
chance
that
you're
gonna
end
up
with
a
well
factored
microservice
architecture.
This
mar
in
a
meaningful
way,
is
quite
low
right.
So,
let's
build
up
that
organizational
competency,
kind
of
muscle
memory
around
those
things
with
the
monolith
that
we
have
and
then
meaningfully
take
pieces
of
functionality
out
of
the
monolith
over
time,
because
I
also
think
that
the
the
Big
Bang
rewrite
approach
to
going
from
monolith
to
to
micro-services
tends
to
lead
to
catastrophic
failure.
So
it's.
A
C
B
To
me
to
me,
like
I
kind
of
break
things
into
a
few
buckets
and
and
if
something
is
not
causing,
the
operational
burden
doesn't
need
to
scale
to
keep
keep
promises.
I
need
for,
for
my
org
or
or
doesn't
have
like
a
need
to
change
it
rapidly
right.
So
some
things
he
needs
to
stay
the
same,
isn't
a
problem,
scaling
and
isn't
expensive
to
operate.
I'll
just
leave
it
alone,
yeah
right.
D
B
Let's
move
that
into
architectures,
where
we
can
have
more
rapid
feedback
cycles
with
that
customer
engagement
right
so
so,
like
that's
a
motivator
if
I
know
I'm
having
problems,
keeping
the
reliability,
the
scaling
of
that
particular
architecture.
Let's
get
to
you,
know
the
new,
the
new
event-driven
or
whatever
your
vision
is
for
that
architecture
to
keep
those
promises
or
if
it's
expensive.
For
other
reasons,
you
know
it's
expensive
in
terms
of
human
costs
or
licensing
costs
or
whatever.
That
could
be
a
motivator.
But
if
it's
not
one
of
those
three
things
monolith
for
life,
baby.
A
You
can
tweet
at
him
at
little
idea
and
we
will
post
this
video
with
his
credentials
and
how
to
get
a
hold
of
them.
We
are
also
launching
a
transformation
sig,
so
there'll
be
a
landing
page
soon
up
on
Commons
OpenShift
org,
with
links
to
this
video
and
others,
as
well
as
a
place
to
sign
up
for
how
to
join
us
and
get
announcements
about
who's
coming
on
deck.